Boot loader updates and power interruption

Go To Last Post
7 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

We have a fair bit of units in the field, these are installed in cars and trucks and designed to switch on a computer when the ignition is on and send a shutdown to the computer when the ignition turns off. It was designed with the MCU to stay powered for as long as the unit is installed however,  these units are now installed into trucks that have emergency isolation switches that remove power.

 

As these units were designed with firmware updates in mind, a few of us are now worried now that if the firmware is updating and the power to the MCU is interrupted, corruption of the boot loader might be an issue. A couple of the engineers have said that during the low voltages, the logic levels will be violated along with setup and hold times and you might corrupt a page of flash you didn't even intend to write to.

 

The MCU has the following

ATmega88PA

Powered by 3.3V

BOD set to 1.8V

Using internal 8MHz RC oscillator

Boot Lock bits not set

Boot loader section enabled

 

The datasheet gives some explanation of what happens during a reset, under the heading of Preventing Flash Corruption (26.2.3) of the ATmega48A/PA/88A/PA/168A/PA/328/P datasheet

"If a reset occurs while a write operation is in progress, the write operation will be completed provided that the power supply voltage is sufficient."

 

Does this statement guarantee that if the power interruption happens in the middle of an erase or program operation, that operation is immediately terminated to protect against corruption? An engineer here believes this is a bit ambiguous.

 

We can deal with corruption in the application section, ie. a failed write - the boot loader stores the CRC and checks on startup and enters a recovery mode

The boot loader does guard against writing over itself via a simple page comparison - The reason the boot lock bit wasn't set was some early requirement that the boot loader could be upgraded - although this is now not the case.

We can instruct the SW to only initiate firmware updates while the vehicle is in motion (but still might run the risk - ie, GPS data not available)

 

Just a few questions:

 

  • Are there any Microchip / Ex-Atmel employees that can clarify whether or not switching off power during the middle of an SPM write will not corrupt any other page other than the one that is currently being written to at the time of power loss.
  • If we set the BLB fuse bits that control SPM access to the Boot loader section (something that can be fortunately done in code) does this set up hardware gating within the AVR to prevent writes to the boot loader pages, or would it likely suffer the same fate.
  • Learn from this and add more bulk capacitance so that the voltages stays higher for longer after the brownout reset to ensure completion of the write

 

Thanks community

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

adam3141 wrote:

The datasheet gives some explanation of what happens during a reset, under the heading of Preventing Flash Corruption (26.2.3) of the ATmega48A/PA/88A/PA/168A/PA/328/P datasheet

"If a reset occurs while a write operation is in progress, the write operation will be completed provided that the power supply voltage is sufficient."

 

Does this statement guarantee that if the power interruption happens in the middle of an erase or program operation, that operation is immediately terminated to protect against corruption? An engineer here believes this is a bit ambiguous.

I would say the `provided that the power supply voltage is sufficient` applies, ie enough to run the charge pumps, is certainly more than enough to hold any CMOS levels.

 

adam3141 wrote:
If we set the BLB fuse bits that control SPM access to the Boot loader section (something that can be fortunately done in code) does this set up hardware gating within the AVR to prevent writes to the boot loader pages, or would it likely suffer the same fate.

On some Atmel MCUs the fuses only apply after a reset, so you might like to check that detail.

 

adam3141 wrote:

Powered by 3.3V

BOD set to 1.8V

...

Learn from this and add more bulk capacitance so that the voltages stays higher for longer after the brownout reset to ensure completion of the write

Maybe you can nudge the BOD setting up a little, to give more decay margin ?

 

You could test this, to be completely sure. eg We have fed a function generator into a LM317 to give a triangle wave power supply, that can vary any min-max values,  or you could use a MCU pin, to drop Vcc with delays very close to a flash write..

Last Edited: Mon. May 20, 2019 - 04:50 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

adam3141 wrote:
... The reason the boot lock bit wasn't set was some early requirement that the boot loader could be upgraded - although this is now not the case.

ATmega48A, ATmega48PA, ATmega88A, ATmega88PA, ATmega168A, ATmega1688PA, ATmega328, ATmega328P datasheet

27. Boot Loader Support – Read-While-Write Self-Programming

...

27.8.11 Preventing Flash Corruption

...

Flash corruption can easily be avoided by following these design recommendations (one is sufficient):

1. If there is no need for a Boot Loader update in the system, program the Boot Loader Lock bits to prevent any Boot Loader software updates.

[2. BOD (internal or external), 3. LVD causes sleep]

...

via ATmega88PA - 8-bit AVR Microcontrollers

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Who-me wrote:
Maybe you can nudge the BOD setting up a little, to give more decay margin ?
And more margin by an external BOD (better precision and accuracy than the internal BOD)

AVR180: External Brown-out Protection

 

"Dare to be naïve." - Buckminster Fuller

Last Edited: Mon. May 20, 2019 - 02:19 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks guys

 

I'm more looking towards the possibility of remedying via a 'risky' code update, or gaining more confidence of our units that are already out there. Unfortunately, it is a very difficult and expensive process to go out and reprogram them with a higher BOD level. Modifying hardware is a no-go. Future hardware might indeed employ more bulk capacitance to ensure it can complete a write with sufficient voltage along with a higher BOD level.

 

Who-me: What worries the other engineers and myself regarding the statement that the write operation will be completed provided there is power supply voltage is sufficient, is that there is no mention of minimum write voltage for the flash cell. The BOD gives a min value of 1.7V so if I start an erase/program cycle and at that moment the voltage drops to 1.7V the BOD circuit will hold the device in reset, according to the statement, the write will continue "so long as the power supply voltage is sufficient" which can take 4.5ms. The voltage will continue to drop to the point where it will get into the logic danger zone. The amount of capacitance just will not hold up the MCU for 4.5ms.

 

There is no question that there will be flash corruption, I can deal with the corruption in the application section, the question is when we start to get into this danger zone, will the MCU terminate the write before the setup and hold time for the flash cell is violated? The worry is a logic 1 might be mis-interpreted as a logic 0 for the actual flash page meaning my write to page 0x14 will be a write to page 0x54 which might be in the boot loader for instance. I would hope the hardware block in the AVR would terminate the write before it could get into this danger zone. I just am not sure as the wording wasn't detailed enough. I prefer things to be quite explicit - maybe it is and I am being too pedantic.

 

gchapman, I did read that bit in the Atmel datasheet regarding the BLB bits and I am hoping this is the saving grace as this can be applied via a code update.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

adam3141 wrote:

Who-me: What worries the other engineers and myself regarding the statement that the write operation will be completed provided there is power supply voltage is sufficient, is that there is no mention of minimum write voltage for the flash cell. The BOD gives a min value of 1.7V so if I start an erase/program cycle and at that moment the voltage drops to 1.7V the BOD circuit will hold the device in reset, according to the statement, the write will continue "so long as the power supply voltage is sufficient" which can take 4.5ms. The voltage will continue to drop to the point where it will get into the logic danger zone. The amount of capacitance just will not hold up the MCU for 4.5ms.

 

That was why I suggested raising the BOD as high as is practical. and others added external reset sense.

 

adam3141 wrote:
The worry is a logic 1 might be mis-interpreted as a logic 0 for the actual flash page meaning my write to page 0x14 will be a write to page 0x54 which might be in the boot loader for instance. I would hope the hardware block in the AVR would terminate the write before it could get into this danger zone. I just am not sure as the wording wasn't detailed enough. I prefer things to be quite explicit - maybe it is and I am being too pedantic.
 

If you are that those levels of paranoia, you are best to test some on the bench, trying to enter these unsafe areas.

eg You should target a moving write location & pattern, and also check all other locations for 'no change', for every write update.

Some chips have charge bleed effects, and others use a page write approach, where they read a row of flash cells, erase and then write back the new data. A starve failure there, could affect anything on the same page.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Who-me wrote:

That was why I suggested raising the BOD as high as is practical. and others added external reset sense.

 

As I said in an earlier comment, this is a no-go. Fine for future hardware, not practical for current hardware.

 

adam3141 wrote:
The worry is a logic 1 might be mis-interpreted as a logic 0 for the actual flash page meaning my write to page 0x14 will be a write to page 0x54 which might be in the boot loader for instance. I would hope the hardware block in the AVR would terminate the write before it could get into this danger zone. I just am not sure as the wording wasn't detailed enough. I prefer things to be quite explicit - maybe it is and I am being too pedantic.
 

Who-me wrote:

If you are that those levels of paranoia, you are best to test some on the bench, trying to enter these unsafe areas.

eg You should target a moving write location & pattern, and also check all other locations for 'no change', for every write update.

Some chips have charge bleed effects, and others use a page write approach, where they read a row of flash cells, erase and then write back the new data. A starve failure there, could affect anything on the same page.

 

Well, this level of paranoia comes from an engineer with ASIC development experience and without a clear cut explanation of the exact process, we cannot just assume it won't corrupt the boot loader. It is why in one of my first comments I asked if there was an Ex-Atmel - Current microchip employee who might be able to clarify that specific statement. I have heard that a few of them frequent this site.

 

It is a good idea to bench test it, don't get me wrong, we just don't have the time or man power to do these things. Possibly we could try such a thing further down the line, for future projects although the humble AVR might give way to ARM chips for our future development projects.

 

We have workarounds which will mitigate this risk somewhat but would still like to be sure.

 

Thanks anyway

Last Edited: Tue. May 21, 2019 - 06:27 AM