blank xmega returned by customer after In field Programming

Go To Last Post
16 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

We have a bootloader application that programs firmware changes to an update region.  Recently a customer returned a board after a failed attempt at programming.  They told me they were successful updating it, but after a few iterations they discovered the failure.  They had three separate boards fail with the same symptom out of 30 or so.

 

The Atmel was completely blank when they sent it back, even the bootloader region was blank (all bits were high) and the fuse settings had changed to execute from the application region after reset.  Only the EEPROM was not erased.  I was able to recover the board using an ICE3.

 

After a cursory glance at the code, there is soft protection for writing to the chip in the update region only.  It's not my bootloader so I am trying to review it to see what could have gone wrong.  Obviously we should be at least setting the lock bit to protect the bootloader and fuse regions, but I would like to know how it is even possible to erase the ATMEL so completely.  Just so I have something to go on while reviewing the code.  It's an ATXMEGA32A4U.

 

Brian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Maybe the customer's been sneakily trying to update or read your code with an ICE ... ?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Which bootloader are you using? E.g. AVR109 based ones have a "chip erase" command that can be quite easily activated by the user, if there is no protection. I'm using xboot and had that problem before I understood to add some protection (to activate the bootloader you must send a special string, not just ESC).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is it possible to do a chip erase from the bootloader?  I took a look at the manual and it seemed like it required "external programming".

 

Could it be the flash was locked?  Maybe locked flash looks like it is erased.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

steve17 wrote:

Is it possible to do a chip erase from the bootloader?  I took a look at the manual and it seemed like it required "external programming".

 

It seems so. I don't know how complete is the erase. http://www.avrfreaks.net/forum/c...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Okay, it seems the bootloader can program the bootloader.  But it can't do the "erase the entire bootloader section" command.  So if it managed to erase the whole bootloader, page by page, it was a pretty neat trick.  Or possibly there is a discrepancy between that chip and the manual.

 

There is something called "chip erase".  As I understand the manual, it erases everything except the user sig. row and also erases the eeprom if the EESAVE fuse isn't programmed.  It's the only way to unlock the lock bits.  The "chip erase' can only be done with "external programming".  On the A4 Xmegas, I think this means PDI.  

 

I'm confused about fuses.  I can't see anything in the manual that says the bootloader can change them.  There is a curious statement that says external programming can change them to a more secure setting.  I don't know what that means.  

 

So I can't explain how the "go to bootloader after reset" fuse was changed without external programming.  There must be a malfunction in my brain, or the chip, or the manual.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Silly question: is BOD enabled in these chips?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Silly question: is BOD enabled in these chips?

 

I see where you are going.  With an SPM sequence somewhere, a rogue sequence in BO conditions could trash something?  But if indeed all flash erased, I'd discount that.  Corruption maybe, but not complete overwrite.

 

I have many scores of production AVR8 apps; none have bootloader or other use of SPM.  While there were indeed EEPROM corruption battles over the years I've never had a single instance of any failures where the flash seemed corrupted.  I'd guess high six to low seven figures of AVR8 deployed.  If some, even a few, of these units would have failed I'd have seen them come back and analyzed them.

 

Perhaps Xmega is different?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

Silly question: is BOD enabled in these chips?

I think you may be on to something.  I believe erasing flash memory requires more power than a chip normally requires.  This could cause the power supply voltage to sag too low and cause the chip to lose it's mind.

 

I've never needed a BOD but a weak power supply might be the culprit. 

 

Maybe I should use a BOD.  Will it act quickly enough to handle the flash erase power pulse?

 

As I understand it, flash memory got it's name because it erases in a flash.  The downside is it causes all the lights in the neighborhood to dim.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
I see where you are going.
Yup, that is exactly where I was headed.

 

I thought it might be rogue execution of the SPM in the BLS. I seem to remember that Xmega also have some kind of "just erase everything" NVM command available though I may have mis-remembered that, I don't spend too much time studying Xmega.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
I seem to remember that Xmega also have some kind of "just erase everything" NVM command

A bit of poking:

28.4.4 Write/Execute Protection
Most command triggers are protected from accidental modification/execution during self-programming. This is done
using the configuration change protection (CCP) feature, which requires a special write or execute sequence in order to
change a bit or execute an instruction.
 

28.9 Preventing NVM Corruption
During periods when the VCC voltage is below the minimum operating voltage for the device, the result from a flash
memory write can be corrupt, as supply voltage is too low for the CPU and the flash to operate properly.To ensure that
the voltage is sufficient enough during a complete programming sequence of the flash memory, a voltage detector using
the POR threshold (VPOT+) level is enabled. During chip erase and when the PDI is enabled the brownout detector (BOD)
is automatically enabled at its configured level.
 

Erase app section in one command, but not all flash or all bootloader:

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There is a curious statement that says external programming can change them to a more secure setting.  I don't know what that means.  

When you look carefully at the protection settings, there are several different possible settings.

If one connects a PDI programmer one can always lock down more of the chip, but you can't unlock a section of the chip that is already locked, without doing a master chip erase.

 

 I seem to remember that Xmega also have some kind of "just erase everything" NVM command available though

Yes and No.  There is such a command, IIRC, but at least in the early Xmega's , (A series), there was a hardware error that made this command fail.

The workaround, from a programming perspective, as to erase each page, one at a time.

That has probably been fixed, but the point of mentioning it is that one ought to look carefully at the Errata for a given chip if one either wants to use that feature, or perhaps blame some event upon that instruction.

 

I'd also suggest the OP check the lay out.  The Xmega's have a lot of pins allocated to Vcc and AVcc, and it is important that all of them are properly powered, all of the Grounds are properly grounded, and that all of them are properly by-passed for in-spec, reliable, operation.  That can be a lot of by-pass caps for the bigger chips!  If the chip is improperly by-passed it can act very strangely.

 

JC 

 

Edit: Typo

Last Edited: Wed. Apr 19, 2017 - 06:06 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

DocJC wrote:

 

 

There is a curious statement that says external programming can change them to a more secure setting.  I don't know what that means.  

 

When you look carefully at the protection settings, there are several different possible settings.

If one connects a PDI programmer one can always lock down more of the chip, but you can't unlock a section of the chip that is already locked, without doing a master chip erase.

 

Yes but the "them" in my post referred to fuses.  More secure lock bits is obvious, but what are more secure fuses?   Here's the line from the manual.

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There is an NVM command to erase almost everything.  It's called "chip erase".  It seems like that is what is happening.  The bootloader is not supposed to be able to run it, only "external programmers" are allowed to do it.  Or maybe I don't understand the manual.  Or maybe the manual is wrong.

 

 

 

If you are wondering what the superscript "1" is for, wonder no longer.

 

Last Edited: Wed. Apr 19, 2017 - 08:28 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

steve17 wrote:
It's called "chip erase". It seems like that is what is happening. The bootloader is not supposed to be able to run it, only "external programmers" are allowed to do it.

I'd +1 on that guess.  Now, if this happened in the field at multiple sites -- not so much.  At a single site -- perhaps someone was poking at it.

 

"Poking at it" more likely if the app is like a paintball gun or engine controller or similar, that people like to hack at.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've been on travel, so I haven't had a chance to thank everyone for all the helpful replies.  It is still a mystery, I think it is probable that there was an escape in our process that may be skewing the facts.  Based on the responses it sounds more like a combination of an escape in the production process, coupled with some invalid assumptions from the customer.  I just don't see how all the data points that we are assuming can reconcile with each other.  When I find out what went wrong I will share with the group.

 

Brian