device/program just stops working / USB not recognized anymore

Go To Last Post
11 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello community,

I'm not sure if I am in the right forum topic, but I just try.

 

I use an ATxmega128A4U with a bootloader and USB application. The device is working ok, I can communicate over USB, put the device into bootloader mode and do firmware update with the FLIP application.

But a few of the devices after a while "stop working". When they are plugged into a computer, they are either recognized as bootloader device or aren't recognized at all. So my basic question is, why does this happen? We do not know any cause preceding this behaviour.

 

So to split it up, two cases:

1. Device is only recognized as bootloader device. Normally, when I put the device into bootloader mode and then unplug and replug it to the PC, the application starts again. That does not happen in this case. It just stays in bootloader mode and can/has to be reprogrammed with flip e.g.

 

2. Device isn't recognized at all. I am only able to reprogram it via programming adapter and atmel studio. I can reprogram bootloader and application and everything is fine again.

 

Does anyone of you has similar experiences?? And more important: does anyone have any idea, what could be the cause of this or where to start digging for a cause? Could it be electrical or programming issues?

 

Any answer is appreciated and I would be happy to provide any further information needed.

It's a very bad thing if such behaviour appears at our customers abroad and they have to send back the devices...

 

Greetings,

AP1

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

IN case 1 it seems that the bootloader no longer thinks the application is valid so it stays in bootloader mode..

 

2 might be the opposite, the bootloader might think the application is valid and exits, but the application crashes......

 

what you could do is program a device and see that it operates correctly. then download the entire flash.

next take a "defective" unit and download the full flash of that too and see if there are any differences ( should not be ofcoarse.....)

 

side question, any chance you use eeprom storage, and corrupting that might cause the application to crash?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for your quick reply!

 

for your first approach, I already tried that. But that didn't lead me to a reliable result. I took another device, and programmed it and immediately downloaded the flash again. Then I compared the hex file which I used for programming and the hex file I downloaded from the flash and there were so many differences...

And I compared the flash download from the defective unit with some versions of the original hex files (for I don't know (at the moment) which firmware version was programmed on the defective device) and the downloaded flash of a working device (could have been pther version). But same there, too many differences to do any useful comparison.

 

And yes, I use eeprom for storing calibration data. So eeprom is once written and with every device power on the data is read from eeprom into my variables for computing my measuring values. But the devices are tested and working after storing the data... But I would keep this in mind for further thinking.

Last Edited: Fri. Jul 26, 2019 - 01:02 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AP1 wrote:
2. Device isn't recognized at all. I am only able to reprogram it via programming adapter and atmel studio. I can reprogram bootloader and application and everything is fine again.

Or the application loaded is so big it overwrites part of the bootloader and thus fails the next time bootloading is tried.

At least that explains why reloading the bootloader makes it work again.

 

Jim

 

Click Link: Get Free Stock: Retire early! PM for strategy

share.robinhood.com/jamesc3274
get $5 free gold/silver https://www.onegold.com/join/713...

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks Jim, for your suggestion. I think though, that the program is not too big, because a majority of the devices is working, and the defect devices did work as well.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AP1 wrote:

Thanks for your quick reply!

 

for your first approach, I already tried that. But that didn't lead me to a reliable result. I took another device, and programmed it and immediately downloaded the flash again. Then I compared the hex file which I used for programming and the hex file I downloaded from the flash and there were so many differences...

 

well then there is something seriously wrong.

when you program your device the downloaded hex file should match the original hex file.

there may be 2 differences... 1 is that you have a bootloader section that is not in your hex file and while downloading it is there, but that is a separate memory block and thus should be easily visible and recognizable.

2 you store a serial number somewhere and that causes a difference in a specific area.

 

I assume there are no parameters stored in the EEPROm that can cause the application to crash?

or that at some point the eeprom content has been changed ( stuff added) and now there is old data in the prom and wrong values are read as you only reprogram the application, but not have made sure the eeprom content is correct again.

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, I tried again the programming and downloading part. At this time I erased the chip first, and now I get identical files (downloaded and original hex file). But the differences to the "corrupted" file are still too big to get any clue (I tried with different versions). The only thing, which I can tell (and maybe that could be interesting) is, that the bootloader section seems to be fine within the corrupted device (which wasn't recognized at all by the computer).

 

meslomp wrote:

 

I assume there are no parameters stored in the EEPROm that can cause the application to crash?

 

or that at some point the eeprom content has been changed ( stuff added) and now there is old data in the prom and wrong values are read as you only reprogram the application, but not have made sure the eeprom content is correct again.

 

 

 

 

Well, I store calibration data in the eeprom and a serial number which is used for USB identifying. But the troubleshooting of my corrupted device was just to reprogram the flash. I did not touch the eeprom (beside downloading it, in case it would get deleted). And the device is working fine now, serial number is still there and calibration data seem ok as well, because I get plausible measurement data.

 

Our guess at the moment would be, that EMV or some instable voltage supply would corrupt the flash program. Do you think that would be possible?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

well i would not expect the flash to get corrupted during 'normal' operation, unless you besides writing to the eeprom also write to flash during that time.

 

What could happen ( I do not know your bootloader) is that the user application is updated without a full application erase. In that case if you have older applications that were bigger than the newer version you will end up with "garbage" at a certain point.

It would be interesting to see if you have a good device and then play with it till it fails and see if the flash did get corrupted at that point.

 

There are still a lot of options open. It might even be the bootloader that is the problem.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

meslomp wrote:

 

What could happen ( I do not know your bootloader) is that the user application is updated without a full application erase. In that case if you have older applications that were bigger than the newer version you will end up with "garbage" at a certain point.

 

 

 

 

Well the devices which failed so long were "normal" sold devices. So they were manufactured, programmed, calibrated, tested and sent to the customer. So there should not be some old garbage code in the flash.

 

meslomp wrote:

 

It would be interesting to see if you have a good device and then play with it till it fails and see if the flash did get corrupted at that point.

 

 

 

Well yeah, we would try this, but since we don't have any clue how to force this fail it would be digging in the dark... and besides this, the devices which failed at the customers did work resp. were there for almost a year, so it could take a while to get the failing :-D

 

meslomp wrote:

 

There are still a lot of options open. It might even be the bootloader that is the problem.

 

 

 

I'm open to every option ;-)
I have to admit, that I do not use the original bootloader. I changed the source code because we wanted to use another bootloader pin. But that's it. Just a pin number which is different.

Comparing the download from the corrupted device and a freshly programmed device the bootloader section was identical.

 

Thanks again for your effort and your ideas!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Are the fuses set to enable the BOD and if so at what level compared to the VCC level?

Just thinking out loud....

If BOD is not enable, it is possible for a run away program (due to voltage fluctuation) to run the flash modification code and change something in the flash area.

 

Jim

 

Click Link: Get Free Stock: Retire early! PM for strategy

share.robinhood.com/jamesc3274
get $5 free gold/silver https://www.onegold.com/join/713...

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yep, apparently the brownout detector (BOD) can cure a case of an insane CPU caused by an out of range VCC.

 

I guess "bootloader mode" means forcing a pin high or low.

 

If the Xmega won't boot at all, I'd think either the bootloader flash is corrupted, or the BOOTRST fuse is disabled. 

 

The calibration data can also be stored in the User Signature Row.  This may be less susceptible to damage.