xmega 32a4u flash corruption

Go To Last Post
24 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi all, i have an issue on a product which is basically a 12V DC motor controller.

Recently i have had some boards returned and the flash has been corrupted.

Reprogramming the board fixes the issue and all works well again.

I would suggest this is happening when a motor jams and large currents are drawn.

I dont have a boot loader.

I have the BOD enabled.

My FAE suggested that it could be caused by noise on the PDI pin causing the flash to be written too.

I normally leave the programming header pins not tied to to anything as the PDI pin has internal pull down.

Has anyone ever heard of this issue.

Should i tie the programming pin to gnd or Vcc ?

 

Thanks in advance

 

Paul

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Normally having the PDI pins floating is fine, but if you have a very noisy environment with motors running it could be an issue I suppose. Can you reproduce the problem? That is the first step, so that you can test the fix.

 

One simple solution is to install removable jumpers to VCC on PDI. It needs to be VCC because one of the PDI pins also resets the MCU if pulled low.

 

You can also use the lock bits to prevent writes to flash memory IIRC.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Mojo-chan, thanks for the reply.

Unfortunately we cannot replicate the issue.

Just one of those rare things.

The reset pin is used as the clock so maybe thats what you are referring too

 

Paul

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sorry, forgot to mention that the lock bits are on

 

Paul

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In that case it's rather odd, I'm not sure what would cause it. Do you have a bootloader?

 

Oh, and I seem to recall that it's a good idea to clear the NVM registers after using them.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Its very odd, no boot loader as stated in my original post.

Thanks again

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What is the MCU supply voltage and what is the BOD level?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MCU supply is 3.3V and BOD is 3V

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

where are you located mojo-chan. Just out of interest ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Should be okay... Maybe there are some big currents, like back EMF or something, reaching the MCU? Could be through ground/supply planes, that sort of thing. I'm at a loss, to be honest.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

UK and Japan, depends on the time of year :-) Why do you ask?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Perhaps you could post part of your schematic. Did you follow the schematic checklist? Ferrite on the supply etc?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Was just interested.

Cant post schematics due to NDA.

Design has been out in the field  for a while now and all appropriate design rules have been well and truly followed.

 

paul

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've noticed some susceptibility to static damage, but beyond that I can't think of anything else.

 

What sort of corruption are you seeing? Erased pages, random data?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So i guess the question to ask is it good practice to leave the programming pins in a non connected state in a high noise environment or is it best practice to tie them to a known state ?

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, the schematic checklist only suggests a 10k pull-up on reset if there is likely to be noise there, and nothing on the other line. I can't really see how noise would cause more than a reset, I mean what are the chances of the noise executing some random command on the NVM controller? Seems more likely it would be noise through the supply or GPIO pins. BOD is supposed to protect you, but there is only so much you can do.

 

Is it blanked pages or random data in the pages? Is it one page or a number of pages?

 

Is BOD in polled mode or constant?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wouldnt have thought noise would have caused this issue but it was bought up by my FAE as he had an other customer/customers that had this issue.

Flash corruption is random as would be expected with noise.

BOD is set at constant

 

Paul

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So many questions and so little info, NDA not withstanding...

 

You have to determine how the uC is getting corrupted before you can fix it, otherwise you are just guessing and doing trial and error...

 

Is the motor controller a chip or a large PCB of its own, commercial design or in house design?

Is the motor controller optically coupled to the uC or direct coupled?

Does the uC share a common ground with the motor?

Does the uC share a common power source with the motor?

If a common ground is present, is it "star" connected?

Hard to verify your uC powering when you just say its all correct...

Is AVcc tied to Vcc?

Is every Vcc and Ground tied to Vcc and Ground?

Do they ALL have By-pass caps, and how close to the uC pins are they?

 

Does the uC PCB sit near the power supply cables for the motor?

Does it sit near the motor itself?

 

How many "failures" have you had, and how many devices are out in the field?

How many board hours of operation total, and hence the mean time to failure?

Are these early failures, or at any age of board usage?

Is the same motor / setup responsible for multiple failures, or are the failures distributed across all of the installations?

 

How long are the PDI PCB traces on the board?

 

How similar are the setups that are failing?

(Same setups, same motors, same physical setups, or all unique?)

 

The questions could go on and on.

 

What can you try?

 

Add opto coupling to the uC and the motor driver on a PCB or two.

Put the uC in a Faraday cage.

Cut the PDI PCB traces adjacent to the uC with a razor knife.

Power the uC from a large car battery and a linear regulator, separate from the motor and its power supply.

If the uC and the motor driver are not co-located on a PCB, then connect them with twisted shielded cabling.

Verify that the motor driver is up to spec for the load it is experiencing.

 

Build a mock up of the installation that fails.

Either an identical setup, or something as close as possible.

Rig it to continually stall / overload, in a manner similar to what happens in practice, so you can better analyze the failure process, and hopefully reproduce it.

 

Protect EVERY signal coming into and exiting the uC.

As mentioned, it could be a static electricity spark from a User touching a control panel causing a problem, also.

You haven't confirmed that it is the motor / motor driver causing the problem.

You haven't confirmed that it is an EMI - PDI input problem.

 

Put a second PCB next to the first one, very close, same orientation, but no motor connected to it.

Use the same power supply, etc.

When the in-use PCB fails, does the second one fail?

 

How did you actually verify that the flash was overwritten?

What Fuses are you setting to lock the chip, and yet allowing you to unlock the chip to upload the flash post - failure?

 

Unless yo have 100 % verified that the flash was overwritten, it could still be a rare occurrence of an ISR / critical race / volatile variable / etc. software problem.

 

I recall one GPS / mapping project that I spent months debugging / troubleshooting a rarely occurring, sporadic error, which I was sure was a hardware issue, only to eventually find a software cause, (that required a unique set of circumstances which rarely occurred to trigger it).

 

It can be very tough to sort out a poorly defined, episodically crashing system.

 

JC

 

 

 

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi JC, thanks for your detailed post. 

All correct layout procedures have been followed so no issue there.

Turns out the problem was an external one as our customer supplies leads with an inline fuse attached to power our board.

The fuse holders were not of a good quality causing the fuse to not have a solid connection.

So hence a lot of noise was produced under load.

 

Paul 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the update, but that is still quite worrying. I've had a couple of reports of partially wiped memory lately in some 128A3Us, but nothing confirmed. If it is possible to corrupt memory with bad power, even with BOD enabled, I'm worried.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

BOD won't protect you from voltage spikes outside the absolute maximum ratings.  If heavy loads resulted 'make-and-break' power events due to a fussy fuse holder, inductive effects could easily result in large voltage spikes (depending on many things, of course).  If you hit your AVR with a 3 ns 20V spike, all bets are off.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

BOD won't protect against supply noise as joeyman mentioned. A few other phenomenon can happen that do not have to do with exceeding VCC or GND. One is that a very fast spike (faster than the 1000nS minimum reset time of the part. This can cause the Reset state machine in the part to lock or not properly execute reset.

 

Another good one is if your power is intermittent and you are just using an RC filter on the reset line. Eventually the reset signal gradually drops slowly into no mans land, then all bets are off. Just an FYI, I'm having the same issue as Paul using a ATxmega64D4. Problem occurs rarely, no supply noise, but I can force the issue by quickly turning the power off and on, yes, and RC reset. Next version of the board will have a WDT with voltage detect (which includes delays). Note that this RC reset issue has been around since the 8031s first roamed the earth. Note, I adopted this little board from my predecessor. Trust me, if it is my fault I'm really hard on myself... You'll now when.

Jorrell

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Watchdogs are really worth having, and these days a basic voltage monitor is sub 10c so unless you have absolutely no room left on the PCB I strongly recommend one in every design.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Mojochan, I agree 100%, that's what I've been doing that for a long time. It just surprised me when I didn't have all the important stuff on the inherited design. Plenty of room on this one, ie enough for a SOT23-5.

Jorrell