Applicaton does not run after power cycle

Go To Last Post
20 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Atmega 128, custom design, external 8mhz crystal, programmed with Atmel-ice.

 

Hi,

So I have a custom app running on a custom board. I'm in the final stages of development and doing my verification testing.

This is an automotive app, it uses the ignition power to wake up and run, otherwise is always powered on but sleeping ( idle mode )

Ive noticed on one device, I need to check more of them, that if I power cycle wait a few seconds then run, it's fine. But If I wait 20-30 minutes the app won't run.

It will run just fine if I reflash the part.

 

I just noticed this last night, so I don't have a lot of data, but it sure looks like the flash is getting corrupted.

I do read EEPROM on startup, but only write it in response to user action.

 

Any thoughts?

thanks

Keith

Keith Vasilakes

Firmware engineer

Minnesota

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've never seen flash corruption in the 300k+ avr's I've used, but a missing ISR for an enabled interrupt, yes!

 

Jim

 

 

(Possum Lodge oath) Quando omni flunkus, moritati.

"I thought growing old would take longer"

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

keith v wrote:
I just noticed this last night, so I don't have a lot of data, but it sure looks like the flash is getting corrupted.
keith v wrote:
Any thoughts?

First, verify your hypotheses.  I've got well over six figures of AVR8 in the field, and have no verified cases of flash corruption.

 

1)  Use your programming tool to verify your flash contents.  If a verify fails, then note the changed bytes and what the values are.  That should give some hints.

 

2)  Do you have SPM in your code anywhere?

 

3)  Do you have proper brownout detection and action?

 

4)  Does this symptom appear on bench testing, or oly in the automotive environment?

 

[I'll wager a virtual cold one that the cause will turn out to be an uninitialized variable...any warnings about using a value before set?]

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks,

I'd be extremely surprised to hear of flash issues.

 

I really just wanted to verify that there isn't any known issues with the device

 

Yeah unhandled interrupts and stuck in a trap.

 

I plan on verifying tonight

SPM? LIke self modifying code? Nope!

No brown out detection.

Bench testing, disconnect power and reconnect

No compiler Warnings. I hate uninitialized variables

 

Another thought is the oscillator is not starting up, mis valued caps or something

It's a new board build, so it's definitely possible.

 

 

Keith Vasilakes

Firmware engineer

Minnesota

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

keith v wrote:
No brown out detection.

And an automotive app?  You are playing with fire.  Cruisin' for a brusin'.  Or whatever adage you care to use.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And using the full swing oscillator mode is usually recommended for the M128..and yes the BOD!

 

I do some repairs on boards (made by others) with M128 and I have seen missing flash code, however these boards do have a bootloader (SPM involved) and I have the feeling that it could come down to operator error during firmware updates. ie bootloader still there but main code gone or at least non functional.

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

Last Edited: Tue. Dec 27, 2016 - 07:02 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I certainly agree with Lee and JS, Brown Out Detection certainly needs to be enabled.

 

Also, as an vehicle app, I assume the power is WELL FILTERED against Load Dump type spec's, that includes both the power rail, AND every sensor / input that comes from something connected to the vehicle power bus.

Starting up the vehicle is one of the noisiest times on the power rails, and hence on any signal input, as well, although that is actually a continuous problem on vehicle apps.

 

With your description so far, you really don't know if the uC "failed" at some point in the 20-30 minute idle time, or "failed" right with engine startup.

My first guess would be that it fails during engine startup, with noise / glitch trashing the poor micro's digital bits.

 

Do you have a "spare" LED on the PCB?  (A project can never have too many LED's, I think that was Moore's Second Law of computers (smirk)).

If you simply flash an LED 5 times on power up you have confirmed that your clock is running, and that the micro is running.

With the temperature extremes of vehicle apps, that is a good idea anyways, not just for initial debugging / validation. 

 

Since you have your "spare" LED sitting there, might as well put it to good use after startup, also.

How about waking up every now and then and flashing the LED.

Don't need to do that in the final code, perhaps, but would be helpful now.

With that you can see if the micro is still behaving during your 20-30 minute wait.

 

Next, as is always the case, you really, really, what to find the underlying cause of the problem, otherwise it is very difficult to be certain that you have ever actually fixed the problem.

Do you have a good, decent BW DSO, so you can watch the power buss and other signals on start up, and see if the noise gets through to the micro's pins?

Watch the vehicle bus on one channel, the pins on the micro with another channel.

 

Gotta mention the occasional culprit on custom PCB's, although I suspect you know this one well:

Are ALL Vcc and AVcc; and ALL Ground pins on the micro connected to V+ & Ground?

Are they ALL by-passed with 0.1 uF caps?

Do you have lots of extra filtering on the power supply rail?

 

Almost forgot:

Related to the BOD requirement.

What does your main V+ do during the engine crank?

It isn't unusual for it to dip quite low during the crank, especially if the engine is cold, (your neck of the woods!).

Easy to watch the power supply during startup for this one, also.

 

An "easy" check here, of course, is to power the PCB from a battery, with a common Ground to the vehicle's power bus.

That wouldn't explain, however, why it works with the short delay, but not with the long one.

 

You mentioned:

But If I wait 20-30 minutes the app won't run.

 

Anytime I see that kind of symptom I also think one has to consider Stack Overflow, etc., where the problem is somewhat time dependent.

When the app runs long enough, and Interrupt X has fired enough times, etc., system crashes.

 

Related to that, also, is variables that get trashed after a period of time, as at some point an interrupt fires while the variable is being updated, totally trashing it.

(I spend 3 months tracking down an atomicity bug in a GPS tracking / mapping app a few years ago, and still have nightmares of the darn bug!)

Again, with a few second delay, you get lucky and the program works.

Wait a bit, (20-30 minutes in this case) and the bug has a chance to manifest itself.

Is the micro waking up and doing anything during the 20-30 min null period?

 

So, like always, one of the first steps is determining: "Is this a HW problem or a SW problem?".

Bench testing with clean signals will go a long way towards helping make that determination.

SW bugs should still show up, the HW bugs will disappear.

 

Good luck with your project!

 

JC

 

Edit: Typo

 

Last Edited: Tue. Dec 27, 2016 - 07:25 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

keith v wrote:
No brown out detection.

And an automotive app?  You are playing with fire.  Cruisin' for a brusin'.  Or whatever adage you care to use.

 

ok ok Sheesh I'll put one in.

 

Seriously, I am planning on doing one along with a watchdog, just don't have one YET

Keith Vasilakes

Firmware engineer

Minnesota

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

All discussion immaterial at this point IMO -- confirm flash contents integrity first.  A simple manual verify pass (at least with my toolset).

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Lots of great input.

Thanks!

 

The power supply is pretty well filtered and protected. It was designed by a pretty decent EE.

But what do you mean "WELL FILTERED against Load Dump type spec's, " ??

 

I'll check the VCC lines as described.

 

Sadly all my LEDs are run by a SPI LED controller ( I know I know )

I do have a spare pin I can hack one onto however, or I'll just watch it with my logic analyzer

 

I think there's a misunderstanding.

The device works just fine powered on for as long as I care to run it.

I don't think it's a stack overflow, buffer overrun or un init variable ( but of course it's a possibility )

 

Using a lab bench supply, not even in a vehicle.

The problem is restarting after power off. no power applied.

after 20 minutes of power off, THATS when the problem is.

And yes I mean physically removing the +12v.

 

Short power cycles, < 1 minute is just fine.

 

Keith Vasilakes

Firmware engineer

Minnesota

Last Edited: Tue. Dec 27, 2016 - 08:06 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
All discussion immaterial at this point IMO -- confirm flash contents integrity first.

...because you said ...

keith v wrote:
it sure looks like the flash is getting corrupted.

Let's prove or disprove that.

 

And:

theusch wrote:
[I'll wager a virtual cold one that the cause will turn out to be an uninitialized variable...

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Got it.

 

I thought the scenario was powering up the uC, then waiting 20 min for the vehicle power to turn on, initiating the uC coming out of sleep mode and starting some code.

 

Enable the BOD and see if the problem persists.

 

Also, do you have an RC on the startup pin?

 

Obviously, in addition to code issues, one of the hardware differences between a power up from a brief power down interval, and from a 20 minute power down interval, is capacitors that are partially charged up when you do the brief power down interval restart.

 

A power up after a lengthy time off has to supply the in rush current for all of the caps in the power supply circuitry, as well as the rest of the circuitry.

Is your power supply up to the task, (probably so, if it is a real bench supply...).

 

The AVR's have to have a monotonicically rising power supply voltage on startup to avoid a possible internal lock up.

One of the secondary roles of the RC on the Reser\ pin is to delay part of the internal reset and HW initialization, slightly, until after the initial power on transition.

 

If you have an RC on the Reset\ pin, then the power on cycling is not likely to be the issue.

 

I still think you need to hack in the LED and flash it on power up / start up to validate that the oscillator is starting correctly.

Verify your PCB layout and that both of the Xtal caps are actually connected to ground.

In theory, (hard core theory for this one), the two Xtal caps should not be identical.

In practice, they never are, but their (even slight) asymmetry is part of the ext oscillator's startup and oscillating, and not stalling on power up.

 

Since it sounds like you have a bunch of PCB's, you could (temporarily) fuse a couple to run on the internal RC Osc, to see if eliminating the external Xtal and oscillator changed anything.

 

JC

Last Edited: Wed. Dec 28, 2016 - 12:26 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've had this problem in the past.  I had to reflash to get it to run.  I think I used a bootloader, but I'm not sure.  I'm pretty sure it was not flash corruption and it was not BOD.  I don't remember what it was.

 

Does the problem happen when the Atmel ICE is not connected?

 

Last Edited: Wed. Dec 28, 2016 - 01:35 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's probably been mentioned by others, but the Atmel ICE should be able to verify the flash.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

keith v wrote:
I do read EEPROM on startup, but only write it in response to user action.

 

theusch wrote:
[I'll wager a virtual cold one that the cause will turn out to be an uninitialized variable...any warnings about using a value before set?]

 

Hmmm,

There was a thread a while back...or two threads for that matter about the first couple of bytes of EEPROM becoming corrupted because of a slow decaying power supply, and no brown out detection.    Cure was to enable brown out detection, AND move the EEPROM data a couple bytes in as it was always the first byte or two in the EEPROM that would get messed up.

 

If the info you read from the EEPROM initializes a variable then you might want to get that BOD set up, AND move the data a couple of bytes in.

 

Jim

I would rather attempt something great and fail, than attempt nothing and succeed - Fortune Cookie

 

"The critical shortage here is not stuff, but time." - Johan Ekdahl

 

"Step N is required before you can do step N+1!" - ka7ehk

 

"If you want a career with a known path - become an undertaker. Dead people don't sue!" - Kartman

"Why is there a "Highway to Hell" and only a "Stairway to Heaven"? A prediction of the expected traffic load?"  - Lee "theusch"

 

Speak sweetly. It makes your words easier to digest when at a later date you have to eat them ;-)  - Source Unknown

Please Read: Code-of-Conduct

Atmel Studio6.2/AS7, DipTrace, Quartus, MPLAB, RSLogix user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

well I founds a couple of things. In no particular order:

* the crystal is 16mhz, not sure where I got 8...

  ** But the xtal has 10pf caps on it.....

*The reset line has 100K ohms to VCC and .01uf to gnd, seems a little high resistance from what Ive seen

* all VCC lines are tied together with a single ,1uf to gnd

* AVCC has a 100ohm to VCC and a .1uf to gnd

 

*fullswing osc maybe slightly better

*BOD set to 2.7v, may be better

   ** VCC is 5v should I used a BOD of 4.3v?

 

*attaching the ice starts the program ( sure makes me think reset or oscillator )

* FW verifies OK

*EEPROM corruption should have no effect, the values are not used in the state it starts up in, and the values are just limit values, all values are valid.

*didn't get an LED on yet

 

Keith Vasilakes

Firmware engineer

Minnesota

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

*fullswing osc maybe slightly better

At 8MHz and above full swing oscillator is mandatory for the M128, some people swear that full swing cures a lot of unexplained phenomenons regardless of the crystal frequency.

 

The truth IS out there.......

 

For what is worth I use 15pF for anything from 8-16Mhz crystals.

** VCC is 5v should I used a BOD of 4.3v?

That's what I use.

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A separate by-pass cap for each set of power pins would be a "better" approach, and the one generally recommended.

The caps should be as close to the micro's pins as you can get them, and that is hard to do when you share a cap.

 

The Atmel Hardware App Note, AVR042, recommends an LC filter feeding AVcc if one is using the ADC or analog comparator.

On many ( all ?) AVR's AVcc powers Port A as well, the details are in the fine print.

So, how much current are you drawing through Port A pins, and hence what is your Vdrop across the RC filter's resistor?

AVcc has to be close to Vcc for proper operation, details in the data sheet which I don't have handy at the moment.

 

For an easy hardware test you could tack in a couple 0.1 uF caps across the power pin pairs, short out (or remove and jumper) the R in the AVcc RC filter, and add your LED so you can easily see whenever the uC resets.

 

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So is it the case that the mega runs when the programmer is attached and does not run otherwise?   This is a fairly common occurrence around here.   Two causes I've seen are a bad ground that the programmer attachment fixes, and a resistor on the reset pin. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

More testing, it seems way better if not perfect. Starts up every time, cold or warm.

 

I do think there are some improvements I'll make on the next spin, multiple VCC caps, Osc cap values, reset RC values, LC filter on ADC VCC, but I think they are minor at this point.

The major solution was the full swing osc and the BOD at 4.3v.

 

Thanks for your help

Keith

Keith Vasilakes

Firmware engineer

Minnesota