Firmware Seems to Break on ATMega328P

Go To Last Post
19 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello,

 

I am having a weird issue with a board based on the ATMega328P, in which the firmware seems to "break" (stops responding to serial commands) after a little while (including being turned on and off again, a few commands sent/received each time) and can only be fixed by re-uploading the code to the micro (power cycling does not fix it). I am not even sure where to begin with sharing the code, as it is a large-ish project. Basically, it is a board using an FTDI chip to send serial commands back and forth to the micro, which then reads an ADC (MCP3204) and controls a DC motor (using a SN754410).

 

I was using Optiboot at first so that I could upload code over the FTDI, but I tried uploading the code directly to the micro with no bootloader, and the problem persisted. My fuse settings are L:0xFF, H:DE, E:05. 

 

Any ideas? If there is any information I left out that could be useful please let me know. Thanks for your help!

 

Sincerely,

Billy K

Last Edited: Fri. Nov 10, 2017 - 10:05 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How about posting your code?

Jim

If you want a career with a known path - become an undertaker. Dead people don't sue! - Kartman

Please Read: Code-of-Conduct

Atmel Studio6.2/AS7, DipTrace, Quartus, MPLAB user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's a larger project, so I zipped it and posted it here:

https://drive.google.com/file/d/...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

BillyK wrote:
can only be fixed by re-uploading the code to the micro (power cycling does not fix it)

Are you saving stuff to EEPROM or Flash?

 

If you do, and managed to save some "bad" value which "crashed" the code - then that could persist across a power cycle ...

 

Once the code has got into its "misbehaving" state, can you read out the flash contents ... ?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am saving stuff to EEPROM but not flash. However, even if a bad value is being written it shouldn't cause the firmware to stop completely, since the value is just being saved as a calibration value (such as a delay time). If the value is bad it won't be calibrated properly but it should still respond to commands.

 

I haven't tried reading out the flash contents, but I'll give it a shot. I never attempt to modify the flash anywhere in my code, what could be causing it to become corrupt?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Transients from the motor might be getting back into the micro. Do you have the brownout detect enabled (BOD)?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I do, but this problem occurs even when the motor is not connected.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

BillyK wrote:
stops responding to serial commands

Since any computer is always doing something [1] it would be very interesting to try to find what it is doing when it is misbehaving.

 

Do you have any means of "instrumenting" your code and let it output rudimentary trace "info" to e.g. one (or preferably several) LEDs?

 

Having an ope n mind (i.e. not pre-convincing yourself about the nature of the bug), place trace-points at different spots to try to deduce where execution is when the system misbehaves.

 

Unless the system is of a highly real-time nature, an on-chip debugger might serve you very well. Free-run a debug session til the problem occurs, then break and start single stepping etc to see what goes on. Note that breaking or single-stepping in code that is timing- or "data-flow"-critical might provoke problems that on the surface could look like part of the problem, while in fact they are an effect of e.g. data overruns because of the slow/staggered execution and response when stepping/breaking.

 

So, you got an ICE?

 


[1] Not entirely true for all computers, I know...

"He used to carry his guitar in a gunny sack, or sit beneath the tree by the railroad track. Oh the engineers would see him sitting in the shade, Strumming with the rhythm that the drivers made. People passing by, they would stop and say, "Oh, my, what that little country boy could play!" [Chuck Berry]

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I do have one LED on the board, as well as a couple GPIO brought to headers I could stick LEDs into. I do not have an ICE :( The unfortunate part is I have no way to actively trigger the fault - sometimes it happens quickly (within a few minutes), sometimes it happens after an hour, sometimes it doesn't happen until plugged into someone else's computer :/

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

One of the common sources of lockup after some operating time is stack overflow. That, however, SHOULD be corrected by a reset. 

 

It might be waiting for something to happen, and that something never occurs (maybe because something got turned off, or some variable got changed to an unexpected value). Thinking of something like waiting for ADC to complete a conversion, but that the ADC got disabled so that the start convert command is never acted upon.

 

Can you put it on a debugger and let it run until it "breaks", then do a manual break, determine where it is when you halted it, then see where it goes as you single step from there?

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Unfortunately, I do not have an ICE - the only thing I have is a USBTinyISP. The infinite waiting idea does make sense, but I can't imagine what it would be waiting for - the ADC is an external chip and it never explicitly "waits" for a conversion. It just clocks in the required number of cycles and then reads the MISO pin at the specified times - even if the ADC was doing nothing it would just read garbage, which even then shouldn't cause a lockup.

 

Are there any decent simulators that include EEPROM that can also emulate serial commands being sent? Perhaps that is something I could pursue.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

BillyK wrote:
shouldn't cause the firmware to stop completely

But, as Johan suggested, how do you know that it has stopped "completely"?

 

How would you distinguished "stopped completely" from just "stopped accepting serial commands" ... ?

 

 If the value is bad it won't be calibrated properly but it should still respond to commands.

Should it?

 

If, say, the timing messes up the baud rate - that would stop it responding to serial commands ...

 

I don't know what you're "calibrating" - so don't know how messing-up that calibration might affect things.

 

But grossly out-of-range values might cause strange under- and/or overflow effects ...

 

Again, as Johan suggested, "instrumenting" the code could help ...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

BillyK wrote:
I do not have an ICE

Sounds like time to get one, then!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The timing doesn't affect the baud rate at all - it determines how long the device should move the motor for. Either way, it seems like some more intense debugging using other hardware is necessary.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Try adding a heartbeat count output, that just puts out 1 2 3 4 5... so you know something reliable is still happening (rather than say a stack hangup into neverland). 

 

Stop executing different things (quit reading adc, or just simply return some fixed value), quit running motors (even if given a command to do so).  Stop turning on that relay, skip waiting for that temperature, etc.     Basically go into different "rooms"  & unplug more & more  items until the system no longer crashes...Using divide & conquer you can often (not always) quickly zoom in to the region that is mostly likely the problem area.  Hopefully the system quits halting before you disable all functionality.  You may wind up saying aha, it quits hanging whenever we stop querying bearing #'3s temperature, then discover an associated deadlock circumstance triggered under rare combinational conditions.

When in the dark remember-the future looks brighter than ever.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If it accepts serial commands presumably the same UART can be used for output? If so I'd follow Johan's advice by instrumenting the code with liberal printf() as it enters/exits functions and conditional sections. Try to isolate where it is executing when communication has stopped.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you spit out on the uart, make sure it is nothing too repetitive, most things may occur much too fast for the uart or PC terminal to keep up, which may cause a crash or other effects in itself.  Things that happen a few times a second  or minute are good candidates, but monitoring state values changing 1000 times a sec are not.  For something like that, you can spit out only when there is a change, but even that might be too often.

When in the dark remember-the future looks brighter than ever.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

When you  write to UDR, wait for the UDR to become empty before writing the next character. You COULD have serial problems, especially at the other end, if you change the UDR before it is empty.

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You COULD have serial problems, especially at the other end, if you change the UDR before it is empty.

Worst case is a dropped character.  UDR is latched to the shift register (and vice-versa for RX), so you won't get corrupted characters, but if you write to UDR while UDRE is clear, the write will be ignored.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"Read a lot.  Write a lot."

"We see a lot of arses on handlebars around here." - [J Ekdahl]