Strange Device Reset or Power Fluctuation

Go To Last Post
15 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello Fellow Freaks!

 

So I've got my small handheld game console made (home brew double sided PCB). Now everything seems to work properly except for one major issue.

The device only seems to run for a small amount of time before it would appear that the device resets continuously. It seems to be anywhere in the range of 15-60 seconds after immediately removing the power supply. If I leave the device off for about 5 minutes and then turn it back on, it seems to be more in the range of 3 or more minutes before this starts to happen.

 

The CPU is a ATmega644p TQFP clocked at 20 MHz with a 5 volt power regulator. It is currently powered by an 11.1 volt Lithium Polymer battery.

I have checked the fuses to ensure that Watchdog and Brown-Out is disabled. So as far as I can tell, it shouldn't be caused by either of these.

 

If I probe the regulator output voltage, I get a steady 5.0 volts when the device is working properly. Once it begins resetting (I am unsure if I should be calling this reset, as I am unsure that is what is happening), it will begin to dip down below 5.0 volts (sometimes as low as 4.4 volts) and then back up to 5.0 volts. If I probe the supply voltage (before and during the erratic behavior), I get a steady voltage of about 12.57 volts (after a full charge) and it never seems to dip like the regulated voltage. The maximum input voltage of the regulator is 15 volts and has a 1.2 volt dropout.

 

Here's a couple ideas I have. What are you thoughts?

- I have do not have any by-pass capacitors for the GLCD power pins (pin #2 and #19 of header SV2). Could this cause such a voltage drop and cause a power-on reset?

- Is the 10 uF capacitor (C8) on the supply voltage of an acceptable amount? If not, could it be causing this behavior?

 

I am really not sure how to continue debugging this problem. I believe it's most likely an electrical issue. I have attached the schematic, top and bottom board layers, and an image of the board when it is working properly.

 

Any ideas of what I should try here?

Attachment(s): 

This topic has a solution.

My digital portfolio: www.jamisonjerving.com

My game company: www.polygonbyte.com

Last Edited: Thu. Dec 24, 2015 - 06:19 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just a stab in the dark, are you using the full swing oscillator mode?  If not, what happens when you do?

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes. My low fuse is currently 0xF7. CKSEL3..1 = 011 (Full Swing Crystal Oscillator), CKSEL0 = 1 (Crystal Oscillator, slowly rising power). CKDIV8 and CKOUT = 1 (unprogrammed). I have also tried with CKSEL0 = 0 (fast rising power) but neither had any effect.

 

I should also mention that JTAG never seems to have an issue reading voltage, device signature, or programming, even when the device acts erratic.

My digital portfolio: www.jamisonjerving.com

My game company: www.polygonbyte.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The device only seems to run for a small amount of time before it would appear that the device resets continuously. It seems to be anywhere in the range of 15-60 seconds after immediately removing the power supply. If I leave the device off for about 5 minutes and then turn it back on, it seems to be more in the range of 3 or more minutes before this starts to happen.

This may actually suggest a software issue.  If the behaviour is different for short power-cycles than it is for long-power cycles (i.e. power left off for an extended period), then it >>may<< point the finger at an uninitialised automatic variable somewhere, or another C 'gotcha' like a buffer overrun.

 

SRAM is not initialised by hardware (nor by the C runtime provided by avr-gcc), and remains unchanged after a reset.  Only variable specifically initialised by the code (or static or global variables) will have a known value after a reset.  When Vcc is allowed to drop below about 0.2V however, the contents of SRAM begins to get corrupted.  An AVR which has been fully powered off (Vcc drops to zero) will not, as you might think, power up with SRAM all zero.  SRAM contents after a full power off will effectively be random.  I say effectively, because it isn't random every time.  It looks random to you and me, but it's actually the result of process variations and is different from device to device.  Bottom line is the different behaviour for 15-second power down (where the power supply caps may not discharge fully, or at least not below 0.2V or so) and a 5 minute power down (where the caps are likely fully discharged) may be a symptom of this kind of coding error.

 

Of course, it may indeed be a hardware problem as you have suggested.  There isn't much to go on.

 

I would suggest trying to rule out at least uninitialised variables by combing through your code and making sure that every single variable, array, and struct is in fact initialised, whether or not you think it's necessary, and see what that does to the behaviour.  Hopefully it won't be time wasted.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

1. The REG117 data sheet says   All models require an output capacitor for proper operation, and for improving high-frequency load regulation; a 10µF tantalum capacitor is recommended. Aluminum electrolytic types of 50µF or greater can also be used.

You have no output capacitor only a few 100nF ceramics.

 

2. The regulator tab is isolated therefore you have no heatsinking. I suspect your regulator is getting hot and going into thermal shutdown.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nothing to do with your issue, but I would also avoid tying AREF to VCC.  An error in code can destroy the ADC.  AVCC is already available as an internal voltage reference, so there is no value in applying it the the external reference at AREF.  Just tie AREF to GND via a 100 nF cap.

 

 

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
This may actually suggest a software issue.

 

So, have you [properly] trapped/logged/cleared MCUSR and analyzed the results?  (IME I'd also suggest logging "uptime" in this case to the list of logged values to see how fast things are cascading.  In this case a "noinit" variable would be simple and effective?  )

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
I would suggest trying to rule out at least uninitialised variables by combing through your code and making sure that every single variable, array, and struct is in fact initialised, whether or not you think it's necessary, and see what that does to the behaviour.  Hopefully it won't be time wasted.

My first thought was initially a software issue as well. With my minimum testing software, there were only a few variables to check. A 1024-byte array for the frame buffer and a few local variables for the GLCD draw functions.

 

joeymorin wrote:
Nothing to do with your issue, but I would also avoid tying AREF to VCC.  An error in code can destroy the ADC.  AVCC is already available as an internal voltage reference, so there is no value in applying it the the external reference at AREF.  Just tie AREF to GND via a 100 nF cap.

Thank you for the additional circuit design tip. I will be sure to remember this when designing future circuits.

 

theusch wrote:
So, have you [properly] trapped/logged/cleared MCUSR and analyzed the results?  (IME I'd also suggest logging "uptime" in this case to the list of logged values to see how fast things are cascading.  In this case a "noinit" variable would be simple and effective?  )

I loaded a simple program today that copied the MCUSR register and cleared it immediately in main. I then drew these values on the GLCD. The GLCD was able to draw the data to the screen fast enough for me to read the values. Every time it reset, it showed it as a Power-On Reset (all other bits were always zero). I believe this goes along with exactly what fingar says above (see my comments below for the solution).

 

fingar wrote:

1. The REG117 data sheet says   All models require an output capacitor for proper operation, and for improving high-frequency load regulation; a 10µF tantalum capacitor is recommended. Aluminum electrolytic types of 50µF or greater can also be used.

You have no output capacitor only a few 100nF ceramics.

 

2. The regulator tab is isolated therefore you have no heatsinking. I suspect your regulator is getting hot and going into thermal shutdown.

1. The specific device I am utilizing is AZ1117CH. The datasheet does not specifically list a tantalum and actually mentions that the device is compatible with low ESR ceramic capacitors. According to the datasheet, the 10uF cap I am using on the input voltage is spec. But I do not have the 22uF cap on the output voltage. That's what I get for using the datasheet of the microcontroller for cap utilization and brushing over the datasheet for the power regulator. Lesson learned there.

2. For some reason, in Eagle CAD, that tab is part of the device package but not connected to anything. This is what the problem was. I soldered the tab to the +5v ground plane on the top layer and everything has now been running for about 40 minutes (still running right now as I type this!).

 

I want to thank you all for the support and the additional tips that I need to consider in the future. Everything appears to be working properly now!

My digital portfolio: www.jamisonjerving.com

My game company: www.polygonbyte.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Jamison wrote:
I loaded a simple program today that copied the MCUSR register and cleared it immediately in main.

While that worked here and was useful, in the general case that may not be good enough for a "robust" AVR8 app.  In particular, for said "robustness" the WDRF is cleared early to avoid possible cascading WD resets.  Just sayin' ...

 

And indeed, POR vs. BOR implies a [nearly] complete power loss rather than a momentary demand surge of some time.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
While that worked here and was useful, in the general case that may not be good enough for a "robust" AVR8 app.  In particular, for said "robustness" the WDRF is cleared early to avoid possible cascading WD resets.  Just sayin' ...

I'm not sure I understand. The first line of main, I copied MCUSR and the second line I cleared the entire MCUSR register. Is there better or more accurate way I should do this if I need to do such a test in the future? Or are you saying writing the data to the GLCD is inefficient?

 

theusch wrote:
And indeed, POR vs. BOR implies a [nearly] complete power loss rather than a momentary demand surge of some time.

I thought maybe my multimeter just wasn't able to detect the dip to zero volts as it seems to happen very quickly. It's not exactly of high quality, just a $40 meter from RadioShack. I do not currently have an oscilloscope to see the entire waveform.

 

[Edit] Removed my quote

[Edit2] Fixed a quote tag

My digital portfolio: www.jamisonjerving.com

My game company: www.polygonbyte.com

Last Edited: Thu. Dec 24, 2015 - 07:18 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Jamison wrote:
The first line of main,...

That may be too late.  Not usually, but sometimes.

 

https://www.avrfreaks.net/forum/w...

 

There are others, but I can't find the threads.  The short summary:  As the WD changes to the shortest timeout value when it kicks in, in a large app it may be more than that time before main() is reached.  So the WD kicks in again...rinse and repeat.

 

How to "solve" it depends on your toolchain.  For "robust" operation, trap and clear the status register including WDRF early in the C prologue.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
That may be too late.  Not usually, but sometimes.

 

https://www.avrfreaks.net/forum/w...

 

There are others, but I can't find the threads.  The short summary:  As the WD changes to the shortest timeout value when it kicks in, in a large app it may be more than that time before main() is reached.  So the WD kicks in again...rinse and repeat.

 

How to "solve" it depends on your toolchain.  For "robust" operation, trap and clear the status register including WDRF early in the C prologue.

Ah, I understand now. After looking at the LSS file, main gets called at around 0x22E with call being 4 cycles, so that's 562 cycles before the first op code happens in main. The simulator reflects this closely (cycle counter is 564 with the break point on the first line of main). If that is the case, a pure assembly solution with code that calls immediately from the reset vector at 0x0000 and then copies and clears could be done in very few cycles.

My digital portfolio: www.jamisonjerving.com

My game company: www.polygonbyte.com

Last Edited: Thu. Dec 24, 2015 - 08:13 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
How to "solve" it depends on your toolchain.

(which you haven't stated)  With GCC, there is a "hook" at init2 or init3...

 

500 cycles at a normal clock speed is fine.  But there are those that have run over 8ms with a slowish clock and a lot of startup init.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

main gets called at around 0x22E with call being 4 cycles, so that's 562 cycles before the first op code happens in main. The simulator reflects this closely (cycle counter is 564 with the break point on the first line of main).

That's a coincidence.  You can't really extrapolate the number of cycles executed before main by the address at which the call to main is placed.  The C runtime startup code includes two loops, one to clear .bss variables, and one to initialise .data variables.  If one or both of those sections is sizeable, there could conceivably be many thousands of cycles before the call to main.

 

As Lee has mentioned, the proper way to handle this (for avr-gcc) is to use one of the .initN sections.  Likely .init3 would be best.  See here:

http://www.nongnu.org/avr-libc/user-manual/mem_sections.html

 

The wdt.h library has a specific example:

http://www.nongnu.org/avr-libc/user-manual/group__avr__watchdog.html

 

EDIT: typos

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Fri. Dec 25, 2015 - 04:44 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

+1 on Thermal shut down as a likely cause.

 

You can put your finger on the regulator to see if it is getting too hot.

 

You certainly need an in-spec output capacitor, also.  Tack one onto the PCB if you don't have one installed.

 

JC