Watchdog ignores resets?

Go To Last Post
10 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Recently a bug was discovered in some code I wrote that caused the system to reset unexpectedly. I quickly diagnosed the problem as a watchdog timeout and set out to insert a wdt_reset() call where one was apparently needed. The problem is that even though I call the reset, the watchdog still seems to time out.

The function where this behavior occurs is one that is run at system startup. All it is doing is waiting on other parts of the system to initialize and report that they are OK to go. Code follows:

void fun()
{
  for(;;)
  {
    wdt_reset();
    procedure1();
    while(component1_NotReady){wdt_reset();}
    wdt_reset();
    procedure2();
    while(component2_NotReady){wdt_reset();}
    procedure3();
    wdt_reset();
    procedure4();
    wdt_reset();
    if(systemReady)
      break;
  }
  return;
}

The problem showed up in the rare times when the system was NOT ready and my for(;;) loop was forced to run a second time to try initialization again.

The loop runs once perfectly fine but the watchdog timer appears to reset the system at procedure2() during the second iteration. If I shorten a loop within procedure 2 I can avoid this behavior but break other things.

Alternatively I can execute a ridiculous number of wdt_resets within procedure2 and things are fine, but I feel that I shouldn't have to do this because the loop ran without a WDT timeout the first time.

I use a polling method here because the system requires all components to be online and if something fails then there's no reason to move forward.

I'm using an ATmega324P. Watchdog fuse is not set, but the timer is configured before fun() is called to expire every "250ms" (I use quotes because practice shows that this can vary wildly with Vcc).

Any ideas as to what could be causing this?

Apologies if my search of the manuals and forums missed something obvious.

Thanks in advance.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How long does procedure2 take to execute? Have you tried increasing your watchdog timeout to 0.5sec or more?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My simple response would be to lengthen the time before the WatchDog fires to 500 mS (or longer?). It may be that when you are having the "system not ready" problem, whatever happens in procedure2() needs that extra time.

You could also extend the time around just procedure2() and then shorten it again afterwards. A little more code, but not that bad.

Of course, all this changing of the WatchDog time will need to be done according to the procedures set out in the datasheet. Check out what your compiler generates to be sure you meet the cycle time restrictions on changing WatchDog parameters. You appear to be using gcc/avr-libc, so be sure you have your optimization set to -O1 or better.

Stu

Edit 1: I'm unsure about your "timeout varies wildly with Vcc" comment. IIRC, the Watchdog is a binary counter running off the system clock. Check your datasheet, since I'm sure it talks about this. Are you, perchance, using the internal oscillator for a clock? That would explain the variance, since your system clock would not be terribly stable with Vcc. Using an external oscillator or crystal should keep the frequency solid w.r.t. Vcc.

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Switching the WDT timeout to 500ms for this part of the code seems to have resolve the problem, thanks.

In response to stu's comment:

stu_san wrote:
Edit 1: I'm unsure about your "timeout varies wildly with Vcc" comment. IIRC, the Watchdog is a binary counter running off the system clock. Check your datasheet, since I'm sure it talks about this. Are you, perchance, using the internal oscillator for a clock? That would explain the variance, since your system clock would not be terribly stable with Vcc. Using an external oscillator or crystal should keep the frequency solid w.r.t. Vcc.

The ATmega324P datasheet says that the WDT is clocked from an internal 128kHz source, presumably an RC oscillator. This is probably why I have seen such deviation in the past.

As far as WDTs go, are there any guidelines for the "proper usage" of a watchdog? I could make it really short and put wdt_reset everywhere, but that would increase the likelihood of errant execution just resetting the watchdog anyway. Making it too long could cause me to miss deadlines.

Just a curiosity as I never really received any formal education on "the proper usage of watchdog timers," just, "This is what watchdog timers do. They are good."

Thanks again for your help.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That's a balance you'll have to determine for your application. The wdt timeout is application specific, or even operation specific. In my designs I'm often changing the WDT period depending on what operation is being performed.

BTW, code like this:

while(component1_NotReady){wdt_reset();}

defeats the purpose of a WDT altogether. Since your AVR will never get reset if you get stuck in this loop. I suggest you either "A" increase the WDT period before entering the wait loop, and then not reset the WDT during the loop. Or "B" add a counter, and a delay routine in the loop, to set a maximum run-time for the loop itself, before exiting in a failed state.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

glitch wrote:

BTW, code like this:

while(component1_NotReady){wdt_reset();}

defeats the purpose of a WDT altogether. Since your AVR will never get reset if you get stuck in this loop. I suggest you either "A" increase the WDT period before entering the wait loop, and then not reset the WDT during the loop. Or "B" add a counter, and a delay routine in the loop, to set a maximum run-time for the loop itself, before exiting in a failed state.

Good call. I actually do limit the loop with a counter for some reason neglected the first. Thanks!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sjackson wrote:
...I never really received any formal education on "the proper usage of watchdog timers," just, "This is what watchdog timers do. They are good."
Watchdog timers are used for as many reasons as there are applications, but in essence they supply a way to automatically recover from unforeseen (and potentially harmful) hangs in the application.

For example, watchdogs are commonly placed in robotic mechanisms to prevent harm to the robot and to the humans and other robots around it in the event of a firmware "hang". I'm sure that all of the robots used to manufacture cars have some watchdog timers, since you don't want an arm with a welding tip to go completely nuts. :shock:

Another classic example is in satellites, since there is not going to be anyone up there to push the "reset" button if ground control sends something stupid. (Okay, except for telling the Mars Express to auger into the planet -- that's a little unrecoverable :lol: ).

As glitch has said, it is sometimes just as easy, if not easier, to place a simple counter around a loop. In the code I write, that is a very common feature.

In fact, in my application (servo controller for an optical drive) I practically never use the watchdog timer. I prefer to watch for the timeouts where I know they will occur so I can customize the response. I also layer timeouts -- the local loop has a counter, but the command handler also has a global timeout as well as a "break" function available from the user. Granted, the command handler timeout could be a watchdog, but then all of the current state would be lost.

(I know someone is going to jump in here and say, "your state isn't necessarily lost -- your watchdog reset handler could reinstate it". Yeah, I know. It just seemed easier to not do things that way. Personal preference - another factor in the watchdog saga.)

At any rate, I don't think you will find a tutorial on watchdog timers. Hopefully, though, the above just gave you a clue.

Stu

PS: That's the problem with those Terminator robots - no watchdog code!

If (about_to_kill_human())
   {RESET();}

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

stu_san wrote:

If (about_to_kill_human())
   {RESET();}

Apparently, you do not write software for the military.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
If (About to Kill Human(s)) then
   IFF (Identify Friend or Foe)
   if Friendly then
       Reset
   else
       Terminate
   endif
endif

It will be a bad day at work, however, if you forget your RFID ID Badge and leave it at home on the dresser.

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kevin wrote:
Stu wrote:

If (about_to_kill_human())
   {RESET();}

Apparently, you do not write software for the military.
That's why they don't talk about "smart" bombs anymore, but instead "laser-guided munitions". The "smart" bombs kept quoting Kant. :P When they got to Kirkegaard they just stopped working. :roll: :wink:

Stu

PS: My son is in the US Air Force, so all of the above is with my tongue firmly planted in my cheek.

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!