We have a simple Controller built around an ATmega1284 microcontroller. We have been building these for years, and never had any problem (with the micros, at least).
Last week we encountered two boards that failed the routine production test. Both boards showed odd misbehaviour by the software. Sometimes the main-line code seemed to hang while some interrupt-driven stuff kept going, but there were also times when the main-line user interface was still working but the interrupt driven communication code was not working.
After experimenting with the two boards for a while, the problem got harder to replicate, and then disappeared. On a whim, I put the boards outside (it's Winter) and let them cool off for a while. When I brought them back in, the problem was back immediately!
I have spent most of today looking into clock fuse settings, the ceramic resonator, the reset line, the power supply, and stray conductance on the PCB, and none of those appear to be the cause of the problem. That seems to leave only the microcontroller itself.
Has anyone experienced this kind of temperature-dependent failure of the micro software to execute properly?
Any advice would be most welcome.
Bert Menkveld
bert@greentronics.com