I've found a 100% reproducible crash bug in my ATMEGA1284P project, where by "crash" I mean my custom AVR board either locks up, or appears to jump back to the beginning of main() and starts the program anew. It's a timing-sensitive issue while communicating with another device, so debugging in an emulator isn't an option. I'm also using ALL the pins, so connecting a hardware debugger isn't an option either. I do have an LCD display that I can print diagnostic info to, if I can catch an error in some kind of exception handler, maybe even write code to unwind the stack so I can try to see how it got there.
My best guess is that I've got a bug that overwrites RAM, corrupting the stack, and causing the AVR to jump to a bogus address when it pops a return address off the stack. I was hoping to add a handler for some kind of "invalid instruction" exception to help catch this, but from what I can see, the 8-bit AVR has no such concept.
Any good suggestions on ways to go about debugging something like this?
Other possible explanations:
1. Electrical problems, bad voltages, shorts, etc. Possible, but I strongly doubt it. I've been using this board for months, and except for this one case, I've never seen any flakiness.
2. Brown-out or watchdog timer. These are disabled (unless I've done it wrong). And sometimes the AVR doesn't actually reset, it just hangs, which is not what I'd expect from the BOD or watchdog.
3. Some other interrupt firing that I didn't think was enabled, and that there's no handler written for.
4. Heap colliding with the stack. avr-size says I'm using 14032 bytes of RAM in my data segment (of 16K total), there's no dynamic memory allocation, and the stack shouldn't use more than a couple of hundred bytes.
Any ideas or suggestions of things to try will be appreciated. Thanks!