Help with debugging: interrupt does not get handled

Go To Last Post
22 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'd appreciate some help with debugging strategy for a large Arduino application.

 

MCU is an ATmega1284P on an own-design board. It's 4000+ lines of application code and multiple 3rd party libraries, so it's difficult to post anything relevant here. The interrupt is handling a MCP2515 CAN bus controller, and takes the incoming message and places it in a circular buffer.

 

When I create a trivial application with all the 3rd party libraries and exercise the hardware, everything works fine.

 

When I add (a lot of) application code, the interrupt stops working. When I say 'stops working', the ISR doesn't run despite the interrupt signal being correctly asserted by the MCP2515, as observed with my logic analyser. No other hardware or 3rd party library uses interrupts, or suspends them.

 

The only thing I can think of is a buffer-overrun or similar trashing the Arduino ISR despatch table**, but it is perfectly fine. The relevant entry still points to my ISR and I can even call the ISR directly thru the pointer in the table.

 

I've hacked some debug code into the Arduino source to return the function pointer and to return a variable that is updated when the interrupt handler runs.

I've also added some code in the Arduino interrupt handler that flip-flops an LED. This does not happen, nor is my variable changed, so I assume the Arduino interrupt handler is not firing.

 

But, a trashed despatch table could point anywhere and is likely to cause random crashes, which does not happen. Everything else works fine *except* this interrupt.

 

What is the 'missing link' between the interrupt signal being asserted and the ISR firing ? And how could an application code bug mess this up ?

 

Unfortunately, I don't know enough to be able read the chip's registers and determine whether they've been messed with. But then, I can't imagine how a code bug could do this.

 

** For those who don't know, the Arduino core despatches interrupts indirectly via a table, which is populated by the attachInterrupt() function, per the source here: (interrupt despatch at line 379): https://github.com/MCUdude/Might...

 

Thanks.

 

 

 

Last Edited: Fri. Jul 2, 2021 - 11:50 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If it stops executing the ISR when code is added, then very likely the code is doing something to prevent it. I would check that the global interrupt enable is on. Perhaps something disabled the global enable and did not turn it back on? It will either be the global enable or the enable for that specific interrupt. Could other parts of the code be using a different pin change interrupt and turned yours off while configuring that other one?

 

Jim

 

Until Black Lives Matter, we do not have "All Lives Matter"!

 

 

Last Edited: Sat. Jul 3, 2021 - 05:45 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Take things out until it starts to work then add back one by one until it quits.

Happy Trails,

Mike

JaxCoder.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mike32217 wrote:

Take things out until it starts to work then add back one by one until it quits.

 

What I was hoping to avoid with a slightly more intelligent approach, but yes, that's my fallback strategy. I'm not sure if I'm naturally lazy or whether I just like to work smart wink

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ka7ehk wrote:

If it stops executing the ISR when code is added, then very likely the code is doing something to prevent it. I would check that the global interrupt enable is on. Perhaps something disabled the global enable and did not turn it back on? It will either be the global enable or the enable for that specific interrupt. Could other parts of the code be using a different pin change interrupt and turned yours off while configuring that other one?

 

Jim

 

I even brute-forced a call to sei() at the top of my loop ! I've grep'd all the library code and nothing is touching that interrupt, just the usual Arduino timers and the SPI code.

 

I think I'm right in thinking that a memory overwrite can't affect CPU registers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

obdevel wrote:

 

What I was hoping to avoid with a slightly more intelligent approach, but yes, that's my fallback strategy. I'm not sure if I'm naturally lazy or whether I just like to work smart

 

Sometimes brute force is the only answer.  We all like to work more intelligently, that's not being lazy it's trying to understand the problem so as not to reproduce it and learn from it.

 

Good luck!

Happy Trails,

Mike

JaxCoder.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Does the flag get set without executing the ISR, or is the flag also absent?

 

To find out, at least partially, generate the interrupt, then beak-all, then look at the IO register display to see the state of the flag bit. Now, something else might, theoretically clear it (like accidentally clearing the entire register) but that will be first try.

 

Jim

 

Until Black Lives Matter, we do not have "All Lives Matter"!

 

 

Last Edited: Sun. Jul 4, 2021 - 05:08 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

obdevel wrote:
What I was hoping to avoid with a slightly more intelligent approach

You can do it- use of JTAG, and then you can debug such problems in no time.

Seems that an Arduino on ATmega1284P is at the brink of the code integrity.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

obdevel wrote:
When I say 'stops working', the ISR doesn't run

Now which ISR might that be ? You neglected to tell us.

 

Is ISR Code short enough to post for a sanity check ?

 


Edit

obdevel wrote:
For those who don't know, the Arduino core despatches interrupts indirectly via a table, which is populated by the attachInterrupt() function, per the source here:

Are you using this mechanism ?

 

Last Edited: Sun. Jul 4, 2021 - 09:40 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

grohote wrote:

Seems that an Arduino on ATmega1284P is at the brink of the code integrity.

 

I'm not sure what the processor has to do with it, other than it's likely to be a larger, more complex application with more code to go wrong. It's just another AVR with more memory, pins and timers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

N.Winterbottom wrote:

obdevel wrote:
When I say 'stops working', the ISR doesn't run

Now which ISR might that be ? You neglected to tell us.

 

Is ISR Code short enough to post for a sanity check ?

 


Edit

obdevel wrote:
For those who don't know, the Arduino core despatches interrupts indirectly via a table, which is populated by the attachInterrupt() function, per the source here:

Are you using this mechanism ?

 

 

It's hardware interrupt 0 (INT0). It is set active-low, so the port pin is pulled up using the internal pull-up. All other interrupts (timers, SPI) are unaffected.

 

The ISR isn't running at all, as the first three lines of code (i) flip an LED, and (ii) set a global flag. Neither of these things happen.

 

As I said, it works perfectly well for a smaller test application using all the libraries and exercising the hardware. Something in my application code is causing the ISR not to run and whilst the obvious candidate is a memory overwrite, I don't see how this could have the observed effect. To do that it would have to either (i) drive the pin permanently high, (ii) change the interrupt register, or (iii) change the dispatch table.

 

Thanks for your help everyone but I'll have to try the brute force approach. If nothing else, it'll force me to refactor the code to be more modular, now rather than later.

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

obdevel wrote:
It's just another AVR with more memory, pins and timers.

Pay the price, then, for shoveling the software in.

This buffer that does not works- can you try more tricks to learn to which address the Int jumps, or if not, why not.

Who can help you without more information provided, and you refuse to obey Mike idea- to remove some software first.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've also added some code in the Arduino interrupt handler that flip-flops an LED. This does not happen

If you want to rely on it, make sure it DOES happen for the shorter ("working") version.  Otherwise it can lead you way down a wrong path (such as the LED installed backward & it would never work in the first place---so you chase the wrong direction)...note that flip flop could happen too fast to see...so a triggering scope is somewhat preferred.

 

Is this bad IRQ supposed to be "always on"?  If it is enabled only for certain paths, maybe you aren't always traversing that pathway in your code.

If the IRQ signal were extremely short, it could be missed (but would likely be the same regardless of code size)...By the way, does your code do something to the CAN chip, that  might affect its interrupt generation (though it sounds like you are saying your analyzer indicates an IRQ signal found coming from the CAN chip).   If the IRQ doesn't return properly, but dumps out to normal code, main code could run (perhaps resuming with hidden or noticeable stack troubles), but further IRQ's would be blocked (due to improper exit).

 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, I deliberately flip-flop the LED rather than pulse it. It's initialised on and each subsequent interrupt flips in on to off, off to on, etc, etc. Again, this works for the minimum example. I can also trigger the interrupt in the working code by jumpering the interrupt pin to ground, so it's unlikely to be a transient electrical issue or the CAN controller chip.

 

The interrupt is setup at the beginning and not subsequently touched. It's the only interrupt I handle 'explicitly'; others are Arduino's use of timers, SPI, etc, which all still work as expected.

 

Thanks for all the ideas so far. Just trying to write a sensible question has suggested some more debug approaches.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

grohote wrote:

obdevel wrote:
It's just another AVR with more memory, pins and timers.

Pay the price, then, for shoveling the software in.

This buffer that does not works- can you try more tricks to learn to which address the Int jumps, or if not, why not.

Who can help you without more information provided, and you refuse to obey Mike idea- to remove some software first.

 

 

That's a strange response. Maybe it's a language problem, but saying "don't do large projects, with or without Arduino" seems a strange philosophy.

 

I didn't reject Mike's suggestion. Rather, I was asking for 'smarter' approaches. Brute force should be the last resort, not the first :)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

When I add (a lot of) application code, the interrupt stops working. 

Is the rest of the app (keypad, motor, LEDs, display, buzzer, whatever) still working?---is is only the IRQ/buffer failing?

If you are misusing a pointer (or have some indirection errors), the code might improperly seem to "work", by sheer luck of the draw, but start colliding (and maybe crashing) when a larger version starts using more code and ram space.  Also, pay attention for collisions due to wraparounds.  

 

Have you examined the asm code to see what has actually been generated?  Anything been unexpectedly optimized away?  Sometimes you literally get to tell the compiler "thanks for nothing",

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sun. Jul 4, 2021 - 08:06 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Your #14 is in contradiction with previous information.

Sorry, I should not offend you. Or Arduino, I know how they are important to AVR community.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

obdevel wrote:
saying "don't do large projects, without a debugger" is definitely a strange philosophy

There - I fixed that for you.

 

I'm sure that of you owned a debugger you'd have diagnosed it by now. Perhaps not fixed the bug but you'd know why the ISR didn't run.

obdevel wrote:
the port pin is pulled up using the internal pull-up

One thought: Does that Pullup remain ON. If it were inadvertently switched OFF that could cause your bug.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In your search for things that might change the configuration of that pin change interrupt, do not forget to look for changes at the register level. After all, some do

REGNAME |= (1<<PINn)

While others mght do

 

REGNAME = 0x37

OR

 

REGNAME = 0b00010101

And, while you are looking at registers, make certain that the pin stays as an input, that the pull-up remains on, that the registers with enables all stay on, that the global interrupt enable stays on, and such. For pin changes, there are quite a few registers involved. Were it a Mega0 or other new one, you would also need to verify that pin mapping and such remains correct.

 

Jim

 

Until Black Lives Matter, we do not have "All Lives Matter"!

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Well, I've found the error and worked around it. It's in a 3rd party library that I've been using for years but never this particular path through the code. I've reported the bug to the maintainer so it will hopefully get fixed at some point. I see no logical reason why it triggered the problem - which irks me - and the workaround is slightly sub-optimal. But it happily runs my full test suite, so I'm happy too.

 

I really don't get the 'anti Arduino' hate. I have CAN bus, a TFT display with touchscreen UI, and an SD card all sharing the SPI bus. And an RTC chip plus outboard ESP32 NTP time source on the I2C bus. There's a mini language interpreter that runs programs from the SD card for automating operations. Next step is to use the Optiboot write-to-flash functionality as an alternative to the SD card. I really don't think all that would be achievable in four weeks of spare time if you were starting from bare metal. I may only be a hobbyist but I have hundreds of customers who use my designs and expect them to work. I say 'customers' but I do it for pleasure, not money. 

 

My professional background is larger systems and when working across multiple systems there is never a common dev environment or debugger. In the 90s I maintained a codebase of over a million lines of C code across 30 versions of Unix, Windows, Mac and VMS. #ifdef and printf() have served me well over the years.

 

I did try to import the project into AS7 but it was far too large. As for reviewing the ASM, the current avr-objdump disassembly output is 59000 lines.

 

I appreciate I may be pushing the boundaries of what is reasonably achievable with a 10 year old 8-bit chip, but in my defence, nobody told me I shouldn't :)

 

Thank you to those who made helpful and encouraging comments and suggestions. If nothing else, it helped my think through the problem logically and ask (hopefully) better questions.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 There's a mini language interpreter that runs programs from the SD card for automating operations

How does that work?  Is there a good lib for that?  Sounds interesting! 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

obdevel wrote:
I think I'm right in thinking that a memory overwrite can't affect CPU registers.
Given tat CPU registers R0 to R31 are mapped to appear at RAM addresses 0x00 to 0x1F I'm not sure where you got the impression that a memory write can't affect CPU registers. If you were to use:

uint8_t * p;

p = (uint8_t *)13;

*p = 0xA5;

then this just put 0xA5 into AVR register R13. Things can get even more complex if the registers involved were actually being used by the C compiler at the time. Suppose this was:

*p = 0xA5;
p++;
*p = 0x5A;

but the compiler happened to be using R13 for 'p' then the first write put 0xA5 into R13. The p++ makes that 0xA6 and so 0x5A gets written into location 0x00A6.