Instruction Line?

Go To Last Post
21 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am trying to find an elusive and intermittent bug in my code. I am fairly confident that the problem is somewhere in a routine where the MCU is interacting with other chips (so simulation will not really work for me). Unfortunately, that doesn't really narrow it down, as there are many such routines in the code. I do not have an on-chip debugger (only an AVRISP mk2). Is there a way to determine what instruction was executed last (program pointer?)?

Meaning, if my WDT trips, can I determine where in memory/code I was and print that out via the UART?

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you are using the WDT interrupt, then you should be able to divine the address from the stack, you just have to know how far back in the stack the address lies.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Meaning, if my WDT trips, can I determine where in memory/code I was and print that out via the UART?

Not directly. However, modern AVR models have WDT "interrupt" and "interrupt+reset" mode so you could set up the WD interrupt and find the return value on the stack. If it were me, I wouldn't try to do USART comms from within WD interrupt but would probably "log" to EEPROM.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You really need a debugger! You can use leds to show which function was executing. Set a led on entry and clear it on exit. It is a slow technique but it can help narrow down the area. You can check the stack level in an isr and light an led if it is bad.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A debugger could make quick work of finding the defect but if it doesn't aid within a day or two then consider using a logic analyzer.
Ideally there's a spare port on the AVR to connect to a logic analyzer.
This technique is also used by a certain major competitor to Atmel.
Some logic analyzers may not keep up with the AVR if one byte on the port per source code line; this may have a work-around by using Gray code.
Some logic analyzers will also decode TWI, SPI, USART, RS422/485, etc.
Ref.
Troubleshooting real-time software issues using a logic analyzer by David B. Stewart (InHand Electronics, Inc.) (embedded.com; FEBRUARY 27, 2012)
sigrok (try SUPPORTED HARDWARE, Logic analyzers, and its comparison)

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Which AVR? If one of the old school you might be able to build a mega16 JTAG. Otherwise splash $50 on a Dragon.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Is there a way to determine what instruction was executed last (program pointer?)?

I do not get your idea.
If the watchdog triggered then that is because:
    a)some software test that allows wdr didn't pass. So the wdr IRQ + unwinding the stack will not give you more information than the test itself (the one that failed). b) the PC got random values so even unwinding the stack will not give you any information.

Do you assert.h? Perhaps you should.

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I do not get your idea.
If the watchdog triggered then that is because:

or c) Program got stuck in a loop.

So for c), finding the PC (and perhaps more of the stacked PCs) when the watchdog fires would give info on where the app is sticking.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I agree that a debugger would be really useful here (though my skill at using it may need some developing). However, for the moment, I do not have one.

[Famous last words ahead] Barring tachyons hitting my MCU, there is only a single place in the code that could potentially turn into an infinite loop. While I have not been able to find the bug yet, I am fairly confident I have excluded that single loop.

I like the idea of toggling an LED at entry/exit of a series of points in the code, because it is short (single instruction). I have tried using print statements to narrow the bug location down, but I believe that this is a timing issue w/ comms to another chip (either SPI or uart). Whenever I add the print statements, the timing gets changed, and I cannot replicate the bug at all (which is intermittent at "best").

I have not used assert.h. I just read the page on the avr-libc webpage, and I am not entirely sure how to best use its functionality to help me in this particular case.

@theusch/koschi: My WDT is configured for interrupt only. I [perhaps erroneously?] don't really care how long the WDT ISR takes at this time, because I am only using it to find the bug. For that reason I don't really care about how long fprintf takes. However, if necessary, I absolutely could log to eeprom. How would I determine return value from within the ISR?

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How about an oscilloscope/logic analyser? Do you have access to one?

The problem with trying to debug timing sensitive stuff with LEDs or even worse, printf()'s is that they operate in the human realm. With a scope you can take a pin high/low in 2 machine cycles and use that with a scope/analyser to catch the moment of entry to some part of the code. Go wild and make a double pulse at exit (so it's discernible). You can do stuff like this without anything but the most infinitesimal impact on existing timing. In some senses this is actually better than a debugger when it comes to diagnosing timing sensitive stuff.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I do have access to a 'scope. Unfortunately, as this is a very small board, I do not have ready access to any pins. If absolutely necessary, I can break into the power cable (there are a few PGOOD signals in the bundle), and use one of those lines as a monitor point. However, I think I will first try turning on/off an LED around code blocks. I know that recognizing blinks is done at human speed, but whether it stays on or off will tell me that the bug is between the two statements (turn on and turn off). I can then shrink the code block until I isolate the problem spot. One "useful" aspect of this bug is that when the wdt interrupt trips, it never kicks out -- i.e., the wdt keeps tripping over and over. This means that whatever block is hanging will continue to hang (allowing me to keep an LED in an on state).

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Did we mention the avr number? Have a uart? I have done putchar('1'); putchar('2'); etc after ea subroutine in the main loop or init or inside the suspected hang condition etc.

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
How would I determine return value from within the ISR?

You mean the PC to reti to.

If your IRQ is naked then that is easy. Just get the SP, and above it you can find your PC.

uint16_t ProgramCounterToRETI;
ProgramCounterToRETI = *((uint16_t*)(SP + 1)); //not tested

But if IRQ is not naked (I bet it is not) then you need some tweaks/more generic stack unwinding procedure. I have never seen a run-time stuff for that (except the one included in exception handling of C++). Usually the debugger does the dirty job and elfs full call stack without problems.

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Your Watchdog interrupt only has to store the PC in SRAM and check a GPIO pin to see if you want to "log current state to EEPROM" and stop.

In fact the ISR(WDT_vect) only has to check a pin. The regular execution is harmless. The logging operation is only required when you have already got stuck. You can unroll stacks, dump memory or whatever you want.

I still reckon it is easier to buy a Dragon.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

hobbss,

If you find that printf statements change the timing of the process you are trying to trace, use putchar instead to put out single-character "tracers". The resulting printout is a bit cryptic and you can't over use it or the printout will be too volumious to analyse and also have the same damping effect as the printfs. It also helps to choose the code characters symbolically: e.g. "(" for entry into a routine and ")' for exit from the same routine.

You want to set the baud rate as high as possible to minimize interference with your main process you are trying trace. You might also write a special putchar() routine that doesn't wait for the xmit buffer to empty before sending the character, in order to minimize the burden the tracer commands place on the main process. Obviously though, you need to use this judiciously to avoid totally jambing the UART's transmit FIFO.

If you are using a micro with multiple UARTs, consider using one for printf statements and the other for the single character tracer codes. In use, set up two side-by-side sessions of TeraTerm on your PC and enjoy the show! The root cause of the problem will usually become apparent after a few minutes of viewing.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
But if IRQ is not naked (I bet it is not)
Probably not, but since he has stated that the ISR is there only to find the return address, it might as well be made that way.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Quote:
But if IRQ is not naked (I bet it is not)
Probably not, but since he has stated that the ISR is there only to find the return address, it might as well be made that way.

Oh, my, here we go again with the "naked". I went off on that tangent just yesterday...

Yes, I thought of that. If an unknown stack situation with an arbitrary return address on the stack, then IMO a "core dump" is needed just like in the good old days for the post mortem. I'd suggest storing the SP, along with a chunk working upwards from the SP. Why? During the post mortem on the first try with a successful capture of the PC, it may well be found that it points to an innocent routine--e.g. strlen() or whatever.

Now the core dump needs to be broken down with the stack frame of that routine, and what called it. Wash, rinse, repeat.

Back when I was your age, core dumps was all we had. ;)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I apologize for the lag in my response (putting out different fires today). I appreciate all the input.

Using the LED approach, I believe that I have narrowed down the most probable source of the problem (though even the single instruction to set and clear the led bit changed the timing enough to make the occurrence frequency of the bug drop from about 66% to 10%).

The sequence is:
1. Wake 3rd party chip from sleep (toggle MCU I/O line with 10ms hold high).
2. Delay 30ms
3. Enable buffers on SPI lines between MCU and 3rd party chip.
4. Return to regular code - which MAY attempt to talk to 3rd party chip immediately via SPI bus.

Said third party chip has complicated wakeup routine (including short period of trying to be SPI master and looking for some nvmem). The 30ms delay mentioned above SHOULD allow that period to elapse. It appears that sometimes it does not. However, simply increasing the delay does not appear to solve the issue.

I am not sure what a naked ISR is - is it one that simply gets the stack pointer (i.e. no other instructions)? How does one get the stack pointer?

@theusch: I really like this suggestion. Could you provide a little more guidance? Specifically how to access the SP and memory above it. I know how to log info.

For the record: avr-gcc.

Science is not consensus. Science is numbers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I am not sure what a naked ISR is

Assuming you are a GCC guy, it is an ISR that explicitly tells a compiler that it does not want any registers to be saved (pushed) prior entry and restored(popped) on reti. If an ISR additionally does some pushpoping then this is stored on the same stack as ProgramCounterToRETI and as number of pushpops is compiler-dependent (can vary with the mood and -O) then I have no idea how this could be done*. It is definitely doable as GDB does it in any circumstances, no matter if ISR is naked or is not.

Quote:
How does one get the stack pointer?

:)
A "stack pointer" is the name of one of the registers in an AVR core. It is named SP and usually consists of two 8-bit chunks called SPL and SPH.

*If by chance someone knows how to do some basic run-time stack unwinding in C (just to know the caller/s) then I would gladly listen to that story. It would be great to have an assert() that not only reports the file and line that failed, but also the callers.

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Specifically how to access the SP and memory above it.

AVR-LibC makes the machine stack pointer available to you as a symbol called "SP" (inspire choice eh?). It's defined in common.h (which is included from io.h). It's defined (usually - depending on the AVR) as an _SFR_IO16() so you can just read/write it like you do with ADCW, TCNT1, OCR1A or whatever. Your "core dump" is probably going to look something like this:

void dump16(uint8_t * addr) {
	uint8_t i;
	printf("%04X: ", (uint16_t)addr);
	for (i = 0; i < 16; i++) {
		printf("%02X ", addr[i]);
	}
	for (i = 0; i < 16; i++) {
		printf("%c", ((addr[i] > 0x1F) && (addr[i] <= 0x7F)) ? addr[i] : '.');
	}
	printf("\r\n");
}

ISR(WDT_vect, ISR_NAKED) {
	register uint16_t i;
	for (i = SP; i <= (SP + 96); i += 16) {
		dump16((uint8_t *)i);
	}
}

BTW I tested that dump routine in the simulator using this:

char buff[200];

int myputchar(char c, FILE *stream) {
	static uint8_t ptr = 0;
	buff[ptr] = c;
	ptr++;
	if (ptr > 199) ptr = 0;
	return 0;
}

FILE str = FDEV_SETUP_STREAM(myputchar, NULL, _FDEV_SETUP_WRITE);

int main(void) {
	stdout = &str;

Then just watched buff[] in memory to see if the output looked right.

What's really nice about 4.7.2 was that by just changing:

void dump16(uint8_t * addr) {

to be

void dump16(__flash const uint8_t * addr) {

I could use the same to dump flash memory (which held more "interesting" data than the RAM!).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Cliff -- thanks for the example. I will put it to use this weekend...

Science is not consensus. Science is numbers.