[SOLVED] Instruction skipping after spurious interrupt

Go To Last Post
9 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Greetings everyone.

During development on an AT32UC3A3 I ran into a peculiar issue that left me stumped because I don't trust the only explanation I came up with after three weeks of debugging. The symptoms are that the CPU appears to skip a single instruction when globally unmasking interrupts after a PDCA/USART race condition.

Sorry for the wordy post, but I find this issue quite interesting and would like to share as much information as possible.

Context:
Centered around an AT32UC3A3 I'm developing a dual-channel bus master device for a high speed bus for robotic applications. I'm basically shovelling data from the USB connection to the USARTs and back. For the USARTs I'm extensively using the PDCA to transfer data and then have the USARTs tell me when they're done transmitting (TXEMPTY). At several stages of the project I encountered odd behaviour with seemingly random hangs, exceptions or data corruption, but it was almost impossible to track down the bugs (classic Heisenbugs), until one of them decided to stick around.

To cut this long story short and to avoid anyone having to dig through all that unrelated clutter, I reduced the issue to a simplified demo program, which reproduces the issue without fail (see attachment).

The race condition:
Observant readers might've already caught the issue: When activating the PDCA to shovel data to the USART and then enabling the USART's TXEMPTY interrupt, there's the possibility that the PDCA might not yet have transferred any data to the USART when the TXEMPTY interrupt is enabled. Globally masking interrupts should obviously be used, but the PDCA might've not transferred anything when interrupts are globally unmasked.

A spontaneous but not-too-helpful idea might involve the reminder from the intc.c documentation: When enabling or disabling periphery interrupt sources one should stall the CPU by immediately reading from the same register before proceeding. However, in this case this would only ensure that the PDCA has received the enable command before the USART TXEMPTY interrupt source is enabled. The race condition persists because the PDCA still might not have transferred anything yet. (This can be tested with the example program as well.)

Either way, I expect one of two possible scenarios to happen:
1) The delay between enabling the PDCA and enabling the USART's TXEPTY interrupt is large enough so that the PDCA has transferred some data when the TXEMPTY interrupt is enabled. All's well.
2) The delay between enabling the PDCA and enabling the USART's TXEPTY interrupt is too short and the USART complains about not having any data yet by causing a TXEMPTY interrupt. In this case one could query the PDCA and return from the interrupt handler immediately if the PDCA is not done yet. This would waste some CPU cycles, but everything should be executed properly.

However, there appears to be a third scenario (or otherwise I wouldn't be writing this), which the attached demo program will show.

About the demo program:
The attached demo program is an AVR32 Studio project for compilation with the avr32-gcc toolchain and is based on some of the Atmel example projects. It's written for the AT32UC3A3 but has been tested on an AT32UC3C1 (engineering sample) as well, where it seems to behave exactly the same. The program runs off the internal RC-oscillator and assumes no external hardware except for a serial port connected to one of the USARTs. All settings should be configurable with the included configuration file, so I should run on pretty much every UC3 microcontroller.

What the program does:
Triggered by a TC it runs several iterations of trying to cause the race condition. In every iteration the delay between enabling the PDCA and enabling the USART's TXEMPTY interrupt is reduced by one clock cycle with the help of a nop-slide.
Every iteration does this:
- Print a message about the iteration and the delay,
- fill a test array with 0x88,
- trigger the race condition with variable delay,
- fill the test array with 0x11,
- print the test array,
- print a message when the PDCA transfer completed successfully.
If a spurious interrupt occurs, an additional message is printed at that time.

What I expected:
With long delays I expected scenario 1 to happen, presenting me an array full of 0x11 and no spurious interrupt message.
With short delays I expected scenario 2 to happen, presenting me an array full of 0x11 and a spurious interrupt message.

What I got:
Sorry if this spoils the end of the movie...

Instruction skip demo.
Iteration 0, 12 nops
00: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
PDCA ok
Iteration 1, 11 nops
00: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
PDCA ok
Iteration 2, 10 nops
00: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
PDCA ok
Iteration 3, 9 nops
00: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
PDCA ok
Iteration 4, 8 nops
00: 11 11 11 88 11 11 11 11 11 11 11 11 11 11 11 11
10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
PDCA ok
Iteration 5, 7 nops
00: 11 11 11 88 11 11 11 11 11 11 11 11 11 11 11 11
10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
PDCA ok
Iteration 6, 6 nops
== spurious interrupt ==
00: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
PDCA ok
Iteration 7, 5 nops
== spurious interrupt ==
00: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
PDCA ok

"¦and then it goes on with more spurious interrupts.

So what happened in iteration 4 and 5 and how is that possible?

The output is the same on the A3 and the C1. Different compiler optimisation settings cause the odd behaviour to appear in one iteration earlier or later.

The relevant assembler code looks like this, where r10 contains the address of the test array and r11 contains the test pattern 0x11:

csrf  0x10         // Globally unmask interrupts
st.b  r10[0x0],r11
st.b  r10[0x1],r11
st.b  r10[0x2],r11
st.b  r10[0x3],r11 // What happened here?
st.b  r10[0x4],r11
st.b  r10[0x5],r11
st.b  r10[0x6],r11

"¦ etc.

Quite obviously there's one instruction that's not being executed and if this was a bug report, I'd stop here, and if you want to try out the program yourself and come up with your own hypothesis, read no further.

Here's my hypothesis:
With the time that the USART's TXEMPTY interrupt signal is active gradually reducing, there appears to be a time-frame of two cycles in which the signal is too short to properly initiate interrupt handling, or, in other words, while the CPU is initiating interrupt handling, the interrupt signal quickly disappears and the CPU messes up the program counter when aborting interrupt handling. After all, no interrupt is executed.
As the UC3 core is probably the identical on all microcontrollers, the same behaviour can be seen with the A3 and the C1.

This left me with two possibilities (again):
1) I messed up real good somewhere and none of my fellow local hackers caught it,
2) or Atmel messed up and there's a real bug in the microcontroller, which went unnoticed until now.

As microcontrollers from the UC3 family seem to be widely used by now, even running Linux, possibility 2 sounds exceedingly unlikely. Ironically, the A3 series has even been promoted from their preliminary status just this month.

So here we are after this wall of text: I would really like to know if someone else can reproduce this behaviour with their setup, possibly even on other series besides A3 and C1. Running the test on a microcontroller from the AT32UC3B series would be especially interesting, due to their different interrupt handling routine.

Comments, ideas and solutions are welcome.

Attachment(s): 

Last Edited: Mon. Dec 19, 2011 - 02:36 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not sure whether this problem is too boring or too complicated, but it would really help if someone could try to verify this by just running the test program on their AT32UC3 of choice.
Thanks.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just in case anyone was actually interested in this: Atmel managed to track down the issue after three month and as it turns out, it was possibility 2.

Apparently it’s a “just discovered and rare errata” where interrupt handling can lead to unpredictable behaviour if the interrupt signal persists for less than 3 clock cycles. The workaround is that the programmer has to make sure that no interrupt can be disabled when the global interrupt mask flag is cleared.

Their response left open whether only all AVR32 with the UC3A core are affected (UC3A, UC3C, UC3L) or the UC3B as well.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

After reading your detailed description, the easiest workaround that comes to mind:

1. enable the pdca
2. poll the USART CSR TXEMPTY bit until it goes to zero (i.e. the pdca has transfered the first byte to the THR)
3. enable the TXEMPTY interrupt (IER.TXEMPTY = 1)

I would think that would solve the problem you describe with minimal wasted time. I can't imagine that the polling in step 2 above would amount to more than single digit microseconds.

Letting the smoke out since 1978

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I did a hasty review of my dma-driven usart code after I read post #3.
In my case I enable the pdca via a routine before I enable the TXEMPTY interrupt, so that is more that 3 clock cycles delay, but I think I may add what digitalDan suggests., just to be safe.

Thanks catweax for the 'heads-up' and the detective work.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

digitalDan wrote:
1. enable the pdca
2. poll the USART CSR TXEMPTY bit until it goes to zero (i.e. the pdca has transfered the first byte to the THR)
3. enable the TXEMPTY interrupt (IER.TXEMPTY = 1)

I would think that would solve the problem you describe with minimal wasted time. I can't imagine that the polling in step 2 above would amount to more than single digit microseconds.

Keep in mind that the polling in step 2 involves the CPU accessing the peripheral bus, which then cannot be accessed immediately by the PDCA, so you’re probably wasting more time than you think.

The workaround I use and suggest:

1. Poll for TXRDY. Unless you just did a transfer this should take only one polling cycle.
2. Copy the first byte from your data into the THR. The USART (or TWI or whatever) will immediately have something to transfer, unsetting TXEMPTY.
3. Start the PDCA to transfer n-1 bytes, starting from the second byte.
4. Enable TXEMPTY interrupt.

This way you can’t have any (relevant) access conflicts between the PDCA and the CPU and your transfer will start immediately.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why not use the PDCA transfer complete interrupt instead of the USART's Tx Empty interrupt? That's what I always do...

Daniel Campora http://www.wipy.io

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Won’t help you if you need to know when all bits have actually been transmitted out of the chip.

I need to know because I’m driving a half-duplex RS-485 line and have to disable the transmitter when done sending. If I do that when the PDCA is done, the last up to two bytes won’t actually be transmitted on the RS-485 line.

In theory the USART’s RS-485 mode could do that for me, but only one of the UC3A3’s USARTs supports that mode and I’m driving two independent buses.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Catweax, just want to report that when using TWI in combination with PDC, indeed the same thing can happen! My UC3A0512 skipped 1 instruction and landed into the 'unpredictable', e.g. like here:

8000df64: 2f fd sub sp,-4
8000df66: d8 22 popm r4-r7,pc <= skipped this
8000df68: 80 02 ld.sh r2,r0[0x0] <= executed this (which is just data)
8000df6a: 33 ec mov r12,62

These are the hardest bugs ever to track down, as the MCU will crash with an exception at some unrelated or non-existing address. Also, the chances of occurring are often small. It occurred more frequently when other interrupts took more time to execute (while keeping the Global Interrupt Mask set).

For me the solution was to carefully select the TWI/PDC interrupt signals that I'm interested in.

It took me a week to find this out and your elaborate post gave the confirmation that I was looking for.

Thanks for that, Hein