Greetings everyone.
During development on an AT32UC3A3 I ran into a peculiar issue that left me stumped because I don't trust the only explanation I came up with after three weeks of debugging. The symptoms are that the CPU appears to skip a single instruction when globally unmasking interrupts after a PDCA/USART race condition.
Sorry for the wordy post, but I find this issue quite interesting and would like to share as much information as possible.
Context:
Centered around an AT32UC3A3 I'm developing a dual-channel bus master device for a high speed bus for robotic applications. I'm basically shovelling data from the USB connection to the USARTs and back. For the USARTs I'm extensively using the PDCA to transfer data and then have the USARTs tell me when they're done transmitting (TXEMPTY). At several stages of the project I encountered odd behaviour with seemingly random hangs, exceptions or data corruption, but it was almost impossible to track down the bugs (classic Heisenbugs), until one of them decided to stick around.
To cut this long story short and to avoid anyone having to dig through all that unrelated clutter, I reduced the issue to a simplified demo program, which reproduces the issue without fail (see attachment).
The race condition:
Observant readers might've already caught the issue: When activating the PDCA to shovel data to the USART and then enabling the USART's TXEMPTY interrupt, there's the possibility that the PDCA might not yet have transferred any data to the USART when the TXEMPTY interrupt is enabled. Globally masking interrupts should obviously be used, but the PDCA might've not transferred anything when interrupts are globally unmasked.
A spontaneous but not-too-helpful idea might involve the reminder from the intc.c documentation: When enabling or disabling periphery interrupt sources one should stall the CPU by immediately reading from the same register before proceeding. However, in this case this would only ensure that the PDCA has received the enable command before the USART TXEMPTY interrupt source is enabled. The race condition persists because the PDCA still might not have transferred anything yet. (This can be tested with the example program as well.)
Either way, I expect one of two possible scenarios to happen:
1) The delay between enabling the PDCA and enabling the USART's TXEPTY interrupt is large enough so that the PDCA has transferred some data when the TXEMPTY interrupt is enabled. All's well.
2) The delay between enabling the PDCA and enabling the USART's TXEPTY interrupt is too short and the USART complains about not having any data yet by causing a TXEMPTY interrupt. In this case one could query the PDCA and return from the interrupt handler immediately if the PDCA is not done yet. This would waste some CPU cycles, but everything should be executed properly.
However, there appears to be a third scenario (or otherwise I wouldn't be writing this), which the attached demo program will show.
About the demo program:
The attached demo program is an AVR32 Studio project for compilation with the avr32-gcc toolchain and is based on some of the Atmel example projects. It's written for the AT32UC3A3 but has been tested on an AT32UC3C1 (engineering sample) as well, where it seems to behave exactly the same. The program runs off the internal RC-oscillator and assumes no external hardware except for a serial port connected to one of the USARTs. All settings should be configurable with the included configuration file, so I should run on pretty much every UC3 microcontroller.
What the program does:
Triggered by a TC it runs several iterations of trying to cause the race condition. In every iteration the delay between enabling the PDCA and enabling the USART's TXEMPTY interrupt is reduced by one clock cycle with the help of a nop-slide.
Every iteration does this:
- Print a message about the iteration and the delay,
- fill a test array with 0x88,
- trigger the race condition with variable delay,
- fill the test array with 0x11,
- print the test array,
- print a message when the PDCA transfer completed successfully.
If a spurious interrupt occurs, an additional message is printed at that time.
What I expected:
With long delays I expected scenario 1 to happen, presenting me an array full of 0x11 and no spurious interrupt message.
With short delays I expected scenario 2 to happen, presenting me an array full of 0x11 and a spurious interrupt message.
What I got:
Sorry if this spoils the end of the movie...
Instruction skip demo. Iteration 0, 12 nops 00: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 PDCA ok Iteration 1, 11 nops 00: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 PDCA ok Iteration 2, 10 nops 00: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 PDCA ok Iteration 3, 9 nops 00: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 PDCA ok Iteration 4, 8 nops 00: 11 11 11 88 11 11 11 11 11 11 11 11 11 11 11 11 10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 PDCA ok Iteration 5, 7 nops 00: 11 11 11 88 11 11 11 11 11 11 11 11 11 11 11 11 10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 PDCA ok Iteration 6, 6 nops == spurious interrupt == 00: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 PDCA ok Iteration 7, 5 nops == spurious interrupt == 00: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 10: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 20: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 30: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 PDCA ok
"¦and then it goes on with more spurious interrupts.
So what happened in iteration 4 and 5 and how is that possible?
The output is the same on the A3 and the C1. Different compiler optimisation settings cause the odd behaviour to appear in one iteration earlier or later.
The relevant assembler code looks like this, where r10 contains the address of the test array and r11 contains the test pattern 0x11:
csrf 0x10 // Globally unmask interrupts st.b r10[0x0],r11 st.b r10[0x1],r11 st.b r10[0x2],r11 st.b r10[0x3],r11 // What happened here? st.b r10[0x4],r11 st.b r10[0x5],r11 st.b r10[0x6],r11
"¦ etc.
Quite obviously there's one instruction that's not being executed and if this was a bug report, I'd stop here, and if you want to try out the program yourself and come up with your own hypothesis, read no further.
Here's my hypothesis:
With the time that the USART's TXEMPTY interrupt signal is active gradually reducing, there appears to be a time-frame of two cycles in which the signal is too short to properly initiate interrupt handling, or, in other words, while the CPU is initiating interrupt handling, the interrupt signal quickly disappears and the CPU messes up the program counter when aborting interrupt handling. After all, no interrupt is executed.
As the UC3 core is probably the identical on all microcontrollers, the same behaviour can be seen with the A3 and the C1.
This left me with two possibilities (again):
1) I messed up real good somewhere and none of my fellow local hackers caught it,
2) or Atmel messed up and there's a real bug in the microcontroller, which went unnoticed until now.
As microcontrollers from the UC3 family seem to be widely used by now, even running Linux, possibility 2 sounds exceedingly unlikely. Ironically, the A3 series has even been promoted from their preliminary status just this month.
So here we are after this wall of text: I would really like to know if someone else can reproduce this behaviour with their setup, possibly even on other series besides A3 and C1. Running the test on a microcontroller from the AT32UC3B series would be especially interesting, due to their different interrupt handling routine.
Comments, ideas and solutions are welcome.