Working around the short interrupt - instruction skip bug

Go To Last Post
2 posts / 0 new
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi all!

So as I promised, I round up an article on how to deal with the AVR32's instruction skipping bug after a "short" interrupt signal.

The bug itself was first discovered by Catweax, and later I also ran in this problem independent of him. After some discussions in the respective thread we managed to get some good understanding on what happens here, and so, more importantly, now it is possible to provide some guidance on how this bug can be identified in a program, and avoided once found.

If you are interested in-depth, the following threads describe the problem, and the ways we managed to discover it's nature:

SHORT DESCRIPTION of the processor problem

If at peripheral level from any source a "short" (in terms of lasting just a few processor cycles) interrupt signal arrives (if at the same time the interrupts were globally enabled via the status register), the interrupt might not be processed, but the same time some cycles after the occurence of this event, an instruction will be skipped.

Such "short" signals may come from using the "idr" registers of any peripheral, and also any event (such as a DMA access) which could asynchronously take away an enabled interrupt condition (how broad this later category is unknown, one should probably be wary in any case where an interrupt set and clear event happens asynchronously to each other).

IDENTIFYING the situation

That this bug affects or not a particular software is hard to identify. The symptoms are usually relatively rare and seemingly random very odd, unexpected behaviors (depending on whatever instructions would occur after a "short" signal). To identify that in a failing program this is what might happen, you may think about the followings:

The problem can not show in sections of code where interrups are globally disabled (using the status register), moreover interrupts which are disabled by priority (using the status register) between the respective mtsr's can not produce this bug.

For the situation to occur a "short" signal is required. Such "short" signal may most probably happen caused by writing to an "idr" register while the respective source could occur, then the bug may manifest is rarely skipping some instruction a few cycles after this "idr".

An other source of a "short" signal may commonly be initating DMA transfers which might take away interrupt sources (for example a TxEmpty interrupt in the case of writing to an USART).

So in general if you think the program may misbehave for this processor bug, first check for any writes to "idr" registers (disabling peripheral interrupts), then think carefully about your interrupt sources, that what may give and what could take away an interrupt condition. If you see that the give and take away could happen asynchronously to each other, you may well observe this problem at some point.

WORKING AROUND the problem

Depending on the situation you discovered there are a multitude of possibilities to solve it.

If you tried to use "idr" - "ier" to guard short critical sections, the simplest is to replace these to global or priority based interrupt disable-enables using the status register (mtsr instruction).

When using DMA, make sure there are no hazards, most importantly keeping in mind that the DMA transfer will not start immediately when you enable it (See Catweax's topic above for more on this - ).

In general when you suspect a "short" interrupt signal (other than caused directly by using an "idr"), try to work it around such a way which eliminates this hazard. These "short" signals may not just cause this processor bug to happen, but might as well indicate a design flaw in your program which should be corrected anyway.

If you absolutely need to use an "idr" (on an interrupt which could occur at that time), keep in mind that this processor fault will strike in. You can only cover it up with the proper use of global or priority based interrupt disable-enable (See my topic for more on this - ). If you need this, you should probably also experiment with how the bug happens for your interrupt source (which instructions are likely to get skipped after setting the "idr") to make it sure you put a proper length of padding until global (or priority) enable after using the "idr". For this as a template, you can download the test case in the aforementioned topic (be sure to "patch" it with the later developed "liocheck.c" to make it practical) and modify it to your needs.


Hope this helps in flattening these bugs! :)

Last Edited: Thu. Oct 11, 2012 - 10:08 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the info, since this bug is not covered in the Errata of the actual datasheets. It might be some time until Atmel updates all concerned datasheets.