GPIO polling jitter

Go To Last Post
13 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello,

I am using an AT32UC3A3256 running at 66MHz. I poll a GPIO pin continuously via the local bus and toggle an LED via the local bus when it goes high. Monitoring both of these signals with an oscilloscope I see a jitter on the LED drive signal of 6 clock cycles that I can’t explain by examining the .lss file. The signal that I am polling has a rise time of around 5ns (one clock cycle is approximately 15.2ns) and the assembly of my polling loop consists of 3 instructions. Below is the relevant part of the .lss file:                                               

			while((AVR32_GPIO_LOCAL.port[2].pvr & (1 << 11)) == 0);
8000783e:	ee f8 02 60 	ld.w	r8,r7[608]
80007842:	e2 18 08 00 	andl	r8,0x800,COH
80007846:	cf c0       	breq	8000783e <main+0x7a2>
			AVR32_GPIO_LOCAL.port[0].ovrt = 1 << 0;
80007848:	ef 46 00 5c 	st.w	r7[92],r6

From the above code I would expect a jitter of 2 clock cycles. Can anyone explain why I see a jitter of 6 clock cycles?

Thanks.

This topic has a solution.
Last Edited: Wed. Sep 10, 2014 - 01:40 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Perhaps I am looking at it incorrectly, but I see 4 cycles(minimum) to toggle the output and 6 cycles (maximum).

Last Edited: Wed. Sep 10, 2014 - 07:39 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I agree. That is why I would expect a jitter of 2 cycles, not 6 cycles as I observe with an oscilloscope.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I suspect there is another while loop wrapped around the .lss snippet you pasted.

 

From what I can tell, you are using the input pin to "gate" the toggling of the output pin.

Are you saying the toggle rate of the output pin varies by 90ns while the input signal is high?

 

Or are you trying to make the output pin "follow" the value of the input pin? I'm not entirely sure what you are trying to achieve.

 

 

If we examine the .lss output in detail, we can start by looking at at 

while((AVR32_GPIO_LOCAL.port[2].pvr & (1 << 11)) == 0);

There are 3 instructions, all in theory single-cycle, if the branch is not taken (input pin is "1"). If the input pin is "0", the breq instruction is going to need two additional clock cycles to perform the branch. Considering you are running at 66MHz, you must have enabled 1 wait state for your FLASH, adding, in this particular case, another clock cycle to the branch because the first instruction jumped to is 4 bytes long and not aligned on a word address, leaving us with a total of 6 cycles for this bit.

Assuming there is another while(1) loop wrapped around your pasted code, we can apply the same analysis to it. This time around the branch at 0x80007846 is not going to be taken, so the instruction only needs one clock cycle to complete. The st.w instruction is adding another cycle and the assumed branch instruction following it is going to take 4 cycles to get back to 0x8000783E leaving us with 8 clock cycles in the outer loop.

 

Ignoring propagation delays, the output pin is going to start toggling no less then 4 cycles and no more then 10 cycles after the rising edge of the input signal.

Last Edited: Wed. Sep 10, 2014 - 10:20 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks very much for the reply.

 

There is another loop in my code but it is not quite as you have assumed. I am using the UC3 to read parallel data from an external device (a JPEG camera). When there is valid data to be read the camera sets a "valid data" signal high. The camera then outputs 512 bytes of data which is read by the UC3 by reading a parallel port 512 times at the same rate that the camera produces data. The camera then clears the "valid data" signal and my code then loops back to

while((AVR32_GPIO_LOCAL.port[2].pvr & (1 << 11)) == 0);

to wait until the "valid data" signal goes high again and then read the next 512 bytes of data and so on. I only inserted the code to toggle the LED (the st.w instruction) so that I could monitor the jitter in the "reaction time" of my code. You are correct in saying that the delay between the "valid data" signal going high and the LED toggling varies by 90ns, which equates to 6 clock cycles, but my code doesn't loop back to

while((AVR32_GPIO_LOCAL.port[2].pvr & (1 << 11)) == 0);

as soon as the LED is toggled. Basically, my code waits in the above while loop until the "valid data" signal goes high and then does some things, at the end of which, the "valid data" signal goes low and my code loops back to the above while loop and waits until "valid data" goes high to repeat the process. It is the variation in the time that it takes to start doing these things after "valid data" goes high that I can't completely explain. What you say about the FLASH wait state makes sense. (Yes, I have enabled 1 wait state for the FLASH.) This adds another cycle to the jitter that I wasn't aware of. Excuse my ignorance but is there a way to make the compiler word align the address of a particular piece of code?

 

Even considering the FLASH wait state I still see 3 more cycles of jitter than I can explain but your reply has certainly been of help. Thanks again.

Last Edited: Wed. Sep 10, 2014 - 01:54 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ah...

So let's dig into it.

In the best case scenario, the input pin goes high just a nanosecond before the pin value register is sampled at 0x8000783E. The branch at 0x80007846 is not taken, and the output pin is toggled 4 clock cycles later.

In the worst case scenario, the input pin goes high just a nanosecond after the pin value register is sampled at 0x8000783E. The branch at 0x80007846 is taken, eating 4 clock cycles instead of just one. Next, the input pin is sampled again, and as it has gone high, the branch at 0x80007846 isn't taken this time around, and the output pin is toggled 10 clock cycles after the rising edge.

Here is the instruction flow for the latter case:

ld.w    r8,r7[608]      // 1 clock cycle. Input pin is sampled as "0"
andl    r8,0x800,COH    // 1 clock cycle. Input pin is now "1" but it changed too late to get sampled
breq    8000783e        // 4 clock cycles, as the branch is taken
ld.w    r8,r7[608]      // 1 clock cycle. Input pin is sampled as "1"
andl    r8,0x800,COH    // 1 clock cycle.
breq    8000783e        // 1 clock cycles, as the branch is not taken.
st.w    r7[92],r6       // 1 clock cycle. Output pin is toggled

 

As you can see, the response time is going to be anywhere between 4 and 10 clock cycles.

 

I'm not sure how you could make the compiler align particular bits of C code.

Have you tried compiling using -O3 and checked the delay and jitter?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks again for the reply.

 

I will experiment with different optimisation levels to see what effect they have on the jitter but for timing reasons I can only use -01 for my project.

 

Are you saying that the breq takes 4 cycles if the branch is taken, or are you including the cycles required to repeat the ld.w, andl and breq instructions? If you are including the cycles required to repeat these instructions then I think you may have doubled up in your analysis of the timing. If you are saying that the breq instruction takes 4 cycles on its own if the branch is taken then that would explain exactly what I am observing.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The breq instruction, in your case, needs 4 clock cycles to complete if the branch is taken, and one to complete if it's not.

All the change-of-flow instructions are evaluated in a single clock cycle. The extra cycles are for flushing the pipeline and loading the next instruction to be executed. This usually takes 2 extra clock cycles, but in some cases it can add 3 or more. One example would be jumping to an extended format (4 byte) instruction that isn't word-aligned, as the instruction has to be loaded in two separate operations.

For conditional change-of-flow instructions, you are only going to get the pipeline flush "penalty" if the the condition(s) are met. This also applies to conditions that always evaluate as true, like bral and even rjmp, because there are no branch-prediction features in the AVR32 architecture.

 

As I said before, I don't know how to align stuff in C, but maybe you could use the same trick I'm using in assembler:

.p2alignw 2, 0xD703

This is going to align the instruction following it to a 2^n byte memory address, filling the space with nop instructions. In this case n=2 for a 4-byte alignment.

You could try putting that directive in some in-line assembly just ahead of

while((AVR32_GPIO_LOCAL.port[2].pvr & (1 << 11)) == 0);

I don't know how to properly format this in C, as I don't write C/C++ for micro-controllers but maybe someone else can pitch in...?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

__asm__ ( ".p2alignw 2, 0xD703" );
will inline that ASM in GCC.

Last Edited: Thu. Sep 11, 2014 - 09:02 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

DukerX and mikech, thanks very much for the information. I'll play around with .p2alignw and let you know what I see. Just for your information and comment if you like, I originally compiled with optimisation -O1 and got the code that I pasted above that resulted in 6 cycles of jitter. I then compiled with optimisation -O0 and got 4 instructions in the while loop (or breq loop) instead of the 3 I got with optimisation -O1. No surprises there but the jitter is still observed to be 6 cycles. The LED toggles later with -O0 than it does with -O1 but the jitter is still the same.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I tried .p2alignw 2, 0xD703 but it didn't always word align my code with -O1 optimisation but .p2alignw did always work. Anyway, the timing jitter didn't change whether my code was word aligned or not. I did find that with optimisation -O3 the jitter seemed to decrease to 5 cycles, even though the breq loop still consisted of 3 instructions.

 

One strange thing I noticed was that everything seemed to work normally, and jitter was unaffected, if I set the set the flash to zero wait states even though I am running at 66 MHz. It is only when I exceed 66 MHz that I have to set the flash to 1 wait state. I'm curious to hear any comments on this.

Last Edited: Fri. Sep 19, 2014 - 01:07 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Horse2929 wrote:

One strange thing I noticed was that everything seemed to work normally, and jitter was unaffected, if I set the set the flash to zero wait states even though I am running at 66 MHz. It is only when I exceed 66 MHz that I have to set the flash to 1 wait state. I'm curious to hear any comments on this.

That sounds awfully familiar... I had set my AT32UC3A3256 to 0 wait states while running at 66 MHz by accident a while ago... and it worked perfectly. In winter. Sometime during late spring I noticed weird hang-ups appearing more and more often and in summer the device was unusable. However, it worked perfectly when I put a cooled metal block on top of the chip. As it turns out, at least my device worked with 0 wait states at 66 MHz when it was cool enough, but it required 1 wait state to work reliably at all (room) temperatures.

Maybe it’s the same for you?

This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think that DukerX is correct with his explanation of why I was observing  6 cycles of jitter with my polling loop. If the breq branch is taken the pipeline must be flushed and the pipeline is 3 cycles long, so the breq instruction will take 4 cycles in total. Thanks to all who contributed. I have certainly learnt a few things.