Interrupt Timing Conundrum

Go To Last Post
59 posts / 0 new

Pages

Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm having difficulty understanding the timing of an ISR. I'm using Timer/Counter0 to both toggle OC0A output and generate an interrupt. The ISR toggles output pin D2 while a loop in main() toggles output pin D0.

I'm using Atmel Studio 6.1 with an ATmega32U2 and looking at the output using a logic analyzer.

Here is the program:

#define F_CPU 16000000UL // @ 5.0V

#include 
#include 
#include 

#include 
#include 
#include 


void Init_LED(void)
{
   // Initialize LED on Mattair Board (D0)
   DDRD  |= (1 << DDD0);  // output
   PORTD |= (1 << PORTD0); // output High = LED On
}


void Init_OutD2(void)
{
   // Pin D2 is an output
   DDRD  |= (1 << DDD2);  // output
   PORTD |= (1 << PORTD2); // set output High
}


void Init_Tmr0(void) // output square wave on OC0A=PB7
{
   // Mode 2: CTC (WGM0[2:0] = 2), TOP=OCR0A
	
   // PB7 is an output
   DDRB  |= (1<<DDB7);  
	
   // CTC with TOP=OCR0A
   // Toggle OC0A on Compare Match
   TCCR0A = ( (1<<COM0A0) | (1<<WGM01) ); 
	
   // TCCR0B=0x03, Timer CLK=clk(I/O)/64 (From prescaler)
   TCCR0B = ( (1<<CS01) | (1<<CS00) );   
	
   OCR0A = 23;     // TOP, div by  24
	       
   TIMSK0 = 0x02;  // Output Compare Match_A Interrupt Enabled
}


void SetupHardware(void)
{
   // Disable watchdog if enabled by bootloader/fuses
   MCUSR &= ~(1 << WDRF);
   wdt_disable();

   // Set clock division
   clock_prescale_set(clock_div_1);  // 16MHz/1 = 16MHz for CPU @ 5.0V
	
   Init_LED();
   Init_OutD2();
   Init_Tmr0();
}


int main(void)
{
   SetupHardware();
   sei();
   while(1)
   {
      // Toggle LED on output D0
      PIND = (1 << PIND0);    
   }	
}


ISR ( TIMER0_COMPA_vect ) // interrupt from timer0
{
  // Toggle Output Pin D2
  PIND = (1 << PIND2);
}

This is from the .lss file for main():

When there is no interrupt, output pin D0 is toggled every 3 cpu clock cycles.

Next is the .lss for the ISR:

The entire interrupt duration is 31 cpu clock cycles (5 cycles to get to the ISR and 26 cycles for the ISR).

So far, so good. It matches what is expected from the data sheet.

Here's where things start to get funky:

I have aligned the "out" instruction in the ISR with the transition of pin D2. The instruction begins with the falling edge of the cpu clock and the output is asserted on the rising edge of the clock.

Edit: This is wrong! The instruction begins and ends with the rising edge of the clock. See this post.

You can see that the interrupt duration stretches the low phase of pin D0 from the normal 3 cycles to 34 cycles, which agrees with the 31 cycles calculated for the interrupt.

Two cycles are marked with a red "?". It appears that the two cycle rjmp instruction in main() is being divided by the ISR, even though the interrupt flag was asserted in the clock period prior to the out instruction.

Now look at the following trace:

The rising edge of OC0A is coincident with the falling edge of pin D0. Again, aligning the "out" instruction with the falling edge of pin D2, there is also a gap of one cycle (marked with a red "?") after the rjmp and before the first cycle of the interrupt. Also, it appears that the last cycle of reti instruction in the ISR occurs at the same time as the "out" instruction in main().

It is as though the falling edge of D2 is one cycle late.

I have no explanation for this.

Last Edited: Mon. Jan 20, 2014 - 06:52 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And when compared to the start of the OUT instruction does the actual change in the pin value happen? I certainly don't know.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
And when compared to the start of the OUT instruction does the actual change in the pin value happen? I certainly don't know.

If the pin voltage change for the 'out' instruction is shifted one or more clock cycles to the right (pipe lining?) then that would be true for both D2 and D0. This would be the equivalent of shifting everything shown below the trace window to the left.

The problem is the number of clock cycles between the edge of D2 the the following edge of D0. This wouldn't change if the pin change is offset from the instruction for both 'out' instructions.

Following that line of thought, the conundrum would vanish if the output was delayed one clock cycle for only the 'out' instruction in the ISR (which changes D2). Is that possible? The 'out' instruction in the ISR is followed by a 'pop' instruction while the 'out' instruction in the main() loop is followed by an 'rjmp' instruction. Does that make a difference as to when the pin voltage changes relative to the instruction?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Following that line of thought, the conundrum would vanish if the output was delayed one clock cycle for only the 'out' instruction in the ISR
Which is of course a ridiculous assumption.

Then when is the interrupt flag set in relation to when OC0A goes high? And when the interrupt flag goes high, how long before the interrupt mechanism recognizes it? If the interrupt flag goes high on the cycle just before a new opcode starts, does the interrupt happen before or after that opcode? (My guess would be after).

I think that you simply have an erroneous assumption about ISR timing.

Regards,
Steve A.

The Board helps those that help themselves.

Last Edited: Mon. Jan 13, 2014 - 06:07 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You've done a lot of digging. Diagrams well annotated.

Let's examine ISR sequence, from what I know. You can correct me where I go wrong.

-- There are "wildcard" operations that affect this cycle counting. For example, CPU clock stopped for four cycles during EEPROM read.

-- Interaction with other ISRs can "throw off" cycle counting with interrupt latency and such.

-- But steady-state simple program such as shown, the above factors don't come into play and a more-or-less steady state operation should occur.

-- Why "more-or-less"? What is the first thing that needs to happen?

Quote:
If an interrupt occurs during execution of a multi-cycle instruction, this instruction is completed before the interrupt is served.

So the current instruction needs to be finished first. Are you synchronous with the ISR over time? I'd expect to sometimes fire during the OUT and sometimes during the RJMP and see some jitter. But you don't?

Now, Let's examine your 5 cycles and your ? cycle. I'd look at it quite differently...

-- 1-3 Finish current instruction (Or is that 0 to n?) In this example, just perhaps a one cycle difference so no big deal?

-- 4 cycles to get to the vector

-- 3 cycles to take the JMP

So I'd say minimum of 8 cycles to get to the ISR. I suppose you could say that a one-cycle instruction takes 0 extra cycles. To me, I don't care in any of my apps about sub-cycle timing or exactly when the output pin goes high in relation to the AVR clock.

[Hmmm--"in relation to the AVR clock" -- add another channel watching CLKO, and/or XTAL2?]

Quote:
After four clock cycles the program vector address for the actual interrupt handling routine is executed. During this four clock cycle period, the Program Counter is pushed onto the Stack.
The vector is normally a jump to the interrupt routine, and this jump takes three clock cycles. If an interrupt occurs during execution of a multi-cycle instruction, this instruction is completed before the interrupt is served.

Now I've lost track of the question. ;) But of no matter, after I challenged the basic assumption of 5 cycles plus ?.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
Which is of course a ridiculous assumption.

Unless that's how the AVR works. Challenge your assumptions. I'm not saying that that is how it works, only posing a hypothesis.

If you assume that only the 'out' in the ISR is shifted to the left by one cpu clock cycle, everything falls into place. For both Figure 1 and Figure 2.

When the interrupt flag is set can not be directly observed, only derived from the output on D2, calculating backwards.

As I see it, the critical parameter is the number of clock cycles from the transition of D2 to the transition of D0. The problem is that the observed doesn't match the calculated.

Quote:
I think that you simply have an erroneous assumption about ISR timing.

Something aint right! But what?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
So I'd say minimum of 8 cycles to get to the ISR.

From the datasheet for the ATmega32U2:

Quote:
6.8.1 Interrupt Response Time

The interrupt execution response for all the enabled AVR interrupts is five clock cycles minimum.
After five clock cycles the program vector address for the actual interrupt handling routine is executed.

Also, from my OP:

Quote:
You can see that the interrupt duration stretches the low phase of pin D0 from the normal 3 cycles to 34 cycles, which agrees with the 31 cycles calculated for the interrupt.

The calculated and the observed match.
It takes 5 clock cycles before executing the first instruction in the ISR.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How do you figure interrupt response should be 5 cycles? It is 4 cycles, plus the JMP in the vector table. JMP is three cycles. Some devices use an RJMP at two cycles.

Look at the .lss file, I'd bet it has RJMP instructions in the vector table. That's 6 cycles and your missing cycle is now accounted for...

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ah, OK I see that the 32U2 has a 5 cycle response time.

That leaves you with another mystery. The RJMP takes 2 cycles more. If it's a JMP, then 3 cycles. You have only 1'?' cycle...

What does the .lss for the vector table show?

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you want to be absolutely sure how long an interrupt takes to service:

void Init_Tmr0(void) {
  TCCR0A = (1<<WGM01);
  TCCR0B = (1<<CS00);
  OCR0A = 0;
  DDRB |= (1<<PB0);
  TIMSK0 = (1<<OCIE0A);
}

ISR(TIMER0_COMPA_vect, ISR_NAKED) {
  PINB |= (1<<PB0);  // whichever port/bit you want
  __asm__ __volatile__ ("sei $ reti");
}

int main(void) {
  Init_Tmr0();
  sei();
  while(1);
}

This will starve main() completely, and will toggle PB0 at a rate equal exactly to the round-trip cycle cost of the ISR.

The only instructions that will execute are:

(r)jmp __vector_19 ; (2)3 cycles
sbi PINB, 0 ; 2 cycles
sei ; 1 cycles
reti ; 5 cycles

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

The calculated and the observed match.
It takes 5 clock cycles before executing the first instruction in the ISR.

No it doesn't. Let's do it one more time...

-- Finish current instruction. Now, you tell me whether this is 0/1/2/3/... Does it in fact sometimes hit during the two-cycle instruction? Then you should be seeing some jitter.
-- 5 cycles to service the interrupt, according to Joey.
-- 3 cycles for (generic) JMP. 2 if RJMP is substituted in the vector table.

In no case are the above going to add up to five. Or six.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Quote:

The calculated and the observed match.
It takes 5 clock cycles before executing the first instruction in the ISR.

No it doesn't. Let's do it one more time...

-- Finish current instruction. Now, you tell me whether this is 0/1/2/3/... Does it in fact sometimes hit during the two-cycle instruction? Then you should be seeing some jitter.
-- 5 cycles to service the interrupt, according to Joey.
-- 3 cycles for (generic) JMP. 2 if RJMP is substituted in the vector table.

In no case are the above going to add up to five. Or six.

If you look at Figures 1 & 2 in my OP, you can see that pin D0 stays low for 34 clock cycles. When there is no interrupt, it stay low for 3 clock cycles. The difference is 31 clock cycles.

The ISR is 26 clock cycles. 31-26=5. That's 5 cycles before executing the first instruction of the ISR. This agrees with the data sheet which I quoted in my post of 13 January at 11:48pm.

If the interrupt lead-in is more than 5 cycles, how do you explain pin D0 staying low for the 34 clock cycles?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Chuck99 wrote:
If you look at Figures 1 & 2 in my OP, you can see that pin D0 stays low for 34 clock cycles. When there is no interrupt, it stay low for 3 clock cycles. The difference is 31 clock cycles.

The ISR is 26 clock cycles. 31-26=5. That's 5 cycles before executing the first instruction of the ISR. This agrees with the data sheet

No it does not. As Lee and I have been saying:
In the datasheet, Atmel wrote:
6.8.1 Interrupt Response Time
    The interrupt execution response for all the enabled AVR interrupts is five clock cycles minimum. After five clock cycles the program vector address for the actual interrupt handling routine is executed. During these five clock cycle period, the Program Counter is pushed onto the Stack. The vector is normally a jump to the interrupt routine, and this jump takes three clock cycles.
You are still ignoring the clear statment that a 3-cycle RJMP follows the initial 5-cycle latency during which the PC is pushed onto the stack. If you were to account for that you would see in fact that your ISR is shorter than it possibly can be.
Quote:
If the interrupt lead-in is more than 5 cycles, how do you explain pin D0 staying low for the 34 clock cycles?
I suspect a datasheet errata w.r.t. both interrupt response time and cycle cost of RET/RETI.

The problem seems to be with the actual width of the program counter (PC). There are several references throughout the datasheet to the PC being 3 bytes wide. There are also references to it being 16-bits wide.

However for a 32KB flash device like the ATmega32U2 the PC is only 16-bits (2 bytes) wide. For these devices, only two bytes are pushed onto the stack for CALL, two are popped from the stack by a RET/RETI, and only two are place on the stack when servicing an interrupt before the vector is executed.

All of these operations take 4 cycles, not the 5 indicated in many places in the datasheet.

Atmel is guilty of countless copy/paste datasheet errata. I believe this to be a shining example. The AVR Instruction Set Manual is clear on the cycle cost of all of these operations. Devices with a 16-bit PC have a cost of 4 cycles for these operations, while 22-bit PC devices have a cost of 5 cycles. Despite text in the datasheet to the contrary, the 32U2 must have a 16-bit PC.

Think about it. On devices with more than 64 KB of flash, the RAMPX, RAMPY, RAMPZ, RAMPD, and EIND registers are required to fully access flash. The 32U2 has no such registers.

Looking again at your .lss, the ISR cost is 25 cycles, not 26. The interrupt response time is 4 cycles, not 5. Add to that the 3 cycles for the JMP that is the vector to the ISR, and you've got a total of 32 cycles. If the vector is a 2-cycle RJMP (you still haven't answered that question), then you've got a total of 31 cycles.

I urge you to run the code I suggested above to determine the real round-trip cycle cost of an ISR. Assuming the errata exists, the round-trip cycle cost of the ISR should be 15 cycles with a JMP vector, or 14 cycles with an RJMP vector. If there is not errata, you would see 17 with JMP, 16 with RJMP.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Chuck99 wrote:
theusch wrote:
So I'd say minimum of 8 cycles to get to the ISR.

From the datasheet for the ATmega32U2:

Quote:
6.8.1 Interrupt Response Time

The interrupt execution response for all the enabled AVR interrupts is five clock cycles minimum.
After five clock cycles the program vector address for the actual interrupt handling routine is executed.



Yeah but that says that after 5 the instruction at the vector is executed. That instruction is 2 cycles for an RJMP or 3 for a JMP so doesn't that make the total time to reach the entry of the ISR 7 or 8. The two cycles you seem to be "missing" are surely the time for RJMP in the vector table aren't they?

(I know it's a 32K micro but these days AS6 defaults to -Wl,-relax so will be using RJMPs not JMPs in the vector table if the ISR code is in range).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'd have a bit more interest if OP tells the importance of this quest.

That said, I saw the 5ns notation on the logic analyzer trace above. It might be indeed interesting to "analyze" a trivial ISR similar to the above, but also include a fast channel with the AVR clock. E.g. See where the output pin actually changes w.r.t. the clock.

"Trivial" ISR is IMO "SBI PINX, " and "RETI" right in the vector table entry. Note that this will help to clear up the 34-5 and similar.

Another area of interest to me is why OP is apparently seeing consistent results, and no jitter from "finishing current instruction".

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

if OP tells the importance of this quest.

My £5 on video generation (isn't it always when the navel fluff gets inspected this deeply?).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
I'd have a bit more interest if OP tells the importance of this quest.

clawson wrote:
My £5 on video generation (isn't it always when the navel fluff gets inspected this deeply?).

This started the evening of January 1. I had a couple hours to kill and testing the ISR timing has been on my to-do list for over a year. Just curiosity.

Mostly, I wanted to know how long it took to decrement a variable (byte, word, long, long long) but also how long it took to execute the first 'real' instruction. Interrupt response and duration.

It was when the results didn't jive with what I expected that it became more of a challenge. I've spent far more time on this than I ever expected.

As for the number of cycles for the interrupt lead-in (I'm using lead-in to describe the part of the interrupt that precedes the Prologue), at first I thought it was as you two and joeymorin describe,
5 cycles plus the 3 cycle jmp in the vector table. But that didn't match the data in the logic analyzer trace. (BTW, the LA is sampling every 8ns, not 5ns.)

The low pulse of pin D0 is stretched by the interrupt from 3 cycles to 34 cycles. This is consistent from sample to sample no matter where the interrupt occurs. The entire interrupt process consumes 31 clock cycles. (The Clock shown in the two figures in the OP is CLKO output on pin PC7.)

In the OP I showed the .lss output for the ISR.
I calculate a total of 26 clock cycles. That leaves 5 cycles for the lead-in.

If in fact the lead-in (including the jmp in the vector table) is 8 cycles, where did I make a mistake?

Here is the entry in the vector table:

 4c:	0c 94 78 00 	jmp	0xf0	; 0xf0 <__vector_19>

Again, it was the timing displayed by the logic analyzer that made me think the lead-in was only 5 clock cycles.

I appreciate any insight any of you can provide.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
"Trivial" ISR is IMO "SBI PINX, " and "RETI" right in the vector table entry.

Sounds like a good idea.
How do I do that?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
If the vector is a 2-cycle RJMP (you still haven't answered that question), then you've got a total of 31 cycles.

From the .lss file:

4c:	0c 94 78 00 	jmp	0xf0	; 0xf0 <__vector_19>

The program uses a 3 cycle jmp.
The numbers still don't add up.

If I can implement theusch's idea of using SBI in the vector table, that should clear up the reti timing.

Edit:
Even though I've been working with the AVR for a few years, I still suffer from habit of believing what's in the datasheet. :?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Program from joeymorin (Posted: Jan 14, 2014 - 01:37 AM):

#include 
#include 


void Init_Tmr0(void) {
	TCCR0A = (1<<WGM01);
	TCCR0B = (1<<CS00);
	OCR0A = 0;
	DDRD |= (1<<PD0);
	TIMSK0 = (1<<OCIE0A);
}

ISR(TIMER0_COMPA_vect, ISR_NAKED) {
	PIND |= (1<<PD0);  // whichever port/bit you want
	__asm__ __volatile__ ("sei $ reti");
}

int main(void) {
	Init_Tmr0();
	sei();
	while(1);
}

From the .lss file:

 4c:	0c 94 4f 00 	jmp	0x9e	; 0x9e <__vector_19>

----------------------

ISR(TIMER0_COMPA_vect, ISR_NAKED) {
	PIND |= (1<<PD0);  // whichever port/bit you want
  9e:	48 9a       	sbi	0x09, 0	; 9
	__asm__ __volatile__ ("sei $ reti");
  a0:	78 94       	sei
  a2:	18 95       	reti

-----------------------

int main(void) {
	Init_Tmr0();
  a4:	0e 94 46 00 	call	0x8c	; 0x8c 
	sei();
  a8:	78 94       	sei
  aa:	ff cf       	rjmp	.-2      	; 0xaa 

From the datasheet for the ATmega32U2

sbi    2
sei    1
reti   5
---------
Total  8

(I don't think the sei is actually needed since the reti sets the i bit.)

This is the logic analyzer image:

From the LA image, the interrupt uses 13 clock cycles. 13-8=5. 5 clock cycles for the lead-in.

Is this correct?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Chuck99 wrote:
(I don't think the sei is actually needed since the reti sets the i bit.)
It is required for the test.

Yes, reti sets the I bit, but and AVR will execute at least one instruction after the I bit is set, before any pending interrupts are serviced. Without the sei, that instruction would be the next one in the interrupted code in main(). We want to starve main() altogether so that our observations are exclusively of the round-trip cycle cost of the ISR.

Inserting an sei before the reti means that the 'one' instruction that gets executed is the reti, still part of the ISR. After which the next pending interrupt is immediately serviced without running any code from main().

The timer configuration (CTC mode with a prescaler of 1 and a resolution of 1) ensures that there is always a pending interrupt.

Quote:
From the LA image, the interrupt uses 13 clock cycles. 13-8=5. 5 clock cycles for the lead-in.
You already know that the 'lead-in' as you're calling it includes the 3 cycles of the jmp instruction that is the vector. This would leave 2 cycles for the push of PC onto the stack, which the datasheet says takes 5 cycles.

If the datasheet is correct we should see:

PC => stack ; 5 cycles
JMP vector  ; 3 cycles
sbi PIND, 0 ; 2 cycles
sei         ; 1 cycle
reti        ; 5 cycles
----------------------
             16 cycles

This doesn't agree with your LA capture.

Now let's assume the datasheet errata exists:

PC => stack ; 4 cycles
JMP vector  ; 3 cycles
sbi PIND, 0 ; 2 cycles
sei         ; 1 cycle
reti        ; 4 cycles
----------------------
             14 cycles

This doesn't agree either (unless my own arithmetic is off ;) ... I'm starting to go cross-eyed...)

Now I'm confused.

Let's break it down a bit more:

ISR(TIMER0_COMPA_vect, ISR_NAKED) {
   PIND |= (1<<PD0);  // whichever port/bit you want
   __asm__ __volatile__ ("call foo%=      \n"
                         "sei             \n\t"
                       "foo%=:            \n\t"
                         "reti            \n\t
                         "ret             ");
}

This will add two instructions to the ISR, a call and a ret. How long does your LA say the ISR is now? 21 cycles? 22 cycles? 23 cycles? Something else?

You could do the same thing to confirm how long a jmp (or any other instruction) really is.

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Chuck99 wrote:
theusch wrote:
"Trivial" ISR is IMO "SBI PINX, " and "RETI" right in the vector table entry.
Sounds like a good idea.
How do I do that?
I'd recommend inserting an sei before the reti for the reasons discussed above:
#define __SFR_OFFSET 0
#include 

        .section .text

        .global main

main:
        ldi     r16,    1<<WGM01
        out     TCCR0A, r16
        ldi     r16,    1<<CS00
        out     TCCR0B, r16
        ldi     r16,    0
        out     OCR0A,  r16
        ldi     r16,    1<<PD0
        out     DDRD,   r16
        ldi     r16,    1<<OCIE0A
        sts     TIMSK0, r16
loop:
        rjmp    loop

        .section timer0_compa_vect,"ax",@progbits 

        sbi     PIND,   0
        sei
        reti

        .end
avr-gcc -Wall -mmcu=atmega32u2 -nostartfiles -save-temps -Wl,--section-start=timer0_compa_vect=0x4c isr_timing_test.S -o isr_timing_test.elf
Disassembly of section .text:

00000000 <__ctors_end>:
   0:	02 e0       	ldi	r16, 0x02	; 2
   2:	04 bd       	out	0x24, r16	; 36
   4:	01 e0       	ldi	r16, 0x01	; 1
   6:	05 bd       	out	0x25, r16	; 37
   8:	00 e0       	ldi	r16, 0x00	; 0
   a:	07 bd       	out	0x27, r16	; 39
   c:	01 e0       	ldi	r16, 0x01	; 1
   e:	0a b9       	out	0x0a, r16	; 10
  10:	02 e0       	ldi	r16, 0x02	; 2
  12:	00 93 6e 00 	sts	0x006E, r16

00000016 :
  16:	ff cf       	rjmp	.-2      	; 0x16 

Disassembly of section timer0_compa_vect:

0000004c <_end-0x8000b4>:
  4c:	48 9a       	sbi	0x09, 0	; 9
  4e:	78 94       	sei
  50:	18 95       	reti
avr-objcopy -O ihex isr_timing_test.elf isr_timing_test.hex
:1000000002E004BD01E005BD00E007BD01E00AB962
:0800100002E000936E00FFCF37
:06004C00489A7894189513
:00000001FF

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
Chuck99 wrote:
(I don't think the sei is actually needed since the reti sets the i bit.)
It is required for the test.

Yes, reti sets the I bit, but and AVR will execute at least one instruction after the I bit is set, before any pending interrupts are serviced. Without the sei, that instruction would be the next one in the interrupted code in main(). We want to starve main() altogether so that our observations are exclusively of the round-trip cycle cost of the ISR.

Inserting an sei before the reti means that the 'one' instruction that gets executed is the reti, still part of the ISR. After which the next pending interrupt is immediately serviced without running any code from main().

The timer configuration (CTC mode with a prescaler of 1 and a resolution of 1) ensures that there is always a pending interrupt.


Cute. Very cute.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
Let's break it down a bit more:
ISR(TIMER0_COMPA_vect, ISR_NAKED) {
   PIND |= (1<<PD0);  // whichever port/bit you want
   __asm__ __volatile__ ("call foo%=      \n"
                         "sei             \n\t"
                       "foo%=:            \n\t"
                         "reti            \n\t
                         "ret             ");
}

This will add two instructions to the ISR, a call and a ret. How long does your LA say the ISR is now? 21 cycles? 22 cycles? 23 cycles? Something else?


The new program with Call added to ISR:

#include 
#include 

void Init_Tmr0(void) {
	TCCR0A = (1<<WGM01);
	TCCR0B = (1<<CS00);
	OCR0A = 0;
	DDRD |= (1<<PD0);
	TIMSK0 = (1<<OCIE0A);
}


ISR(TIMER0_COMPA_vect, ISR_NAKED) {
	PIND |= (1<<PD0);  // whichever port/bit you want
	__asm__ __volatile__ ("call foo      \n"
	"sei             \n\t"
	"foo:            \n\t"
	"reti            \n\t"
	"ret             ");
}

int main(void) {
	Init_Tmr0();
	sei();
	while(1);
}

The .lss file:

 4c:	0c 94 4f 00 	jmp	0x9e	; 0x9e <__vector_19>

-----------------

ISR(TIMER0_COMPA_vect, ISR_NAKED) {
	PIND |= (1<<PD0);  // whichever port/bit you want
  9e:	48 9a       	sbi	0x09, 0	; 9
	__asm__ __volatile__ ("call foo      \n"
  a0:	0e 94 53 00 	call	0xa6	; 0xa6 
  a4:	78 94       	sei

000000a6 :
  a6:	18 95       	reti
  a8:	08 95       	ret

------------------

int main(void) {
	Init_Tmr0();
  aa:	0e 94 46 00 	call	0x8c	; 0x8c 
	sei();
  ae:	78 94       	sei
  b0:	ff cf       	rjmp	.-2      	; 0xb0 

Here is the LA capture:

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
ISR(TIMER0_COMPA_vect, ISR_NAKED) {
	PIND |= (1<<PD0);  // whichever port/bit you want
	__asm__ __volatile__ ("call foo      \n"
	"sei             \n\t"
	"foo:            \n\t"
	"reti            \n\t"
	"ret             ");
}

Ah crap. Copy paste error, sorry. Label foo is in wrong place.

Should be:

ISR(TIMER0_COMPA_vect, ISR_NAKED) {
	PIND |= (1<<PD0);  // whichever port/bit you want
	__asm__ __volatile__ ("call foo      \n"
	"sei             \n\t"
	"reti            \n\t"
	"foo:            \n\t"
	"ret             ");
}

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
Ah crap. Copy paste error, sorry. Label foo is in wrong place.

Should be:

ISR(TIMER0_COMPA_vect, ISR_NAKED) {
	PIND |= (1<<PD0);  // whichever port/bit you want
	__asm__ __volatile__ ("call foo      \n"
	"sei             \n\t"
	"reti            \n\t"
	"foo:            \n\t"
	"ret             ");
}

Program:

#include 
#include 


void Init_Tmr0(void) {
	TCCR0A = (1<<WGM01);
	TCCR0B = (1<<CS00);
	OCR0A = 0;
	DDRD |= (1<<PD0);
	TIMSK0 = (1<<OCIE0A);
}


ISR(TIMER0_COMPA_vect, ISR_NAKED) {
	PIND |= (1<<PD0);  // whichever port/bit you want
	__asm__ __volatile__ ("call foo      \n"
		"sei             \n\t"
		"reti            \n\t"
		"foo:            \n\t"
		"ret             ");
}

int main(void) {
	Init_Tmr0();
	sei();
	while(1);
}

Parts of .lss file:

4c:	0c 94 4f 00 	jmp	0x9e	; 0x9e <__vector_19>

---------------

0000009e <__vector_19>:
}
ISR(TIMER0_COMPA_vect, ISR_NAKED) {
	PIND |= (1<<PD0);  // whichever port/bit you want
  9e:	48 9a       	sbi	0x09, 0	; 9
	__asm__ __volatile__ ("call foo      \n"
  a0:	0e 94 54 00 	call	0xa8	; 0xa8 
  a4:	78 94       	sei
  a6:	18 95       	reti

000000a8 :
  a8:	08 95       	ret

000000aa 
: "foo: \n\t" "ret "); } --------------------- int main(void) { Init_Tmr0(); aa: 0e 94 46 00 call 0x8c ; 0x8c sei(); ae: 78 94 sei b0: ff cf rjmp .-2 ; 0xb0

LA capture image:

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Chuck99 wrote:
LA capture image:
So 21 cycles. That's 8 cycles more than without call/ret, suggesting that each of call and ret are 4 cycles.

This is in conflict with what the datasheet has to say about it, but in keeping with what the AVR Instruction Set Manual tells us about devices with <= 64 KB flash.

Other than being able to sleep at night, I can't see a really compelling reason to keep investigating... however I can appreciate your desire to solve a mystery ;)

We've still got a discrepancy of 1 cycle somewhere. Try some of the other test programs suggested. Concoct your own. You can't test the PC => stack action performed by the interrupt servicing mechanism directly, but you can test the rest of them, many of them without an interrupt.

Here's an easy one:

while (1) {
  PIND |= (1<<PD0);
  sei();    // test 1 with a single sei
  // sei(); // test 2 with both
}

Here's another:

while (1) {
  PIND |= (1<<PD0);
  PIND |= (1<<PD0);
}

Once you've nailed down the real cycle cost of each of:

    jmp sbi
    sei
    reti
... you can infer the cycle cost of the PC => stack action of the interrupt.

Although we haven't confirmed it directly, reti should also be 4 cycles. Can you think of a way to unambiguously prove it's cycle cost?

How about:

  PIND |= (1<<PD0);
  __asm__ __volatile__ ("call foo%= \n\t"
                        "reti       \n"
                      "foo%=:       \n\t"
                        "reti       \n");

... followed by:

  PIND |= (1<<PD0);
  __asm__ __volatile__ ("call foo%= \n\t"
                        "ret        \n"
                      "foo%=:       \n\t"
                        "reti       \n");

See how it will work?

Now jmp:

while (1) {
  PIND |= (1<<PD0);
  PIND |= (1<<PD0);
  __asm__ __volatile__ ("jmp foo%= $ foo%=:");
  PIND |= (1<<PD0);
  PIND |= (1<<PD0);
}

We do seem to have established that the datasheet is at least partly in error.

Just for kick, can you post the hex file for your 13-cycle test?

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
Just for kick, can you post the hex file for your 13-cycle test?

Attached is the hex file for interrupt duration of 13 clock cycles.

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Chuck99 wrote:
Attached is the hex file for interrupt duration of 13 clock cycles.
I thought I'd just be sure that the vector was indeed a jmp and not an rjmp...

Labels added by me:


   0:	0c 94 3a 00 	jmp	0x74	;  0x74   
   

   4:	0c 94 44 00 	jmp	0x88	;  0x88   
   8:	0c 94 44 00 	jmp	0x88	;  0x88   
   c:	0c 94 44 00 	jmp	0x88	;  0x88   
  10:	0c 94 44 00 	jmp	0x88	;  0x88   
  14:	0c 94 44 00 	jmp	0x88	;  0x88   
  18:	0c 94 44 00 	jmp	0x88	;  0x88   
  1c:	0c 94 44 00 	jmp	0x88	;  0x88   
  20:	0c 94 44 00 	jmp	0x88	;  0x88   
  24:	0c 94 44 00 	jmp	0x88	;  0x88   
  28:	0c 94 44 00 	jmp	0x88	;  0x88   
  2c:	0c 94 44 00 	jmp	0x88	;  0x88   
  30:	0c 94 44 00 	jmp	0x88	;  0x88   
  34:	0c 94 44 00 	jmp	0x88	;  0x88   
  38:	0c 94 44 00 	jmp	0x88	;  0x88   
  3c:	0c 94 44 00 	jmp	0x88	;  0x88   
  40:	0c 94 44 00 	jmp	0x88	;  0x88   
  44:	0c 94 44 00 	jmp	0x88	;  0x88   
  48:	0c 94 44 00 	jmp	0x88	;  0x88   


  4c:	0c 94 4f 00 	jmp	0x9e	;  0x9e


  50:	0c 94 44 00 	jmp	0x88	;  0x88   
  54:	0c 94 44 00 	jmp	0x88	;  0x88   
  58:	0c 94 44 00 	jmp	0x88	;  0x88   
  5c:	0c 94 44 00 	jmp	0x88	;  0x88   
  60:	0c 94 44 00 	jmp	0x88	;  0x88   
  64:	0c 94 44 00 	jmp	0x88	;  0x88   
  68:	0c 94 44 00 	jmp	0x88	;  0x88   
  6c:	0c 94 44 00 	jmp	0x88	;  0x88   
  70:	0c 94 44 00 	jmp	0x88	;  0x88   


  74:	11 24       	eor	r1, r1
  76:	1f be       	out	0x3f, r1	; 63
  78:	cf ef       	ldi	r28, 0xFF	; 255
  7a:	d4 e0       	ldi	r29, 0x04	; 4
  7c:	de bf       	out	0x3e, r29	; 62
  7e:	cd bf       	out	0x3d, r28	; 61


  80:	0e 94 52 00 	call	0xa4	;  0xa4 
84: 0c 94 56 00 jmp 0xac ; 0xac 88: 0c 94 00 00 jmp 0 ; 0x0 8c: 82 e0 ldi r24, 0x02 ; 2 8e: 84 bd out 0x24, r24 ; 36 90: 91 e0 ldi r25, 0x01 ; 1 92: 95 bd out 0x25, r25 ; 37 94: 17 bc out 0x27, r1 ; 39 96: 50 9a sbi 0x0a, 0 ; 10 98: 80 93 6e 00 sts 0x006E, r24 9c: 08 95 ret 9e: 48 9a sbi 0x09, 0 ; 9 a0: 78 94 sei a2: 18 95 reti
a4: 0e 94 46 00 call 0x8c ; 0x8c a8: 78 94 sei aa: ff cf rjmp .-2 ; 0xaa ac: f8 94 cli ae: ff cf rjmp .-2 ; 0xae

... still a mystery where that 1 cycle is getting cut...

///////////////////////////////////////////////////////////////////////////////

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Testing the timing of sbi, sei, jmp, call, reti.

;Assy_Test_ISR_Timing_A2
;
;No Interrupts Used - Test timing of sbi, sei, jmp, call, and reti.

#define __SFR_OFFSET 0
#include 

        .section .text

        .global main

main:

      ldi     r16,    1<<PD0		
      out     DDRD,   r16		;Pin D0 is output
		
loop:
		sbi     PIND,   0			;toggle D0
		sbi     PIND,   0			;toggle D0
		sei							  ;B-A
		sbi     PIND,   0			;toggle D0
		sbi     PIND,   0			;toggle D0
	   jmp     SKIP1				 ;D-C
L1:
		rjmp	 L1		

SKIP1:
		sbi     PIND,   0			;toggle D0
		sbi     PIND,   0			;toggle D0
		call	 C1					 ;F-E (call)
									     ;H-G (reti)
		sbi     PIND,   0			;toggle D0
		sbi     PIND,   0			;toggle D0

		nop	  ;add spacer to identify start of loop
		nop
		nop
		nop
		nop
		nop
		nop
		nop
      rjmp    loop

C1:		
		sbi     PIND,   0			;toggle D0
		sbi     PIND,   0			;toggle D0
		reti						    ;substituting reti for ret
        .end 

Instruction   Clock Cycles
   sbi           2
   sei           1
   jmp           3
   call          4
   reti          4
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I was finally able to get working joeymorin's assembly language program that puts the ISR in the vector table. (Link)

#define __SFR_OFFSET 0
#include 

        .section .text

        .global main

main:
        ldi     r16,    1<<WGM01
        out     TCCR0A, r16
        ldi     r16,    1<<CS00
        out     TCCR0B, r16
        ldi     r16,    0
        out     OCR0A,  r16
        ldi     r16,    1<<PD0
        out     DDRD,   r16
        ldi     r16,    1<<OCIE0A
        sts     TIMSK0, r16
		sei
loop:
        rjmp    loop

        .section timer0_compa_vect,"ax",@progbits

        sbi     PIND,   0
        sei
        reti

        .end 

So it looks like it takes 3 clock cycles from the start of the interrupt until the first instruction in the interrupt vector table is executed.

The entry in the vector table is usually a jmp instruction to the ISR. The jmp instruction also uses 3 clock cycles, which results in the 6 cycle lead-in to the ISR.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Chuck99 wrote:
I was finally able to get working
Ah... sorry about the missing sei in main...
Quote:
So it looks like it takes 3 clock cycles from the start of the interrupt until the first instruction in the interrupt vector table is executed.

The entry in the vector table is usually a jmp instruction to the ISR. The jmp instruction also uses 3 clock cycles, which results in the 6 cycle lead-in to the ISR.

That is an interesting result, to say the least, but not surprising given the last dozen or so posts.

I'm wondering how it can be that 2 bytes can be pushed onto the stack in 3 cycles. Perhaps it is related to the single-level instruction pipeline used by AVR. Remember that we've established the condition that the OCF1A flag gets set on every single tick of the system clock. If the instruction pipeline is interrupt-aware, this might explain the shortened PC=>stack operation, i.e. if it can get some kind of 'head start' with associated logic w.r.t pushing the PC onto the stack.

Mind you, I think your original 31 (34?) cycle experiment might conflict with this theory, but if we can arrange for the OCF1A flag to get set every 14 cycles (the expected round-trip cycle cost of the ISR), we may be able to test it:

#define __SFR_OFFSET 0
#include 

#define TIMER_RES 14

        .section .text

reset:  jmp     main

        .section timer0_compa_vect,"ax",@progbits 

        sbi     PIND,   0
        sei
        reti

        .global main

main:
        ldi     r17,    0
        ldi     r18,    1
        ldi     r16,    1<<WGM01
        out     TCCR0A, r16
        ldi     r16,    1<<CS00
        out     TCCR0B, r16
        ldi     r16,    TIMER_RES-1
        out     OCR0A,  r16
        ldi     r16,    1<<PD0
        out     DDRD,   r16
        ldi     r16,    1<<OCIE0A
        sts     TIMSK0, r16
        sei
loop:
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        out     PINB,   r18
        out     PINB,   r17
        rjmp    loop
        .end

Now we have an interrupt being generated every 14 cycles. Main meanwhile is able (if uninterrupted) to toggle PB0 every tick of the system clock.

We still have the sei preceeding the reti, which has the potential to starve main completely if there is a pending interrupt.

The theory is that if indeed the PC=>stack action takes only 3 cycles for a 'newly seen' interrupt condition, then main will nevertheless be able to squeeze an out instruction toggling PB0 between each invocation of the ISR (ignoring the rjmp at the end of 128 out instructions), which should also happen every 14 cycles.

If however the interrupt response time for a 'just in time' interrupt condition is actually 4 cycles (as almost all documentation suggests [errata notwithstanding]), then main will be completely starved. After entry into the ISR the first time, PB0 would remain unchanging.

We are now officially chasing spirits ;)

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here's the new program:

;from joeymorin @avrfreaks
;https://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&p=1128386&sid=459222dd2c82b8ca2faffa30074afcee#1128386
;
;Testing inteerrupt response time
;OCF1A flag to get set every 14 cycles
;interrupt being generated every 14 cycles
;ISR toggles output Pin D2
;Main toggles PD0 every tick of the system clock.
;[changed r17 from 0 to 1]
;
#define __SFR_OFFSET 0
#include 

#define TIMER_RES 14

        .section .text

reset:  jmp     main

        .section timer0_compa_vect,"ax",@progbits

        sbi     PIND,   2			;toggle output Pin D2
        sei
        reti

        .global main

main:
;        ldi     r17,    0
        ldi     r17,    1			;was 0
        ldi     r18,    1
        ldi     r16,    1<<WGM01
        out     TCCR0A, r16
        ldi     r16,    1<<CS00
        out     TCCR0B, r16
        ldi     r16,    TIMER_RES-1
        out     OCR0A,  r16
        ldi     r16,    5			;(1<<PD0) | (1<<PD2)
        out     DDRD,   r16
        ldi     r16,    1<<OCIE0A
        sts     TIMSK0, r16
        sei							;enable global interrupts
loop:
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        out     PIND,   r18
        out     PIND,   r17
        rjmp    loop
        .end 

Here's the Logic Analyzer Capture:

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Chuck99 wrote:
Here's the new program:
OK. I'm sure it must seem like I'm making little mistakes on purpose... but really it's just that I'm incompetent ;)

I'd calculated a resolution of 14 based on one of the earlier test with a jmp vector. I should have said 11:

PC => stack ; 4 cycles (expected)
sbi PINx, n ; 2 cycles
sei         ; 1 cycle
reti        ; 4 cycles
----------------------
             11 cycles

I also incorrectly composed the main loop with nop-like writes of 0 to PINx! Thanks for catching that...

However the test is still revealing. My error of 3 extra ticks in the timer resolution should result in 3 out instructions and 3 edges on PD0.

I see 4 edges, suggesting the PC => stack action really does take only 3 cycles, even for a newly arrived interrupt.

I would propose one more (hopefully final) test, with the timer resolution correctly set, but with one additional change where the 2-cycle sbi in the ISR is replaced with a 1-cycle out dropping the expected round-trip-cycle cost to 10:

#define __SFR_OFFSET 0
#include 

#define TIMER_RES 10
#define LOOP_PIN 0
#define ISR_PIN 2

        .section .text

reset:  jmp     main

        .section timer0_compa_vect,"ax",@progbits

        out     PIND,   r17     ; toggle output Pin D2
        sei
        reti

        .global main

main:
        ldi     r17,    1<<ISR_PIN
        ldi     r16,    1<<WGM01
        out     TCCR0A, r16
        ldi     r16,    1<<CS00
        out     TCCR0B, r16
        ldi     r16,    TIMER_RES-1
        out     OCR0A,  r16
        ldi     r16,    5      ;(1<<PD0) | (1<<PD2)
        out     DDRD,   r16
        ldi     r16,    1<<OCIE0A
        sts     TIMSK0, r16
        sei             ;enable global interrupts
        ldi     r16     1<<LOOP_PIN
loop:
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        rjmp    loop
        .end

With luck I haven't introduced any further errors ;)

I'm pretty sure we'll still see evidence that the PC => stack is in fact 3 cycles.

This raises the question of why this might be the case, when documentation (errata notwithstanding) says otherwise.

It might be important to remember that we have been testing a synchronous timer interrupt. If we were to test with an asynchronous timer, or with an inherently asynchronous external interrupt source (INTn, PCINTn, ACI, etc.), I wonder what me might find. Of course the timing of the external interrupt source would have to be carefully controlled. We'd probably have to make adjustments equivalent to small fractions of system clock cycle to reveal if and exactly where the synchronising logic 'kicks in', and whether or not that ever results in the advertised 4-cycle PC => stack action.

I certainly don't posses the equipment for such an investigation. Neither do I really have the inclination at this point ;)

I'm marginally satisfied that we've (you've) discovered an anomaly. I'd be interested in knowing what Atmel have to say on the subject.

What I might have patience for is running the same tests you've already conducted on an ATtiny85, mostly because I'm equipped to fairly easily capture the relevant data. I don't have an LA, but I have a home-brew test rig that should do the trick. Can't promise when I'll get around to it.

I am also curious what the recently silent elders of the forum have to say about your results... anyone actually still reading?... anyone care to chime in? Other than the obvious 'WHAT A COLOSSAL WASTE OF TIME!', that is :)

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
This raises the question of why this might be the case, when documentation (errata notwithstanding) says otherwise.

Are you implying that Atmel made a mistake in the data sheet? No, that's impossible! ;)
joeymorin wrote:
I am also curious what the recently silent elders of the forum have to say about your results... anyone actually still reading?... anyone care to chime in? Other than the obvious 'WHAT A COLOSSAL WASTE OF TIME!', that is :)

Not an elder of the forum, just an elder. :D

Are you kidding? This thread is great, reminds me of the days when we used to try to find out what the unused opcodes were in early MPUs. Great detective work guys. We should do more of this, so count me in on further trials and testing.

I've been needing to replace my broken logic analyzer for a while now, this thread has given me the reason to do so.

"I may make you feel but I can't make you think" - Jethro Tull - Thick As A Brick

"void transmigratus(void) {transmigratus();} // recursio infinitus" - larryvc

"It's much more practical to rely on the processing powers of the real debugger, i.e. the one between the keyboard and chair." - JW wek3

"When you arise in the morning think of what a privilege it is to be alive: to breathe, to think, to enjoy, to love." -  Marcus Aurelius

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

larryvc wrote:
Are you implying that Atmel made a mistake in the data sheet? No, that's impossible! ;)
I realise this isn't a revelation. However to date the datasheet errata I've come across have generally been either ambiguities of wording, or copy/paste errors from other datasheets.

This one if proven seems on the face of it to be rather more serious. So far only the 32U2 has been subjected to testing, but I can't think of a compelling reason why other AVR based on essentially the same core (read: most devices) would behave differently.

Up to this point it (to me) has made sense that reads/writes to SRAM by whichever mechanism is used would take the same amount of time. On the mega core in question, that's 2 cycles. Whether an sts/lds, an st/ld, or a push/pop was used, we could count on 2-cycles-per-byte-accessed. Similarly, a call/ret would be 4 cycles each for the two-byte PC getting pushed/popped to/from the stack. Again, similarly, an interrupt/reti on the same device would be 4 cycles each.

Of course the one wrinkle in this theory is what the the AVR Instruction Set Manual has to say about devices with a 22-bit PC. The extra cost for the additional byte involved in call/ret and interrupt/reti is advertised as only 1 cycle (rather than the 2 the one might expect) for a total cost of 5 cycles for these operations on devices with > 64 KB flash. Mind you, since the advertised cost of a 2-byte interrupt PC => stack action is 4 and seems in reality to be 3, I'm rather past the point of wanting to guess.

My interest lies in determining whether other <= 64 KB flash devices exhibit as advertised PC => stack interrupt cycle cost, and also how any > 64 KB flash devices behave. Do they in fact take 5 cycles as advertised? 4 cycles? 6? Something else?

It's really only curiosity at this point. I am not AtomicZombie, and I don't typically need to be concerned with single cycles for any real application. But I'm curious to learn what Atmel have to say w.r.t why the interrupt PC => stack takes a cycle less than seems reasonable. What is different about the interrupt mechanism that exempts it from what appear to be fairly immutable SRAM per-byte cycle-cost? I'm also open to criticism over the testing methodology. Is there something we're doing that is giving the false impression that the PC => stack action is taking only 3 cycles? Have the test results been incorrectly interpreted?

JJ

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The program:

;from joeymorin @avrfreaks (11 cycles this time)
;https://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&p=1128548#1128548
;
;Testing inteerrupt response time
;OCF1A flag to get set every 11 cycles
;interrupt being generated every 11 cycles
;ISR toggles output Pin D2
;Main toggles PD0 every tick of the system clock.
;[changed r17 from 0 to 1]

#define __SFR_OFFSET 0
#include 

#define TIMER_RES 9 //use 9, 10, and 11
#define LOOP_PIN 0
#define ISR_PIN 2

        .section .text

reset:  jmp     main

        .section timer0_compa_vect,"ax",@progbits

        out     PIND,   r17     ; toggle output Pin D2
        sei
        reti

        .global main

main:
        ldi     r17,    1<<ISR_PIN
        ldi     r16,    1<<WGM01
        out     TCCR0A, r16
        ldi     r16,    1<<CS00
        out     TCCR0B, r16
        ldi     r16,    TIMER_RES-1
        out     OCR0A,  r16
        ldi     r16,    5			;(1<<PD0) | (1<<PD2)
        out     DDRD,   r16
        ldi     r16,    1<<OCIE0A
        sts     TIMSK0, r16
        sei							;enable global interrupts
        ldi     r16,    1<<LOOP_PIN
loop:
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        out     PIND,   r16
        rjmp    loop
        .end 

Here's the LA capture image:

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Chuck99 wrote:

#define TIMER_RES 9 //use 9, 10, and 11

Good call.
Quote:
Here's the LA capture image:
Seems to be very clear that, under these specific circumstances, the 32U2 takes 3 cycles to puth the PC onto the stack and jump to the TIMER0 COMPA vector. Not 4 cycles as expected. And not 5 cycles as advertised in the 8/16/32U2 datasheet.

Hmph.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
Seems to be very clear that, under these specific circumstances, the 32U2 takes 3 cycles to put the PC onto the stack and jump to the TIMER0 COMPA vector. Not 4 cycles as expected. And not 5 cycles as advertised in the 8/16/32U2 datasheet.

joeymorin, thanks for all your help. I was spinning my wheels going nowhere before. I appreciate all the time and effort that you put in to this.

A hat-tip to theusch for suggesting putting the ISR in the vector table.

Koshchi wrote:
Which is of course a ridiculous assumption.

You're right, it was a ridiculous assumption.

What was found.
1) 3 clock cycles after the start of the interrupt, the instruction in the vector table executes.

2) If the instruction in the vector table is a jmp, the first instruction in the ISR executes 6 clock cycles after the interrupt starts, not 5 and not 8.

3) reti takes 4 clock cycles, not 5.

This resolves all the problems that I encountered.

Figure 1 from the OP:

Here's Figure 1 modified with the new data:

Figure 2 from the OP:

Here's Figure 2 modified with the new data:

The Universe is once again in balance. Yeehaa!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Chuck99 wrote:
thanks for all your help.
Sorry for the innumerable small bugs I introduced into the test code (missing sei, 0 => PINx, missing comma, etc.)... must have had you scratching your head :)

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

UPDATE:

ATtiny85 has an advertised interrupt PC => stack cost of 4 cycles. I've run a similar test to those conducted on the ATmega32U2. They show that the real cycle cost is 3.

Again, this may be an artefact of the use of a synchronous timer interrupt, and asynchronous external interrupts might exhibit the advertised 4 cycle cost at least some of the time.

Someone else will have to take it from here ;)

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

3) reti takes 4 clock cycles, not 5.

Why is this a surprise?
opcode manual wrote:
Cycles:
4 devices with 16-bit PC
5 devices with 22-bit PC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It was a surprise to me because the datasheet for the ATmega32U2 lists reti as using 5 clock cycles.

clawson wrote:
Quote:

3) reti takes 4 clock cycles, not 5.

Why is this a surprise?
opcode manual wrote:
Cycles:
4 devices with 16-bit PC
5 devices with 22-bit PC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

the datasheet for the ATmega32U2 lists reti as using 5 clock cycles.

Atmel data is always "fun" - the game is to guess which of the conflicting documents holds the truth (if any) ;-)

(people would probably have more respect for Atmel documentation if they actually took on board corrections given to them).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I want to correct a minor error in the OP:

Quote:
The instruction begins with the falling edge of the cpu clock and the output is asserted on the rising edge of the clock.

This is wrong!
The instruction begins and ends with the rising edge of the clock, as shown in Figure 6-5 of the datasheet:

This doesn't affect results of our tests, but I wanted to set the record straight.

Here is the Figure 1 modified to show the instructions beginning and ending on the rising edge of Clock.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh no, the claims of the discovery of Cold Fusion have seemingly been over exaggerated again! I would say that all the testing should be done again with some new tests thrown in as well. :shock: :)

"I may make you feel but I can't make you think" - Jethro Tull - Thick As A Brick

"void transmigratus(void) {transmigratus();} // recursio infinitus" - larryvc

"It's much more practical to rely on the processing powers of the real debugger, i.e. the one between the keyboard and chair." - JW wek3

"When you arise in the morning think of what a privilege it is to be alive: to breathe, to think, to enjoy, to love." -  Marcus Aurelius

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

larryvc wrote:
I would say that all the testing should be done again ...

Fat chance!

Quote:
... with some new tests thrown in as well.

Maybe.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I know it's been more than 4 years, but just thought I'd report back that I (finally) ran similar tests on an m328p, and the results are the same.  Interrupt response time (i.e. PC->stack) takes 3 cycles, not the 4 cycles claimed in the datasheet.

 

Can anyone shed any light on this discrepancy?  Other than 'datasheet errata', that is ;-)

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Maybe it differs according to the size of the PC?

 

Jim

Jim Wagner Oregon Research Electronics, Consulting Div. Tangent, OR, USA http://www.orelectronics.net

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ka7ehk wrote:
Maybe it differs according to the size of the PC?
Indeed it does.  The mystery, however, is why it differs from the datasheet claim of 4 cycles for a 16-bit PC.  I have no devices with a >16bit PC in order to test the theory that they will exhibit a 4-cycle interrupt response time v.s. the datasheet claim of 5 cycles.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
Interrupt response time (i.e. PC->stack) takes 3 cycles, not the 4 cycles claimed in the datasheet.
How have you tested that?

 

I would assume that "response time" != "time needed for vectoring".

Apparently it is 3 cycles for the vectoring and 1 cycle of delay between the actual event and the start of the vectoring. I guess this delay cycle has something to do with the "always execute one more instruction before servicing a pending interrupt". And because the main code is executed during this delay cycle, you don't notice it with the kind of test the OP has done.

Stefan Ernst

Pages