## Determining ISR duration

33 posts / 0 new
Author
Message

I would like to go through and figure out what the time duration is of the different interrupts I am using. My main purpose is to try to figure out what the slowest possible response I can expect from the time I recieve a byte on the UART to the the time the RXC interrupt fires.

I will probably be measuring this, making changes, and repeating, so I want to use a quick method that may not be perfectly accurate.

My plan is to go to set an unused pin high in the beginning of the interrupt, then set it low at the end of the interrupt and measure this with my DSO. Of course this does not include the prologue, epilogue, and vector jumps.

Here is an example:

```ISR(TIMER1_OVF_vect){
//This is used for long delays in main
L1_ON
slow_counter.word++;

if(inverse_velocity_overflow < 12000){
inverse_velocity_overflow += 1500;
if((inverse_velocity_overflow >= 3000) && ((inverse_velocity_overflow - 1500) > encoder.inverse_vel))
encoder.inverse_vel = inverse_velocity_overflow - 1500;
}
else encoder.inverse_vel = inverse_velocity_overflow;
L1_OFF
}```

My oscilliscope measures this at 4.92 us.

Now I compensate.

```ISR(TIMER1_OVF_vect){
21a:	1f 92       	push	r1
21c:	0f 92       	push	r0
21e:	0f b6       	in	r0, 0x3f	; 63
220:	0f 92       	push	r0
222:	11 24       	eor	r1, r1
224:	2f 93       	push	r18
226:	3f 93       	push	r19
228:	8f 93       	push	r24
22a:	9f 93       	push	r25
//This is used for long delays in main
L1_ON
22c:	2b 9a       	sbi	0x05, 3	; 5
slow_counter.word++;
22e:	80 91 d1 02 	lds	r24, 0x02D1
232:	90 91 d2 02 	lds	r25, 0x02D2
236:	01 96       	adiw	r24, 0x01	; 1
238:	90 93 d2 02 	sts	0x02D2, r25
23c:	80 93 d1 02 	sts	0x02D1, r24

if(inverse_velocity_overflow < 12000){
240:	80 91 d3 02 	lds	r24, 0x02D3
244:	90 91 d4 02 	lds	r25, 0x02D4
248:	80 5e       	subi	r24, 0xE0	; 224
24a:	9e 42       	sbci	r25, 0x2E	; 46
24c:	90 f5       	brcc	.+100    	; 0x2b2 <__vector_15+0x98>
inverse_velocity_overflow += 1500;
24e:	80 91 d3 02 	lds	r24, 0x02D3
252:	90 91 d4 02 	lds	r25, 0x02D4
256:	84 52       	subi	r24, 0x24	; 36
258:	9a 4f       	sbci	r25, 0xFA	; 250
25a:	90 93 d4 02 	sts	0x02D4, r25
25e:	80 93 d3 02 	sts	0x02D3, r24
if((inverse_velocity_overflow >= 3000) && ((inverse_velocity_overflow - 1500) > encoder.inverse_vel))
262:	80 91 d3 02 	lds	r24, 0x02D3
266:	90 91 d4 02 	lds	r25, 0x02D4
26a:	88 5b       	subi	r24, 0xB8	; 184
26c:	9b 40       	sbci	r25, 0x0B	; 11
26e:	b8 f0       	brcs	.+46     	; 0x29e <__vector_15+0x84>
270:	80 91 d3 02 	lds	r24, 0x02D3
274:	90 91 d4 02 	lds	r25, 0x02D4
278:	20 91 02 01 	lds	r18, 0x0102
27c:	30 91 03 01 	lds	r19, 0x0103
280:	8c 5d       	subi	r24, 0xDC	; 220
282:	95 40       	sbci	r25, 0x05	; 5
284:	28 17       	cp	r18, r24
286:	39 07       	cpc	r19, r25
288:	50 f4       	brcc	.+20     	; 0x29e <__vector_15+0x84>
encoder.inverse_vel = inverse_velocity_overflow - 1500;
28a:	80 91 d3 02 	lds	r24, 0x02D3
28e:	90 91 d4 02 	lds	r25, 0x02D4
292:	8c 5d       	subi	r24, 0xDC	; 220
294:	95 40       	sbci	r25, 0x05	; 5
296:	90 93 03 01 	sts	0x0103, r25
29a:	80 93 02 01 	sts	0x0102, r24
}
else encoder.inverse_vel = inverse_velocity_overflow;
L1_OFF
29e:	2b 98       	cbi	0x05, 3	; 5
}
2a0:	9f 91       	pop	r25
2a2:	8f 91       	pop	r24
2a4:	3f 91       	pop	r19
2a6:	2f 91       	pop	r18
2a8:	0f 90       	pop	r0
2aa:	0f be       	out	0x3f, r0	; 63
2ac:	0f 90       	pop	r0
2ae:	1f 90       	pop	r1
2b0:	18 95       	reti
if(inverse_velocity_overflow < 12000){
inverse_velocity_overflow += 1500;
if((inverse_velocity_overflow >= 3000) && ((inverse_velocity_overflow - 1500) > encoder.inverse_vel))
encoder.inverse_vel = inverse_velocity_overflow - 1500;
}
else encoder.inverse_vel = inverse_velocity_overflow;
2b2:	80 91 d3 02 	lds	r24, 0x02D3
2b6:	90 91 d4 02 	lds	r25, 0x02D4
2ba:	90 93 03 01 	sts	0x0103, r25
2be:	80 93 02 01 	sts	0x0102, r24
2c2:	ed cf       	rjmp	.-38     	; 0x29e <__vector_15+0x84>

000002c4 <__vector_28>:
L1_OFF
}```

So we have 7 push, 7 pop, 1 eor, 1 in, 1 out and 1 reti. Also we can subract the sbi and the cbi.

So I get 14*2 + 1 + 1 + 1 + 5 - 4 = 32 clock cycles.

And finally

ATMEGA644p Datasheet wrote:
The interrupt execution response for all the enabled AVR interrupts is five clock cycles minimum. After five clock cycles the program vector address for the actual interrupt handling routine is executed. During these five clock cycle period, the Program Counter is pushed onto the Stack. The vector is normally a jump to the interrupt routine, and this jump takes three clock cycles. If an interrupt occurs during execution of a multi-cycle instruction, this instruction is completed before
the interrupt is served.

So I I assume worst case of interrupting a 5 cycle instruction then I would at 4 cycles for the instruction to finish + 5 + 3 = 12 more cycles.

So putting it all together the max time this will take would be for 12 MHz is 4.92 + (32+12)/12 = 8.6 us.

Can someone please confirm that I am doing this correctly?

Basically, yes, although there might be some more details to know.

Plus the longest time interrupts are disabled in main, if any (and you should have some as you are using two-byte variables in the timer interrupt). Note, that interupts are not enabled until the instruction *after* sei() (or SREG restore) is executed. (If my memory serves well, the same applies for RETI).

Btw. you could shorten your timer interrupt by making a local non-volatile copy of the volatile variables.

Is the timer the only interrupt in the system, besides the Rx interrupt? If not, the calculation gets even more complex, involving also priorities and consideration of periodicity and various interrupts overlapping patterns.

JW

wek wrote:
Btw. you could shorten your timer interrupt by making a local non-volatile copy of the volatile variables.
That is an excellent idea. I didn't think of this.

Yes I will definitely consider delays for atomic access.

Here are the interrupts I am using in order of priority:

```PCINT3_vect					//This can occur at a maximum of 6.25 kHz, duration is 12.2 us
TIMER1_OVF_vect				//This occurs at 1 kHz always, 8.6 us duration
TIMER0_OVF_vect				//This occurs at 4.68 kHz when active, 2.5 us duration
USART1_RX_vect				//I want to know how fast I can trigger this, duration is about 4.2 us
USART1_TX_vect, ISR_NAKED	//This shouldn't matter```

I think I can just add them up. I get worst case to be 27.5 us between when the RX_vect goes high and when is the RX_vect ISR finishes executing. Not including delays for atomic access.

+ 8/12 = .7 us for the way I use atomic access.

dpaulsen wrote:

`USART1_TX_vect, ISR_NAKED	//This shouldn't matter`

This *does* matter, though. Imagine, that this interrupt just started, when the character Rx-ing is finished and the Rx interrupt triggers, but during the Tx ISR all the other interrupts trigger, too. Then those will be served first. The general "formula" is then "the longest of those which have lower priority; plus all of higher priority; plus the longest interrupt disable in main (plus the one instruction after them); plus the "natural" interrupt latency (the 5 cycles plus length of longest instruction) (assuming interrupts occur sparsely enough so they don't repeatedly trigger during this "longest event"; assuming interrupt disables in "main" are far enough from each other).

JW

wek wrote:
dpaulsen wrote:

`USART1_TX_vect, ISR_NAKED	//This shouldn't matter`

This *does* matter, though. Imagine, that this interrupt just started, when the character Rx-ing is finished and the Rx interrupt triggers, but during the Tx ISR all the other interrupts trigger, too. Then those will be served first. The general "formula" is then "the longest of those which have lower priority; plus all of higher priority; plus the longest interrupt disable in main (plus the one instruction after them); plus the "natural" interrupt latency (the 5 cycles plus length of longest instruction) (assuming interrupts occur sparsely enough so they don't repeatedly trigger during this "longest event"; assuming interrupt disables in "main" are far enough from each other).

JW

Thanks. Very educational.

Fortunately in my case

```ISR(USART1_TX_vect, ISR_NAKED){
PORTD &= ~(1<<PD4);
reti();
}```

is only 18 clock cycles = 1.3 us.

dpaulsen wrote:

Fortunately in my case
```ISR(USART1_TX_vect, ISR_NAKED){
PORTD &= ~(1<<PD4);
reti();
}```

is only 18 clock cycles = 1.3 us.

I don't like this, too fragile. You should consider to move this to assembler.

Also, this is supposed to result in one cbi and one reti (and the ISR call, which is 3 cycles - note the 5 cycles is the total "natural" latency, including cycles for interrupt sync and polling) - are you sure the sum is 18 cycles?

JW

I didn't think too hard about it, but here is my math:

Finish existing instruction: 4 worst case
Flag activates withing 5 cycles
Vector jump is 3 cycles
cbi 1
reti 5

4 + 5 + 3 + 1 + 5 = 18

Let me know if I made a mistake

Uhm, well, maybe. It's too late here... ;-)

1:03 AM is definitely late. Maybe you should get some sleep. :)

I very much appreciate your assistance.

wek wrote:
I don't like this, too fragile. You should consider to move this to assembler.

Per your suggestion I took a look at the assembler code and it had an unexpected result. I am going to to start a new thread for this.

Assuming you are using the standard USART hardware, you have two bytes of buffer behind UDRx - the one byte "buffer" and the one byte incoming character register. That means that you won't get a data overflow before the start bit of the third incoming character. It also means that, after reading UDRx, you should check the RXCx bit of UCSRxA (usually) to see if there is another character waiting for you.

The upshot of all this is that, even at high incoming data rates, you have almost twice the time to pick up characters from the USART than you might have thought. That may make your calculations a little easier.

One other note: Hopefully your receive routine is dumping the characters into a circular buffer so you can do your processing in non-interrupt time.

Just a thought.

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Let's add some perspective here. You appear to be running at 14MHz.

If your UART is @ 115200 baud you get RX every 87us. Now most regular RX service routines can execute within about 10us. You should have very few problems.

You can run your actual ISR() through the simulator to get an exact # cycles.

As Stu has mentioned, you can occasionally be held up with other ISR()'s. Providing you service the RX within 165us, you should never miss a trick.

Even if you have to do heavy maths in the foreground code, a reasonably sized circular RX buffer will keep you safe.

Of course there can always be worst cases. But a sensible design will cope with peak demands and manage just fine over a period.

David.

stu_san wrote:
It also means that, after reading UDRx, you should check the RXCx bit of UCSRxA (usually) to see if there is another character waiting for you.

I like this idea!

Here is a more detailed description of what is occuring.

1) Mega644 sends 1 byte over UART to a Mega88, which interprete it as a command to start
2) M88 completes 5 A/D conversions
3) M88 sends all 10bytes over UART
4) M88 sends a trailing zero byte

Here is my code one the M644 for sending the command (I am pretty sure this is breaking some style rules) :oops: I should probably change this into an inline function.

```//Commands:
#define IDLE	0
#define RAW		1
#define BOUNCE	2

#define GET_RAW {rx1_data.size = 10; rx1_data.byte_num = 0; rx1_data.err_flags = 0; tx1_URT(RAW);}```

Here is my code on the M644 for receiving the data:

```ISR(USART1_RX_vect){
uint8_t byte_num_nv = rx1_data.byte_num;

rx1_data.err_flags = (UCSR1A & ((1<<FE1)|(1<<DOR1)|(1<<UPE1)));
rx1_data.block[byte_num_nv] = UDR1;
if(byte_num_nv < rx1_data.size)
rx1_data.byte_num = byte_num_nv + 1;
}```

As for my hardware:

Both MCUs are at 12 MHz.
I am using two RS485 driver chips.
My UART is running at 500 kHz.

Just for the hell of it, I tried my UART at 1.5 MHz and it worked fine. But because this didnt give me any real benefit I lowered it again.

This might also help

```typedef union {
volatile uint8_t block[11];
volatile struct {
uint16_t raw_gyro;
uint16_t raw_z;
uint16_t raw_y;
uint16_t raw_x;
uint16_t raw_hall;
uint8_t end;
uint8_t byte_num;
uint8_t size;
uint8_t err_flags;
};
} rx_block_t;```

How can you make a union between an array of 11 bytes and a struct that contains 14 bytes?

clawson wrote:
How can you make a union between an array of 11 bytes and a struct that contains 14 bytes?
:oops: I wasn't sure about this when I first did it, but it seems to work fine. :oops:

I remember when I originally did this I looked in my K&R and it was reticent on this. I ASSumed that the compiler would give me a warning if it wasn't standard practice.

If you are asking me this, then I am guessing it is probably not gauranteed to work for all compilers and I should probably fix it.

Well the point is that if you just fill in a variable of type rx_block_t called foo.block[] you cannot (in theory) set foo.byte_num, foo.size, foo.err_flag but in reality you can just write 14 bytes to the array spilling 3 bytes over it's boundary and you will hit those fields.

But really, just change [11] to [14] ;-)

You are buffering the UART anyway, and know that an ADC only occurs when you ask for it.

So you could receive it at any reasonable speed. 500k baud seems a little overkill for 10 bytes. I could understand if you were transferring megabytes.

David.

clawson wrote:
Well the point is that if you just fill in a variable of type rx_block_t called foo.block[] you cannot (in theory) set foo.byte_num, foo.size, foo.err_flag but in reality you can just write 14 bytes to the array spilling 3 bytes over it's boundary and you will hit those fields.

But really, just change [11] to [14] ;-)

Yes I will fix it.

I assumed that compiler would put dummy bytes at the end of the block[11].

david.prentice wrote:
You are buffering the UART anyway, and know that an ADC only occurs when you ask for it.

So you could receive it at any reasonable speed. 500k baud seems a little overkill for 10 bytes. I could understand if you were transferring megabytes.

David.

I am trying to keep track of a very fastly accelerating/deccelerating object. I have found that I can do this if sample the data at 1 kHz.

This means I have 1000 us to sample, process, and send over UART.

:edit spelling

Last Edited: Thu. Aug 26, 2010 - 05:03 PM

dpaulsen wrote:
clawson wrote:
Well the point is that if you just fill in a variable of type rx_block_t called foo.block[] you cannot (in theory) set foo.byte_num, foo.size, foo.err_flag but in reality you can just write 14 bytes to the array spilling 3 bytes over it's boundary and you will hit those fields.

But really, just change [11] to [14] ;-)

Yes I will fix it.

I assumed that compiler would put dummy bytes at the end of the block[11].

Actually I think it does do this. I wrote a dummy program and compiled with [11] and [14]. In both cases the .bss was 14 bytes.

Never-the-less I will assume this is not good practice to be on the safe side.

Quote:

In both cases the .bss was 14 bytes.

That's what I said about writing 3 bytes beyond the array bound. The compiler will make the data the size of the largest item in the union but for documentary purposes for the next reader of the code it just makes sense to make it 14 so they don't sit scratching their head thinking "how can you get 14 into 11?"

It looks like this works but the struct{} member of the union is no longer anonymous:

```typedef struct {
uint16_t raw_gyro;
uint16_t raw_z;
uint16_t raw_y;
uint16_t raw_x;
uint16_t raw_hall;
uint8_t end;
uint8_t byte_num;
uint8_t size;
uint8_t err_flags;
} data_t;

typedef union {
volatile uint8_t block[sizeof(data_t)];
volatile data_t data;
} rx_block_t; ```

Quote:

On the flip side, one benefit to [11] is that if you ever accidentally code an assignment to rx1_data.block[11] you will get a compiler warning.

There is no need for the slave mega to send ADC data 1000 times a second.

Think about it. You can let the Slave monitor the speed continuously (or very often).

The Slave can maintain a running average.

When the Master AVR wants to know the speed it asks the Slave to send the 'Average Speed'. You may also have other requests like 'Maximum Speed' or 'What gear is the motor running in'.

David.

Here is the final solution I settled on.

First I change my RX_COMPLETE interrupt per STU's suggestion:

```ISR(USART1_RX_vect){
uint8_t byte_num_nv = rx1_data.byte_num;
do{
rx1_data.err_flags = (UCSR1A & ((1<<FE1)|(1<<DOR1)|(1<<UPE1)));
rx1_data.block[byte_num_nv] = UDR1;
if(byte_num_nv < rx1_data.size)
byte_num_nv++;
}while(RX1_COMPLETE);
rx1_data.byte_num = byte_num_nv;
}```

With this change here is my new worst case scenario:
1) A new byte is recieved immediatly after the while(RX1_COMPLETE); check and then all the interrupts fire at the same time.
2) Epilogue of RXC is 1.75 us
3) PCINT3 executes, 11.84 us
4) T1 Overflow, 7.18 us
5) T0_Overflow 2.5 us
6) RX_COMPLETE prologue + first few instructions, 2.59 us

So this gives a total of 25.8 us between the the time it checks the RX_COMPLETE flag to the time it reads the next UDR1.

On the other side the time for 2 bytes to shift in is (11 bits @ 500 kHz) 44 us. This means I can send the data full bore.

I could even send the data full bore at 750 kHz which would give 29.3 us. In fact I tested it at 750 and it works fine with STU's suggestion.

Thanks every one for your help on this matter.

-Daniel

david.prentice wrote:
There is no need for the slave mega to send ADC data 1000 times a second.

Think about it. You can let the Slave monitor the speed continuously (or very often).

The Slave can maintain a running average.

When the Master AVR wants to know the speed it asks the Slave to send the 'Average Speed'. You may also have other requests like 'Maximum Speed' or 'What gear is the motor running in'.

David.

Thanks for the suggestion but you don't quite understand my application.

I will get in trouble if I say too much but here is a little more info.

The M88 monitors a 3 axis accelerometer, a gyro, and a position sensor. The M88 is moving through space eradically. The M88 computes the acceleration vector direction and magnitude. If the magnitude is calm enough it updates its orientation in space. If it is too great it uses a reimann sum to integrate the gyro signal and update the current orientation in space. Its actually more complicated than this but you get the idea.

The M644 monitors a motor encoder and implements servo control via software. It also monitors the orientation data from the M88 and makes decisions via a state machine as to what angular position the motor needs to be set to.

The trick is that the motor angular position needs to be precisely coordinated with the sensor data. I have a very responsive motor that can accelerate to full speed in about 10 ms.

dpaulsen wrote:
clawson wrote:
How can you make a union between an array of 11 bytes and a struct that contains 14 bytes?
:oops: I wasn't sure about this when I first did it, but it seems to work fine. :oops:

I remember when I originally did this I looked in my K&R and it was reticent on this. I ASSumed that the compiler would give me a warning if it wasn't standard practice.

If you are asking me this, then I am guessing it is probably not gauranteed to work for all compilers and I should probably fix it.

The anonymous struct is less than portable,
but different union members can have different sizes.
The array size should be 11 or 14 depending on what is desired.

Iluvatar is the better part of Valar.

Michael,

Surely he's using the array to UART_RX then effectively casting an interpretation onto the bytes using the struct part of the union. Presumably then he will be receiving a full struct{}'s worth of data which is 14, not 11 bytes.

Cliff

clawson wrote:
Michael,

Surely he's using the array to UART_RX then effectively casting an interpretation onto the bytes using the struct part of the union. Presumably then he will be receiving a full struct{}'s worth of data which is 14, not 11 bytes.

Cliff

No this is not the case. I intentionally left the block size at [11] because that is the portion that contains the UART data. byte_num, size, and error flags are all information used to control the UART. A full struct{}'s worth of data is 11.

Then you should document that at least with a comment or perhaps:

```typedef union {
struct {
volatile uint8_t uart_block[11];
volatile uint8_t not_transferred_by_uart[3];
};
volatile struct {
uint16_t raw_gyro;
uint16_t raw_z;
uint16_t raw_y;
uint16_t raw_x;
uint16_t raw_hall;
uint8_t end;
uint8_t byte_num;
uint8_t size;
uint8_t err_flags;
};
} rx_block_t; ```

Always write software with the NEXT reader in mind (which could be you in 9 months when you've forgotten why something was done as it was)

clawson wrote:
Then you should document that at least with a comment or perhaps:

```typedef union {
struct {
volatile uint8_t uart_block[11];
volatile uint8_t not_transferred_by_uart[3];
};
volatile struct {
uint16_t raw_gyro;
uint16_t raw_z;
uint16_t raw_y;
uint16_t raw_x;
uint16_t raw_hall;
uint8_t end;
uint8_t byte_num;
uint8_t size;
uint8_t err_flags;
};
} rx_block_t; ```

Nice. Then you would get the best of both worlds: very readable, yet you still get the benefit of a compiler warning if you try to access uart_block[11]