ISR timing

Go To Last Post
9 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am doing something I have never done before - working out how long an ISR will take to operate.

The following code snippet works on my hardware, but I would like to optimise the timing.

ISR (TIMER0_OVF_vect) {
uint8_t ChState= 0;
uint8_t DataBuffer= PortComp;
if (ChData[0] >= DataBuffer) ChState|= (1<<CH1);			// 8 Clock Cycles
if (ChData[1] >= DataBuffer) ChState|= (1<<CH2);			// 6 Clock Cycles
if (ChData[2] >= DataBuffer) ChState|= (1<<CH3);			// 6 Clock Cycles
if (ChData[3] >= DataBuffer) ChState|= (1<<CH4);			// 6 Clock Cycles
if (ChData[4] >= DataBuffer) ChState|= (1<<CH5);			// 6 Clock Cycles
if (ChData[5] >= DataBuffer) ChState|= (1<<CH6);			// 6 Clock Cycles
if (ChData[6] >= DataBuffer) ChState|= (1<<CH7);			// 6 Clock Cycles
if (ChData[7] >= DataBuffer) ChState|= (1<<CH8);			// 6 Clock Cycles
PORTB= ChState;												// 1 Clock Cycle

if (DataBuffer &(1<<7))										// 3 Clock Cycle
	 {
	 DataBuffer++;											// 3 Clock Cycle				
	 DataBuffer &= ~(1<<7);									// 1 Clock Cycle
	 if (DataBuffer == 0) DataBuffer= 0x80;						// 5 Clock Cycle				
	 }
else DataBuffer |= (1<<7);									// 2 Clock Cycle
PortComp= DataBuffer;										// 3 Clock Cycle
}

I compiled the code and looked at the .lss file. From there I calculated the total number of Clock cycles from section 4.4 of Atmel's AVR Assembler User Guide, using the highest number if multiple were available ( for example BRNE) per line of code.

I conclude the ISR will take max of 82 clock cycles to run.

When i use the simulator step through the code the cycle counter is more like 255.

Which method is correct OR is there a better way to do this? Any good tuts would also be appreciated.

Cheers
Matt

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Studio will give you a true count of cycles, so my guess is you made a mistake somewhere (either with studio, or in your calculation), or are not comparing apples to apples. Your calculation, while what you have seems reasonable, it may not include entry and exit overhead of the ISR. (though it shouldn't account for more than 2x of what you already have)

place a break point at the interrupt vector, note the cycle count. then single step through till you return, and then subtract the initial count from the final count, this will be the true cycle time of your ISR, for that pass.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The following idiom will accumulate the bits you're collecting in ChState at only four clocks per bit instead of six:

        ld temp, X+
        cp temp, DataBuffer
        ror answer
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

that all depends on the code inside the ISR this gives howmany registers have to be popped and pushed.

and don;t forgat that it might also take a while for the ISR to fire at all... starting off with just four clock cycles to get the interrupt down to the controlelr. then a jump to the isr...

and what if another interrupt is already running at that time......

it is a nice thing to investigate, but is dependant on a large number of variables you don;t have control over.....

I'm curious for the results

regards

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Matt_edwards wrote:
The following code snippet works on my hardware, but I would like to optimise the timing.

Then you should post a compileable code (used includes, define used variables and name the AVR-target).

Peter

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

danni wrote:
Matt_edwards wrote:
The following code snippet works on my hardware, but I would like to optimise the timing.

Then you should post a compileable code (used includes, define used variables and name the AVR-target).

Peter

No Worries.

Target is Mega16 running at 16MHz

Quote:

//PDM Library (pulse density modulation lib, C version)
//Author: Hendrik Hölscher

#include
#include
#include

volatile uint8_t ChData[16]; //array of pam vals
volatile uint8_t i;
volatile uint8_t j;

#define CH1 (PA0)
#define CH2 (PA1)
#define CH3 (PA2)
#define CH4 (PA3)
#define CH5 (PA4)
#define CH6 (PA5)
#define CH7 (PA6)
#define CH8 (PA7)

#define CH9 (PB0)
#define CH10 (PB1)
#define CH11 (PB2)
#define CH12 (PB3)
#define CH13 (PB4)
#define CH14 (PB5)
#define CH15 (PB6)
#define CH16 (PB7)

uint8_t PdmCompare =1;

void init_pdm(void)
{
DDRA = 0xFF;
PORTA = 0;

DDRB = 0xFF;
PORTB = 0;

TCCR0 = (1<<CS00); //set T0 @sys clk
TIMSK |= (1<<TOIE0); //enable overflow irq

// --- TIMER1_OVF irq ---
// selected time = 100 ms (1600000 ticks)
// prescaler = 64 (4 us ... 262.144 ms)
TCCR1B = (1<<CS11)|(1<<CS10);
TCNT1H = 158;
TCNT1L = 88;
TIMSK |= (1<<TOIE1);

}

// ----------------------------------------------------------------------------
// irq code (timer_overflow: TIMER0_OVF)
// Timer/Counter1 Overflow
// ----------------------------------------------------------------------------
ISR (TIMER0_OVF_vect) {
uint8_t ChState1= 0;
uint8_t ChState2= 0;
uint8_t DataBuffer= PdmCompare;
if (ChData[0] >= DataBuffer) ChState1|= (1<<CH1); //compare channels
if (ChData[1] >= DataBuffer) ChState1|= (1<<CH2);
if (ChData[2] >= DataBuffer) ChState1|= (1<<CH3);
if (ChData[3] >= DataBuffer) ChState1|= (1<<CH4);
if (ChData[4] >= DataBuffer) ChState1|= (1<<CH5);
if (ChData[5] >= DataBuffer) ChState1|= (1<<CH6);
if (ChData[6] >= DataBuffer) ChState1|= (1<<CH7);
if (ChData[7] >= DataBuffer) ChState1|= (1<<CH8);
PORTA = ChState1;
if (ChData[8] >= DataBuffer) ChState2|= (1<<CH8); //compare channels
if (ChData[9] >= DataBuffer) ChState2|= (1<<CH9);
if (ChData[10] >= DataBuffer) ChState2|= (1<<CH10);
if (ChData[11] >= DataBuffer) ChState2|= (1<<CH11);
if (ChData[12] >= DataBuffer) ChState2|= (1<<CH12);
if (ChData[13] >= DataBuffer) ChState2|= (1<<CH13);
if (ChData[14] >= DataBuffer) ChState2|= (1<<CH14);
if (ChData[15] >= DataBuffer) ChState2|= (1<<CH15);
PORTB = ChState2;

if (DataBuffer &(1<<7))
{
DataBuffer++; //increment compare register
DataBuffer &= ~(1<<7); //clear MSB
if (DataBuffer == 0) DataBuffer= 0x80;
}
else DataBuffer |= (1<<7); //set MSB
PdmCompare= DataBuffer;
}

// ----------------------------------------------------------------------------
// irq code (timer_overflow: TIMER1_OVF)
// Timer/Counter1 Overflow
// ----------------------------------------------------------------------------
ISR(TIMER1_OVF_vect) // timer_overflow
{
// restart counter
TCNT1H = 158;
TCNT1L = 88;
if (i >= 255) j=0;
if (i <= 0) j=1;
if (j <= 0) i--;
if (j >= 1) i++;
}

int main(void)
{
cli();
i = 0;
init_pdm();
sei();
for(;;)
{
ChData[0]= i;
ChData[1]= i;
ChData[2]= i;
ChData[3]= i;
ChData[4]= i;
ChData[5]= i;
ChData[6]= i;
ChData[7]= i;
ChData[8]= i;
ChData[9]= i;
ChData[10]= i;
ChData[11]= i;
ChData[12]= i;
ChData[13]= i;
ChData[14]= i;
ChData[15]= i;
};
}

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I compiled your code with regular -Os and the ISR() took 128 cycles from entry to RETI. (187 cycles with -O0)

Now the code is going to take an extra few cycles when the conditionals are true. If you are looking for a worst case then this will be when every conditional is true.

I am not sure quite what you are trying to do. If you happen to know what you are doing, you may be able to use some intelligence in the tests and test several conditions at once.

But as a general technique for timing an ISR(), just set a breakpoint at the actual vector, and another one at the associated RETI. On entry you clear the cycle counter, run with F5 and read the cycles at the RETI breakpoint.

IMHO, the compiler has generated very efficient code. Give it an improved algorithm and you may see a realistic improvement. Otherwise you will struggle to shave any cycles at all.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I tried concocting a paraphrase of the OP's ISR in gas assembly to compare with David's measurement of 128 cycles from entry to RETI. The assembler version does that in 99 cycles, and always takes constant time (it has no conditional branches). It also throws in an automatic increment of the OP's "PdmCompare" variable on each overflow, which IMHO is preferable to using up another timer, and produces better results anyway. Here's a simple "main" file that initializes the various duty-cycle settings and then just sits back to watch the blinking bits:

#include 
#include 
#include 

extern volatile uint8_t PdmCompare;
extern volatile uint8_t ChData_A[8];
extern volatile uint8_t ChData_B[8];

int main(void)
{
    // Set up some arbitrary values for the individual PWMs:
    //
    for (uint8_t index = 8; index--; )
    {
            ChData_A
= 13 * index; ChData_B
= 240 - 7 * index; } sei(); while (1); }

And here's the Timer0 ISR module:

; twoportPWM.s - Timer0 ISR for generating PWM signals on two I/O ports
;
;
; The bulk of this file is a giant interrupt-service routine for Timer 0's
; overflow interrupt. 

#include 


    ;--------------
    .section .bss ;
    ;--------------

; This variable increments every Timer0 overflow.  No particular reason
; for it to be global, but then, there's no particular reason to keep
; it private, either:
;
PdmCompare:  .space 1
;
.global PdmCompare


; Next the array of eight duty factors for the individual bits
; of PortA, followed by the eight duty factors for PortB.  The
; ordering is from bit zero upwards.  These are our public interfaces.
;
ChData_A:      .space 8
ChData_B:      .space 8
;
;
.global ChData_A
.global ChData_B


; Here's our initializer code ------------------------------------------------
;
.section .init7, "ax", @progbits

        ; We'll be using both PortA and PortB as output:
        ;
        ldi r16, 0xff
        out _SFR_IO_ADDR(DDRA), R16
        out _SFR_IO_ADDR(DDRB), R16

        in r16, _SFR_IO_ADDR(TIMSK)
        ori R16, 1
        out _SFR_IO_ADDR(TIMSK), R16  ; Enable overflow interrupt for Timer 0

        ldi R16, 1  
        out _SFR_IO_ADDR(TCCR0), R16  ; Start Timer0 in 1X prescale mode

;------------------- initializer code ends ---------------------------

            ;----------------------------
            ; Timer 0 Interrupt service |
            ;---------------------------- 

     .section .text

; We need some temporary variables: one indirect indexing pointer,
; one register to accumulate the patterns to output, one register to
; hold the current phase angle, and one register to load the individual
; port pin duty factor settings into.  Set up some symbols for the registers:
;
#define statusSave R2
#define outbyte    R3
#define sawtooth   R4
#define tempPWM    R5

; In the interest of execution speed, we'll "unroll" what would ordinarily
; be iterative loops for generating the patterns to be sent out the ports.
; Each bit takes three instructions to generate; here's a macro to compress
; those instructions down into one:

#define GENBIT\
   ld tempPWM, X+       $\
   cp sawtooth, tempPWM $\
   ror outbyte $

        .global TIMER0_OVF_vect

TIMER0_OVF_vect:
        push statusSave
        in statusSave, _SFR_IO_ADDR(SREG)
        push outbyte
        push sawtooth
        push tempPWM
        push r26
        push r27

        ldi r26, lo8(PdmCompare)
        ldi r27, hi8(PdmCompare)

        ; Have the PWM "phase" autoincrement each cycle:
        ;
        ld sawtooth, X
        inc sawtooth
        st X+, sawtooth

        ; Build the pattern of eight comparisons for loading into PortA.
        ; The bit for a channel comes from the carry flag's state after
        ; pretending to subtract the current global "ramp" level from
        ; that channel's setting, which produces a zero value when
        ; "ChData[i] >= ramp".  We want "1" bits for this case instead,
        ; so we'll have to invert our result, but we'll wait 'till we
        ; can do it for all eight bits at once. 
        ; 
        GENBIT GENBIT GENBIT GENBIT
        GENBIT GENBIT GENBIT GENBIT
        ;
        COM outbyte
        out _SFR_IO_ADDR(PORTA), outbyte
        
        ; Now for the PortB bits:
        ;
        GENBIT GENBIT GENBIT GENBIT
        GENBIT GENBIT GENBIT GENBIT
        ;
        COM outbyte
        out _SFR_IO_ADDR(PORTB), outbyte


        pop r27
        pop r26
        pop tempPWM
        pop sawtooth
        pop outbyte
        out _SFR_IO_ADDR(SREG), statusSave
        pop statusSave
        reti
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Now that I see what the algorithm was trying to do, I like your solution.

This is a classic case of C not being able to use the Carry flag. And all micros can do very useful stuff with the carry flag.

David.