Volatile or Register?

Go To Last Post
9 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi guys,

I'm trying to squeeze all the optimization I can out of GCC, and wouldn't mind some input.

I have global variables that are updated in an interrupt. I've noticed when I use the volatile keyword GCC generates a pair of load/store instructions for each statement that accesses the variable. This makes sense outside the ISR, where the variable can change at any time, but gets repetitive inside the ISR where I know I have exclusive access.

I've included some sample code below to show what I'm talking about. It's a simple program that samples a button (active low) on each timer interrupt and stores the last 8 samples in a shift register. When the main loop detects the button is debounced (i.e. all 8 bits low) it turns on a LED.

Disassembly is included for the relevant portion of the ISR. Notice the extra lds/sds.

Is it somehow possible to make GCC treat the variable as volatile for only a region of code (i.e. in the "critical section")?

My other option is allocating a register to hold the variable. I'd prefer that approach, though I'm wary of the danger of it being trampled by library calls. So far the only documentation I could find says R2-R7 "should" be safe. I wish that were a "definitely". Other than manually inspecting the final asm output of my program, is there any way to report what registers are used by all the library calls in my code?

Another issue.. in the second snippet of asm you can see GCC emits code to shuffle the register around needlessly (into R24). I just can't win!

I've noticed in general that GCC doesn't seem to do a very good job of optimization on register variables. Is this because the avr-libc library is precompiled and doesn't count on you using them? Is there any way to re-run my final compiled & linked program back through the optimizer to look for stuff like this?

Any C attribute magic I'm missing to make things more compact? (aside from writing the ISR in ASM..)

#include 
#include 

#define LED PINB0
#define SWT PINB1

volatile unsigned char samples_isr;
//register unsigned char samples_isr asm("r2");
unsigned char samples;

int main() {

  // Init
  DDRB  |= (1<<LED);    // configure LED pin as output
  PORTB |= (1<<SWT);    // enable pull-up
  TCCR0  = 0b010;       // prescaler 8
  TIMSK |= (1<<TOIE0);  // enable Timer0 interrupt
  samples_isr = 0xFF;   // start with button inactive

  while(1) {

    // Critical section: read values from ISR
    cli();
    samples = samples_isr;
    sei();

    if (samples == 0) { 
      // Button is debounced; turn on LED
      PORTB |= (1<<LED);
    } else {
      PORTB &= ~(1<<LED);
    }

  }
}

ISR(TIMER0_OVF0_vect) {
    
  // Shift left to make room for new sample
  samples_isr <<= 1;

  // Put latest sample into the rightmost bit
  if (PINB & (1<<SWT)) {
    samples_isr |= 0b00000001;
  } else {
    // rightmost bit already 0
  }  
    
}

Using volatile:

samples_isr <<= 1;
58:  lds  r24,    0x0060  ;load
5c:  add  r24,    r24
5e:  sts  0x0060, r24

if (PINB & (1<<SWT)) {
62:  sbis  0x16,  1      ; PINB
64:  rjmp  .+10          ; 0x70 (epilogue)

samples_isr |= 0b00000001
66:  lds  r24,    0x0060 ; load again
6a:  ori  r24,    0x01
6c:  sts  0x0060, r24
}

Using register:

samples_isr <<= 1
46:  mov  r24,    r2     ; shuffle
48:  add  r24,    r24
4a:  mov  r2,     r24    ; shuffle back

if (PINB & (1<<SWT)) {
4c:  sbis 0x16,   1      ; PINB
4e:  rjmp .+6            ; 0x56 (epilogue)

samples_isr |= 0b00000001
50:  eor  r2,     r2     ; I don't like this
52:  inc  r2             ; very much either
54:  or   r2,     r24
}
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

volatile is acting exactly as it should. Unfortunately you cannot tell GCC to treat the variable differently inside the ISR, so instead, use a local temporary variable inside the ISR. [essentially manually doing what you want the compiler to do]

I don't suggest using register vars unless you are willing to re-compile the entire library excluding any use of your reserved registers.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
50:  eor  r2,     r2     ; I don't like this 
52:  inc  r2             ; very much either 
54:  or   r2,     r24 

There are no immediate opcodes for the lower 16 registers, so there is not much you can do about this one, though the compiler could have made it:

ori r24, 0x01
move r2, r24

But of course, it can only do that because it did the shuffling earlier.

But if you are really concerned about 4 clock cycles, write it by hand in assembly.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Another option is to not make it volatile generally, but to cast it to volatile at the point(s) where you need a guaranteed reread/rewrite.

Or you put the ISR in a separate compilation unit and simply omit the "volatile" in the extern declaration there.

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koschi, forget ORI. It should have used SBI.

sternst, neat suggestion. Tried it. Interestingly, this works..

ATOMIC_BLOCK(ATOMIC_FORCEON) { 
  samples = (volatile unsigned char)samples_isr;
}

..but this doesn't..

cli();
samples = (volatile unsigned char)samples_isr;
sei();

..as in, the latter still eliminates the assignment.

And taking "volatile" off the declaration only eliminated line 66 in the ISR. Interesting that the compiler insists on storing my variable right after it's used in the first statement.. but is fine with using the "cached" (R24) value in the second. Are my expectations too high?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This is what glitch suggests:

ISR(TIMER0_OVF0_vect) {
  uint8_t samples;
   
  samples = samples_isr;
  // Shift left to make room for new sample
  samples <<= 1;

  // Put latest sample into the rightmost bit
  if (PINB & (1<<SWT)) {
    samples++;
  } else {
    // rightmost bit already 0
  } 
   
  samples_isr = samples;
} 

and it results in

  48 0028 1F92      		push __zero_reg__
  49 002a 0F92      		push __tmp_reg__
  50 002c 0FB6      		in __tmp_reg__,__SREG__
  51 002e 0F92      		push __tmp_reg__
  52 0030 1124      		clr __zero_reg__
  53 0032 8F93      		push r24
  54               	/* prologue end (size=6) */
  55 0034 8091 0000 		lds r24,samples_isr
  56 0038 880F      		lsl r24
  57 003a B199      		sbic 54-0x20,1
  58 003c 8F5F      		subi r24,lo8(-(1))
  59               	.L11:
  60 003e 8093 0000 		sts samples_isr,r24
  61               	/* epilogue: frame size=0 */
  62 0042 8F91      		pop r24
  63 0044 0F90      		pop __tmp_reg__
  64 0046 0FBE      		out __SREG__,__tmp_reg__
  65 0048 0F90      		pop __tmp_reg__
  66 004a 1F90      		pop __zero_reg__
  67 004c 1895      		reti

However, optimisation of this sort of rather trivial ISRs down to the last tick is best to be done in asm:

ISR(TIMER0_OVF_vect, ISR_NAKED) {
   
  __asm__ (
  "push __tmp_reg__  \n\t"
  "in __tmp_reg__,__SREG__  \n\t"
  "push __tmp_reg__  \n\t"
  "lds  __tmp_reg__, samples_isr \n\t"
  "lsl  __tmp_reg__  \n\t"
  "sbic %[_PINB], %[_SWT] \n\t"
  "inc  __tmp_reg__  \n\t"
  "sts  samples_isr, __tmp_reg__ \n\t"
  "pop  __tmp_reg__ \n\t"
  "out  __SREG__, __tmp_reg__ \n\t"
  "pop  __tmp_reg__ \n\t"
  "reti  \n\t"
  :
  : [_PINB] "I" (_SFR_IO_ADDR(PINB))
  , [_SWT]  "M" (SWT)
  );
  
} 

getting rid of the unnecessary parts of prologue/epilogue (__zero_reg__ save and clear, r24 save), which in the C version form a significant part of the ISR:

  78 004e 0F92      		push __tmp_reg__  
  79 0050 0FB6      		in __tmp_reg__,__SREG__  
  80 0052 0F92      		push __tmp_reg__  
  81 0054 0090 0000 		lds  __tmp_reg__, samples_isr 
  82 0058 000C      		lsl  __tmp_reg__  
  83 005a B199      		sbic 22, 1 
  84 005c 0394      		inc  __tmp_reg__  
  85 005e 0092 0000 		sts  samples_isr, __tmp_reg__ 
  86 0062 0F90      		pop  __tmp_reg__ 
  87 0064 0FBE      		out  __SREG__, __tmp_reg__ 
  88 0066 0F90      		pop  __tmp_reg__ 
  89 0068 1895      		reti  

With using 2 register variables or some unused SFRs this could be optimised further if needed.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rkagerer wrote:
Koschi, forget ORI. It should have used SBI.

sternst, neat suggestion. Tried it. Interestingly, this works..

ATOMIC_BLOCK(ATOMIC_FORCEON) { 
  samples = (volatile unsigned char)samples_isr;
}

..but this doesn't..

cli();
samples = (volatile unsigned char)samples_isr;
sei();

..as in, the latter still eliminates the assignment.

And taking "volatile" off the declaration only eliminated line 66 in the ISR. Interesting that the compiler insists on storing my variable right after it's used in the first statement.. but is fine with using the "cached" (R24) value in the second. Are my expectations too high?

Why the ATOMIC() or cli()/sei() stuff around
accessing samples_isr and using a temporary local in main()?
I could understand if it was a 16 bit value to prevent seeing a partially updated value but it is only 8 bits.

Why not access isr_samples directly in main()?
Is the real code more complicated than this example?

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Koschi, forget ORI. It should have used SBI.

SBI only applies to peripheral (I/O) registers, not CPU registers. The equivalent to SBI when working with CPU registers is SBR, and it just so happens that SBR and ORI both assemble down to the exact same binary code.

By the way, what type of AVR device are you using? Is it a device with General Purpose I/O Registers? (GPIORn)
If so, especially if at least one of them is located within the bit-accessible region of I/O space, then using them judiciously might offer some improvement over your baseline implementation, without the need to perform too many back-flips such as typecasting, inline assembly or explicit variable caching.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Older device; no GPIOR's.

Quote:
SBI only applies to peripheral (I/O) registers, not CPU registers. The equivalent to SBI when working with CPU registers is SBR, and it just so happens that SBR and ORI both assemble down to the exact same binary code.
OOPS! Quite right, quite right. Sorry Koschi! I guess there isn't any way to collapse it down to one instruction without using regs above R15.

Bill.. I was focusing on the pattern and used an 8-bit value just to keep things simpler, but you're right if I used a 16-bit one the motivation would be clearer. Let's just pretend that there's more code in main() and that I need a consistent view of samples throughout the iteration.