This code is a snippet from a software UART, I've stripped timing and control code to focus on the title issue.
#include<avr/io.h>
int main () {
/* Sample bits */
uint8_t c = 0;
for (uint8_t i = 0; i < 8; i++) {
/* Take sample */
//c >>= 1;
c /= 2;
if ( PIND & (1 << PIND6) ) {
c |= 0x80;
}
}
return c;
}
There is a line where I could use either a right shift by 1 or divide by 2, and I expected avr-gcc to generate the same code due to optimizations. But to my surprise, the code using the division is better than the shift one!
I tried several versions of avr-gcc. The shift code is worse because the uint8_t variable is promoted to 16 bits, shifted, then cropped back to 8 bits:
; code using right shift 00000080 <main>: 80: 98 e0 ldi r25, 0x08 ; 8 82: 80 e0 ldi r24, 0x00 ; 0 84: 28 2f mov r18, r24 86: 30 e0 ldi r19, 0x00 ; 0 88: 35 95 asr r19 8a: 27 95 ror r18 8c: 82 2f mov r24, r18 8e: 4e 99 sbic 0x09, 6 ; 9 90: 80 68 ori r24, 0x80 ; 128 92: 91 50 subi r25, 0x01 ; 1 94: b9 f7 brne .-18 ; 0x84 <main+0x4> 96: 90 e0 ldi r25, 0x00 ; 0 98: 08 95 ret
When using the division by 2, the code is what I expected:
; code using division 00000080 <main>: 80: 98 e0 ldi r25, 0x08 ; 8 82: 80 e0 ldi r24, 0x00 ; 0 84: 86 95 lsr r24 86: 4e 99 sbic 0x09, 6 ; 9 88: 80 68 ori r24, 0x80 ; 128 8a: 91 50 subi r25, 0x01 ; 1 8c: d9 f7 brne .-10 ; 0x84 <main+0x4> 8e: 90 e0 ldi r25, 0x00 ; 0 90: 08 95 ret
This makes quite a difference on a tight loop where timing is quite critical. I don't understand why the variable is promoted to 16 bits in the shift case, maybe the C standard mandates this? Opinions welcome.