This simple program shows a poor optimization by GCC:
#include <avr/io.h> int main () { OCR1A = (PIND << 8) | PIND; }
gcc 4.8.1 and 5.4.0 give:
int main () { OCR1A = (PIND << 8) | PIND; 80: 29 b1 in r18, 0x09 ; 9 82: 89 b1 in r24, 0x09 ; 9 84: 90 e0 ldi r25, 0x00 ; 0 86: 92 2b or r25, r18 88: 90 93 89 00 sts 0x0089, r25 8c: 80 93 88 00 sts 0x0088, r24 90: 80 e0 ldi r24, 0x00 ; 0 92: 90 e0 ldi r25, 0x00 ; 0 94: 08 95 ret
I also tested gcc 7.2.0, which I use because of C++17 support. It's even worse:
00000080 <main>: 80: 89 b1 in r24, 0x09 ; 9 82: 99 b1 in r25, 0x09 ; 9 84: 89 27 eor r24, r25 86: 98 27 eor r25, r24 88: 89 27 eor r24, r25 8a: 90 93 89 00 sts 0x0089, r25 ; 0x800089 <__TEXT_REGION_LENGTH__+0x7e0089> 8e: 80 93 88 00 sts 0x0088, r24 ; 0x800088 <__TEXT_REGION_LENGTH__+0x7e0088> 92: 90 e0 ldi r25, 0x00 ; 0 94: 80 e0 ldi r24, 0x00 ; 0 96: 08 95 ret
Yes, very clever the use of 3 eors to exchange r24 and r25. Very clever but very useless since you could just revert the 2 in instructions and achieve the same result... or reverse r24 and r25 in the store instructions.
I'm using -Os flag, but the other optimization levels don't seem to do any difference.
Is there one of those obscure gcc options I usually miss that might fix this?