I've found that avr-gcc automatically generates loop counters even when I don't want it to.
Check out this example code that counts how many pins in PORT D are set (this is just an example to illustrate the problem). I'm using Arduino IDE.
int main() { byte mask=1, count=0; do if (PIND & mask) count++; while (mask <<=1); return count; }
This loop executes 8 times, because the mask variable will eventually overflow its 8 bit size and become zero, stopping the loop. Now check out the assembly output:
00000080 <main>: int main() { 80: 28 e0 ldi r18, 0x08 ; 8 82: 30 e0 ldi r19, 0x00 ; 0 byte mask=1, count=0; 84: 80 e0 ldi r24, 0x00 ; 0 86: 91 e0 ldi r25, 0x01 ; 1 do { if (PIND & mask) count++; 88: 49 b1 in r20, 0x09 ; 9 8a: 49 23 and r20, r25 8c: 09 f0 breq .+2 ; 0x90 <main+0x10> 8e: 8f 5f subi r24, 0xFF ; 255 int main() { byte mask=1, count=0; do { 90: 99 0f add r25, r25 92: 21 50 subi r18, 0x01 ; 1 94: 31 09 sbc r19, r1 96: c1 f7 brne .-16 ; 0x88 <main+0x8> if (PIND & mask) count++; } while (mask <<=1); return count; } 98: 90 e0 ldi r25, 0x00 ; 0 9a: 08 95 ret
The compiler noticed that the loop executes 8 times, so it decided to create a counter of its own accord, worse, it creates a 16 bit counter to store a value of 8 using r18 and r19 and then decrements it inside the loop (addr 92 and 94). The compiler at least could have been smart enough to see that 8 could fit in 8 bits, after all the ATmega is a 8 bit processor and 16 bit operations have extra costs.
If I disable this "optimization" feature of gcc using the option "--param max-iterations-to-track=1", the code I was expecting is generated:
00000080 <main>: int main() { byte mask=1, count=0; 80: 80 e0 ldi r24, 0x00 ; 0 82: 91 e0 ldi r25, 0x01 ; 1 do { if (PIND & mask) count++; 84: 29 b1 in r18, 0x09 ; 9 86: 29 23 and r18, r25 88: 09 f0 breq .+2 ; 0x8c <main+0xc> 8a: 8f 5f subi r24, 0xFF ; 255 int main() { byte mask=1, count=0; do { 8c: 99 0f add r25, r25 8e: d1 f7 brne .-12 ; 0x84 <main+0x4> if (PIND & mask) count++; } while (mask <<=1); return count; } 90: 90 e0 ldi r25, 0x00 ; 0 92: 08 95 ret
So the code becomes faster and smaller disabling this optimization, I hope there aren't many like this inside gcc. Should I warn avr-gcc developers of this issue? I mean, it's not really a bug, just non-optimal compiling.