Since a left shift on the AVR takes one clock cycle per bit, and multiplications take two cycles, isn't it faster in some cases to use a single multiply by a power of two instead of a series of shifts?
For example, the C code
char r20; // load something into r20 r20 <<= 3;
produces the following assembly when compiled for atmega168 with -O3:
clr r21 sbrc r20,7 com r21 lsl r20 rol r21 lsl r20 rol r21 lsl r20 rol r21
That's 9 cycles! So why doesn't it just do
ldi r21,8 muls r20,r21
which is a much more reasonable 3 cycles?