## Shifting slower than multiplication?

6 posts / 0 new
Author
Message

Since a left shift on the AVR takes one clock cycle per bit, and multiplications take two cycles, isn't it faster in some cases to use a single multiply by a power of two instead of a series of shifts?

For example, the C code

```char r20;
r20 <<= 3;```

produces the following assembly when compiled for atmega168 with -O3:

```clr r21
sbrc r20,7
com r21
lsl r20
rol r21
lsl r20
rol r21
lsl r20
rol r21```

That's 9 cycles! So why doesn't it just do

```ldi r21,8
muls r20,r21```

which is a much more reasonable 3 cycles?

Quote:

which is a much more reasonable 3 cycles?

Ummm--where is the result after your 3 cycles?

There is a bit of apples and oranges in your example since you are forcing the C compiler to do sign work and also by rules (unless you take a different approach) forcing it to do 16-bit operations.

>>If<< you actually have an 8-bit variable in R20 ov the AVR, and >>if<< you coerce the compiler into 8-bit operations, >>then<< you may end up with three shifts/three cycles and the result is in the right place.

In your example you have to move the result to the right place. This then makes it a wash with up to 4 shifts for speed.

If you >>want<< an multiply, then why not >>tell<< the compiler to do the multiply instead of the shift? What do you get then?

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

The problem here is due to the integer promotion rules. The calculation is performing a 3 bit shift on a 16 bit value rather than an 8 bit one. This because the calculation is temporarily promoted to int according to the C language rules. The missed optimization here is not seeing that the upper byte is not used, so it can be discarded, in which case the 3 shifts would equal the cycles for the multiply. It would still be more efficient shifting in this case, because of additional register usage by mult. (remember the result ends up in r0:r1, so you will need to move your result back to the source register)

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

Ah, hadn't read the reference closely enough, didn't notice the result ends up in r0:r1. Still, if you only care about an 8-bit value, it's only one extra instruction to copy the result of the multiply back to the register you want it in.

theusch: If I say "g *= 8" instead, it still gets compiled into the shift I illustrated above. If I multiply by a non-power-of two, it gets compiled to a multiply.

Of course, nothing's stopping me from using inline assembler if I need to. Was curious, I guess that CPUs without a barrel shifter break the "use shifts instead of multiplies" mantra of x86-land.

I think you'll find that the resulting multiply will be done as a 16bit multiply (integer promotion, remember) which will take more than the 9 cycles of the 16 bit shift.

It does appear to be a missed optimization in the sense that it did not reduce the calculation to 8 bits, as the upper 8 are never used.

You'll also find that the multiply, even if done as 8 bit, will take at least 5 cycles, as R1 needs to be restored to 0 after. (R1 is assumed to always be 0 in GCC [an unfortunate early implementation decision that is now hard to change])

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

Quote:

theusch: If I say "g *= 8" instead, it still gets compiled into the shift I illustrated above.

Well, that depends on how you >>force<< the compiler to do signed 8-bit operations. [Does it really make sense in the general case to shift a signed value left? What would you expect to be the final value when asking for a shift of, say, -123 left 3? Do you still expect it to be negative or not? this appears to be at least a semi-troll question.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.