I have a question on how avr-gcc handles large (16x16) multiplies. I'm working with an XMega128A1 and am use avr-gcc 4.3.2.
I have a project that will require lots of 16X16 multiplies and in general I need all 32 bits of the result. Of course I need to use the smallest number of cycles.
I've tried several constructs. Here are two:
int ia, ib;
lc = (long) ia*ib;
This one does not work. The code generated builds the 16 bit result and then sign extends that to 32 bits - not what I need.
lc = (long)ia * (long)ib;
This produces the correct results but wastes quite a few cycles. It first sign extends ia and ib, then calls the routine __mulsi3, which does a full 32x32 multiply. My guess is this requires ~50% more cycles than needed.
My question: is there a way to get the compiler to do just the 16 X 16 to 32 bit multiply? I can build my own using embedded assembly, but I was hoping that I won't have to do that.