Although I can often do better than gcc's code generation, I recently found it can optimize constant loading to avoid using ldr. For example:
num = 8192;
will generate:
movs rN, #128 lsls rN, rN, #6
But when I write ARM asm, I write it as:
ldr rN, =8192
I'm wondering if there are any assembler tricks that would automatically convert the ldr into movs + lsls when the value can be composed from an 8-bit immediate + a shift.