Everyone knows, to shift right a 24 bits variable (3 bytes) it needs 3 instructions per bit shifted, if doing straight without using loop.
per bit:
ASR ByteH ; if signed, or LSR if unsigned
ROR ByteM
ROR ByteL
When shifting 6 bits, it will use 18 clock cycles, if this is part of a repeating routine, it can easily eat time.
Using reciprocal multiplication can save some cycles.
Shift right ABC (24 bits) "n" bits:
The same as ABC x 2^(8-n), then ignore LSByte, result into ABC.
For "n" 4~7 it saves clock cycles.
If ABC is signed and negative, needs to pad resulting "A" n MSbits with "1", not showing below.
If n=4, D=0x10
If n=5, D=0x08
If n=6, D=0x04
A B C
x D
--- --- --- ---
CDH /
ADH ADL /
BDH BDL /
--- --- --- ---
A B C /
Option A (10 clock cycles), considering variable Zero = 0:
MUL A, D ; MOV A, R1 ; ADH MoV X, R0 ; ADL MUL C, D ; MOV C, R1 ; CDH MUL B, D MOV B, X ; ADL ADD C, R0 ; CDH + BDL ADC B, R1 ; ADL + BDH + cy ADC A, ZERO ; ADH + cy
Option B, just changing sequence, same clock cycles.
MUL C, D ; MOV C, R1 ; CDH MUL A, D ; MOV A, R1 ; ADH MOV X, R0 ; ADL MUL B, D ; MOV B, X ; ADL ADD C, R0 ; CDH + BDL ADC B, R1 ; ADL + BDH + cy ADC A, ZERO ; ADH + cy
Any other ideas for less than 10 clock cycles?
The example above will run in 10 clock cycles no matter the "n" for unsigned.
In the normal shifting fashion, 6 bits will eat 18 clock cycles for unsigned (LSR) or signed (ASR).