Unsigned division of 32 bit by 16 bit giving 32 quot &16

Go To Last Post
3 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

i wrote a machine language routine dividing a 32 bit unsigned by an 16 bit unsigned. Result is a 32 bit quotient and a 32 bit remainder.

I think about optimizing it. But for now it is OK.
If somebody is interested, please feel free to copy.
I think there is comment enough to understand it.

Any suggestions or comments or hints about it?

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

HKarl, here are my suggestions:

1) The Result(Quotient) does not need to be cleared during loop initialization, since it is completely overwritten inside the loop.
2) Use a countdown-to-zero loop-counter, to eliminate the "CPI R21, 0x20".
3) Avoid branches by moving the loop termination check from the top of the loop to the bottom. [ turn it into a "do {} while()" loop ]
4) Dividend and Quotient can share the same regs, since Dividend is shifted out one bit at a time, and Quotient is shifted in one bit at a time. This will save you four ROL ops.
5) Use LSL instead of CLC & ROL
6) You have an awful lot of branches for your 16bit compare. Replace it all with a CP, CPC, and BRLO.

--Brian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi bcb,

thank you for your helpfull suggestions:
1, 2, 4 and 5 are quite usefull. nr of cycles decreased by about 30 percent. (for given dividend and divisor)

About 3, I think, that branch at loop start is 32 times false (which is 1 cycle) and one time true.
At loop end it would be 32 true (which is 2 cycles) and one time false.
I know, it is often worth to think about invert and moving conditions on loops.

I had a look at the comparison and removed a superflous jmp.
But using cp, cpc and brlo, would reduce code size by increasing run time, because it must ever check both bytes before the decision. I asume you mean something like:

cp Low1, Low2
cpc High1, High2
brlo IS_LOWER

Would you like to comment, bcb?