## Fast AVR Code for Dividing 16-Bit Number by 8-Bit Number

6 posts / 0 new
Author
Message

On the previous thread “Versatile AVR Code for Dividing 16 Bits by 8-Bit Constant”, the trick that lets the code be rather fast is that the long division by reg_N is made already by the compiler. Therefore, if one has a free memory space for 256 words, a 256-word table could be added which includes the results of reg_Kd [(2*256*256/reg_N+1)/2] for all possible reg_N [2 to 255]; the first 2 entries of the table could be of any number.

```;==============================================================
; Division of 16-bit register {A} by 8-bit {N}
;==============================================================

; {R}={A}/{N} = r17:r16 / {N} = r17:r16 * {Kd} /256 /256 then rounding

; {A} dividend in r17:r16 [0 to 65535]
; {N} constant divisor in r22 [N=0 to 255], see Note_1, Note_2 below
; {R} result in r21:r20 , {A}/{N} rounded
; {Kd} = round(256*256/{N},0), the division constant [from KtTable:]
; {Kr} = INT[ (1-{N})/2 ]    , the rounding constant
; Used registers: ZH,ZL,r19,r18,r14,r13,r1,r0
; 50 48 words, excluding RET and RCALL
; 53, 56, 57, or 58 cycles = 46c|49c|50c|51c code + 4c RET + 3c RCALL
; 55, 58, 59, or 60 cycles = 48c|51c|52c|53c code + 4c RET + 3c RCALL

D16var8:
TST   r22                   ;1abcd
BREQ  DOoverF               ;1abcd

CPI   r22, 1                ;1abcd
BREQ  DO_same               ;1abcd

LDI   ZH, high(KdTable)     ;1abcd, address in words, 0x??00
MOV   ZL, r22               ;1abcd
LSL   ZL                    ;1abcd
ROL   ZH                    ;1abcd
LPM   r13, Z+               ;3abcd
LPM   r18, Z+               ;3abcd
MOV   r13, r18              ;1abcd
LPM   r14, Z                ;3abcd, 14abcd
LPM   r18, Z+               ;3abcd
MOV   r14, r18              ;1abcd, 16abcd
; r14:r13 = reg_Kd

; multiplicand in r17:r16 = {A}
; multiplier   in r14:r13 = {Kd}
; mul. result  in r21:r20:r19:r18
; valid result in r21:r20
MUL   r17, r14              ;2abcd
MOVW  r21:r20, r1:r0        ;1abcd
MUL   r16, r13              ;2abcd
MOVW  r19:r18, r1:r0        ;1abcd
MUL   r17, r13              ;2abcd
CLR   r13                   ;1abcd
ADD   r19, r0               ;1abcd
ADC   r20, r1               ;1abcd
ADC   r21, r13              ;1abcd
MUL   r14, r16              ;2abcd
ADD   r19, r0               ;1abcd
ADC   r20, r1               ;1abcd
ADC   r21, r13              ;1abcd +17abcd= 32abcd
; {B} = r21:r20 = r17:r16 * r14:r13 /256/256 = {A}*{Kd}/256/256
; {R} = {B} or {B}+1

; for rounding
MOV   r13, r22              ;1abcd
; r13 = {N}

MUL   r20, r13              ;2abcd
MOVW  r19:r18, r1:r0        ;1abcd
MUL   r21, r13              ;2abcd
ADD   r19, r0               ;1abcd, +7abcd= 39abcd
; {C} = r19:r18 = {B}*{N} = r21:r20 * r13

; the following conditions were deduced empirically
; if( Carry_1=0, {R}={B}, if( Zero_2=1 OR Carry_2=0, {R}={B}, {R}={B}+1 ) )

SUB   r18, r16              ;1abcd
SBC   r19, r17              ;1abcd
; {D1} = r19:r18 = {C} - {A} = r19:r18 - r17:r16

BRCC  DIV_ret               ;2a|1bcd +4a=[43a]
; if Carry_1=0, {R}={B}

MOV   r13,  r22             ;1bcd
DEC   r13                   ;1bcd
LSR   r13                   ;1bcd
; r13 = - reg_Kr

ADD   r18,  r13             ;1bcd
CLR   r13                   ;1bcd
ADC   r19,  r13             ;1bcd
; {D2} = r19:r18 = {D1} + {-Kr} = r19:r18 + {-Kr}

BREQ  DIV_ret               ;2b|1cd, +11b=[50b]
; if Zero_2=1, {R}={B}

BRCC  DIV_ret               ;2c|1d, +12c=[51c]
; if Carry_2=0, {R}={B}

SUBI  r20,  low(-1)         ;1d
SBCI  r21, high(-1)         ;1d, +13d=[52d]
; {R}={B}+1

DIV_ret:
RET                         ;4
; {R} = r21:r20 = {B} or {B}+1 [ {A}/{N} rounded ]

DOoverF:
SER   r20
SER   r21
RET
; {R} = r21:r20 = 0xFFFF

DO_same:
MOV   r20, r16
MOV   r21, r17
RET

;====================================

.org 0x??00

KdTable:
.dw   0,0,32768,21845,16384,13107,10923,9362
.dw   8192,7282,6554,5958,5461,5041,4681,4369
.dw   4096,3855,3641,3449,3277,3121,2979,2849
.dw   2731,2621,2521,2427,2341,2260,2185,2114
.dw   2048,1986,1928,1872,1820,1771,1725,1680
.dw   1638,1598,1560,1524,1489,1456,1425,1394
.dw   1365,1337,1311,1285,1260,1237,1214,1192
.dw   1170,1150,1130,1111,1092,1074,1057,1040
.dw   1024,1008,993,978,964,950,936,923
.dw   910,898,886,874,862,851,840,830
.dw   819,809,799,790,780,771,762,753
.dw   745,736,728,720,712,705,697,690
.dw   683,676,669,662,655,649,643,636
.dw   630,624,618,612,607,601,596,590
.dw   585,580,575,570,565,560,555,551
.dw   546,542,537,533,529,524,520,516
.dw   512,508,504,500,496,493,489,485
.dw   482,478,475,471,468,465,462,458
.dw   455,452,449,446,443,440,437,434
.dw   431,428,426,423,420,417,415,412
.dw   410,407,405,402,400,397,395,392
.dw   390,388,386,383,381,379,377,374
.dw   372,370,368,366,364,362,360,358
.dw   356,354,352,350,349,347,345,343
.dw   341,340,338,336,334,333,331,329
.dw   328,326,324,323,321,320,318,317
.dw   315,314,312,311,309,308,306,305
.dw   303,302,301,299,298,297,295,294
.dw   293,291,290,289,287,286,285,284
.dw   282,281,280,279,278,277,275,274
.dw   273,272,271,270,269,267,266,265
.dw   264,263,262,261,260,259,258,257
/*
Note_1:
If N=0, the returned result [r21”r20] is 65535 [the highest possible number for it]. It could be replaced by setting an overflow flag for example.
Note_2:
In case reg_N [r22] won’t be 0 or 1, testing r22 at the beginning of the code could be removed with DOoverF: and DO_same: which deal with r22=0 and r22=1.
*/
```

The code above is the one which I use in my programs. Sorry, it is surely not the optimum one (for speed or words number) to implement this division algorithm. It simply suits the common template of my assembly codes.

Last Edited: Sun. Feb 20, 2022 - 10:23 AM

Very nice!!

```  SUBI  r20,  low(-1)         ;1d
SBCI  r21, high(-1)         ;1d, +13d=[52d]```

why not adiw r21:r20, 1 ??  (that may not be a savings, can't recall) , but is does shorten the listing, and typos

Testing for div by 1 is prob not too worthwhile, since it impacts all of the others (biases efficiency towards div by 1).

the final

` MOV   r13,  r22             ;1bcd`

Why not just forget the mov and use r22....since r22 is never used anywhere after this point (unless you are trying to preserve it for future call)

FUN!

Edit: must convert to higher registers for ADIW (r21:r20 not good)

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sun. Feb 20, 2022 - 11:03 AM

avrcandies wrote:

Very nice!!

```  SUBI  r20,  low(-1)         ;1d
SBCI  r21, high(-1)         ;1d, +13d=[52d]```

why not adiw r21:r20, 1 ??  (that may not be a savings, can't recall) , but is does shorten the listing, and typos

Testing for div by 1 is prob not too worthwhile, since it impacts all of the others (biases efficiency towards div by 1).

FUN!

Hi,

You meant "why not adiw r25:r24, 1"... right?

You are right.

I forgot adding on OP, the following personal note:

The following code is the one which I use in my programs. Sorry, the code below (now above) is surely not the optimum one (for speed or words number) to implement this division algorithm. It simply suits the common template of my assembly codes.

avrcandies wrote:

the final

` MOV   r13,  r22             ;1bcd`

Why not just forget the mov and use r22....since r22 is never used anywhere after this point (unless you are trying to preserve it for future call)

Your suggestion is good. But, as you mentioned, I usually like to preserve it for subsequent functions.

Please feel free to add any further suggestions which could be useful for others.

Last Edited: Sun. Feb 20, 2022 - 10:36 AM

You meant "why not adiw r25:r24, 1"... right?

Since you are using r21:r20 right there (subtracting -1),..maybe you can do similar similar with r25:r24...i forgot the the lower pairs don't support it (the compiler will complain!)

Optimizing is fun; it can be addictive drug, making us zombies roaming the land of AVR. When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sun. Feb 20, 2022 - 11:04 AM

avrcandies wrote:
Optimizing is fun; it can be addictive drug, making us zombies roaming the land of AVR. By the way, don't you know that the same algorithm could be applied to divide a 24-bit number by 16-bit one?

But, in this case, the reg_N Kd table would be 256 times longer [65536 words)!!!

Therefore, it could be useful/practical in case reg_N is constant [2 to 64435 65535] so that Kd could be calculated in advance by the compiler using .equ or .set.

Last Edited: Sun. Feb 20, 2022 - 01:34 PM