Fast AVR Code for Dividing 16-Bit Number by 8-Bit Number

Go To Last Post
6 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

On the previous thread “Versatile AVR Code for Dividing 16 Bits by 8-Bit Constant”, the trick that lets the code be rather fast is that the long division by reg_N is made already by the compiler. Therefore, if one has a free memory space for 256 words, a 256-word table could be added which includes the results of reg_Kd [(2*256*256/reg_N+1)/2] for all possible reg_N [2 to 255]; the first 2 entries of the table could be of any number.
 

;==============================================================
; Division of 16-bit register {A} by 8-bit {N}
;==============================================================

; {R}={A}/{N} = r17:r16 / {N} = r17:r16 * {Kd} /256 /256 then rounding

; {A} dividend in r17:r16 [0 to 65535]
; {N} constant divisor in r22 [N=0 to 255], see Note_1, Note_2 below
; {R} result in r21:r20 , {A}/{N} rounded
; {Kd} = round(256*256/{N},0), the division constant [from KtTable:]
; {Kr} = INT[ (1-{N})/2 ]    , the rounding constant
; Used registers: ZH,ZL,r19,r18,r14,r13,r1,r0
; 50 48 words, excluding RET and RCALL
; 53, 56, 57, or 58 cycles = 46c|49c|50c|51c code + 4c RET + 3c RCALL
; 55, 58, 59, or 60 cycles = 48c|51c|52c|53c code + 4c RET + 3c RCALL 

D16var8:
    TST   r22                   ;1abcd
    BREQ  DOoverF               ;1abcd

    CPI   r22, 1                ;1abcd
    BREQ  DO_same               ;1abcd

    LDI   ZH, high(KdTable)     ;1abcd, address in words, 0x??00
    MOV   ZL, r22               ;1abcd
    LSL   ZL                    ;1abcd
    ROL   ZH                    ;1abcd
    LPM   r13, Z+               ;3abcd
    LPM   r18, Z+               ;3abcd
    MOV   r13, r18              ;1abcd
    LPM   r14, Z                ;3abcd, 14abcd
    LPM   r18, Z+               ;3abcd
    MOV   r14, r18              ;1abcd, 16abcd
; r14:r13 = reg_Kd

; multiplicand in r17:r16 = {A}
; multiplier   in r14:r13 = {Kd}
; mul. result  in r21:r20:r19:r18
; valid result in r21:r20
    MUL   r17, r14              ;2abcd
    MOVW  r21:r20, r1:r0        ;1abcd
    MUL   r16, r13              ;2abcd
    MOVW  r19:r18, r1:r0        ;1abcd
    MUL   r17, r13              ;2abcd
    CLR   r13                   ;1abcd
    ADD   r19, r0               ;1abcd
    ADC   r20, r1               ;1abcd
    ADC   r21, r13              ;1abcd
    MUL   r14, r16              ;2abcd
    ADD   r19, r0               ;1abcd
    ADC   r20, r1               ;1abcd
    ADC   r21, r13              ;1abcd +17abcd= 32abcd
; {B} = r21:r20 = r17:r16 * r14:r13 /256/256 = {A}*{Kd}/256/256
; {R} = {B} or {B}+1

; for rounding
    MOV   r13, r22              ;1abcd
; r13 = {N}

    MUL   r20, r13              ;2abcd
    MOVW  r19:r18, r1:r0        ;1abcd
    MUL   r21, r13              ;2abcd
    ADD   r19, r0               ;1abcd, +7abcd= 39abcd
; {C} = r19:r18 = {B}*{N} = r21:r20 * r13

; the following conditions were deduced empirically
; if( Carry_1=0, {R}={B}, if( Zero_2=1 OR Carry_2=0, {R}={B}, {R}={B}+1 ) )

    SUB   r18, r16              ;1abcd
    SBC   r19, r17              ;1abcd
; {D1} = r19:r18 = {C} - {A} = r19:r18 - r17:r16

    BRCC  DIV_ret               ;2a|1bcd +4a=[43a]
; if Carry_1=0, {R}={B}

    MOV   r13,  r22             ;1bcd
    DEC   r13                   ;1bcd
    LSR   r13                   ;1bcd
; r13 = - reg_Kr

    ADD   r18,  r13             ;1bcd
    CLR   r13                   ;1bcd
    ADC   r19,  r13             ;1bcd
; {D2} = r19:r18 = {D1} + {-Kr} = r19:r18 + {-Kr}

    BREQ  DIV_ret               ;2b|1cd, +11b=[50b]
; if Zero_2=1, {R}={B}

    BRCC  DIV_ret               ;2c|1d, +12c=[51c]
; if Carry_2=0, {R}={B}

    SUBI  r20,  low(-1)         ;1d
    SBCI  r21, high(-1)         ;1d, +13d=[52d]
; {R}={B}+1

DIV_ret:
    RET                         ;4
; {R} = r21:r20 = {B} or {B}+1 [ {A}/{N} rounded ]

DOoverF:
    SER   r20
    SER   r21
    RET
; {R} = r21:r20 = 0xFFFF

DO_same:
    MOV   r20, r16
    MOV   r21, r17
    RET

;====================================

    .org 0x??00

KdTable:
    .dw   0,0,32768,21845,16384,13107,10923,9362
    .dw   8192,7282,6554,5958,5461,5041,4681,4369
    .dw   4096,3855,3641,3449,3277,3121,2979,2849
    .dw   2731,2621,2521,2427,2341,2260,2185,2114
    .dw   2048,1986,1928,1872,1820,1771,1725,1680
    .dw   1638,1598,1560,1524,1489,1456,1425,1394
    .dw   1365,1337,1311,1285,1260,1237,1214,1192
    .dw   1170,1150,1130,1111,1092,1074,1057,1040
    .dw   1024,1008,993,978,964,950,936,923
    .dw   910,898,886,874,862,851,840,830
    .dw   819,809,799,790,780,771,762,753
    .dw   745,736,728,720,712,705,697,690
    .dw   683,676,669,662,655,649,643,636
    .dw   630,624,618,612,607,601,596,590
    .dw   585,580,575,570,565,560,555,551
    .dw   546,542,537,533,529,524,520,516
    .dw   512,508,504,500,496,493,489,485
    .dw   482,478,475,471,468,465,462,458
    .dw   455,452,449,446,443,440,437,434
    .dw   431,428,426,423,420,417,415,412
    .dw   410,407,405,402,400,397,395,392
    .dw   390,388,386,383,381,379,377,374
    .dw   372,370,368,366,364,362,360,358
    .dw   356,354,352,350,349,347,345,343
    .dw   341,340,338,336,334,333,331,329
    .dw   328,326,324,323,321,320,318,317
    .dw   315,314,312,311,309,308,306,305
    .dw   303,302,301,299,298,297,295,294
    .dw   293,291,290,289,287,286,285,284
    .dw   282,281,280,279,278,277,275,274
    .dw   273,272,271,270,269,267,266,265
    .dw   264,263,262,261,260,259,258,257
/*
Note_1:
If N=0, the returned result [r21”r20] is 65535 [the highest possible number for it]. It could be replaced by setting an overflow flag for example.
Note_2:
In case reg_N [r22] won’t be 0 or 1, testing r22 at the beginning of the code could be removed with DOoverF: and DO_same: which deal with r22=0 and r22=1.
*/

 

ADDED:

The code above is the one which I use in my programs. Sorry, it is surely not the optimum one (for speed or words number) to implement this division algorithm. It simply suits the common template of my assembly codes.

Last Edited: Sun. Feb 20, 2022 - 10:23 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Very nice!!

 

  SUBI  r20,  low(-1)         ;1d
    SBCI  r21, high(-1)         ;1d, +13d=[52d]

why not adiw r21:r20, 1 ??  (that may not be a savings, can't recall) , but is does shorten the listing, and typos

 

Testing for div by 1 is prob not too worthwhile, since it impacts all of the others (biases efficiency towards div by 1).

 

the final 

 MOV   r13,  r22             ;1bcd

Why not just forget the mov and use r22....since r22 is never used anywhere after this point (unless you are trying to preserve it for future call) 

 

FUN!

Edit: must convert to higher registers for ADIW (r21:r20 not good)

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sun. Feb 20, 2022 - 11:03 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avrcandies wrote:

Very nice!!

 

  SUBI  r20,  low(-1)         ;1d
    SBCI  r21, high(-1)         ;1d, +13d=[52d]

why not adiw r21:r20, 1 ??  (that may not be a savings, can't recall) , but is does shorten the listing, and typos

 

Testing for div by 1 is prob not too worthwhile, since it impacts all of the others (biases efficiency towards div by 1).

 

FUN!

 

Hi,

You meant "why not adiw r25:r24, 1"... right?

You are right.

 

I forgot adding on OP, the following personal note:

The following code is the one which I use in my programs. Sorry, the code below (now above) is surely not the optimum one (for speed or words number) to implement this division algorithm. It simply suits the common template of my assembly codes.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avrcandies wrote:

the final 

 MOV   r13,  r22             ;1bcd

Why not just forget the mov and use r22....since r22 is never used anywhere after this point (unless you are trying to preserve it for future call) 

 

Your suggestion is good. But, as you mentioned, I usually like to preserve it for subsequent functions.

 

Added:

Please feel free to add any further suggestions which could be useful for others.

 

Last Edited: Sun. Feb 20, 2022 - 10:36 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You meant "why not adiw r25:r24, 1"... right?

Since you are using r21:r20 right there (subtracting -1),..maybe you can do similar similar with r25:r24...i forgot the the lower pairs don't support it (the compiler will complain!)

Optimizing is fun; it can be addictive drug, making us zombies roaming the land of AVR. cool

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sun. Feb 20, 2022 - 11:04 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avrcandies wrote:
Optimizing is fun; it can be addictive drug, making us zombies roaming the land of AVR. cool

 

By the way, don't you know that the same algorithm could be applied to divide a 24-bit number by 16-bit one?

But, in this case, the reg_N Kd table would be 256 times longer [65536 words)!!!

Therefore, it could be useful/practical in case reg_N is constant [2 to 64435 65535] so that Kd could be calculated in advance by the compiler using .equ or .set.

 

Last Edited: Sun. Feb 20, 2022 - 01:34 PM