## Versatile AVR Code for Dividing 16 Bits by 8-Bit Constant

53 posts / 0 new
Author
Message

Lately, I needed dividing two 8-bit registers by various 8-bit constants. I ended up having one common code for all constant divisors [1 to 255] [2 to 255].

I liked sharing it here since I couldn’t find a similar/equivalent one by searching on the internet. But it is possible that I missed it and someone else did it already.

Its speed is not bad [39/34 cycles, max/min ; excluding RET and RCALL].

It uses 33 words.

To use this code (posted below), only the constant divisor ‘reg_N’ needs to be changed.

By the way, I checked the rounding of this division code for every constant divisor [1 to 255] [2 to 255] and for all dividend values [0 to 65535].

Kerim

Added: An updated version on post #5 , it suits better the registers of 'my' assembly programs (not recommended).

Edited: Another version on post #37 , so far, it is the optimum version of the code, in term of speed (35 cycles) and memory space (29 words).

By avrcandies

Edited: Another version on post #45 , another good version of the code.

By grohote

```
;==============================================================
; *R* Division of 16-bit register {A} by 8-bit constant {N} ***
;==============================================================

; {R}={A}/{N} = r17:r16 / {N} = r17:r16 * {Kd} /256 /256 then rounding

; {A} dividend in r17:r16 [0 to 65535]
; {N} constant divisor [1 to 255]
; {R} result in r21:r20 , {A}/{N} rounded
; {Kd} = round(256*256/{N},0), the division constant
; {Kr} = INT[ (1-{N})/2 ]    , the rounding constant
; used registers: r19,r18,r14,r13,r1,r0
; 33 words , excluding RET and RCALL
; 41, 44, 45, or 46 cycles = 34c|37c|38c|39c code + 4c RET + 3c RCALL

.set reg_N = 113                  ; {N}
.set reg_Kd=(2*256*256/reg_N+1)/2 ; {Kd}
.set reg_Kr=(1-reg_N)/2           ; {Kr}

D16_nnn:
LDI   r18,  low(reg_Kd)     ;1abcd
MOV   r13, r18              ;1abcd
LDI   r18, high(reg_Kd)     ;1abcd
MOV   r14, r18              ;1abcd, 4abcd
; r14:r13 = reg_Kd

; multiplicand in r17:r16 = {A}
; multiplier   in r14:r13 = {Kd}
; mul. result  in r21:r20:r19:r18
; valid result in r21:r20
MUL   r17, r14              ;2abcd
MOVW  r21:r20, r1:r0        ;1abcd
MUL   r16, r13              ;2abcd
MOVW  r19:r18, r1:r0        ;1abcd
MUL   r17, r13              ;2abcd
CLR   r13                   ;1abcd
MUL   r14, r16              ;2abcd
ADC   r21, r13              ;1abcd +17abcd= 21abcd
; {B} = r21:r20 = r17:r16 * r14:r13 /256/256 = {A}*{Kd}/256/256
; {R} = {B} or {B}+1

; for rounding
LDI   r18, reg_N            ;1abcd
MOV   r13, r18              ;1abcd
; r13 = {N}

MUL   r20, r13              ;2abcd
MOV   r18, r0               ;1abcd
MOV   r19, r1               ;1abcd
MUL   r21, r13              ;2abcd
ADD   r19, r0               ;1abcd, +9abcd= 30abcd
; {C} = r19:r18 = {B}*{N} = r21:r20 * r13

; the following conditions were deduced empirically
; if( Carry_1=0, {R}={B}, if( Zero_2=1 OR Carry_2=0, {R}={B}, {R}={B}+1 ) )

SUB   r18, r16              ;1abcd
SBC   r19, r17              ;1abcd
; {D1} = r19:r18 = {C} - {A} = r19:r18 - r17:r16

BRCC  DIV_ret               ;2a|1bcd +4a=[34a]
; if Carry_1=0, {R}={B}

SUBI  r18,  low(reg_Kr)     ;1bcd
SBCI  r19, high(reg_Kr)     ;1bcd
; {D2} = r19:r18 = {D1} - {Kr} = r19:r18 - {Kr}

BREQ  DIV_ret               ;2b|1cd, +7b=[37b]
; if Zero_2=1, {R}={B}

BRCC  DIV_ret               ;2c|1d, +8c=[38c]
; if Carry_2=0, {R}={B}

SUBI  r20,  low(-1)         ;1d
SBCI  r21, high(-1)         ;1d, +9d=[39d]
; {R}={B}+1

DIV_ret:
RET                         ;4
; {R} = r21:r20 = {B} or {B}+1 [ {A}/{N} rounded ]
```

Last Edited: Tue. Jan 18, 2022 - 10:15 AM

Sweet!

Keys to wealth:

Invest for cash flow, not capital gains!

Wealth is attracted, not chased!

Income is proportional to how many you serve!

If you want something you've never had...

...you must be willing to do something you've never done!

Lets go Brandon!

Neat!!  I'm giving it a try.

noticed:

```; for rounding
LDI   r18, reg_N            ;1abcd
MOV   r13, r18              ;1abcd```

I think can simply be one line:  ldi r16, reg_N     ...r16 allows the immediate

then below this line: edit everywhere r13 with r16  and r16 with r13

------------

``` MOV   r18, r0               ;1abcd
MOV   r19, r1               ;1abcd```

of course is just movw r19:r18, r1:r0     ...(you also use this exact same line somewhere else above)

-------------

if you use r24:r23 everywhere, instead of  r21:r20

```    SUBI  r20,  low(-1)         ;1d
SBCI  r21, high(-1)         ;1d, +9d=[39d]```

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Wed. Jan 12, 2022 - 12:34 AM

avrcandies wrote:

Neat!!  I'm giving it a try.

noticed:

```; for rounding
LDI   r18, reg_N            ;1abcd
MOV   r13, r18              ;1abcd```

I think can simply be one line:  ldi r16, reg_N     ...r16 allows the immediate

then below this line: edit everywhere r13 with r16  and r16 with r13

Please note that r17:r16 hold the 16-bit dividend at the entry of the code.

------------

avrcandies wrote:

``` MOV   r18, r0               ;1abcd
MOV   r19, r1               ;1abcd```

of course is just movw r19:r18, r1:r0     ...(you also use this exact same line somewhere else above)

Thank you. I missed doing it in my last revision. And this saves 1 cycle and 1 word.

-------------

avrcandies wrote:

if you use r24:r23 everywhere, instead of  r21:r20

```    SUBI  r20,  low(-1)         ;1d
SBCI  r21, high(-1)         ;1d, +9d=[39d]```

I guess you meant: adiw r25:r24, 1

You are right and this also saves 1 word.

In my actual study (of this code), I tried not using these two registers for a personal reason.

Before I noticed that ADIW works also for r25:r24 besides X,Y and Z, I used, almost always, the bits of r25 and r24 (if not of r23 too) as flags in my assembly programs (they were in hundreds). So I seldom use adiw r25:r24, K when there is no other option while writing in a new program.

Thank you again for your interesting remarks.

Following the remarks of ‘avrcandies’, the updated code is:

```;==============================================================
; *R* Division of 16-bit register {A} by 8-bit constant {N} ***
;==============================================================

; {R}={A}/{N} = r17:r16 / {N} = r17:r16 * {Kd} /256 /256 then rounding

; {A} dividend in r17:r16 [0 to 65535]
; {N} constant divisor [1 to 255]
; {R} result in r25:r24 , {A}/{N} rounded
; {Kd} = round(256*256/{N},0), the division constant
; {Kr} = INT[ (1-{N})/2 ]    , the rounding constant
; used registers: r19,r18,r14,r13,r1,r0
; 31 words , excluding RET and RCALL
; 40, 43, 44, or 45 cycles = 33c|36c|37c|38c code + 4c RET + 3c RCALL

.set reg_N = 113                  ; {N}
.set reg_Kd=(2*256*256/reg_N+1)/2 ; {Kd}
.set reg_Kr=(1-reg_N)/2           ; {Kr}

D16_nnn:
LDI   r18,  low(reg_Kd)     ;1abcd
MOV   r13, r18              ;1abcd
LDI   r18, high(reg_Kd)     ;1abcd
MOV   r14, r18              ;1abcd, 4abcd
; r14:r13 = reg_Kd

; multiplicand in r17:r16 = {A}
; multiplier   in r14:r13 = {Kd}
; mul. result  in r25:r24:r19:r18
; valid result in r25:r24
MUL   r17, r14              ;2abcd
MOVW  r25:r24, r1:r0        ;1abcd
MUL   r16, r13              ;2abcd
MOVW  r19:r18, r1:r0        ;1abcd
MUL   r17, r13              ;2abcd
CLR   r13                   ;1abcd
MUL   r14, r16              ;2abcd
ADC   r25, r13              ;1abcd +17abcd= 21abcd
; {B} = r25:r24 = r17:r16 * r14:r13 /256/256 = {A}*{Kd}/256/256
; {R} = {B} or {B}+1

; for rounding
LDI   r18, 9                ;1abcd
MOV   r13, r18              ;1abcd
; r13 = {N}

MUL   r24, r13              ;2abcd
MOVW  r19:r18, r1:r0        ;1abcd
MUL   r25, r13              ;2abcd
ADD   r19, r0               ;1abcd, +8abcd= 29abcd
; {C} = r19:r18 = {B}*{N} = r25:r24 * r13

; the following conditions were deduced empirically
; if( Carry_1=0, {R}={B}, if( Zero_2=1 OR Carry_2=0, {R}={B}, {R}={B}+1 ) )

SUB   r18, r16              ;1abcd
SBC   r19, r17              ;1abcd
; {D1} = r19:r18 = {C} - {A} = r19:r18 - r17:r16

BRCC  DIV_ret               ;2a|1bcd +4a=[33a]
; if Carry_1=0, {R}={B}

SUBI  r18,  low(reg_Kr)     ;1bcd
SBCI  r19, high(reg_Kr)     ;1bcd
; {D2} = r19:r18 = {D1} - {Kr} = r19:r18 - {Kr}

BREQ  DIV_ret               ;2b|1cd, +7b=[36b]
; if Zero_2=1, {R}={B}

BRCC  DIV_ret               ;2c|1d, +8c=[37c]
; if Carry_2=0, {R}={B}

; {R}={B}+1

DIV_ret:
RET                         ;4
; {R} = r25:r24 = {B} or {B}+1 [ {A}/{N} rounded ]
```

Edited:

```     LDI r18, 9 ;1abcd
above should be
LDI r18, reg_N ;1abcd```

Last Edited: Wed. Jan 12, 2022 - 10:00 PM

I started to read the code, and found reg_Kr=(1-reg_N)/2

which seems to be 16b negative number, you are fetching low and high of them.

It is l'bit puzzling to me.

grohote wrote:

I started to read the code, and found reg_Kr=(1-reg_N)/2

which seems to be 16b negative number, you are fetching low and high of them.

It is l'bit puzzling to me.

The maximum value of reg_N is 255. Therefore the minimum value of reg_Kr is -127 which is 15b negative number (0xFF81) for SUBI and SBCI. The same applies for other values of reg_N.

Last Edited: Wed. Jan 12, 2022 - 01:09 PM

Why you have dual KD description, please:

; {Kd} = round(256*256/{N},0), the division constant
.set reg_Kd=(2*256*256/reg_N+1)/2 ; {Kd}

My last question is: did you checked the case N=1 (for Min and Max of input number)?

Please note that r17:r16 hold the 16-bit dividend at the entry of the code.

Sorry, I should say swap r13 & r16 everywhere:

R16 is never used as part of a word command

R16 is not used for any immediate operations

So it can be swapped with any other register

R13 is never used as part of a word command

R13 would like to use immediate mode, but it can't

So  these two can be swapped

``` for rounding
LDI   r16, 9                ;1abcd```

So another line is saved!

That may work always, but in any cases where the caller uses a movw  for loading r17:r16, then it would need

split into 2 operations  (r17, r13) , taking back the gain in those situations.

of course r17:r13  vs r17:r16 doesn't is not as "beautiful"

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

grohote wrote:

Why you have dual KD description, please:

; {Kd} = round(256*256/{N},0), the division constant
.set reg_Kd=(2*256*256/reg_N+1)/2 ; {Kd}

My last question is: did you checked the case N=1 (for Min and Max of input number)?

Now I got your question and you are right, thank you.

Indeed, reg_Kd=0 (not 65536) if reg_N=1.

I didn't notice this case because most of my tests were made on an Excel sheet which gave me 65536 instead of 0. It is was my mistake for not adding something like MOD(xxx,65536) so that the test fails for reg_N=1. Fortunately, no one is interested in using the code for reg_N=1.

About the two Kd descriptions, the first one is just descriptive while the second one is how to be applied for the code.

Thank you again for pointing out my mistake. I had to write on my OP that reg_N is from 2 to 255 (not from 1 to 255).

Actually, after I tried the r13 r16 swap, I noticed you can ALSO  fix up the first ldi (it becomes ldi r16, low(reg_kd)  ANOTHER line saved

```D16_nnn:
LDI   r18,  low(reg_Kd)     ;1abcd
MOV   r13, r18              ;1abcd
LDI   r18, high(reg_Kd)     ;1abcd
MOV   r14, r18              ;1abcd, 4abcd```

then I thought swap r14 & r17 too...ANOTHER line saved:

I noticed the end result due to swap ended with A in r14:r13   ...oddball

so I swapped  r14:r13  for  r15:r14   (edit: replace r14 with r15, then edit: replace r13 with r14)

here is what I see after edits (******  let me know if you want me to remove this from the post ****, I don't want to make confusion)

```        ; {R}={A}/{N} = r15:r14 / {N} = r15:r14 * {Kd} /256 /256 then rounding

; {A} dividend in r15:r14 [0 to 65535]
; {N} constant divisor [1 to 255]
; {R} result in r25:r24 , {A}/{N} rounded
; {Kd} = round(256*256/{N},0), the division constant
; {Kr} = INT[ (1-{N})/2 ]    , the rounding constant
; used registers: r19,r18,r17,r16,r1,r0
; 31 words , excluding RET and RCALL
; 40, 43, 44, or 45 cycles = 33c|36c|37c|38c code + 4c RET + 3c RCALL

.set reg_N = 113                  ; {N}
.set reg_Kd=(2*256*256/reg_N+1)/2 ; {Kd}
.set reg_Kr=(1-reg_N)/2           ; {Kr}

D16_nnn:
LDI   r16, low(reg_Kd)     ;1abcd
LDI   r17, high(reg_Kd)     ;1abcd
; r17:r16 = reg_Kd

; multiplicand in r15:r14 = {A}
; multiplier   in r17:r16 = {Kd}
; mul. result  in r25:r24:r19:r18
; valid result in r25:r24
MUL   r15, r17              ;2abcd
MOVW  r25:r24, r1:r0        ;1abcd
MUL   r14, r16              ;2abcd
MOVW  r19:r18, r1:r0        ;1abcd
MUL   r15, r16              ;2abcd
CLR   r16                   ;1abcd
MUL   r17, r14              ;2abcd
ADC   r25, r16              ;1abcd +17abcd= 21abcd
; {B} = r25:r24 = r15:r14 * r17:r16 /256/256 = {A}*{Kd}/256/256
; {R} = {B} or {B}+1

; for rounding
LDI   r16, reg_N             ;1abcd NOTE: UPDATED PER POSTER'S CORRECTION
; r16 = {N}

MUL   r24, r16              ;2abcd
MOVW  r19:r18, r1:r0        ;1abcd
MUL   r25, r16              ;2abcd
ADD   r19, r0               ;1abcd, +8abcd= 29abcd
; {C} = r19:r18 = {B}*{N} = r25:r24 * r16

; the following conditions were deduced empirically
; if( Carry_1=0, {R}={B}, if( Zero_2=1 OR Carry_2=0, {R}={B}, {R}={B}+1 ) )

SUB   r18, r14              ;1abcd
SBC   r19, r15              ;1abcd
; {D1} = r19:r18 = {C} - {A} = r19:r18 - r15:r14

BRCC  DIV_ret               ;2a|1bcd +4a=[33a]
; if Carry_1=0, {R}={B}

SUBI  r18,  low(reg_Kr)     ;1bcd
SBCI  r19, high(reg_Kr)     ;1bcd
; {D2} = r19:r18 = {D1} - {Kr} = r19:r18 - {Kr}

BREQ  DIV_ret               ;2b|1cd, +7b=[36b]
; if Zero_2=1, {R}={B}

BRCC  DIV_ret               ;2c|1d, +8c=[37c]
; if Carry_2=0, {R}={B}

; {R}={B}+1

DIV_ret:
RET                         ;4
; {R} = r25:r24 = {B} or {B}+1 [ {A}/{N} rounded ]
```

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Thu. Jan 13, 2022 - 05:26 AM

from the codes, note r18 is not used anywhere here & r18 is overwritten just below this code block

so that line could be:  mov r19, r1, though at the moment it offers no gain, might allow more register playing around.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

True, I think that r19 r20 r21 are in a group, so not an error.

avrcandies wrote:
here is what I see after edits (******  let me know if you want me to remove this from the post ****, I don't want to make confusion)

It is fine to me in the least. I even thank you for your work.

For instance, on my (so in yours) updated code, there is a mistyping:

```; for rounding
LDI   r18, 9                ;1abcd

should be

; for rounding
LDI   r18, reg_N            ;1abcd
```

Last Edited: Wed. Jan 12, 2022 - 10:28 PM

A minor improvement could be made by removing the extra label  ‘DIV_ret’ and using 'PC+x' instead of it in the conditional branch instructions:

```; the following conditions were deduced empirically
; if( Carry_1=0, {R}={B}, if( Zero_2=1 OR Carry_2=0, {R}={B}, {R}={B}+1 ) )

SUB   r18, r16              ;1abcd
SBC   r19, r17              ;1abcd
; {D1} = r19:r18 = {C} - {A} = r19:r18 - r17:r16

BRCC  PC+6                  ;2a|1bcd +4a=[33a]<<<<<<<<<<<<<<<<<<
; if Carry_1=0, {R}={B}

SUBI  r18,  low(reg_Kr)     ;1bcd
SBCI  r19, high(reg_Kr)     ;1bcd
; {D2} = r19:r18 = {D1} - {Kr} = r19:r18 - {Kr}

BREQ  PC+3                  ;2b|1cd, +7b=[36b] <<<<<<<<<<<<<<<<<<
; if Zero_2=1, {R}={B}

BRCC  PC+2                  ;2c|1d, +8c=[37c] <<<<<<<<<<<<<<<<<<
; if Carry_2=0, {R}={B}

SUBI  r20,  low(-1)         ;1d
SBCI  r21, high(-1)         ;1d, +9d=[38d]
; {R}={B}+1

RET                         ;4
; {R} = r21:r20 = {B} or {B}+1 [ {A}/{N} rounded ]
```

This helps avoiding the duplication of this label (DIV_ret) in case more than one code (more than one division by reg_N) is needed in a program.

This is a nice code you have created!

I edited #11 to reflect using reg_N (instead of using 9)

With all the edits--maybe time to rerun through your automated numeric checker?

You will be interested in reading here some interesting articles about division and modulus:

http://homepage.cs.uiowa.edu/~dw...

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

`{A} dividend in r15:r14 [0 to 65535]`

It can be a problem /I have not investigated further/.

Each decent divide-program does have a checker to avoid involving of 0 in division.

Each decent divide-program does have a checker to avoid involving of 0 in division.

Prob not good here since the divisor is compiled (you are likely the checker).  but the comments should match his statement

I ended up having one common code for all constant divisors [1 to 255].

You could toss in some macro stuff to throw a build exception if zero is attempted for the build.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

You could toss in some macro stuff to throw a build exception if zero is attempted for the build

It is dividend I like to discuss, I did not checked further. Are you sure r15:r14 can be 0 on entry, please.

It is dividend I like to discuss, I did not checked further. Are you sure r15:r14 can be 0 on entry, please.

Oh...that's odd request... Well, there is nothing illegal about dividing 0 by some nonzero, usually you want to prevent division BY 0 & that is what is usually discussed.

We'd have to look whether this program has any hiccups when the numerator is zero, but it should calculate fine like any other value. May be a corner case?

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Not odd indeed.

Zero divided by some number is zero.

Any number divided by zero is infinite.

Zero divided by zero is any number.

In an AVR math, division needs to be protected from such results.

Last Edited: Thu. Jan 13, 2022 - 07:17 AM

avrcandies wrote:

This is a nice code you have created!

I edited #11 to reflect using reg_N (instead of using 9)

With all the edits--maybe time to rerun through your automated numeric checker?

You will be interested in reading here some interesting articles about division and modulus:

http://homepage.cs.uiowa.edu/~dw...

To deduce the 3 conditional branches, I used an Excel sheet.

If you like I will clean it, as possible, and attach it here. But since its actual file size is rather big (about 13MB) I will also remove almost all duplicated rows, say above the one of reg_N=10. Then, on your side, you will copy the last row (reg_N=10, here) and pasted it down to reg_N=65535. (Hint: for moving down fast on a sheet, I usually chose the 25% scale view on the menu, at the right bottom of the sheet, so that a page includes about 4 times the rows number of the normal view of 100%).

I also wrote a checker program (written in assembly as it is the case of all my programs) and did the check by using the simulator (assuming the device is ATmega32A).

In my products, I use ATmega8L(or 8A) because it is available locally and costs the less. But for the checker program, the table's size of the calculated rounded results for a specific reg_N is 8192 bytes. So I wrote it for ATmega32A instead (I used hearing of it since it is also available locally and I use it when many pins are needed). If you like I will also clean it and attach it here.

Last Edited: Thu. Jan 13, 2022 - 10:06 AM

You can take a look on  https://avr-asm.tripod.com/div16xx.html for N<24.

grohote wrote:
Each decent divide-program does have a checker to avoid involving of 0 in division.

avrcandies wrote:
We'd have to look whether this program has any hiccups when the numerator is zero, but it should calculate fine like any other value. May be a corner case?

Although I made a mistake, at the beginning, about the valid range of reg_N which is 2 to 255  and not 1 to 255, all numbers of the dividend (0 to 65535) were checked (including zero).

By the way, although this code works for a divisor from 2 to 255, I use it only when, for a specific reg_N, a better alternative is not available (to me).

For example, I surely don't use it for reg_N=2^x (x integer, 1 to 7).

grohote wrote:
You can take a look on  https://avr-asm.tripod.com/div16xx.html for N<24.

Not odd indeed.

Zero divided by some number is zero.

Any number divided by zero is infinite.

Zero divided by zero is any number.

Division by zero (the last 2 cases ) is undefined and should be prevented (or result in an error).  So that is the normal concern.

The fixed denominator is compiled by the user---so they will not divide by zero (or set the build macro to flag such a request & halt the compile).

The top case is handled fine by the program (you will get zero as the result),as expected....so that is not usually a concern at all.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

KerimF wrote:
‘DIV_ret’ and using 'PC+x' instead of it in the conditional branch instructions
How on earth his that an improvement ?? PC relative addressing is the work of Beelzebub !

How on earth his that an improvement ?? PC relative addressing is the work of Beelzebub !

It's room for lots of whoops!   Maybe the upside is that it gets rid of all the labels, so one can cut'n'paste multiple copies (with different divisors) without generating duplicate label errors.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

avrcandies wrote:
so one can cut'n'paste multiple copies (with different divisors) without generating duplicate label errors.
Doesn't the assembler support "local labels" then (perhaps inside macros?)

@avrcandies, thanks for great ASM work in this thread.

One question: I do not know the difference between .set and .equ- is it any, please.

clawson wrote:

KerimF wrote:

‘DIV_ret’ and using 'PC+x' instead of it in the conditional branch instructions

How on earth his that an improvement ?? PC relative addressing is the work of Beelzebub !

Perhaps it is an improvement, just for me, in the way this common code could be used after replacing DIV_ret with PC+x.

For example, if my program needs 4 different division subroutines, say divisions by reg_N=10, 60, 90 and 100, I would add:

```.set reg_N = 10
.set reg_Kd=(2*256*256/reg_N+1)/2
.set reg_Kr=(1-reg_N)/2

D16_010:
[The code]

.set reg_N = 60
.set reg_Kd=(2*256*256/reg_N+1)/2
.set reg_Kr=(1-reg_N)/2

D16_060:
[The code]

.set reg_N = 90
.set reg_Kd=(2*256*256/reg_N+1)/2
.set reg_Kr=(1-reg_N)/2

D16_090:
[The code]

.set reg_N = 100
.set reg_Kd=(2*256*256/reg_N+1)/2
.set reg_Kr=(1-reg_N)/2

D16_100:
[The code]

; I have here 4 different subroutines by changing the main label only of each to reflect its reg_N value.
; Without using PC+x, I would need to also change the name of 'DIV_ret' in each of them (but one) to avoid duplication.
; I mean; your remark is very true, if the program needs just one instance of the code with 'DIV_ret'.
```

grohote wrote:
One question: I do not know the difference between .set and .equ- is it any, please.

clawson wrote:

Doesn't the assembler support "local labels" then (perhaps inside macros?)

You may be right here.

I used writing my assembly codes by using the minimum possible of assembler directives.

Did you rerun the checking on the new code---in case there are some troubles created by all the changes?

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

avrcandies wrote:

Did you rerun the checking on the new code---in case there are some troubles created by all the changes?

No, I didn't.

I will do it tomorrow by using the simulator; at least for one value of reg_N.

I believe if it works for one value it will work for all others.

avrcandies wrote:
Did you rerun the checking on the new code---in case there are some troubles created by all the changes?

I re-wrote my checker program to run the code properly with the altered registers of your version.

The following is a copy of the tested code that worked:

```;==============================================================
; *R* Division of 16-bit register {A} by 8-bit constant {N} ***
;==============================================================

; {R}={A}/{N} = r15:r14 / {N} = r15:r14 * {Kd} /256 /256 then rounding

; {A} dividend in r15:r14 [0 to 65535]
; {N} constant divisor [2 to 255]
; {R} result in r25:r24 , {A}/{N} rounded
; {Kd} = round(256*256/{N},0), the division constant
; {Kr} = INT[ (1-{N})/2 ]    , the rounding constant
; used registers: r19,r18,r17,r16,r1,r0
; 29 words , excluding RET and RCALL
; 37, 40, 41, or 42 cycles = 30c|33c|34c|35c code + 4c RET + 3c RCALL

.set reg_N = 100                  ; {N}
.set reg_Kd=(2*256*256/reg_N+1)/2 ; {Kd}
.set reg_Kr=(1-reg_N)/2           ; {Kr}

D16_nnn:
LDI   r16 , low(reg_Kd)     ;1abcd
LDI   r17, high(reg_Kd)     ;1abcd, 2abcd
; r17:r16 = reg_Kd

; multiplicand in r15:r14 = {A}
; multiplier   in r17:r16 = {Kd}
; mul. result  in r25:r24:r19:r18
; valid result in r25:r24
MUL   r15, r17              ;2abcd
MOVW  r25:r24, r1:r0        ;1abcd
MUL   r14, r16              ;2abcd
MOVW  r19:r18, r1:r0        ;1abcd
MUL   r15, r16              ;2abcd
CLR   r16                   ;1abcd
MUL   r17, r14              ;2abcd
ADC   r25, r16              ;1abcd +17abcd= 19abcd
; {B} = r25:r24 = r15:r14 * r17:r16 /256/256 = {A}*{Kd}/256/256
; {R} = {B} or {B}+1

; for rounding
LDI   r16, reg_N            ;1abcd
; r16 = {N}

MUL   r24, r16              ;2abcd
MOVW  r19:r18, r1:r0        ;1abcd
MUL   r25, r16              ;2abcd
ADD   r19, r0               ;1abcd, +7abcd= 26abcd
; {C} = r19:r18 = {B}*{N} = r25:r24 * r16

; the following conditions were deduced empirically
; if( Carry_1=0, {R}={B}, if( Zero_2=1 OR Carry_2=0, {R}={B}, {R}={B}+1 ) )

SUB   r18, r14              ;1abcd
SBC   r19, r15              ;1abcd
; {D1} = r19:r18 = {C} - {A} = r19:r18 - r15:r14

BRCC  PC+6                  ;2a|1bcd +4a=[30a]
; if Carry_1=0, {R}={B}

SUBI  r18,  low(reg_Kr)     ;1bcd
SBCI  r19, high(reg_Kr)     ;1bcd
; {D2} = r19:r18 = {D1} - {Kr} = r19:r18 - {Kr}

BREQ  PC+3                  ;2b|1cd, +7b=[33b]
; if Zero_2=1, {R}={B}

BRCC  PC+2                  ;2c|1d, +8c=[34c]
; if Carry_2=0, {R}={B}

; {R}={B}+1

RET                         ;4
; {R} = r25:r24 = {B} or {B}+1 [ {A}/{N} rounded ]
```

Last Edited: Fri. Jan 14, 2022 - 09:46 AM

BRCC  PC+6                  ;2a|1bcd +4a=[30a]
BREQ  PC+3                  ;2b|1cd, +7b=[33b]
BRCC  PC+2                  ;2c|1d, +8c=[34c]

You have no reason to keep this PC stuff instead of labels, as #28 said- it is very bad.

It is not a serious code, the code quality is discredited.

grohote wrote:

BRCC  PC+6                  ;2a|1bcd +4a=[30a]
BREQ  PC+3                  ;2b|1cd, +7b=[33b]
BRCC  PC+2                  ;2c|1d, +8c=[34c]

You have no reason to keep this PC stuff instead of labels, as #28 said- it is very bad.

It is not a serious code, the code quality is discredited.

First, do you mean you also disliked the reason for which I keep it, as it is explained on posts #29 and #32?

Second, whoever discovers that the presented code is not serious. Doesn't this mean that he also knows how to make it serious?

Third, if someone has worries about the newbies, I hope he can be sure that they are not less intelligent than he or me. They read all posts here, besides many related ones elsewhere, before they decide what is good, or not, for them (not just about programming).

Finally, if using 'PC+x' makes a code discredited, should we see, from now on, all those who made their assembler recognize this expression (PC+x) are discredited too since, as I heard from you, no one should use it in the first place?

If I couldn't get you well, I apologize in advance.

(Anyway, the way of thinking in writing MCU programs is not always the same while writing in assembly and C.)

Last Edited: Fri. Jan 14, 2022 - 09:50 PM

KerimF wrote:
writing in assembly

You touched the core of the problem- the code with PC stuff is not an assembly.

O, yes, such code is produced in assembly process either from C- or real ASM, but you can not call it Assembly, for it is not.

We discussed it last summer on this Forum, use of PC does not belongs to ASM language - although some people does tolerate it.

Please, do believe our main expert, he warned you well.

grohote wrote:
You touched the core of the problem- the code with PC stuff is not an assembly.

Could you please be clearer on this point? It is what then?!

grohote wrote:
O, yes, such code is produced in assembly process either from C- or real ASM, but you can not call it Assembly, for it is not.

In my knowledge, one usually writes a program (firmware) for an MCU by using a C compiler or an assembler.

For instance, I wrote my first program for Z80, about 40 years ago, in machine language (not C or ASM) because I didn't have a computer (PC or microcomputer). I wrote it for a moving text panel which I designed and built with 224 220Vac bulbs (32*7, each driven by a triac)... I didn't have enough money to make it longer . It took me about two months to let my code work (now, it would take me just a couple of hours) in order to fix one bug after another discovered on running it on the controller board. For it, I had to also design and build a rather bulky programmer for EPROMs from which the CPU Z80 can read its instructions. So I had to input the code manually, byte by byte (actually bit by bit, using 8 switches), into a bank of dynamic RAMs then save their contents into an EPROM IC without interruption. Fortunately, I was able to buy a microcomputer a year later then a PC. Since then, I wrote all my programs for Z80, AT89Cxx family and now AVR family in assembly (although I used writing DOS programs, in C language, to run on a PC).

So I am not sure of what you mean about the difference between using an assembler to build a code and writing a code in assembly language.

grohote wrote:
We discussed it last summer on this Forum, use of PC does not belongs to ASM language - although some people does tolerate it.

I am not tolerating it. I just use it because it is useful to me in this special code. And, if necessary, it can be fixed (replaced) rather easily by reading the comments in this thread.

grohote wrote:
Please, do believe our main expert, he warned you well.

With all my respect to every human I met or will meet, I believe always in sharing knowledge not imposing it (in science or else). So when I was, as a student, at school then university, I, unlike some others, never used opposing seriously my teachers even if they were wrong clearly (after all, no one is perfect, including I).

The way I live may be seen wrong in view of some people, but this is how I am.

Let we put it this way:

- keep PC and you can keep the program for yourself

- change PC with labels, and the program may be public (and seems to be a good one)

We are here to help you. I also finished similar school, university, Z80, worked 30 years as software designer.

To me, ASM is the best AVR language, and I am glad to discuss some aspects of.

Last Edited: Sat. Jan 15, 2022 - 06:48 AM

grohote wrote:

Let we put it this way:

- keep PC and you can keep the program for yourself

- change PC with labels, and the program may be public (and seems to be a good one)

We are here to help you. I also finished similar school, university, Z80, worked 30 years as software designer.

To me, ASM is the best AVR language, and I am glad to discuss some aspects of.

Sorry but I wonder really what prevents you (or anyone else) re-posting the code on which the RET label exists.

For example, 'avrcandies' didn't argue with me about what he sees better for the code. He just re-posted it with the changes he suggested. Did I refuse his work? Of course not, it is a public forum.

Although his version is better than mine, in term of speed and words, I still use mine for personal reasons.

After all, since I have had to work as an independent engineer since always (since after my graduation) and my works (in programming and electronics) couldn't be recognized, therefore, by any Elite Group in the world, I had no reason but to be real free at work in deciding whatever is good, or not for me in the least (I run, since then, a very small private business).

I am a realist rational man so I discovered since many decades ago that, in every scientific field, there are always Elite groups and followers in order for the world to progress. But joining an Elite group has its price which I can't afford and being a follower, directly or indirectly, was, always, out of question for me. So you are right, and I am sorry for living in a different world from yours.

Last Edited: Sat. Jan 15, 2022 - 11:42 AM

KerimF wrote:

Sorry but I wonder really what prevents you (or anyone else) re-posting the code on which the RET label exists.

For example, 'avrcandies' didn't argue with me about what he sees better for the code. He just re-posted it with the changes he suggested. Did I refuse his work? Of course not, it is a public forum.

No Good deed goes unpunished...

I'd rather have the code however you decide to post it... even if it's in "C".

Thanks for the code!!

Brawndo's got what plants crave... It's got electrolytes!

There is the code. My last remark: seems that rounding up starts from 127 (and should be 128).

.set reg_N = 100                  ; {N}
.set reg_Kd=(2*256*256/reg_N+1)/2 ; {Kd}
.set reg_Kr=(1-reg_N)/2           ; {Kr}
ldi   r16, low(reg_Kd)
ldi   r17, high(reg_Kd)
ldi   r20, reg_N
ldi   r30, low(reg_Kr)
ldi   r31, high(reg_Kr)
rcall  D16_nnn
; result is in r25:r24

;==============================================================
; *R* Division of 16-bit register {A} by 8-bit constant {N} ***
;==============================================================

; {R}={A}/{N} = r15:r14 / {N} = r15:r14 * {Kd} /256 /256 then rounding

; {A} dividend in r15:r14 [0 to 65535]
; {N} constant divisor [2 to 255]
; {R} result in r25:r24 , {A}/{N} rounded
; {Kd} = round(256*256/{N},0), the division constant
; {Kr} = INT[ (1-{N})/2 ]    , the rounding constant
; used registers: r19,r18,r17,r16,r1,r0
; 29 words , excluding RET and RCALL
; 37, 40, 41, or 42 cycles = 30c|33c|34c|35c code + 4c RET + 3c RCALL

D16_nnn:
; reg_N for rounding in r20
; reg_Kr       in r31:r30
; multiplicand in r15:r14 = {A}
; multiplier   in r17:r16 = {Kd}
; mul. result  in r25:r24:r19:r18
; valid result in r25:r24
MUL   r15, r17
MOVW  r25:r24, r1:r0
MUL   r14, r16
MOVW  r19:r18, r1:r0
MUL   r15, r16
CLR   r16
MUL   r17, r14
; {B} = r25:r24 = r15:r14 * r17:r16 /256/256 = {A}*{Kd}/256/256
; {R} = {B} or {B}+1

MUL   r24, r20
MOVW  r19:r18, r1:r0
MUL   r25, r20
; {C} = r19:r18 = {B}*{N} = r25:r24 * r16

; the following conditions were deduced empirically
; if( Carry_1=0, {R}={B}, if( Zero_2=1 OR Carry_2=0, {R}={B}, {R}={B}+1 ) )

SUB   r18, r14
SBC   r19, r15
; {D1} = r19:r18 = {C} - {A} = r19:r18 - r15:r14

BRCC  jRet
; if Carry_1=0, {R}={B}

SUB   r18,  r30 ;low(reg_Kr)
SBC   r19,  r31 ;high(reg_Kr)
; {D2} = r19:r18 = {D1} - {Kr} = r19:r18 - {Kr}

BREQ  jRet
; if Zero_2=1, {R}={B}

BRCC  jRet ;PC+2
; if Carry_2=0, {R}={B}

; {R}={B}+1
jRet:
RET
; {R} = r25:r24 = {B} or {B}+1 [ {A}/{N} rounded ]

Last Edited: Mon. Jan 17, 2022 - 07:28 PM

grohote wrote:
There is the code.

Your idea is clear now. Thank you.

grohote wrote:
My last remark: seems that rounding up starts from 127 (and should be 128).

?

Edited:

On my side, I checked the code results with their rounded counterparts by using excel; 0.5 --> +1 which is equivalent to 128 --> +256.

This is why I added the remark on the code list that the 3 conditional branches were deduced empirically though I believe they can also be deduced by using pure logic.

Last Edited: Mon. Jan 17, 2022 - 08:37 AM

Regarding rounding, it may be a weak point.

Try this, please:    65499 : 100 =  654.99 (after rounding 655=h'28f)

The code produces h'ffdb : h'64 = h'28e

Edit: see #49

Last Edited: Mon. Jan 17, 2022 - 07:29 PM

grohote wrote:

Regarding rounding, it may be a weak point.

Try this, please:    65499 : 100 =  654.99 (after rounding 655=h'28f)

The code produces h'ffdb : h'64 = h'28e

I will check it.

Edited:

Which code from above are you referring to? Mine gives 655!

Last Edited: Mon. Jan 17, 2022 - 04:25 PM

You are right! Rounding problem occurred on testing method only.

To resume, I checked my last code #45 on selected numbers N= 2, 3, 16, 100, 244, 254, 255

for all dividends. We did it, or better: You did it!

With regards

Milan

Hi Kerim,

First of all, I think that your approach to a fast integer division using inverse multiplication is very sophisticated (although this idea is basically not new).

However, due to the .set method the program memory requirement for many different divisors might be correspondingly large.

Instead of using .set and a separate code for all different divisors, you could calculate the wanted constants reg_Kd and reg_Kr for a given number of divisors (if not too many) once at startup by software and store them in SRAM, e. g.

Then the actual division could be carried out with one and the same inverse multiplication subroutine as quickly as possible.

On the other hand, there are fast integer divisions that, despite using loops, are only about 5 times slower than using your method.

Here is a tiny version (the loop part is not my invention) that I like quite well, written using my structured AVR assembly language (s'AVR):

```; Divide 16 bits by 8 bits with optional rounding
; updated 21-JAN-2022/EH

; Register definitions, lower AVR registers (could be upper AVR registers as well):

.def     Zero       = r12       ; register content is zero during the subroutine
.def     Divisor    = r13       ; Divisor (unmodified)
.def     RemainderL = r14       ; true 8-bit remainder before rounding
.def     RemainderH = r15       ; used for comparing and rounding

; upper AVR register (advantageous, otherwise more registers and instructions required):

.def     DividendL  = r24       ; DividendL at the beginning, QuotientL at the end
.def     DividendH  = r25       ; DividendH at the beginning, QuotientH at the end
.def     Count      = r18       ; loop counter

DIV16_8:                        ; divide 16 bits by 8 bits
clr Zero
clr RemainderL
clr RemainderH

FOR Count := #16
lsl DividendL           ; Dividend << 1, clear LSB
rol DividendH
rol Divisor             ; shift MSB of dividend to calculate the quotient
cp  RemainderL, Divisor ; Dividend (in Remainder) ? Divisor, 9 bits need to be compared
cpc RemainderH, Zero
CONTIF C                ; Divisor > Dividend (in Remainder), step to next bit location
sub RemainderL, Divisor ; Dividend (in Remainder) - Divisor
sbc RemainderH, Zero
inc DividendL           ; set LSB
ENDF                        ; Quotient now in DividendH:DividendL, 8-bit remainder up to 254 is separate

; optionally round up if Remainder/Divisor >= 0.5 or 2*Remainder-Divisor >= 0

clr     RemainderH          ; 9 bits need to be compared as 2*Remainder can be up to 508
lsl     RemainderL          ; 2*Remainder
rol     RemainderH
cp      RemainderL, Divisor ; compare with Divisor
cpc     RemainderH, Zero
IF  NC                      ; round up the result
subi DividendL, low(-1) ; +1
sbc  DividendH, high(-1)
ENDI
; clr     RemainderL        ; clear Remainder (if wanted)
; clr     RemainderH

ret
```

As you can see, besides the label for the subroutine no other labels are required when using s'AVR (in general).

If I understood your original code (using DIV_ret or jRet) correctly, the last few lines would read as follows (not using any DIV_ret or jRet):

```    ...
IF C
subi  r18, low(reg_Kr)
sbci  r19, high(reg_Kr)
EXITIF Z
EXITIF NC
ENDI
RET```

Using the s'AVR language, structures can be nested to any level (just limited by the program memory).
Directive CONTIF is a conditional CONTINUE.

I have used above s'AVR code to confirm that all 2^16 dividends are correctly divided by divisors 3 through 255 when comparing against your D16_nnn code.

The constants reg_Kd and reg_Kr are dynamically calculated by the code.

Edit: Cleaning up my test files I realized that the test originally went only from 3 through 127.
In the meantime I have improved my code and now the complete test for divisors from 3 through 255 takes 8 minutes on ATmega328P @16 MHz.

Other standard integer dividing methods might be about ten times slower than your approach, I guess.

Obviously, speed is an important requirement in your application.

Last Edited: Sat. Jan 22, 2022 - 05:11 PM

you could calculate the wanted constants reg_Kd and reg_Kr for a given number of divisors (if not too many) once at startup

I'd calculate them off line and keep them in an indexed table, that would avoid duplicating the code ..the premise here is you know ahead of time what you will be dividing by, so it can be done fast & efficient.  Otherwise, yes, go for a full-blown arbitrary division routine.

If you have 512 bytes of storage free, you can just store all possible Kd; Kr is trivial to find once you have Kd

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

@S'AVR and @avrcandies

On my side, I used finding a better solution, in term of memory space (as it is the case here), when the ones I have cannot be applied in a new program for a certain MCU.

For example, if I have a relatively large number of free bytes, I simply choose to copy and paste the code as many times as the number of reg_N in use.

This is why I personally welcomed the various good suggestions on how this code could be used because each of them could be useful, if not needed, in certain cases.