Search |
 |
|
 |
| Author |
Message |
|
|
Posted: Mar 31, 2006 - 04:30 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN's DIRTY MATH TRICKS
PURPOSE: On Going Tutorial on Math Tricks in AVR Native Machine Language for Beginners.
PREAMBLE: While trying to help someone in another FORUM with a Division-by-60 routine to converting seconds to minutes, someone made the remark:
Quote:
What RetroDan has is a quick and dirty approximation!
I think it was meant as an insult, but it gave me a great name for the type of binary routines I love: Dirty Math!
Quick & Dirty Routines that are small and super fast.
CAVEAT LECTOR: If you don't like taking chances and experimenting and don't really care about how big your programs are, or how fast they excecute then this is not a place for you. May I recommend the Atmel Appnotes #200, 201 and 204 that are chock full great standard math routines for the AVR microcontrollers. |
Last edited by RetroDan on Mar 31, 2006 - 10:40 AM; edited 4 times in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 04:38 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN's MATH TIPS
INTRODUCTION:
Here are a few quick tips before we start:
Code:
DIRTY DAN'S TIP #1: ALWAYS USE A POWER OF TWO
If you are writing any program or routine, even if the savings are not obvious at first, pick numbers that are powers-of-two. Somewhere down the line you'll probably be glad you did. Make it a habit and your microcontroller will thank you too.
For example: if you need to take an average, don't take 10 readings and divide by 10, take 8 or 16. If you're setting up a table, don't make entries 10 bytes long, make them 8 or 16. It will greatly simplify things later. There's nothing magical about powers-of-ten to a microprocessor. They're set up for powers-of-two and the number of fingers we have means absolutely nothing to them.
Code:
DIRTY DAN'S TIP #2: AVOID DIVISION AT ALL COSTS!
While Addition, Subtraction and Multiplication can be done in just a few clock cycles, most division involves looping and uses a lot of processor time. Division is normally done by repeated subtractions, the "standard" 16x16 division routine from the Atmel Appnotes averages over 250 clocks cycles compare that to just 9 for a 16x16 multiply and just 2 for a 16x16 Addition.
One problem with high-level languages is that this dirty little fact is kept hidden from the programer. So while you're happily writing code that does Bilinear Interpolation or Cubic Splines, your processor is swearing at you under his breath everytime you ask for a division. You're wondering why things are grinding to a halt with smoke drifting up from the PCB.
Code:
DIRTY DAN'S TIP #3: IF YOU MUST DIVIDE, DO IT BY POWERS OF TWO!
Division by a power of two can be easily and quick accoplished with just a few right-shifts. Each time you shift your bits to the right you are essentially dividing your number in half. So to divide by four you just shift-right twice.
Code:
DIRTY DAN'S TIP #4: IF YOU MUST DIVIDE, DO IT ONLY ONCE!
If your equation has multiple divisions involved, try and re-arrange your equation so you only divide once.
For example: if you need to divide by 10 and later divide by 12, perhaps you can re-arrange things and just divide once by 120 and be done with it. |
Last edited by RetroDan on Apr 13, 2006 - 09:11 AM; edited 1 time in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 05:50 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN
PREAMBLE:
I've been working on my own driver routines for the LCD Display on the Butterfly (same as STK502) that are super small and fast. When it came time to write a little routine that displays a number in decimal, rather than Hex, I ran into my old nemesis DIVISION!
To be more precises, Division-by-Ten. Since we humans are so fond of Division-by-Ten, I thought it would make an excellent starting point for our examination of my Dirty Math Tricks.
For this routine, all we want is to take a single byte value and display it on the LCD Display in Decimal Number Form. So we need to take a number with a maximum value of $FF and break it into three: 2 / 5 / 5.
The way most of us humans would do this is to first divide by it by 100 and then by 10 and then keep the remainder;
Code:
255 / 100 = 2 (remainder of 55)
55 / 10 = 5
remainder = 5
Oh no, that means we need to have two division routines! One for division by 100 and another by ten. Mybe we should just use a general division routine and change the divisor from 100 to 10 rather than have two routines. So I go and check the Atmel Appnote 200 looking for a general 8x8 unsigned division routine and here is what I found:
Code:
;***************************************************************************
;* "div8u" - 8/8 Bit Unsigned Division
;*
;* This subroutine divides the two register variables "dd8u" (dividend) and
;* "dv8u" (divisor). The result is placed in "dres8u" and the remainder in
;* "drem8u".
;*
;* Number of words :14
;* Number of cycles :97
;* Low registers used :1 (drem8u)
;* High registers used :3 (dres8u/dd8u,dv8u,dcnt8u)
;***************************************************************************
.def drem8u =r15 ;remainder
.def dres8u =r16 ;result
.def dd8u =r16 ;dividend
.def dv8u =r17 ;divisor
.def dcnt8u =r18 ;loop counter
div8u: sub drem8u,drem8u ;clear remainder and carry
ldi dcnt8u,9 ;init loop counter
d8u_1: rol dd8u ;shift left dividend
dec dcnt8u ;decrement counter
brne d8u_2 ;if done
ret ; return
d8u_2: rol drem8u ;shift dividend into remainder
sub drem8u,dv8u ;remainder = remainder - divisor
brcc d8u_3 ;if result negative
add drem8u,dv8u ; restore remainder
clc ; clear carry to be shifted into result
rjmp d8u_1 ;else
d8u_3: sec ; set carry to be shifted into result
rjmp d8u_1
Only 14 program steps is not bad. Almost 100 clock cycles per call and we need to call it twice fort a single byte conversion. That's about 200 clock cycles each. I'm sure we can do better than that... |
Last edited by RetroDan on Apr 13, 2006 - 09:13 AM; edited 3 times in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 06:24 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
Previously we divided our 1 byte number by 100 then by 10 and when it comes time to expand our routine to handle 16 bit numbers we'll need to divide by 10,000, 1,000, 100 and finally by 10. At approx. 100 cycles per call that's 400 clock cycles per number.
Maybe we should look at the problem again from the "other side" the right-hand-side instead of left.
Code:
255 / 10 = 25 Remainder = 5
25 / 10 = 2 Remainder = 5
2 / 10 = 0 Remainder = 2
Well we've managed to extract the 255 again, but now we need Three calls to the division routine and for a 16 byte number we'll need Five. The only thing we have going for us using this method is that if we can find a "special" division routine for Division-by-Ten that is smaller and/or faster than the Atmel routine, we might have something we can work with... |
Last edited by RetroDan on Mar 31, 2006 - 06:50 AM; edited 2 times in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 06:46 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
I remember someone posting a file of Division-by-a-Constant Routines, so why don't we see if we can get luck there.
Here is what I found, exactly what we're looking for, a specialized routine for Division-by-Ten:
Code:
;***************************************************************************
;* Function "Div16_10"
;* Divides an unsigned 16 bit word (XH:XL) by 10
;* Returns quotient in YH:YL and remainder in XL
;*
;* Author: Andreas Lenze (andreas.lenze@t-online.de)
;* Equations by D: W. Jones:
;*
;* Reciprocal mul w. extra precision:
;* unsigned int A;
;* unsigned int scaled_reciprocal = 0xCCCD;
;* unsigned int Q; /* the quotient */
;*
;* Q = ((A * 0xCCCD) >> 19)
;*
;* Uses: high regs: 7 (r17, r18, r19, X, Y)
;* low regs: 3 (r0, r1, r2)
;*
;* words: 36 (w. push/pop = 8 words)
;* cycles: 46 (w. push/pop = 16 cycles)
;*
;* Note: Hardware multiplier required ("mul" instruction)
;*
;***************************************************************************
Div16_10:
push r2
push r19
push r18
push r17
ldi YH,0xCC ; scaled reciprocal for /10
ldi YL,0xCD
; Q = A * 0xCCCD
; (r19:r18:r17[:rXX] = XH:XL * YH:YL)
clr r2
mul XH, YH ; ah * bh
movw r19:r18, r1:r0
mul XL, YL ; al * bl
mov r17,r1 ; r0 to [rXX] is superfluous
mul XH, YL ; ah * bl
add r17, r0
adc r18, r1
adc r19, r2
mul YH, XL ; bh * al
add r17, r0
adc r18, r1
adc r19, r2
; Q = Q >> 16: use r19:r18 as word
; Q = Q >> 3
lsr r19 ; do the last 3 shifts
ror r18
lsr r19
ror r18
lsr r19
ror r18
; r19:r18 now "Q" (= result >> 19)
; R = A - 10*Q;
ldi r17,10 ; multiply r19:r18 by 10
mul r18, r17 ; al * bl
sub XL,r0
clr XH
movw YL,r18 ; make copy of Q
; XL holds "R"
; YH:YL holds "Q"
pop r17
pop r18
pop r19
pop r2
ret
Wow, I looks huge! Let's see 46 program steps, that's a little on the heavy side, and 10 registers used ouch!. However, it is a 16 bit Divide-by-Ten Routine and it excecutes in about 1/2 the time that the Atmel 8 bit routine did, so we're moving in the right direction. Two or Three calls to this would be 100 to 150 cycles versus the 200 to 300 using the Atmel Routine.
If we can't come up with something on our own, perhaps we could strip this down to an 8 bit routine, or perhaps just bite the bullet because sooner-or-later we're gonna' need to convert 16 bit values anyway... |
Last edited by RetroDan on Apr 13, 2006 - 09:18 AM; edited 2 times in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 07:04 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
A while back, while literally sitting-on-the-throne, I happened to realized that 255 is evenly divisable by 5 and that it is only one away from 256 which is a Power-of-Two. After a while, it becomes a habit to always be on-the-lookout for Powers-of-Two. Either that or I'm slowly drifting into the world of padded-rooms and nice doctors in white smocks. Of all the things I've lost, I miss my mind the most.
To divide by Ten we might be able to first Divide-by-Five then Right-Shift to Divide-by-Two and the result would be a Division-by-Ten. I stored that fact away and wondered if the 255-256 link could be put to use sometime.
Pehaps today is a good day to drag it out of the bottom drawer and dust it off...
Let's see: 255 divided by 5 is 51 and 255 is awlful close to 256 and furthermore division by 256 is super-super easy and requires ZERO program steps. The "Dirty Trick" to that is to simply ignore the least significant Byte and we've essentially Divided our number by 256.
Since 255 is almost equal to 256, then 255/256 is almost equal to 1
Since multiplication by one does not change a value
Then 1/5 times 255/256 shoud be pretty close to 1/5
Since division by 256 can be accomplished with the Dirty Math Trick of ignoring the lowest byte, we'll leave that to last and our equation becomes:
1/5 time 255 and later we'll truncate the lower byte.
Code:
1/5 x 255 = 255 / 5 = 51
So mathematically speaking if we multiply our number by 51 and later
ignore the lower byte, we should get a pretty good approximation of
Division-by-Five. That's where our first routine will start... |
|
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 07:59 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
Let's start by writing a routine that muliplies by 51 and we'll look at the high byte and see if our mathematical assumptions were correct.
Here is the code:
Code:
DIV5: LDI B,51
MUL A,B ;ANSWER LEFT IN R1
RET
Great it's only 3 lines long!!! Let's test it's accuracy.
We're interested in a single byte so values can go from zero to 255
so lets check a few numbers in that range and see how it performs:
Code:
10 => 1 (off by -1)
50 => 9 (off by -1)
100 => 19 (off by -1)
150 => 29 (off by -1)
200 => 39 (off by -1)
250 => 49 (off by -1)
Hmm, we're alway off by minus one, so if we add one, we should be right on the money!
So the new Division-by-Five Routine becomes:
Code:
DIV5: LDI B,51
MUL A,B
INC R1 ; ANSWER IN R1
RET
So we're half-way-home, all we need to do now is divide our answer by two and we should have a "Quick and Dirty" Divide-by-Ten Routine... |
|
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 08:00 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
There's two ways we can use our Division-by-Five Routine for a Division-by-Ten Routine.
We can modify the DIV5 routine. Earlier I stated that a "Quick and Dirty" way to Divide-by-Two is to just Shift your bits to the right. This is how we modify our DIV5 Routine to get a DIV10:
Code:
DIV10: LDI B,51
MUL A,B
INC R1 ;R1=A/5
LSR A ;R1=(A/5)/2 = A/10
RET
The other way we could have accomplished the same thing would be to have the DIV10 routine call the DIV5 routine and that way we "kill-two-birds-with-one-stone" because we get two usable routines from it:
Code:
DIV10: RCALL DIV5 ;R1=A / 5
LSR R1 ;R1=(A/5) / 2 = A/10
RET
DIV5: LDI B,51
MUL A,B
INC R1 ;ANSWER IN R1
RET
Just to be on-the-safe-side let's double check the accuracy of these routines:
Code:
10 => 1 Correct!
50 => 5 Correct!
100 => 10 Correct!
150 => 15 Correct!
200 => 20 Correct!
250 => 25 Correct!
So we did it. We have our first "Dirty Math" Routine that is 100% accurate over the range we need.
Stay tuned however, there's still more to this story... |
Last edited by RetroDan on Mar 31, 2006 - 08:26 AM; edited 2 times in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 08:18 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
PRE-SUMMARY:
Not bad for a days work eh? We took a rather large routine that we needed and reduced it to a "Quick and Dirty" little routine using a few "Dirty Math Tricks".
Let's compare and see how well we did.
First let's compare to the Atmel 8x8 Unsigned Division Routine:
Code:
DIV10a DIV10b ATMEL
REGISTERS: 2 2 4
DIFFERENCE: -2 -2 -
PERCENT: -50% -50% -
PROGRAM WORDS: 5 7 15(incl RET)
DIFFERENCE: -10 -8 -
PERCENT: -66% -53% -
SPEED/CYCLES: 5 8 97
DIFFERENCE: -92 -89
PERCENT: -95% -92%
So we've used 1/2 the register space, about 60% less program space and we're running at about 20 times the speed that the Atmel Routine does.
Let compare to the Lenze/Jones Routine:
Code:
DIV10a DIV10b LENZE
REGISTERS: 2 2 10
DIFFERENCE: -8 -8 -
PERCENT: -80% -80% -
PROGRAM WORDS: 5 7 36
DIFFERENCE: -31 -29 -
PERCENT: -86% -80% -
SPEED/CYCLES: 5 8 46
DIFFERENCE: -41 -38 -
PERCENT: -89% -83% -
Against the Lenze/Jones routine we're using 80% less Register Space, about 85% less Program Space and w'e're running at between 6 to 10 times the speed.
Many might think that this is a great accomplishment and that I should stop at this point. Especially those that probably would have just used the larger routines and not had a second thought about it.
Well, obviously some of you don't know me very well.
I think I can pull a few more "Dirty Tricks" from out of my hat and turn this Dirty Littlle Routine into something totally scandalous. I'm not happy until the routine is so small that it pulls numbers from thin air, like magick... |
Last edited by RetroDan on Apr 13, 2006 - 09:24 AM; edited 1 time in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 09:07 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
Let's re-examine what it is we're doing mathematically with our Dirty Little DIV10 Routine:
We're multiplying by 51, adding one, then dividing by two, and then truncating to get a final division of 256:
Code:
A / 5 ~= [ [ ( A x 51 ) + 1 ] / 2 ] / 256
Let's ignore that final division by 256 because it's accomplished with ZERO program steps by ignoring the lower byte and there's no way we can improve on that unless there are opcodes that can move us backwards in time.
So we want to look closely at the right side of this equation:
Code:
256 x [A / 5] ~= ( 51A + 1) / 2
256 x [A / 5] ~= 51A/2 + 1/2
Well 1/2 doesn't really mean much to a microprocessor, you either have a bit that is on or its off. We'll ignore it for now and if need be, we'll increment something in the end like we did with our DIV5 routine.
So now we have:
Code:
256 x [A / 5] ~= 51A/2
256 x [A / 5] ~= 25.5 x A
256 x [A / 5] ~= 25A
Since we dropped a 1/2 earlier and now we're stuck with another 0.5 perhaps we should round the 25.5 upto 26 rather than truncate it down to 25.
Code:
256 x [A / 5] ~= 25.5 x A
256 x [A / 5] ~= 26A
So to divide by 10 all we need to do is multiply by 26 and ignore lower byte:
Code:
MUL10: LDI B,26
MUL A,B ;ANSWER IN R1
It sounds incredable, but let's check the results and see what our error ratio is:
Code:
10 => 1 Correct!
50 => 5 Correct!
100 => 10 Correct!
150 => 15 Correct!
200 => 20 Correct!
250 => 25 Correct!
So it's 100% accurate over the range we need and it's only two program lines.
We started with rather large routines requiring 50 to 100 clock cycles per call and have something that runs in just three.
Can we reduce it even more?
Not really, not until Atmel comes out with a MULI command that allows us to multiply a register with an immedaite value rather than loading it into another register first.
Maybe next year...
It would probably look like this:
Code:
DIV10: MULI A,26 ;ANSWER IN R1
|
Last edited by RetroDan on Mar 31, 2006 - 10:14 AM; edited 1 time in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 10:08 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Conclusion)
IN CONCLUSION:
By systematically applying a few Dirty Tricks and following our instincts combined with a little high-school math, we took some rather large routines and reduced them down into just a couple program steps. Something that many would say is impossible., but never say never my friend. If I can do it, so can you!
REQUEST FOR FEED BACK:
I hope you found this posting entertaining as well as educational.
If you would like to see more like it, please be sure to let the moderators know.
Thank you for your time and consideration. |
Last edited by RetroDan on Apr 13, 2006 - 09:27 AM; edited 1 time in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 03:31 PM |
|

Joined: Mar 28, 2001
Posts: 416
Location: Munich, Germany
|
|
|
Code:
MUL10: LDI B,26
MUL A,B ;ANSWER IN R1
Quote:
So it's 100% accurate over the range we need and it's only two program lines.
This is not true. For a 'div by 10' this is an approximation which will not yield correct results for all integers between 1 and 255 decimal (try for example 169, 159 ... for 'A' to see it fail). The 'scaled reciprocal' for a correct 'division by 10' of an 8 bit integer would have to be at least 9 bits to reach the required precision.
Please check out the mathematical basics for this technique at http://www.cs.uiowa.edu/~jones/bcd/divide.html
The claims in your comparison
Code:
DIV10a DIV10b LENZE
REGISTERS: 2 2 10
DIFFERENCE: -8 -8 -
PERCENT: -80% -80% -
PROGRAM WORDS: 5 7 36
DIFFERENCE: -31 -29 -
PERCENT: -86% -80% -
SPEED/CYCLES: 5 8 46
DIFFERENCE: -41 -38 -
PERCENT: -89% -83% -
are also incorrect: the '46 cycles' for 'Lenze' is including register saves (push/pop), the naked division routine w/o reg saves and getting the remainder is:
Code:
ldi YH,0xCC ; scaled reciprocal for /10
ldi YL,0xCD
; Q = A * 0xCCCD
; (r19:r18:r17[:rXX] = XH:XL * YH:YL)
clr r2
mul XH, YH ; ah * bh
movw r19:r18, r1:r0
mul XL, YL ; al * bl
mov r17,r1 ; r0 to [rXX] is superfluous
mul XH, YL ; ah * bl
add r17, r0
adc r18, r1
adc r19, r2
mul YH, XL ; bh * al
add r17, r0
adc r18, r1
adc r19, r2
; Q = Q >> 16: use r19:r18 as word
; Q = Q >> 3
lsr r19 ; do the last 3 shifts
ror r18
lsr r19
ror r18
lsr r19
ror r18
; r19:r18 now "Q" (= result >> 19)
That's 25 cycles for a 'divide 16 bit number by 10' (result BTW is always correct over the full range 1 - 65535). |
_________________ Andreas
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 04:07 PM |
|

Joined: Dec 08, 2004
Posts: 4719
Location: Nova Scotia, Canada
|
|
Well now, Andreas... The argument could be made that RetroDan's solution is *more correct* than the traditional integer approach.
169/10 = 16 if we follow traditional definitions.
Of course, the "true" result is 16.9. But we cannot express 16.9 in an integer. So we round. The traditional approach for computer integer arithmetic is to always round down.
Mathemeticians would tell us that we should round anything from 0.0 up to 0.49999999... as 0, and anything from 0.500000000...01 up to 0.9999999999999999... as 1. There is a gray area for 0.5 exactly. I was taught a convention that in that case, you round to the even-numbered integer.
From that prespective, RetroDan's solution comes much closer to the "mathematically" correct solution. 169/10 = 17.
As for the amount of pushing and popping required... well, that's relative to the register usage conventions at hand. In the worst case (with all destoryed registers saved and restored), RetroDan's solution is still smaller and faster than the best-case (with no destroyed registers saved and restored) "Lenze" solution. |
Last edited by lfmorrison on Mar 31, 2006 - 04:18 PM; edited 1 time in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 04:17 PM |
|


Joined: Jul 18, 2005
Posts: 62281
Location: (using avr-gcc in) Finchingfield, Essex, England
|
|
Yup, actually I thought the *26 then ditch 8 bit thing was really clever. Obviously it works because 26/256 is 0.1015625 which for small numbers is OK as a way of multiplying by roughly 0.1. If you tried a larger value like 800 then I guess it'd be 800 * 26 = 20,800 and divided by 256 this is then 81 - so the extra 0.0015625 has come into play to create the error. But on small numbers it is a clever technique for an "integer divide"
In fact just checked and it's good for all values up to 630 in fact, the first one where it goes wrong is 631
Cliff |
|
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 06:23 PM |
|

Joined: Mar 28, 2001
Posts: 416
Location: Munich, Germany
|
|
|
Quote:
From that prespective, RetroDan's solution comes much closer to the "mathematically" correct solution. 169/10 = 17.
Ah well, Luke - for '169', yes. But even if we assume an 'implicit rounding' scheme, this is not handled correctly, for ex. in the range 0-9 we get '0' as result up to dividend 9, ditto e.g. in the 50-59 range ('5' up to 59).
Mathematically correct - as to the mathematicans - would be the 'increment result' from '5' or '55' upwards. OP claims '100% correct' for the algorithm which it isn't.
It would be correct if it either were returning the remainder together with the result or providing the mathematically correct rounding you described.
Quote:
In the worst case (with all destroyed registers saved and restored), RetroDan's solution is still smaller and faster than the best-case (with no destroyed registers saved and restored) "Lenze" solution.
True, except for the fact that the OP is manipulatively comparing a 'sort of 8 bit integers division' (nice as long as you don't need correct results) with a 16 bit integers division (add another 6 cycles to get the remainder - blows it up to 31 ... geeeze ...) which, thanks to Mr Jones, does return correct results.
BTW: As this is the 'tutorial' forum (contents aimed at beginners) I honestly think that it would be more helpful if an approximation technique were explecitely labeled as such instead of trying to sell it off as real binary mathematics. |
_________________ Andreas
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 06:35 PM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
|
alenze wrote:
BTW: As this is the 'tutorial' forum (contents aimed at beginners) I honestly think that it would be more helpful if an approximation technique were explecitely labeled as such instead of trying to sell it off as real binary mathematics.
I apologoize Alenze, as the comparison was not meant to in any way diminsh the excellent routines you have in your Div_xx Package. It is full of "tried and true" routine, and are backed by the work of Jones. If anyone is looking for "correct" routine I recommend them highly
However, I don't think you read the CAVEAT LECTOR warning in my first post. These postings are to discuss "Dirty Math Tricks" and hopefully have some fun at the same time. If anyone is uneasy about what I'm doing, I suggest they stick to the proven routines like the ones you and Atmel provide and forget my threads even exist. |
|
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 07:16 PM |
|

Joined: Dec 08, 2004
Posts: 4719
Location: Nova Scotia, Canada
|
|
"Closer to mathematically correct". I wasn't claiming that it was a miracle cure-all solution. In fact, it conveniently works out that way by pure coincidence of the range of numbers involved..
Anyway... The "traditional" approach to integer division (ignoring remainders for the moment), and Retro's "radical approach" to integer division are both wrong when compared to result using real numbers.
When compared to the "real number" result, the traditional results will always be less than or equal to the true result, with a maximum absolute error error of -1 LSB, and this maximum error is repeated consistently with a periodicity of 10 for all possible inputs.
RetroDan's results may differ from the "real number" result as well. The result may be greater than or less than the "real number" result. But it, too, will have a maximum absolute error of +/-1 LSB for the range of inputs we're interested in.
Note that the original intent of this thread was only to do 8-bit division by 10. And we actually only approach an absolute error of -1 LSB at the lower extreme with inputs of around 0, and +1 LSB at the upper extreme with inputs of around 630. In the middle of the range, the aboslute error will be limited to a band of +/- 0.5 LSB
In light of these results, I don't think it's fair to label his as "more wrong" than the traditional results.
Of course, if you want to have access to the remainders so that your application can fine-tune its interpretation of the result, then that's a strong argument in favour of some other approach. |
|
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 07:38 PM |
|

Joined: Dec 08, 2004
Posts: 4719
Location: Nova Scotia, Canada
|
|
Finding remainders:
- Multiply the result by 10. Label as 'x'.
- Subtract 'x' from the original input.
- This result is equal the the "Remainder". Using RetroDan's code, the remainder can either "positive" or "negative", but it will always have a magnitude of 10 or less. If negative, then feel free to subtract 1 from the quotient and add 10 to the remainder. And presto! You have a result that is totally identical to the "Lenze" result, for all possible 8-bit inputs.
Code:
; RetroDan's original code
DIV10: LDI B,51
MUL A,B
INC R1 ;R1=A/5
LSR R1 ;R1=(A/5)/2 = A/10
; Luke's insertion
MOV C, R1
LDI B, 10
MUL C, B
SUB A, R0
; 'A' contains the (signed) remainder.
; sign of the remainder indicates whether it is an:
; - under-estimate (positive remainder), or
; - over-esitmate (negative remainder)
; 'C' contains the quotient
; Now a 100% _mathematically_ correct result for all 8-bit inputs.
; (Destroys the original input operand 'A'.
; one could return the remainder in register
; 'B' at the cost of one extra MOV instruction.)
RET
|
|
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 09:47 PM |
|

Joined: Nov 17, 2004
Posts: 13834
Location: Vancouver, BC
|
|
Regarding the division by 5 routine you said:
Quote:
Hmm, we're alway off by minus one, so if we add one, we should be right on the money!
But the only numbers that you checked were ones that were actually divisible by five. In fact, these are the only numbers where this routine returns the correct answer. If someone wants to use this, then some correction would definitely be necessary. What I would suggest is to not add the one, then find the remainder as lfmorrison points out. In this case the remainder will always be in the range of [0, 5]. If the remainder is 5, add one to the quotient and make the remainder 0. Otherwise, the quotient and the remainder are correct.
Applying this to the divide by 10 (the '* 51' version, not the '* 26' version), again if you drop the 'add one', the remainder is always in the range of [0, 10]. So again the only adjustment is for a remainder of 10.
Keep in mind that these remainder ranges are only valid if the original number is 8 bit. I quickly goes awry for larger numbers. Even 261 produces an out of range remainder. You could subtract 10 (or 5 for the div by 5) from the remainder instead of just making it 0 to extend the range. To be more specific, it will be correct as long as the answer fit in 8 bits. |
_________________ Regards,
Steve A.
The Board helps those that help themselves.
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 11:52 PM |
|

Joined: Mar 28, 2001
Posts: 416
Location: Munich, Germany
|
|
One way to do this (sorry, lfmorrison's signed remainder isn't always mathematically correct - "we should round anything from 0.0 up to 0.49999999... as 0, and anything from 0.500000000...01 up to 0.9999999999999999... as 1":
Code:
; Division by 10, useing reciprocal multiplication
;
; Call with:
; 8 bit dividend in r16
;
; Returns:
; Result in r16
; Remainder in r18
mov r18, r16 ; save original dividend
ldi r17,26 ; reciprocal, scaled *256, off a bit
mul r16,r17
; result of multiplication now in r1:r0
; use only r1, thereby effectively divide by 256
dec r1 ; if imprecise scaling value influences result,
; result will be '+1'. Decrement to avoid neg.
; value in later subtraction
ldi r17,10 ; re-use r17 for divisor
mov r16,r1 ; save result/256
mul r16,r17 ; find remainder by multiplication of result by 10d
; result again in r1:r0
; get remainder and correct result of 'by 10'
sub r18,r0
cpi r18,10
brlo done
subi r18,10
inc r16
done:
; result in r16, remainder in r18
;optional rounding:
cpi r18,5
brlo done_r
inc r16
done_r:
(Quick hack, up to two or three cycles could be shaved off) |
_________________ Andreas
|
| |
|
|
|
|
|
Posted: Apr 01, 2006 - 03:34 AM |
|


Joined: Nov 16, 2005
Posts: 341
Location: Calgary, Alberta, Canada
|
|
I'm going to start with an apology here, because I might be thinking differently, and appear to be stiffling a very simulating and valuable discussion. However as a tutorial, I feel this thread is misplaced. It should be in the main forum.
My thoughts of a tutorial are as an introduction, a quickstart guide, a pointer to get going and excited. It is a place for Newbies to look and gain confidence, and make things. Get involved and learn. Good examples to simply use (and with luck considered later), just standard routines.
If my manager came up and said, "Eric, we stock a 4k device, the next part up is $5 and that will take us under", then I would be fantastically interested in this thread. But If I had just found this site, looking for information, I might be scared away by the academic discourse of the moment. I'm looking for gentle introductions to do this, do that. If the device cannot divide by 10, and I cannot comprehend the details that experienced people are debating, then I'm off. Another Newbie scared away.
Engineering is about compromises."Gerald!, if you can do it in a day just get on with it.". Versus, "A new part in our inventory, can't you make it smaller". Such pragmatic stuff is key to success, but we have to be careful about introducing it here (apart from sowing thoughts).
I hope my opinion is understood in a positive sense. It is only one view among so many good people.
Cheers All. |
|
|
| |
|
|
|
|
|
Posted: Apr 01, 2006 - 04:15 AM |
|

Joined: Dec 08, 2004
Posts: 4719
Location: Nova Scotia, Canada
|
|
Frankly, I don't buy that "negative remainder isn't correct" argument.
It's a non-standard representation. So are improper fractions. But just like improper fractions, the result is numerically dead-on.
AFAIC, 25 divided by 4 can be perfectly legally expressed as 6 remainder 1. And that is exactly identical to 5 remainder 5, or 7 remainder -3.
The only possible result from RetroDan's expression would still be guaranteed to be within 1 LSB of the true result. And the "traditional approach" cannot offer any better than getting within 1 LSB either.
But if it really bothers you, then you could always test the sign bit of the remainder... if it is set, then subtract 1 from the quotient and add 10 to the remainder.
Code:
DIV10: LDI B,51
MUL A,B
INC R1 ;R1=A/5
LSR R1 ;R1=(A/5)/2 = A/10
; Luke's insertion
MOV C, R1
LDI B, 10
MUL C, B
SUB A, R0
; 'A' contains the (signed) remainder.
; sign of the remainder indicates whether it is an:
; - under-estimate (positive remainder), or
; - over-esitmate (negative remainder)
; 'C' contains the quotient
; We can return here with a totally valid and numerically correct result.
; But for the sticklers among us, let's convert it
; to a "cannonical form".
SBRS A, 7 ; test the sign of result.
RJMP PositiveRemainder
; We have a negative remainder; subtract 1 from
; the quotient and add 10 to the remainder.
DEC C
ADD A, B
PositiveRemainder
; The result is now in perfect "cannonical" form.
RET
And given the limited range of values we're concerned with here, that result (after a single check of the sign on the remainder, and possibly a single subtraction of 1 / addition of 10) will be in the "cannonical" form. (Ie. a guaranteed underesitmate in the result, and a remainder that is guaranteed to be positive, and bound between 0 and 9.) |
|
|
| |
|
|
|
|
|
Posted: Apr 01, 2006 - 04:49 AM |
|

Joined: Jan 30, 2005
Posts: 852
Location: Junction City, OR USA
|
|
Broxbourne: I agree 100%. The give and take on this subject should be hashed out in another forum. Then and ONLY then should it become a tutorial. After all, what is a beginner to think when seeing the differing opinions. Most beginners are in no position to judge who is right or wrong on the details. I would like to see it here once that is done.
Rick |
|
|
| |
|
|
|
|
|
Posted: Apr 01, 2006 - 05:05 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
|
Broxbourne wrote:
I'm going to start with an apology here, because I might be thinking differently, and appear to be stiffling a very simulating and valuable discussion. However as a tutorial, I feel this thread is misplaced. It should be in the main forum.
<snip>
I hope my opinion is understood in a positive sense. It is only one view among so many good people.
Cheers All.
Thanks for the feed-back.
You may not know this, but actual tutorials are labeled [TUT] by moderator to distiguish them from other threads. The moderator has labeled this thread [DIS] for discussion. Another label is [CODE] for code segments that are not tutorials. I think this information can be found in the "Sticky-the-Bear" threads at the top of the list. |
|
|
| |
|
|
|
|
|
Posted: Apr 01, 2006 - 05:40 AM |
|


Joined: Nov 16, 2005
Posts: 341
Location: Calgary, Alberta, Canada
|
|
Dan, Thank you for the information. I did not realize the moderator was adjusting the titles of the thread according to the intended audience. I just thought this was the place to find tutorials and good examples of code and AVR applications.
Thanks for clarifying that. I'll make sure I only read [TUT] postings from now on to avoid getting confused by the more heady academic stuff, although I am still surprised to see such specialised discussion in a beginner forum.
Practically Yours. |
|
|
| |
|
|
|
|
|
Posted: Apr 01, 2006 - 12:47 PM |
|

Joined: Mar 28, 2001
Posts: 416
Location: Munich, Germany
|
|
Hi Luke,
Quote:
sorry, lfmorrison's signed remainder isn't always mathematically correct - "we should round anything from 0.0 up to 0.49999999... as 0, and anything from 0.500000000...01 up to 0.9999999999999999... as 1"
definitely didn't come through quite as intended (sorry about that):
Quote:
Frankly, I don't buy that "negative remainder isn't correct" argument
You are right and I phrased it badly: of course the signed remainder is correctly calculated (cleaner than in my example, too - 3 or four cycles saved).
It's the 'inbuild rounding' whose threshold is off by one, mathematically speaking ( ):
The quotient 'n.5' (6.5 for example if dviding 65/10) should already be rounded up to n+1 ('7'), here it is still left at '6'.
66/10 (6.6) is correctly rounded to '7' as 64/10 (6.4) becomes a rounded '6'. It's just the '.5' rounding which doesn't quite work (aren't those mathematicians a fascinating breed?).
Sorry again - apologies to Broxbourne, too, guess we got carried away a little here ... |
_________________ Andreas
|
| |
|
|
|
|
|
Posted: Apr 06, 2006 - 05:07 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
Thank you all for the wonderful feed-back. The information was invaluable and probably saved me a lot of head-scratching as you'll soon find out.
I was pleased-as-punch that my "quirky" little division by 10 routine worked. I was even more delighted to find out that it did a pretty good job of rounding to boot. However, when I got around to using it for my original purpose, I realized that rounding was the last thing I wanted.
The original use for the routine was to break-down an unsigned single byte for display on the Butterfly LCD. Well I was getting results like: 74, 75, 76, 77, 80, 80, 80, 81, 82, 83. Obviously the little DIV10 routine was working over-time and rounding, so a few fine adjustments were needed to "UN-ROUND" the results.
I know that you guys have already worked out your own versions, here is mine:
Code:
;-----------------------------------;
; RETRO (SYNTHETIC) DIVISION BY 10 ;
; ANSWER IN R1, R0=REM, A:PRESERVED ;
;-----------------------------------;
DIV10: PUSH B
LDI B,26 ;MUL BY 26
MUL A,B ;R1=A/10
PUSH R1 ;BRUTE-FORCE CALC OF REMAINDER
LDI B,10 ;CALC REM
MUL R1,B ;R0=10xR1(QUOT)
POP R1 ;RESTORE QUOT
SUB R0,A ;SUBTRACT REMx10
NODJST: NEG R0 ;MAKE POSITIVE
BRPL NONEG ;STILL NEG?
ADD R0,B ;OOPS MAKE
DEC R1 ;ADJUSTMENTS
NONEG: RET
If you're wondering why I calculate the remainder "backwards" then negate it, it's to preserve the value of A. If there's no need to preserve A then the remainder could be calculated the other-way-around and eliminate the need for the NEG R0 line. However, you should be forewarned that "letting go" a NEG R0 without very good cause could lead to a charge of racism! Other than removing that one statement I don't know if it can be optimized any further, but it sure would be nice. |
|
|
| |
|
|
|
|
|
Posted: Oct 08, 2006 - 12:58 AM |
|

Joined: Nov 15, 2004
Posts: 11
Location: Netherlands
|
|
this topic got me thinking a while ago.
I really liked the super simple idea of a divide by ten routine, but didn't like the error it has. So i was thinking a little further about the routine with the multiplier.
I was thinking the error came because of the difference between 26 and 25.6, and started thinking about a way to make this error smaller, so i came up with the idea of 65536/10 which would make 6553.6 or 6554 which has less error.
I wrote a test program on the ti83 to verify my idea, and found that it is correct up to 14 bits.
listing:
Code:
:ClrHome
:0->G
:0->F
:6554->B
:For(A,1,20000)
:If (iPart(A/10)=iPart((A*B)/65536))
:Then
:G+1->G
:Else
:F+1->F
:If F<4
:Output(F+2,1,A
:End
:Output(1,1,G
:Output(2,1,F
:End
This told me the first error would occur at 16389.
So up to 14 bit accurate!
Enough for a 8 bit number
I wrote me some asm and came up with:
Code:
div10:
; put your value in temp
; ldi temp, 11 ; example value is in temp
ldi mul_constL, low(6554)
ldi mul_constH, high(6554) ; load multiply value
mul temp, mul_constL ; mul lower
movw result_M2:result_L2, R1:R0 ; store
mul temp, mul_constH ; mul higher
clr result_H2
add result_M2, R0 ; add value
adc result_H2, R1
ret
; result is now in result_H2
The code above is hogging up to much vars. So i optimized it a little and came up with:
Code:
div10:
; input value in temp
ldi tempL, low(6554)
ldi tempH, high(6554) ; load multiply value
mul temp, tempL ; mul lower
mov tempL, R1 ; store
mul temp, tempH ; mul higher
clr tempH
add tempL, R0 ; add value
adc tempH, R1
ret
; result is now in tempH
this is better.  |
|
|
| |
|
|
|
|
|
Posted: Oct 21, 2006 - 04:26 PM |
|

Joined: Oct 12, 2006
Posts: 12
|
|
I tried running your 8 bit division code:
[quote];***************************************************************************
;* "div8u" - 8/8 Bit Unsigned Division
;*
;* This subroutine divides the two register variables "dd8u" (dividend) and
;* "dv8u" (divisor). The result is placed in "dres8u" and the remainder in
;* "drem8u".
;*
;* Number of words :14
;* Number of cycles :97
;* Low registers used :1 (drem8u)
;* High registers used :3 (dres8u/dd8u,dv8u,dcnt8u)
;***************************************************************************
.def drem8u =r15 ;remainder
.def dres8u =r16 ;result
.def dd8u =r16 ;dividend
.def dv8u =r17 ;divisor
.def dcnt8u =r18 ;loop counter
div8u: sub drem8u,drem8u ;clear remainder and carry
ldi dcnt8u,9 ;init loop counter
d8u_1: rol dd8u ;shift left dividend
dec dcnt8u ;decrement counter
brne d8u_2 ;if done
ret ; return
d8u_2: rol drem8u ;shift dividend into remainder
sub drem8u,dv8u ;remainder = remainder - divisor
brcc d8u_3 ;if result negative
add drem8u,dv8u ; restore remainder
clc ; clear carry to be shifted into result
rjmp d8u_1 ;else
d8u_3: sec ; set carry to be shifted into result
rjmp d8u_1
When I build this code, it gives me an error: "Register already defined by the .def directive". The above code refers to R16 with two different labels which should be okay. Is there a reason why it gives me this error?
Thanks,
John |
|
|
| |
|
|
|
|
|
Posted: Oct 21, 2006 - 05:13 PM |
|

Joined: Nov 17, 2004
Posts: 13834
Location: Vancouver, BC
|
|
| I've gotten this error recently as well. I think that it used to work, but it has changed in recent versions of AVR Studio. Not sure if this is a bug or a deliberate change. |
_________________ Regards,
Steve A.
The Board helps those that help themselves.
|
| |
|
|
|
|
|
Posted: Oct 21, 2006 - 11:07 PM |
|

Joined: Oct 20, 2006
Posts: 4
|
|
First, let me say that I have no clue about AVR coding. An Arduino board is in the mail, but I haven't gotten it yet. I love digital math, though. So here's a little different take on the OP's trick.
Dividing is the same thing as multiplying by the inverse, so dividing by 10 is the same as multiplying by 0.1. The problem is that there isn't a neat binary representation of 0.1. But it turns out that you can use a good enough 16-bit representation of 0.1 to get accurate results (meaning the same number as division by 10, rounded down as expected). With some simple fixed-point math you can then do your multiply, discard the mantissa (the bit after the . (or , for Europeans)) and there you go.
What fixed-point format to use? It'll be obvious in a minute. Let's just assume 8 bits of mantissa for now and start finding the binary representation of 0.1d (decimal). Really what you're looking for is which powers of -2 you need to add up to get as close to 0.1 as possible. If you do that, you find:
1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 0/128 + 1/256
the binary representation for this is:
00000000.00011001
So that's the 16-bit number you should multiply the original number by to divide by 10. But you'll notice we can use extra mantissa bits, as long as the top byte is 0. (If the top byte isn't 0 we might get overflow when we multiply.) So let's use 11 mantissa bits:
1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 0/128 + 1/256 + 1/512 + 0/1024 + 1/2048
or 00000.00011001101
Here it is all nicely coded up in C. I've tested it for 0-255 and it gives the same result as C's /10 operation for every one. gcc did some awful things to the assembly, but I'll leave that part up to you guys because you're much better at it than I am.
Code:
uint8_t div10_fp(uint8_t number)
{
// 1/10th with 11 mantissa bits
// 1/10th = 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 0/128 + 1/256 + 1/512 +
// 0/1024 + 1/2048
// 0.00011001101
uint16_t tenth = 0xcd;
// result with 11 mantissa bits
uint16_t result = number * tenth;
9a: 99 27 eor r25, r25
9c: 2d ec ldi r18, 0xCD ; 205
9e: 30 e0 ldi r19, 0x00 ; 0
a0: 82 9f mul r24, r18
a2: a0 01 movw r20, r0
a4: 83 9f mul r24, r19
a6: 50 0d add r21, r0
a8: 92 9f mul r25, r18
aa: 50 0d add r21, r0
ac: 11 24 eor r1, r1
ae: ca 01 movw r24, r20
// Drop mantissa to round down.
return result >> 11;
b0: 89 2f mov r24, r25
b2: 99 27 eor r25, r25
b4: 86 95 lsr r24
b6: 86 95 lsr r24
b8: 86 95 lsr r24
}
ba: 99 27 eor r25, r25
bc: 08 95 ret
You can do the same thing for dividing by any constant, and I'm actually kind of surprised gcc doesn't just do this for me.
Tim |
|
|
| |
|
|
|
|
|
Posted: Oct 22, 2006 - 03:35 PM |
|

Joined: Nov 15, 2004
Posts: 11
Location: Netherlands
|
|
That is a great idea froody!
I had a look at your code, and optimized it a bit.
Code:
; the input number must be in register "input"
; value/10 is in register "result"
; register temp1 and R1:R0 are clobbered
div10:
ldi temp1, 205
mul temp1, input
lsr R1
lsr R1
lsr R1
mov result, R1
ret
edit...
i have written a little test program in ti basic.
Code:
:ClrHome
:0->G
:0->F
:For(N,0,1200)
:If (iPart(N/10)=iPart((N*205)/2048))
:Then
:G+1->G
:Else
:F+1->F
:If F<5
:Output(2+F,1,N
:End
:Output(1,1,"G="
:Output(1,3,G
:Output(2,1,"F="
:Output(2,3,F
:End
It shows that it is correct up to 1028 and it fails at 1029, more than enough for 8 bit |
|
|
| |
|
|
|
|
|
Posted: Oct 22, 2006 - 11:13 PM |
|

Joined: Oct 20, 2006
Posts: 4
|
|
brum, wow. gcc really sucks, huh? Nice job on optimizing the code. I'm going to install gcc 4.1 soon. Hopefully it won't suck as much.
Meanwhile, I clarified my original post a bit, and wrote some more, and now it's turned into a little article:
How to divide an unsigned 8-bit integer by a constant.
(c) 2006 by Tim Newsome <nuisance@casualhacker.net>
Thanks to RetroDan and several other on avrfreaks.net for the inspiration.
Dividing is slow, so let's avoid that. Instead, multiply by the inverse. You can write the inverse as a fixed-point binary number. Just like each digit in a normal binary number represents a power of 2, each digit after the decimal point in a binary number represents a power of -2. So:
10.11 = 1 * 2^1 + 0 * 2^0 + 1 * 2^-1 + 1 * 2^-2 = 2 + 1/2 + 1/4 = 2.75
Example: 0.5 = 1/2 = 0.1 (binary, 1-bit mantissa)
Example: 0.25 = 0/2 + 1/4 = 0.01 (binary, 2-bit mantissa)
Unfortunately there isn't a precise binary representation of most fractions, so you have to use an approximation. The same problem happens in decimal numbers, just not as often. For instance, 1/3 can be approximated as 0.3, which has an error of 0.3-1/3=-1/30. A better approximation is 0.33, which has an error of -1/300. The more digits you add, the smaller the error becomes. The same thing is true in binary:
1/3 =~ 1/2 = 0.1, error = 0.167
1/3 =~ 0/2 + 1/4 = 0.01, error = -0.0833
1/3 =~ 0/2 + 1/4 + 1/8 = 0.011, error = 0.0417
1/3 =~ 0/2 + 1/4 + 0/8 + 1/16 = 0.0101, error = -0.0208
Because we can't always find the precise inverse, we need to find one that's close enough, and gives us the right answer by rounding down always. (Rounding down just means ignoring the mantissa bits, which is easy to do.) That means we want to find an approximation that is at least as big as the real inverse. In other words, the error (computed as approximation minus real value) has to be positive. Furthermore, the error multiplied by 255 (our largest input value) must be smaller (less negative) than 1/divisor. For dividing by 3, the error must lie between 0 and 1/(3*255). Let's rework 1/3 keeping the error positive, and seeing how many bits we need before the error is small enough that we can use the approximation:
1/3 =~ 1/2 = 0.1, error = .167
1/3 =~ 1/2 + 0/4 = 0.10, error = .167
1/3 =~ 0/2 + 1/4 + 1/8 = 0.011, error = .0417
1/3 =~ 0/2 + 1/4 + 1/8 + 0/16 = 0.0110, error = .0417
1/3 =~ 0/2 + 1/4 + 0/8 + 1/16 + 1/32 = 0.01011, error = .0104
1/3 =~ 0/2 + 1/4 + 0/8 + 1/16 + 1/32 + 0/64 = 0.010110, error = .0104
1/3 =~ 0/2 + 1/4 + 0/8 + 1/16 + 0/32 + 1/64 + 1/128 = 0.0101011, error = .00260
1/3 =~ 0/2 + 1/4 + 0/8 + 1/16 + 0/32 + 1/64 + 1/128 + 0/256 = 0.01010110, error = .00260
1/3 =~ 0/2 + 1/4 + 0/8 + 1/16 + 0/32 + 1/64 + 0/128 + 1/256 + 1/512 = 0.010101011, error = .000651
So 9 bits of mantissa is enough, and our inverse is 0.010101011. Here is some C code that takes advantage of this:
Code:
uint8_t div3(uint8_t number)
{
// Compute the intermediate value, with 9 mantissa bits.
uint16_t intermediate = (uint16_t) number * 0xab;
// Return the result, chopping off the mantissa bits.
return intermediate >> 9;
}
A good compiler should compile that down to a single multiply instruction, a shift, and maybe a few extra instructions. This is drastically faster than using the generic divide code.
Let's do one more example, for division by 10. First figure out the inverse using the fewest number of mantissa bits:
1/10 =~ 1/2, error = 0.400
1/10 =~ 0/2 + 1/4, error = 0.150
1/10 =~ 0/2 + 0/4 + 1/8, error = 0.025
1/10 =~ 0/2 + 0/4 + 1/8 + 0/16, error = 0.025
1/10 =~ 0/2 + 0/4 + 1/8 + 0/16 + 0/32, error = 0.025
1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 1/64, error = .00938
1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 1/128, error = .00156
1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 1/128 + 0/256, error = .00156
1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 1/128 + 0/256 + 0/512, error = .00156
1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 0/128 + 1/256 + 1/512 + 1/1024, error = .000586
1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 0/128 + 1/256 + 1/512 + 0/1024 + 1/2048, error = .0000977
Finally we've reached the point where 255 * error is less than 1/10. So we need 11 mantissa bits, and our inverse is 0.00011001101. Here's the C code:
Code:
uint8_t div10(uint8_t number)
{
// Compute the intermediate value, with 11 mantissa bits.
uint16_t intermediate = (uint16_t) number * 0xcd;
// Return the result, chopping off the mantissa bits.
return intermediate >> 11;
}
So there you have it. Fast division by a constant of an unsigned 8-bit integer. If you followed what I've done here, you should be able to extend this to 16-bit or any other sized number. |
Last edited by froody on Oct 22, 2006 - 11:50 PM; edited 1 time in total
|
| |
|
|
|
|
|
Posted: Oct 22, 2006 - 11:29 PM |
|


Joined: Jan 23, 2004
Posts: 9825
Location: Trondheim, Norway
|
|
Froody,
Good article. However, in your code examples don't you need an explicit cast to the "number" parameter to force GCC to use 16-bit operations? I would expect GCC to do the multiplication as 8-bit first, then implicit cast to a 16-bit value when storing.
- Dean  |
_________________ Atmel Studio 6.1 is now released, grab it here.
Report AS6/ASF bugs here.
|
| |
|
|
|
|
|
Posted: Oct 22, 2006 - 11:49 PM |
|

Joined: Oct 20, 2006
Posts: 4
|
|
|
abcminiuser wrote:
Good article. However, in your code examples don't you need an explicit cast to the "number" parameter to force GCC to use 16-bit operations? I would expect GCC to do the multiplication as 8-bit first, then implicit cast to a 16-bit value when storing.
Thanks. Good point about the typecast. I think it's OK, because the hex constant is implicitly of type int, which is 16 bits, on AVR (right?). So it all works out. But it certainly is safer to add the typecast, so I'll edit my post and stick it in there.
Tim |
|
|
| |
|
|
|
|
|
Posted: Oct 23, 2006 - 01:54 AM |
|

Joined: Nov 17, 2004
Posts: 13834
Location: Vancouver, BC
|
|
|
Quote:
You can do the same thing for dividing by any constant, and I'm actually kind of surprised gcc doesn't just do this for me.
...
gcc really sucks, huh?
I don't know how you can come to this conclusion from this thread. Up until your post this entire thread has been about assembly language and therefore has absolutely nothing to do with gcc. Also, the C language has absolutely no knowledge about fixed point numbers, so how could you possibly expect it to optimize your code in this manner? |
_________________ Regards,
Steve A.
The Board helps those that help themselves.
|
| |
|
|
|
|
|
Posted: Oct 23, 2006 - 03:51 AM |
|

Joined: Oct 20, 2006
Posts: 4
|
|
|
Koshchi wrote:
Quote:
gcc really sucks, huh?
I don't know how you can come to this conclusion from this thread. Up until your post this entire thread has been about assembly language and therefore has absolutely nothing to do with gcc. Also, the C language has absolutely no knowledge about fixed point numbers, so how could you possibly expect it to optimize your code in this manner?
In the first post you quoted, I showed some C code among the assembly code. You can see that gcc's generated code is really inefficient. For instance it does a mul by a register known to be 0. (Note that I've since installed gcc 4.1.1 and it does a lot better, although there are still some obvious optimizations possible.)
Look at this C code:
Code:
uint8_t function(void) {
volatile uint8_t val = 123;
return val / 10;
}
When I compile it, I get a call to a general divide routine. But the compiler knows I'm dividing by a constant, so it could do the exact optimization I outlined. Effectively it would rewrite the code to look like:
Code:
uint8_t function(void) {
volatile uint8_t val = 123;
return ((uint16_t) val * 0xcd) >> 11;
}
That is actually slightly larger than the original code, but a lot faster. If you really care about code size you could just use the div10() function. Then function() would be 1 instruction shorter (because you don't have to pass 10 to the divide function) but you do take the size penalty of div10() (10 instructions in my gcc 4.1.1 build).
Tim |
|
|
| |
|
|
|
|
|
Posted: Jun 14, 2007 - 09:44 AM |
|


Joined: Dec 15, 2003
Posts: 4402
Location: Slovakia, Bratislava
|
|
| Try removing the "voltatile" keyword. You'll get a far better output. |
_________________ There are pointy haired bald people.
Time flies when you have a bad prescaler selected.
|
| |
|
|
|
|
|
Posted: Jun 14, 2007 - 11:27 AM |
|

Joined: Mar 10, 2006
Posts: 31
Location: Denmark
|
|
daqq it's an old thread, nearly 3/4 of an year old
/Safari |
|
|
| |
|
|
|
|
|
Posted: Jun 14, 2007 - 11:41 AM |
|


Joined: Aug 29, 2006
Posts: 229
Location: Kingston, Jamaica
|
|
Still, I'm glad he brought it back since this will help me.
Edward |
|
|
| |
|
|
|
|
|
Posted: Jun 14, 2007 - 06:17 PM |
|


Joined: Apr 16, 2001
Posts: 3522
Location: Phoenix, Arizona
|
|
Nothing like a nice jolt when I saw who the OP was! Then I realized it's an old thread.  |
|
|
| |
|
|
|
|
|
Posted: Dec 19, 2007 - 05:27 AM |
|

Joined: Jun 25, 2007
Posts: 7
|
|
Wouldn't it be nice to not deal with the rounding error? If we were to add a compensation amount before multiplying by 26, we then have no error:
Code:
.def temp1 = r16
.def temp2 = r17
.def A = r18
// Input: A
// Output: r1
DivBy10:
// calculate the amount needed to add to A to compensate for the rounding error to come
// compensation = (A + 3) / 64
mov temp2, A
ldi temp1, 3
add temp2,temp1
// "divide by 64" shortcut
swap temp2
andi temp2, 15
lsr temp2
lsr temp2
// now use 'A - compensation'
mov temp1, A
sub temp1, temp2
// Multiply by magic number
ldi temp2, 26
mul temp1, temp2
ret
I've checked it out on my computer and seems like this works good, and doesn't eat many more cycles than the original. |
|
|
| |
|
|
|
|
|
Posted: Feb 24, 2009 - 08:47 PM |
|


Joined: Dec 07, 2008
Posts: 347
Location: germany
|
|
|
Quote:
Code:
; the input number must be in register "input"
; value/10 is in register "result"
; register temp1 and R1:R0 are clobbered
div10:
ldi temp1, 205
mul temp1, input
lsr R1
lsr R1
lsr R1
mov result, R1
ret
This one rocks! Saved my day, today  |
_________________ Jan Kießling
74HC Overview
Flowchart CAD
schematic CAD
|
| |
|
|
|
|
|
Posted: Mar 06, 2009 - 03:54 AM |
|

Joined: Jan 18, 2005
Posts: 38
|
|
| i suppose one could allocate a huge look up table ln FLASH on the larger AVR range ... practically instant result which i have done many many years ago... (code lost to time) |
|
|
| |
|
|
|
|
|
Posted: Dec 21, 2009 - 08:04 PM |
|

Joined: Sep 29, 2001
Posts: 53
Location: Superior, Colorado USA
|
|
Thanks to all for alively discussion. I need help on a 16 bit divide:
I have been using the DIV16u straight from apnote 1011 (AVR200) since 1998, and found a possible error.
When you have $7080 (28,800) divided by $3841 (14,401), the correct answer is: $0001 and remainder $FFF6 (1.99986).
However, the Atmel DIV16u returns $0001 remainder $383F, quite an error.
When you have $7080 (28,800) divided by $3840 (14,400), the correct answer is: $0002 and remainder $0000 (2.00000), and the Atmel code returns the correct same answer.
Please help, I need a fix fast, or am I wrong?
Vern Bunch, AVR consultant and AVR designer |
|
|
| |
|
|
|
|
|
Posted: Jan 05, 2010 - 06:41 PM |
|


Joined: May 26, 2004
Posts: 2538
Location: Las Vegas, Nevada
|
|
|
vbunch wrote:
When you have $7080 (28,800) divided by $3841 (14,401), the correct answer is: $0001 and remainder $FFF6 (1.99986).
However, the Atmel DIV16u returns $0001 remainder $383F, quite an error.
Dude, WTF?
Let's do this manually, by long division.
$3841 goes into $7080 once.
Code:
(hex assumed)
____1___
3841 ) 7080
3841 -
----
383F
$3841 won't go into $383F, so the quotient is 1 and the remainder is $383F. The Atmel routine returns the correct answer.
The number $FFF6 might possibly occur in some floating point representation, but can never happen in unsigned integer math, since it is larger than the dividend. |
|
|
| |
|
|
|
|
|
Posted: Jan 05, 2010 - 07:02 PM |
|


Joined: May 26, 2004
Posts: 2538
Location: Las Vegas, Nevada
|
|
Oh, I see where you got it from.
If you want to generate the fractional bits, take the remainder, multiply by 16 (shift left 4) and divide this by the divisor. Repeat until you have enough "(hexa)decimal" places.
Code:
____1FF...
3841 ) 7080
3841 -
----
383F <<4
3841 * F -
----
3821 <<4
3841 * F -
----
3641
... etc
Again, this is just confirming that the original division routine got it right. You didn't carry out enough steps to format the answer the way you expected.
Note that since the first remainder is guaranteed to be less than the divisor, each successive step is guaranteed to produce a quotient less than 16, ie a single hex digit. You have to manually concatenate these single digits to the number of places you want. |
|
|
| |
|
|
|
|
|
Posted: Jan 08, 2010 - 05:53 PM |
|

Joined: Sep 29, 2001
Posts: 53
Location: Superior, Colorado USA
|
|
Thank you my friends for your correct responses. It is my fault that I mentioned "remainder" when I expected a fully usable result, without having to divide again, and one that had 16 bit accuracy. So the Atmel division routine is "correct" but unusable without more computation. Since there was no mention that the result is unusable without further division, I niavely skipped the true math definition of "remainder" and went right to "answer", I should not have done that.
I have since written an assembly language routine that gives me a correct (and directly usable) 16 bit division in one run, and it is less time (156 cycles) then the published 16 bit divide routine. It took 3 weeks at 104 hours a week, and I'm way beyond tired.  |
|
|
| |
|
|
|
|
|
Posted: Jan 08, 2010 - 06:01 PM |
|


Joined: Jul 18, 2005
Posts: 62281
Location: (using avr-gcc in) Finchingfield, Essex, England
|
|
|
Quote:
I have since written an assembly language routine that gives me a correct (and directly usable) 16 bit division in one run, and it is less time (156 cycles) then the published 16 bit divide routine. It took 3 weeks at 104 hours a week, and I'm way beyond tired
I guess it's too late to say that you could have "borrowed" the source for a / routine from an open source C compiler? |
_________________
|
| |
|
|
|
|
|
Posted: Jan 08, 2010 - 06:09 PM |
|


Joined: May 26, 2004
Posts: 2538
Location: Las Vegas, Nevada
|
|
|
Quote:
I guess it's too late to say that you could have "borrowed" the source for a / routine from an open source C compiler?
Actually I suspect in this case he could not, because the answer he wanted was in a fixed point format. He got simple integer division from the appnote. The compiler would have given him full float, which would challenge me to adapt to a fixed format in assembler (and I consider myself rather good at it).
As a matter of interest, if we did look at the source for a / routine in an open source C compiler - let's say, and why not, the GCC compiler - would we be looking at assembler, or C? |
|
|
| |
|
|
|
|
|
Posted: Jan 08, 2010 - 06:16 PM |
|


Joined: Jul 18, 2005
Posts: 62281
Location: (using avr-gcc in) Finchingfield, Essex, England
|
|
|
Quote:
would we be looking at assembler, or C?
Assembler. Here's a snippet from gcc/config/avr/libgcc.s:
Code:
/*******************************************************
Division 16 / 16 => (result + remainder)
*******************************************************/
#define r_remL r26 /* remainder Low */
#define r_remH r27 /* remainder High */
/* return: remainder */
#define r_arg1L r24 /* dividend Low */
#define r_arg1H r25 /* dividend High */
/* return: quotient */
#define r_arg2L r22 /* divisor Low */
#define r_arg2H r23 /* divisor High */
#define r_cnt r21 /* loop count */
#if defined (L_udivmodhi4)
.global __udivmodhi4
.func __udivmodhi4
__udivmodhi4:
sub r_remL,r_remL
sub r_remH,r_remH ; clear remainder and carry
ldi r_cnt,17 ; init loop counter
rjmp __udivmodhi4_ep ; jump to entry point
__udivmodhi4_loop:
rol r_remL ; shift dividend into remainder
rol r_remH
cp r_remL,r_arg2L ; compare remainder & divisor
cpc r_remH,r_arg2H
brcs __udivmodhi4_ep ; remainder < divisor
sub r_remL,r_arg2L ; restore remainder
sbc r_remH,r_arg2H
__udivmodhi4_ep:
rol r_arg1L ; shift dividend (with CARRY)
rol r_arg1H
dec r_cnt ; decrement loop counter
brne __udivmodhi4_loop
com r_arg1L
com r_arg1H
; div/mod results to return registers, as for the div() function
mov_l r_arg2L, r_arg1L ; quotient
mov_h r_arg2H, r_arg1H
mov_l r_arg1L, r_remL ; remainder
mov_h r_arg1H, r_remH
ret
.endfunc
#endif /* defined (L_udivmodhi4) */
#if defined (L_divmodhi4)
.global __divmodhi4
.func __divmodhi4
__divmodhi4:
.global _div
_div:
bst r_arg1H,7 ; store sign of dividend
mov __tmp_reg__,r_arg1H
eor __tmp_reg__,r_arg2H ; r0.7 is sign of result
rcall __divmodhi4_neg1 ; dividend negative : negate
sbrc r_arg2H,7
rcall __divmodhi4_neg2 ; divisor negative : negate
rcall __udivmodhi4 ; do the unsigned div/mod
rcall __divmodhi4_neg1 ; correct remainder sign
tst __tmp_reg__
brpl __divmodhi4_exit
__divmodhi4_neg2:
com r_arg2H
neg r_arg2L ; correct divisor/result sign
sbci r_arg2H,0xff
__divmodhi4_exit:
ret
__divmodhi4_neg1:
brtc __divmodhi4_exit
com r_arg1H
neg r_arg1L ; correct dividend/remainder sign
sbci r_arg1H,0xff
ret
.endfunc
#endif /* defined (L_divmodhi4) */
|
_________________
|
| |
|
|
|
|
|
Posted: Jan 15, 2010 - 03:50 PM |
|

Joined: Nov 02, 2009
Posts: 3239
Location: Zelenograd, Russia
|
|
Just for fun, I have castrated the OP's div_10 code even further:
Code:
;----------------------------------------
; 8-bit division by 10 routine.
;
; Input: R0 = dividend.
; Output: R1 = result = dividend/10.
;
; Registers R0 and R16 are clobbered.
; 5 words/10 clocks including RET.
;
div10:
ldi r16,205
mul r0,r16
ldi r16,32
mul r1,r16
ret
;----------------------------------------
|
|
|
| |
|
|
|
|
|
Posted: Apr 30, 2010 - 11:37 AM |
|

Joined: Jul 18, 2007
Posts: 154
Location: Germany
|
|
The OP's div_10 code was buggy at values like 19, 29, 39, ... or at 21, 31, 41, ... depending on whether the correction (INC) was done or not. Here's a fixed version, with remainder calculation (remainder lines marked, strip them if not needed, saves 5 cycles):
Code:
// input: A, output/value: R1, output/remainder: A
// total cycles: 13
PUSH A //2 Me (for remainder)
INC A //1 Me
CPI A, 0 //1 Me
BRNE x //1,2 Me
DEC A //1 Me
x: //0 Me
LDI B, 51 //1 RetroDan
MUL A, B //2 RetroDan
LSR R1 //1 RetroDan
POP A //2 Me (for remainder)
SUB A, R1 //1 Me (for remainder)
Hope this is useful for someone out there.
My approach:
I made a list with values, RetroDan and RetroDan with INC. It looked somehow like this:
8 - 0 0
9 - 0 1
10 - 0 1
11 - 1 1
So one was too low, the other too high around exact matches (i.e. dividable by 10 with no remainder). So I took the one without INC (saves one cycle), INCreased the value before giving it to the RetroDan stuff, so all results were accurate - but "255" which becomes 0.
A quick check for the overflow after increasing (i.e. value=0), decrease it again (make it 255 again, which gives the correct result) and done.
P.S: I know it's an old thread, but I think it's an important one that's read by a lot of people for finding a good divide-by-10 routine. |
_________________ Markus
|
| |
|
|
|
|
|
Posted: May 03, 2010 - 10:49 AM |
|

Joined: Sep 05, 2001
Posts: 2497
|
|
But I would ask at first, whats the purpose to do a division by 10?
Mostly it was the decimal output conversion.
The fastest way to do so, was the subtraction method, since no divison nor multiplication was needed.
A small example for 16 bit ASCII output can be seen on:
http://www.avrfreaks.net/index.php?name ... 721#453721
Peter |
|
|
| |
|
|
|
|
|
Posted: May 03, 2010 - 08:20 PM |
|

Joined: Nov 17, 2004
Posts: 13834
Location: Vancouver, BC
|
|
|
Quote:
The fastest way to do so, was the subtraction method, since no divison nor multiplication was needed.
But repeated subtraction that that routine uses is division. And the multiplies when done with the hardware multiply are very fast. |
_________________ Regards,
Steve A.
The Board helps those that help themselves.
|
| |
|
|
|
|
|
|
|
|