Search |
 |
|
 |
| Author |
Message |
|
|
Posted: Mar 31, 2006 - 04:30 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN's DIRTY MATH TRICKS
PURPOSE: On Going Tutorial on Math Tricks in AVR Native Machine Language for Beginners.
PREAMBLE: While trying to help someone in another FORUM with a Division-by-60 routine to converting seconds to minutes, someone made the remark:
Quote:
What RetroDan has is a quick and dirty approximation!
I think it was meant as an insult, but it gave me a great name for the type of binary routines I love: Dirty Math!
Quick & Dirty Routines that are small and super fast.
CAVEAT LECTOR: If you don't like taking chances and experimenting and don't really care about how big your programs are, or how fast they excecute then this is not a place for you. May I recommend the Atmel Appnotes #200, 201 and 204 that are chock full great standard math routines for the AVR microcontrollers. |
Last edited by RetroDan on Mar 31, 2006 - 10:40 AM; edited 4 times in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 04:38 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN's MATH TIPS
INTRODUCTION:
Here are a few quick tips before we start:
Code:
DIRTY DAN'S TIP #1: ALWAYS USE A POWER OF TWO
If you are writing any program or routine, even if the savings are not obvious at first, pick numbers that are powers-of-two. Somewhere down the line you'll probably be glad you did. Make it a habit and your microcontroller will thank you too.
For example: if you need to take an average, don't take 10 readings and divide by 10, take 8 or 16. If you're setting up a table, don't make entries 10 bytes long, make them 8 or 16. It will greatly simplify things later. There's nothing magical about powers-of-ten to a microprocessor. They're set up for powers-of-two and the number of fingers we have means absolutely nothing to them.
Code:
DIRTY DAN'S TIP #2: AVOID DIVISION AT ALL COSTS!
While Addition, Subtraction and Multiplication can be done in just a few clock cycles, most division involves looping and uses a lot of processor time. Division is normally done by repeated subtractions, the "standard" 16x16 division routine from the Atmel Appnotes averages over 250 clocks cycles compare that to just 9 for a 16x16 multiply and just 2 for a 16x16 Addition.
One problem with high-level languages is that this dirty little fact is kept hidden from the programer. So while you're happily writing code that does Bilinear Interpolation or Cubic Splines, your processor is swearing at you under his breath everytime you ask for a division. You're wondering why things are grinding to a halt with smoke drifting up from the PCB.
Code:
DIRTY DAN'S TIP #3: IF YOU MUST DIVIDE, DO IT BY POWERS OF TWO!
Division by a power of two can be easily and quick accoplished with just a few right-shifts. Each time you shift your bits to the right you are essentially dividing your number in half. So to divide by four you just shift-right twice.
Code:
DIRTY DAN'S TIP #4: IF YOU MUST DIVIDE, DO IT ONLY ONCE!
If your equation has multiple divisions involved, try and re-arrange your equation so you only divide once.
For example: if you need to divide by 10 and later divide by 12, perhaps you can re-arrange things and just divide once by 120 and be done with it. |
Last edited by RetroDan on Apr 13, 2006 - 09:11 AM; edited 1 time in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 05:50 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN
PREAMBLE:
I've been working on my own driver routines for the LCD Display on the Butterfly (same as STK502) that are super small and fast. When it came time to write a little routine that displays a number in decimal, rather than Hex, I ran into my old nemesis DIVISION!
To be more precises, Division-by-Ten. Since we humans are so fond of Division-by-Ten, I thought it would make an excellent starting point for our examination of my Dirty Math Tricks.
For this routine, all we want is to take a single byte value and display it on the LCD Display in Decimal Number Form. So we need to take a number with a maximum value of $FF and break it into three: 2 / 5 / 5.
The way most of us humans would do this is to first divide by it by 100 and then by 10 and then keep the remainder;
Code:
255 / 100 = 2 (remainder of 55)
55 / 10 = 5
remainder = 5
Oh no, that means we need to have two division routines! One for division by 100 and another by ten. Mybe we should just use a general division routine and change the divisor from 100 to 10 rather than have two routines. So I go and check the Atmel Appnote 200 looking for a general 8x8 unsigned division routine and here is what I found:
Code:
;***************************************************************************
;* "div8u" - 8/8 Bit Unsigned Division
;*
;* This subroutine divides the two register variables "dd8u" (dividend) and
;* "dv8u" (divisor). The result is placed in "dres8u" and the remainder in
;* "drem8u".
;*
;* Number of words :14
;* Number of cycles :97
;* Low registers used :1 (drem8u)
;* High registers used :3 (dres8u/dd8u,dv8u,dcnt8u)
;***************************************************************************
.def drem8u =r15 ;remainder
.def dres8u =r16 ;result
.def dd8u =r16 ;dividend
.def dv8u =r17 ;divisor
.def dcnt8u =r18 ;loop counter
div8u: sub drem8u,drem8u ;clear remainder and carry
ldi dcnt8u,9 ;init loop counter
d8u_1: rol dd8u ;shift left dividend
dec dcnt8u ;decrement counter
brne d8u_2 ;if done
ret ; return
d8u_2: rol drem8u ;shift dividend into remainder
sub drem8u,dv8u ;remainder = remainder - divisor
brcc d8u_3 ;if result negative
add drem8u,dv8u ; restore remainder
clc ; clear carry to be shifted into result
rjmp d8u_1 ;else
d8u_3: sec ; set carry to be shifted into result
rjmp d8u_1
Only 14 program steps is not bad. Almost 100 clock cycles per call and we need to call it twice fort a single byte conversion. That's about 200 clock cycles each. I'm sure we can do better than that... |
Last edited by RetroDan on Apr 13, 2006 - 09:13 AM; edited 3 times in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 06:24 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
Previously we divided our 1 byte number by 100 then by 10 and when it comes time to expand our routine to handle 16 bit numbers we'll need to divide by 10,000, 1,000, 100 and finally by 10. At approx. 100 cycles per call that's 400 clock cycles per number.
Maybe we should look at the problem again from the "other side" the right-hand-side instead of left.
Code:
255 / 10 = 25 Remainder = 5
25 / 10 = 2 Remainder = 5
2 / 10 = 0 Remainder = 2
Well we've managed to extract the 255 again, but now we need Three calls to the division routine and for a 16 byte number we'll need Five. The only thing we have going for us using this method is that if we can find a "special" division routine for Division-by-Ten that is smaller and/or faster than the Atmel routine, we might have something we can work with... |
Last edited by RetroDan on Mar 31, 2006 - 06:50 AM; edited 2 times in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 06:46 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
I remember someone posting a file of Division-by-a-Constant Routines, so why don't we see if we can get luck there.
Here is what I found, exactly what we're looking for, a specialized routine for Division-by-Ten:
Code:
;***************************************************************************
;* Function "Div16_10"
;* Divides an unsigned 16 bit word (XH:XL) by 10
;* Returns quotient in YH:YL and remainder in XL
;*
;* Author: Andreas Lenze (andreas.lenze@t-online.de)
;* Equations by D: W. Jones:
;*
;* Reciprocal mul w. extra precision:
;* unsigned int A;
;* unsigned int scaled_reciprocal = 0xCCCD;
;* unsigned int Q; /* the quotient */
;*
;* Q = ((A * 0xCCCD) >> 19)
;*
;* Uses: high regs: 7 (r17, r18, r19, X, Y)
;* low regs: 3 (r0, r1, r2)
;*
;* words: 36 (w. push/pop = 8 words)
;* cycles: 46 (w. push/pop = 16 cycles)
;*
;* Note: Hardware multiplier required ("mul" instruction)
;*
;***************************************************************************
Div16_10:
push r2
push r19
push r18
push r17
ldi YH,0xCC ; scaled reciprocal for /10
ldi YL,0xCD
; Q = A * 0xCCCD
; (r19:r18:r17[:rXX] = XH:XL * YH:YL)
clr r2
mul XH, YH ; ah * bh
movw r19:r18, r1:r0
mul XL, YL ; al * bl
mov r17,r1 ; r0 to [rXX] is superfluous
mul XH, YL ; ah * bl
add r17, r0
adc r18, r1
adc r19, r2
mul YH, XL ; bh * al
add r17, r0
adc r18, r1
adc r19, r2
; Q = Q >> 16: use r19:r18 as word
; Q = Q >> 3
lsr r19 ; do the last 3 shifts
ror r18
lsr r19
ror r18
lsr r19
ror r18
; r19:r18 now "Q" (= result >> 19)
; R = A - 10*Q;
ldi r17,10 ; multiply r19:r18 by 10
mul r18, r17 ; al * bl
sub XL,r0
clr XH
movw YL,r18 ; make copy of Q
; XL holds "R"
; YH:YL holds "Q"
pop r17
pop r18
pop r19
pop r2
ret
Wow, I looks huge! Let's see 46 program steps, that's a little on the heavy side, and 10 registers used ouch!. However, it is a 16 bit Divide-by-Ten Routine and it excecutes in about 1/2 the time that the Atmel 8 bit routine did, so we're moving in the right direction. Two or Three calls to this would be 100 to 150 cycles versus the 200 to 300 using the Atmel Routine.
If we can't come up with something on our own, perhaps we could strip this down to an 8 bit routine, or perhaps just bite the bullet because sooner-or-later we're gonna' need to convert 16 bit values anyway... |
Last edited by RetroDan on Apr 13, 2006 - 09:18 AM; edited 2 times in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 07:04 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
A while back, while literally sitting-on-the-throne, I happened to realized that 255 is evenly divisable by 5 and that it is only one away from 256 which is a Power-of-Two. After a while, it becomes a habit to always be on-the-lookout for Powers-of-Two. Either that or I'm slowly drifting into the world of padded-rooms and nice doctors in white smocks. Of all the things I've lost, I miss my mind the most.
To divide by Ten we might be able to first Divide-by-Five then Right-Shift to Divide-by-Two and the result would be a Division-by-Ten. I stored that fact away and wondered if the 255-256 link could be put to use sometime.
Pehaps today is a good day to drag it out of the bottom drawer and dust it off...
Let's see: 255 divided by 5 is 51 and 255 is awlful close to 256 and furthermore division by 256 is super-super easy and requires ZERO program steps. The "Dirty Trick" to that is to simply ignore the least significant Byte and we've essentially Divided our number by 256.
Since 255 is almost equal to 256, then 255/256 is almost equal to 1
Since multiplication by one does not change a value
Then 1/5 times 255/256 shoud be pretty close to 1/5
Since division by 256 can be accomplished with the Dirty Math Trick of ignoring the lowest byte, we'll leave that to last and our equation becomes:
1/5 time 255 and later we'll truncate the lower byte.
Code:
1/5 x 255 = 255 / 5 = 51
So mathematically speaking if we multiply our number by 51 and later
ignore the lower byte, we should get a pretty good approximation of
Division-by-Five. That's where our first routine will start... |
|
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 07:59 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
Let's start by writing a routine that muliplies by 51 and we'll look at the high byte and see if our mathematical assumptions were correct.
Here is the code:
Code:
DIV5: LDI B,51
MUL A,B ;ANSWER LEFT IN R1
RET
Great it's only 3 lines long!!! Let's test it's accuracy.
We're interested in a single byte so values can go from zero to 255
so lets check a few numbers in that range and see how it performs:
Code:
10 => 1 (off by -1)
50 => 9 (off by -1)
100 => 19 (off by -1)
150 => 29 (off by -1)
200 => 39 (off by -1)
250 => 49 (off by -1)
Hmm, we're alway off by minus one, so if we add one, we should be right on the money!
So the new Division-by-Five Routine becomes:
Code:
DIV5: LDI B,51
MUL A,B
INC R1 ; ANSWER IN R1
RET
So we're half-way-home, all we need to do now is divide our answer by two and we should have a "Quick and Dirty" Divide-by-Ten Routine... |
|
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 08:00 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
There's two ways we can use our Division-by-Five Routine for a Division-by-Ten Routine.
We can modify the DIV5 routine. Earlier I stated that a "Quick and Dirty" way to Divide-by-Two is to just Shift your bits to the right. This is how we modify our DIV5 Routine to get a DIV10:
Code:
DIV10: LDI B,51
MUL A,B
INC R1 ;R1=A/5
LSR A ;R1=(A/5)/2 = A/10
RET
The other way we could have accomplished the same thing would be to have the DIV10 routine call the DIV5 routine and that way we "kill-two-birds-with-one-stone" because we get two usable routines from it:
Code:
DIV10: RCALL DIV5 ;R1=A / 5
LSR R1 ;R1=(A/5) / 2 = A/10
RET
DIV5: LDI B,51
MUL A,B
INC R1 ;ANSWER IN R1
RET
Just to be on-the-safe-side let's double check the accuracy of these routines:
Code:
10 => 1 Correct!
50 => 5 Correct!
100 => 10 Correct!
150 => 15 Correct!
200 => 20 Correct!
250 => 25 Correct!
So we did it. We have our first "Dirty Math" Routine that is 100% accurate over the range we need.
Stay tuned however, there's still more to this story... |
Last edited by RetroDan on Mar 31, 2006 - 08:26 AM; edited 2 times in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 08:18 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
PRE-SUMMARY:
Not bad for a days work eh? We took a rather large routine that we needed and reduced it to a "Quick and Dirty" little routine using a few "Dirty Math Tricks".
Let's compare and see how well we did.
First let's compare to the Atmel 8x8 Unsigned Division Routine:
Code:
DIV10a DIV10b ATMEL
REGISTERS: 2 2 4
DIFFERENCE: -2 -2 -
PERCENT: -50% -50% -
PROGRAM WORDS: 5 7 15(incl RET)
DIFFERENCE: -10 -8 -
PERCENT: -66% -53% -
SPEED/CYCLES: 5 8 97
DIFFERENCE: -92 -89
PERCENT: -95% -92%
So we've used 1/2 the register space, about 60% less program space and we're running at about 20 times the speed that the Atmel Routine does.
Let compare to the Lenze/Jones Routine:
Code:
DIV10a DIV10b LENZE
REGISTERS: 2 2 10
DIFFERENCE: -8 -8 -
PERCENT: -80% -80% -
PROGRAM WORDS: 5 7 36
DIFFERENCE: -31 -29 -
PERCENT: -86% -80% -
SPEED/CYCLES: 5 8 46
DIFFERENCE: -41 -38 -
PERCENT: -89% -83% -
Against the Lenze/Jones routine we're using 80% less Register Space, about 85% less Program Space and w'e're running at between 6 to 10 times the speed.
Many might think that this is a great accomplishment and that I should stop at this point. Especially those that probably would have just used the larger routines and not had a second thought about it.
Well, obviously some of you don't know me very well.
I think I can pull a few more "Dirty Tricks" from out of my hat and turn this Dirty Littlle Routine into something totally scandalous. I'm not happy until the routine is so small that it pulls numbers from thin air, like magick... |
Last edited by RetroDan on Apr 13, 2006 - 09:24 AM; edited 1 time in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 09:07 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)
Let's re-examine what it is we're doing mathematically with our Dirty Little DIV10 Routine:
We're multiplying by 51, adding one, then dividing by two, and then truncating to get a final division of 256:
Code:
A / 5 ~= [ [ ( A x 51 ) + 1 ] / 2 ] / 256
Let's ignore that final division by 256 because it's accomplished with ZERO program steps by ignoring the lower byte and there's no way we can improve on that unless there are opcodes that can move us backwards in time.
So we want to look closely at the right side of this equation:
Code:
256 x [A / 5] ~= ( 51A + 1) / 2
256 x [A / 5] ~= 51A/2 + 1/2
Well 1/2 doesn't really mean much to a microprocessor, you either have a bit that is on or its off. We'll ignore it for now and if need be, we'll increment something in the end like we did with our DIV5 routine.
So now we have:
Code:
256 x [A / 5] ~= 51A/2
256 x [A / 5] ~= 25.5 x A
256 x [A / 5] ~= 25A
Since we dropped a 1/2 earlier and now we're stuck with another 0.5 perhaps we should round the 25.5 upto 26 rather than truncate it down to 25.
Code:
256 x [A / 5] ~= 25.5 x A
256 x [A / 5] ~= 26A
So to divide by 10 all we need to do is multiply by 26 and ignore lower byte:
Code:
MUL10: LDI B,26
MUL A,B ;ANSWER IN R1
It sounds incredable, but let's check the results and see what our error ratio is:
Code:
10 => 1 Correct!
50 => 5 Correct!
100 => 10 Correct!
150 => 15 Correct!
200 => 20 Correct!
250 => 25 Correct!
So it's 100% accurate over the range we need and it's only two program lines.
We started with rather large routines requiring 50 to 100 clock cycles per call and have something that runs in just three.
Can we reduce it even more?
Not really, not until Atmel comes out with a MULI command that allows us to multiply a register with an immedaite value rather than loading it into another register first.
Maybe next year...
It would probably look like this:
Code:
DIV10: MULI A,26 ;ANSWER IN R1
|
Last edited by RetroDan on Mar 31, 2006 - 10:14 AM; edited 1 time in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 10:08 AM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Conclusion)
IN CONCLUSION:
By systematically applying a few Dirty Tricks and following our instincts combined with a little high-school math, we took some rather large routines and reduced them down into just a couple program steps. Something that many would say is impossible., but never say never my friend. If I can do it, so can you!
REQUEST FOR FEED BACK:
I hope you found this posting entertaining as well as educational.
If you would like to see more like it, please be sure to let the moderators know.
Thank you for your time and consideration. |
Last edited by RetroDan on Apr 13, 2006 - 09:27 AM; edited 1 time in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 03:31 PM |
|

Joined: Mar 28, 2001
Posts: 416
Location: Munich, Germany
|
|
|
Code:
MUL10: LDI B,26
MUL A,B ;ANSWER IN R1
Quote:
So it's 100% accurate over the range we need and it's only two program lines.
This is not true. For a 'div by 10' this is an approximation which will not yield correct results for all integers between 1 and 255 decimal (try for example 169, 159 ... for 'A' to see it fail). The 'scaled reciprocal' for a correct 'division by 10' of an 8 bit integer would have to be at least 9 bits to reach the required precision.
Please check out the mathematical basics for this technique at http://www.cs.uiowa.edu/~jones/bcd/divide.html
The claims in your comparison
Code:
DIV10a DIV10b LENZE
REGISTERS: 2 2 10
DIFFERENCE: -8 -8 -
PERCENT: -80% -80% -
PROGRAM WORDS: 5 7 36
DIFFERENCE: -31 -29 -
PERCENT: -86% -80% -
SPEED/CYCLES: 5 8 46
DIFFERENCE: -41 -38 -
PERCENT: -89% -83% -
are also incorrect: the '46 cycles' for 'Lenze' is including register saves (push/pop), the naked division routine w/o reg saves and getting the remainder is:
Code:
ldi YH,0xCC ; scaled reciprocal for /10
ldi YL,0xCD
; Q = A * 0xCCCD
; (r19:r18:r17[:rXX] = XH:XL * YH:YL)
clr r2
mul XH, YH ; ah * bh
movw r19:r18, r1:r0
mul XL, YL ; al * bl
mov r17,r1 ; r0 to [rXX] is superfluous
mul XH, YL ; ah * bl
add r17, r0
adc r18, r1
adc r19, r2
mul YH, XL ; bh * al
add r17, r0
adc r18, r1
adc r19, r2
; Q = Q >> 16: use r19:r18 as word
; Q = Q >> 3
lsr r19 ; do the last 3 shifts
ror r18
lsr r19
ror r18
lsr r19
ror r18
; r19:r18 now "Q" (= result >> 19)
That's 25 cycles for a 'divide 16 bit number by 10' (result BTW is always correct over the full range 1 - 65535). |
_________________ Andreas
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 04:07 PM |
|

Joined: Dec 08, 2004
Posts: 4719
Location: Nova Scotia, Canada
|
|
Well now, Andreas... The argument could be made that RetroDan's solution is *more correct* than the traditional integer approach.
169/10 = 16 if we follow traditional definitions.
Of course, the "true" result is 16.9. But we cannot express 16.9 in an integer. So we round. The traditional approach for computer integer arithmetic is to always round down.
Mathemeticians would tell us that we should round anything from 0.0 up to 0.49999999... as 0, and anything from 0.500000000...01 up to 0.9999999999999999... as 1. There is a gray area for 0.5 exactly. I was taught a convention that in that case, you round to the even-numbered integer.
From that prespective, RetroDan's solution comes much closer to the "mathematically" correct solution. 169/10 = 17.
As for the amount of pushing and popping required... well, that's relative to the register usage conventions at hand. In the worst case (with all destoryed registers saved and restored), RetroDan's solution is still smaller and faster than the best-case (with no destroyed registers saved and restored) "Lenze" solution. |
Last edited by lfmorrison on Mar 31, 2006 - 04:18 PM; edited 1 time in total
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 04:17 PM |
|


Joined: Jul 18, 2005
Posts: 62954
Location: (using avr-gcc in) Finchingfield, Essex, England
|
|
Yup, actually I thought the *26 then ditch 8 bit thing was really clever. Obviously it works because 26/256 is 0.1015625 which for small numbers is OK as a way of multiplying by roughly 0.1. If you tried a larger value like 800 then I guess it'd be 800 * 26 = 20,800 and divided by 256 this is then 81 - so the extra 0.0015625 has come into play to create the error. But on small numbers it is a clever technique for an "integer divide"
In fact just checked and it's good for all values up to 630 in fact, the first one where it goes wrong is 631
Cliff |
|
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 06:23 PM |
|

Joined: Mar 28, 2001
Posts: 416
Location: Munich, Germany
|
|
|
Quote:
From that prespective, RetroDan's solution comes much closer to the "mathematically" correct solution. 169/10 = 17.
Ah well, Luke - for '169', yes. But even if we assume an 'implicit rounding' scheme, this is not handled correctly, for ex. in the range 0-9 we get '0' as result up to dividend 9, ditto e.g. in the 50-59 range ('5' up to 59).
Mathematically correct - as to the mathematicans - would be the 'increment result' from '5' or '55' upwards. OP claims '100% correct' for the algorithm which it isn't.
It would be correct if it either were returning the remainder together with the result or providing the mathematically correct rounding you described.
Quote:
In the worst case (with all destroyed registers saved and restored), RetroDan's solution is still smaller and faster than the best-case (with no destroyed registers saved and restored) "Lenze" solution.
True, except for the fact that the OP is manipulatively comparing a 'sort of 8 bit integers division' (nice as long as you don't need correct results) with a 16 bit integers division (add another 6 cycles to get the remainder - blows it up to 31 ... geeeze ...) which, thanks to Mr Jones, does return correct results.
BTW: As this is the 'tutorial' forum (contents aimed at beginners) I honestly think that it would be more helpful if an approximation technique were explecitely labeled as such instead of trying to sell it off as real binary mathematics. |
_________________ Andreas
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 06:35 PM |
|


Joined: Feb 24, 2006
Posts: 794
Location: http://avr.x.am
|
|
|
alenze wrote:
BTW: As this is the 'tutorial' forum (contents aimed at beginners) I honestly think that it would be more helpful if an approximation technique were explecitely labeled as such instead of trying to sell it off as real binary mathematics.
I apologoize Alenze, as the comparison was not meant to in any way diminsh the excellent routines you have in your Div_xx Package. It is full of "tried and true" routine, and are backed by the work of Jones. If anyone is looking for "correct" routine I recommend them highly
However, I don't think you read the CAVEAT LECTOR warning in my first post. These postings are to discuss "Dirty Math Tricks" and hopefully have some fun at the same time. If anyone is uneasy about what I'm doing, I suggest they stick to the proven routines like the ones you and Atmel provide and forget my threads even exist. |
|
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 07:16 PM |
|

Joined: Dec 08, 2004
Posts: 4719
Location: Nova Scotia, Canada
|
|
"Closer to mathematically correct". I wasn't claiming that it was a miracle cure-all solution. In fact, it conveniently works out that way by pure coincidence of the range of numbers involved..
Anyway... The "traditional" approach to integer division (ignoring remainders for the moment), and Retro's "radical approach" to integer division are both wrong when compared to result using real numbers.
When compared to the "real number" result, the traditional results will always be less than or equal to the true result, with a maximum absolute error error of -1 LSB, and this maximum error is repeated consistently with a periodicity of 10 for all possible inputs.
RetroDan's results may differ from the "real number" result as well. The result may be greater than or less than the "real number" result. But it, too, will have a maximum absolute error of +/-1 LSB for the range of inputs we're interested in.
Note that the original intent of this thread was only to do 8-bit division by 10. And we actually only approach an absolute error of -1 LSB at the lower extreme with inputs of around 0, and +1 LSB at the upper extreme with inputs of around 630. In the middle of the range, the aboslute error will be limited to a band of +/- 0.5 LSB
In light of these results, I don't think it's fair to label his as "more wrong" than the traditional results.
Of course, if you want to have access to the remainders so that your application can fine-tune its interpretation of the result, then that's a strong argument in favour of some other approach. |
|
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 07:38 PM |
|

Joined: Dec 08, 2004
Posts: 4719
Location: Nova Scotia, Canada
|
|
Finding remainders:
- Multiply the result by 10. Label as 'x'.
- Subtract 'x' from the original input.
- This result is equal the the "Remainder". Using RetroDan's code, the remainder can either "positive" or "negative", but it will always have a magnitude of 10 or less. If negative, then feel free to subtract 1 from the quotient and add 10 to the remainder. And presto! You have a result that is totally identical to the "Lenze" result, for all possible 8-bit inputs.
Code:
; RetroDan's original code
DIV10: LDI B,51
MUL A,B
INC R1 ;R1=A/5
LSR R1 ;R1=(A/5)/2 = A/10
; Luke's insertion
MOV C, R1
LDI B, 10
MUL C, B
SUB A, R0
; 'A' contains the (signed) remainder.
; sign of the remainder indicates whether it is an:
; - under-estimate (positive remainder), or
; - over-esitmate (negative remainder)
; 'C' contains the quotient
; Now a 100% _mathematically_ correct result for all 8-bit inputs.
; (Destroys the original input operand 'A'.
; one could return the remainder in register
; 'B' at the cost of one extra MOV instruction.)
RET
|
|
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 09:47 PM |
|

Joined: Nov 17, 2004
Posts: 13961
Location: Vancouver, BC
|
|
Regarding the division by 5 routine you said:
Quote:
Hmm, we're alway off by minus one, so if we add one, we should be right on the money!
But the only numbers that you checked were ones that were actually divisible by five. In fact, these are the only numbers where this routine returns the correct answer. If someone wants to use this, then some correction would definitely be necessary. What I would suggest is to not add the one, then find the remainder as lfmorrison points out. In this case the remainder will always be in the range of [0, 5]. If the remainder is 5, add one to the quotient and make the remainder 0. Otherwise, the quotient and the remainder are correct.
Applying this to the divide by 10 (the '* 51' version, not the '* 26' version), again if you drop the 'add one', the remainder is always in the range of [0, 10]. So again the only adjustment is for a remainder of 10.
Keep in mind that these remainder ranges are only valid if the original number is 8 bit. I quickly goes awry for larger numbers. Even 261 produces an out of range remainder. You could subtract 10 (or 5 for the div by 5) from the remainder instead of just making it 0 to extend the range. To be more specific, it will be correct as long as the answer fit in 8 bits. |
_________________ Regards,
Steve A.
The Board helps those that help themselves.
|
| |
|
|
|
|
|
Posted: Mar 31, 2006 - 11:52 PM |
|

Joined: Mar 28, 2001
Posts: 416
Location: Munich, Germany
|
|
One way to do this (sorry, lfmorrison's signed remainder isn't always mathematically correct - "we should round anything from 0.0 up to 0.49999999... as 0, and anything from 0.500000000...01 up to 0.9999999999999999... as 1":
Code:
; Division by 10, useing reciprocal multiplication
;
; Call with:
; 8 bit dividend in r16
;
; Returns:
; Result in r16
; Remainder in r18
mov r18, r16 ; save original dividend
ldi r17,26 ; reciprocal, scaled *256, off a bit
mul r16,r17
; result of multiplication now in r1:r0
; use only r1, thereby effectively divide by 256
dec r1 ; if imprecise scaling value influences result,
; result will be '+1'. Decrement to avoid neg.
; value in later subtraction
ldi r17,10 ; re-use r17 for divisor
mov r16,r1 ; save result/256
mul r16,r17 ; find remainder by multiplication of result by 10d
; result again in r1:r0
; get remainder and correct result of 'by 10'
sub r18,r0
cpi r18,10
brlo done
subi r18,10
inc r16
done:
; result in r16, remainder in r18
;optional rounding:
cpi r18,5
brlo done_r
inc r16
done_r:
(Quick hack, up to two or three cycles could be shaved off) |
_________________ Andreas
|
| |
|
|
|
|
|
|
|
|