Author Message
 RetroDan
 Posted: Mar 31, 2006 - 04:30 AM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am
 TITLE: DIRTY DAN's DIRTY MATH TRICKS PURPOSE: On Going Tutorial on Math Tricks in AVR Native Machine Language for Beginners. PREAMBLE: While trying to help someone in another FORUM with a Division-by-60 routine to converting seconds to minutes, someone made the remark: Quote: What RetroDan has is a quick and dirty approximation! I think it was meant as an insult, but it gave me a great name for the type of binary routines I love: Dirty Math! Quick & Dirty Routines that are small and super fast. CAVEAT LECTOR: If you don't like taking chances and experimenting and don't really care about how big your programs are, or how fast they excecute then this is not a place for you. May I recommend the Atmel Appnotes #200, 201 and 204 that are chock full great standard math routines for the AVR microcontrollers. Last edited by RetroDan on Mar 31, 2006 - 10:40 AM; edited 4 times in total

 RetroDan
 Posted: Mar 31, 2006 - 04:38 AM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am

 RetroDan
 Posted: Mar 31, 2006 - 05:50 AM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am
 TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN PREAMBLE: I've been working on my own driver routines for the LCD Display on the Butterfly (same as STK502) that are super small and fast. When it came time to write a little routine that displays a number in decimal, rather than Hex, I ran into my old nemesis DIVISION! To be more precises, Division-by-Ten. Since we humans are so fond of Division-by-Ten, I thought it would make an excellent starting point for our examination of my Dirty Math Tricks. For this routine, all we want is to take a single byte value and display it on the LCD Display in Decimal Number Form. So we need to take a number with a maximum value of \$FF and break it into three: 2 / 5 / 5. The way most of us humans would do this is to first divide by it by 100 and then by 10 and then keep the remainder; Code: 255 / 100 = 2     (remainder of 55) 55  / 10  = 5 remainder = 5 Oh no, that means we need to have two division routines! One for division by 100 and another by ten. Mybe we should just use a general division routine and change the divisor from 100 to 10 rather than have two routines. So I go and check the Atmel Appnote 200 looking for a general 8x8 unsigned division routine and here is what I found: Code: ;*************************************************************************** ;* "div8u" - 8/8 Bit Unsigned Division ;* ;* This subroutine divides the two register variables "dd8u" (dividend) and ;* "dv8u" (divisor). The result is placed in "dres8u" and the remainder in ;* "drem8u". ;*  ;* Number of words   :14 ;* Number of cycles   :97 ;* Low registers used   :1 (drem8u) ;* High registers used  :3 (dres8u/dd8u,dv8u,dcnt8u) ;*************************************************************************** .def   drem8u   =r15      ;remainder .def   dres8u   =r16      ;result .def   dd8u   =r16      ;dividend .def   dv8u   =r17      ;divisor .def   dcnt8u   =r18      ;loop counter div8u:   sub   drem8u,drem8u   ;clear remainder and carry    ldi   dcnt8u,9   ;init loop counter d8u_1:   rol   dd8u      ;shift left dividend    dec   dcnt8u      ;decrement counter    brne   d8u_2      ;if done    ret         ;    return d8u_2:   rol   drem8u      ;shift dividend into remainder    sub   drem8u,dv8u   ;remainder = remainder - divisor    brcc   d8u_3      ;if result negative    add   drem8u,dv8u   ;    restore remainder    clc         ;    clear carry to be shifted into result    rjmp   d8u_1      ;else d8u_3:   sec         ;    set carry to be shifted into result    rjmp   d8u_1 Only 14 program steps is not bad. Almost 100 clock cycles per call and we need to call it twice fort a single byte conversion. That's about 200 clock cycles each. I'm sure we can do better than that... Last edited by RetroDan on Apr 13, 2006 - 09:13 AM; edited 3 times in total

 RetroDan
 Posted: Mar 31, 2006 - 06:24 AM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am
 TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued) Previously we divided our 1 byte number by 100 then by 10 and when it comes time to expand our routine to handle 16 bit numbers we'll need to divide by 10,000, 1,000, 100 and finally by 10. At approx. 100 cycles per call that's 400 clock cycles per number. Maybe we should look at the problem again from the "other side" the right-hand-side instead of left. Code: 255 / 10 = 25   Remainder = 5 25  / 10 = 2    Remainder = 5 2   / 10 = 0    Remainder = 2 Well we've managed to extract the 255 again, but now we need Three calls to the division routine and for a 16 byte number we'll need Five. The only thing we have going for us using this method is that if we can find a "special" division routine for Division-by-Ten that is smaller and/or faster than the Atmel routine, we might have something we can work with... Last edited by RetroDan on Mar 31, 2006 - 06:50 AM; edited 2 times in total

 RetroDan
 Posted: Mar 31, 2006 - 06:46 AM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am
 TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued) I remember someone posting a file of Division-by-a-Constant Routines, so why don't we see if we can get luck there. Here is what I found, exactly what we're looking for, a specialized routine for Division-by-Ten: Code: ;*************************************************************************** ;* Function "Div16_10" ;* Divides an unsigned 16 bit word (XH:XL) by 10 ;* Returns quotient in YH:YL and remainder in XL ;* ;* Author: Andreas Lenze (andreas.lenze@t-online.de) ;* Equations by D: W. Jones: ;* ;*   Reciprocal mul w. extra precision: ;*   unsigned int A; ;*   unsigned int scaled_reciprocal = 0xCCCD; ;*   unsigned int Q; /* the quotient */ ;* ;*   Q = ((A * 0xCCCD) >> 19) ;*    ;* Uses: high regs: 7 (r17, r18, r19, X, Y) ;*    low regs:  3 (r0, r1, r2) ;* ;*    words:     36 (w. push/pop = 8 words) ;*    cycles:    46 (w. push/pop = 16 cycles) ;* ;* Note: Hardware multiplier required ("mul" instruction) ;* ;*************************************************************************** Div16_10:    push   r2    push   r19    push   r18    push   r17    ldi   YH,0xCC      ; scaled reciprocal for /10    ldi   YL,0xCD    ; Q = A * 0xCCCD    ; (r19:r18:r17[:rXX] = XH:XL * YH:YL)         clr     r2         mul     XH, YH      ; ah * bh         movw    r19:r18, r1:r0         mul     XL, YL      ; al * bl    mov   r17,r1      ; r0 to [rXX] is superfluous         mul     XH, YL      ; ah * bl         add     r17, r0         adc     r18, r1         adc     r19, r2         mul     YH, XL      ; bh * al         add     r17, r0         adc     r18, r1         adc     r19, r2    ; Q = Q >> 16: use r19:r18 as word    ; Q = Q >> 3    lsr   r19      ; do the last 3 shifts    ror   r18    lsr   r19    ror   r18    lsr   r19    ror   r18    ; r19:r18 now "Q" (= result >> 19)    ; R = A - 10*Q;    ldi   r17,10      ; multiply r19:r18 by 10    mul     r18, r17   ; al * bl    sub   XL,r0    clr   XH    movw   YL,r18      ; make copy of Q    ; XL holds "R"    ; YH:YL holds "Q"    pop   r17    pop   r18    pop   r19    pop   r2    ret Wow, I looks huge! Let's see 46 program steps, that's a little on the heavy side, and 10 registers used ouch!. However, it is a 16 bit Divide-by-Ten Routine and it excecutes in about 1/2 the time that the Atmel 8 bit routine did, so we're moving in the right direction. Two or Three calls to this would be 100 to 150 cycles versus the 200 to 300 using the Atmel Routine. If we can't come up with something on our own, perhaps we could strip this down to an 8 bit routine, or perhaps just bite the bullet because sooner-or-later we're gonna' need to convert 16 bit values anyway... Last edited by RetroDan on Apr 13, 2006 - 09:18 AM; edited 2 times in total

 RetroDan
 Posted: Mar 31, 2006 - 07:04 AM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am
 TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued) A while back, while literally sitting-on-the-throne, I happened to realized that 255 is evenly divisable by 5 and that it is only one away from 256 which is a Power-of-Two. After a while, it becomes a habit to always be on-the-lookout for Powers-of-Two. Either that or I'm slowly drifting into the world of padded-rooms and nice doctors in white smocks. Of all the things I've lost, I miss my mind the most. To divide by Ten we might be able to first Divide-by-Five then Right-Shift to Divide-by-Two and the result would be a Division-by-Ten. I stored that fact away and wondered if the 255-256 link could be put to use sometime. Pehaps today is a good day to drag it out of the bottom drawer and dust it off... Let's see: 255 divided by 5 is 51 and 255 is awlful close to 256 and furthermore division by 256 is super-super easy and requires ZERO program steps. The "Dirty Trick" to that is to simply ignore the least significant Byte and we've essentially Divided our number by 256. Since 255 is almost equal to 256, then 255/256 is almost equal to 1 Since multiplication by one does not change a value Then 1/5 times 255/256 shoud be pretty close to 1/5 Since division by 256 can be accomplished with the Dirty Math Trick of ignoring the lowest byte, we'll leave that to last and our equation becomes: 1/5 time 255 and later we'll truncate the lower byte. Code: 1/5 x 255 = 255 / 5 = 51 So mathematically speaking if we multiply our number by 51 and later ignore the lower byte, we should get a pretty good approximation of Division-by-Five. That's where our first routine will start...

 RetroDan
 Posted: Mar 31, 2006 - 07:59 AM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am
 TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued) Let's start by writing a routine that muliplies by 51 and we'll look at the high byte and see if our mathematical assumptions were correct. Here is the code: Code: DIV5: LDI B,51       MUL A,B  ;ANSWER LEFT IN R1       RET Great it's only 3 lines long!!! Let's test it's accuracy. We're interested in a single byte so values can go from zero to 255 so lets check a few numbers in that range and see how it performs: Code:  10 =>  1 (off by -1)  50 =>  9 (off by -1) 100 => 19 (off by -1) 150 => 29 (off by -1) 200 => 39 (off by -1) 250 => 49 (off by -1) Hmm, we're alway off by minus one, so if we add one, we should be right on the money! So the new Division-by-Five Routine becomes: Code: DIV5: LDI B,51       MUL A,B       INC R1 ; ANSWER IN R1       RET So we're half-way-home, all we need to do now is divide our answer by two and we should have a "Quick and Dirty" Divide-by-Ten Routine...

 RetroDan
 Posted: Mar 31, 2006 - 08:00 AM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am
 TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued) There's two ways we can use our Division-by-Five Routine for a Division-by-Ten Routine. We can modify the DIV5 routine. Earlier I stated that a "Quick and Dirty" way to Divide-by-Two is to just Shift your bits to the right. This is how we modify our DIV5 Routine to get a DIV10: Code: DIV10: LDI B,51        MUL A,B        INC R1   ;R1=A/5        LSR A    ;R1=(A/5)/2 = A/10        RET The other way we could have accomplished the same thing would be to have the DIV10 routine call the DIV5 routine and that way we "kill-two-birds-with-one-stone" because we get two usable routines from it: Code: DIV10: RCALL DIV5  ;R1=A / 5        LSR R1      ;R1=(A/5) / 2 = A/10        RET DIV5:  LDI B,51        MUL A,B        INC R1      ;ANSWER IN R1        RET Just to be on-the-safe-side let's double check the accuracy of these routines: Code:  10 =>  1 Correct!  50 =>  5 Correct! 100 => 10 Correct! 150 => 15 Correct! 200 => 20 Correct! 250 => 25 Correct! So we did it. We have our first "Dirty Math" Routine that is 100% accurate over the range we need. Stay tuned however, there's still more to this story... Last edited by RetroDan on Mar 31, 2006 - 08:26 AM; edited 2 times in total

 RetroDan
 Posted: Mar 31, 2006 - 08:18 AM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am
 TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued) PRE-SUMMARY: Not bad for a days work eh? We took a rather large routine that we needed and reduced it to a "Quick and Dirty" little routine using a few "Dirty Math Tricks". Let's compare and see how well we did. First let's compare to the Atmel 8x8 Unsigned Division Routine: Code:                  DIV10a   DIV10b   ATMEL REGISTERS:         2        2        4  DIFFERENCE:      -2       -2        -   PERCENT:       -50%     -50%       - PROGRAM WORDS:    5         7       15(incl RET)  DIFFERENCE:    -10        -8        -   PERCENT:      -66%      -53%       - SPEED/CYCLES:     5         8       97  DIFFERENCE:    -92       -89   PERCENT:      -95%      -92% So we've used 1/2 the register space, about 60% less program space and we're running at about 20 times the speed that the Atmel Routine does. Let compare to the Lenze/Jones Routine: Code:                  DIV10a   DIV10b   LENZE REGISTERS:         2        2       10  DIFFERENCE:      -8       -8        -   PERCENT:       -80%     -80%       - PROGRAM WORDS:    5         7       36  DIFFERENCE:    -31        -29       -   PERCENT:      -86%       -80%      - SPEED/CYCLES:     5         8       46  DIFFERENCE:    -41       -38        -   PERCENT:      -89%      -83%       - Against the Lenze/Jones routine we're using 80% less Register Space, about 85% less Program Space and w'e're running at between 6 to 10 times the speed. Many might think that this is a great accomplishment and that I should stop at this point. Especially those that probably would have just used the larger routines and not had a second thought about it. Well, obviously some of you don't know me very well. I think I can pull a few more "Dirty Tricks" from out of my hat and turn this Dirty Littlle Routine into something totally scandalous. I'm not happy until the routine is so small that it pulls numbers from thin air, like magick... Last edited by RetroDan on Apr 13, 2006 - 09:24 AM; edited 1 time in total

 RetroDan
 Posted: Mar 31, 2006 - 09:07 AM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am
 TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued) Let's re-examine what it is we're doing mathematically with our Dirty Little DIV10 Routine: We're multiplying by 51, adding one, then dividing by two, and then truncating to get a final division of 256: Code: A / 5 ~= [ [ ( A x 51 ) + 1 ] / 2 ] / 256 Let's ignore that final division by 256 because it's accomplished with ZERO program steps by ignoring the lower byte and there's no way we can improve on that unless there are opcodes that can move us backwards in time. So we want to look closely at the right side of this equation: Code: 256 x [A / 5] ~= ( 51A + 1) / 2 256 x [A / 5] ~= 51A/2 + 1/2 Well 1/2 doesn't really mean much to a microprocessor, you either have a bit that is on or its off. We'll ignore it for now and if need be, we'll increment something in the end like we did with our DIV5 routine. So now we have: Code: 256 x [A / 5] ~= 51A/2 256 x [A / 5] ~= 25.5 x A 256 x [A / 5] ~= 25A Since we dropped a 1/2 earlier and now we're stuck with another 0.5 perhaps we should round the 25.5 upto 26 rather than truncate it down to 25. Code: 256 x [A / 5] ~= 25.5 x A 256 x [A / 5] ~= 26A So to divide by 10 all we need to do is multiply by 26 and ignore lower byte: Code: MUL10: LDI B,26        MUL A,B   ;ANSWER IN R1 It sounds incredable, but let's check the results and see what our error ratio is: Code:  10 =>  1  Correct!  50 =>  5  Correct! 100 => 10  Correct! 150 => 15  Correct! 200 => 20  Correct! 250 => 25  Correct! So it's 100% accurate over the range we need and it's only two program lines. We started with rather large routines requiring 50 to 100 clock cycles per call and have something that runs in just three. Can we reduce it even more? Not really, not until Atmel comes out with a MULI command that allows us to multiply a register with an immedaite value rather than loading it into another register first. Maybe next year... It would probably look like this: Code: DIV10: MULI A,26  ;ANSWER IN R1 Last edited by RetroDan on Mar 31, 2006 - 10:14 AM; edited 1 time in total

 RetroDan
 Posted: Mar 31, 2006 - 10:08 AM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am
 TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Conclusion) IN CONCLUSION: By systematically applying a few Dirty Tricks and following our instincts combined with a little high-school math, we took some rather large routines and reduced them down into just a couple program steps. Something that many would say is impossible., but never say never my friend. If I can do it, so can you! REQUEST FOR FEED BACK: I hope you found this posting entertaining as well as educational. If you would like to see more like it, please be sure to let the moderators know. Thank you for your time and consideration. Last edited by RetroDan on Apr 13, 2006 - 09:27 AM; edited 1 time in total

 alenze
 Posted: Mar 31, 2006 - 03:31 PM
 Joined: Mar 28, 2001 Posts: 416 Location: Munich, Germany
 Code: MUL10: LDI B,26        MUL A,B   ;ANSWER IN R1 Quote: So it's 100% accurate over the range we need and it's only two program lines. This is not true. For a 'div by 10' this is an approximation which will not yield correct results for all integers between 1 and 255 decimal (try for example 169, 159 ... for 'A' to see it fail). The 'scaled reciprocal' for a correct 'division by 10' of an 8 bit integer would have to be at least 9 bits to reach the required precision. Please check out the mathematical basics for this technique at http://www.cs.uiowa.edu/~jones/bcd/divide.html The claims in your comparison Code:                  DIV10a   DIV10b   LENZE REGISTERS:         2        2       10  DIFFERENCE:      -8       -8        -   PERCENT:       -80%     -80%       - PROGRAM WORDS:    5         7       36  DIFFERENCE:    -31        -29       -   PERCENT:      -86%       -80%      - SPEED/CYCLES:     5         8       46  DIFFERENCE:    -41       -38        -   PERCENT:      -89%      -83%       - are also incorrect: the '46 cycles' for 'Lenze' is including register saves (push/pop), the naked division routine w/o reg saves and getting the remainder is: Code:    ldi   YH,0xCC      ; scaled reciprocal for /10    ldi   YL,0xCD    ; Q = A * 0xCCCD    ; (r19:r18:r17[:rXX] = XH:XL * YH:YL)         clr     r2         mul     XH, YH      ; ah * bh         movw    r19:r18, r1:r0         mul     XL, YL      ; al * bl    mov   r17,r1      ; r0 to [rXX] is superfluous         mul     XH, YL      ; ah * bl         add     r17, r0         adc     r18, r1         adc     r19, r2         mul     YH, XL      ; bh * al         add     r17, r0         adc     r18, r1         adc     r19, r2    ; Q = Q >> 16: use r19:r18 as word    ; Q = Q >> 3    lsr   r19      ; do the last 3 shifts    ror   r18    lsr   r19    ror   r18    lsr   r19    ror   r18    ; r19:r18 now "Q" (= result >> 19) That's 25 cycles for a 'divide 16 bit number by 10' (result BTW is always correct over the full range 1 - 65535). _________________Andreas

 lfmorrison
 Posted: Mar 31, 2006 - 04:07 PM
 Joined: Dec 08, 2004 Posts: 4719 Location: Nova Scotia, Canada
 Well now, Andreas... The argument could be made that RetroDan's solution is *more correct* than the traditional integer approach. 169/10 = 16 if we follow traditional definitions. Of course, the "true" result is 16.9. But we cannot express 16.9 in an integer. So we round. The traditional approach for computer integer arithmetic is to always round down. Mathemeticians would tell us that we should round anything from 0.0 up to 0.49999999... as 0, and anything from 0.500000000...01 up to 0.9999999999999999... as 1. There is a gray area for 0.5 exactly. I was taught a convention that in that case, you round to the even-numbered integer. From that prespective, RetroDan's solution comes much closer to the "mathematically" correct solution. 169/10 = 17. As for the amount of pushing and popping required... well, that's relative to the register usage conventions at hand. In the worst case (with all destoryed registers saved and restored), RetroDan's solution is still smaller and faster than the best-case (with no destroyed registers saved and restored) "Lenze" solution. Last edited by lfmorrison on Mar 31, 2006 - 04:18 PM; edited 1 time in total

 clawson
 Posted: Mar 31, 2006 - 04:17 PM
 Joined: Jul 18, 2005 Posts: 62281 Location: (using avr-gcc in) Finchingfield, Essex, England
 Yup, actually I thought the *26 then ditch 8 bit thing was really clever. Obviously it works because 26/256 is 0.1015625 which for small numbers is OK as a way of multiplying by roughly 0.1. If you tried a larger value like 800 then I guess it'd be 800 * 26 = 20,800 and divided by 256 this is then 81 - so the extra 0.0015625 has come into play to create the error. But on small numbers it is a clever technique for an "integer divide" In fact just checked and it's good for all values up to 630 in fact, the first one where it goes wrong is 631 Cliff

 alenze
 Posted: Mar 31, 2006 - 06:23 PM
 Joined: Mar 28, 2001 Posts: 416 Location: Munich, Germany
 Quote: From that prespective, RetroDan's solution comes much closer to the "mathematically" correct solution. 169/10 = 17. Ah well, Luke - for '169', yes. But even if we assume an 'implicit rounding' scheme, this is not handled correctly, for ex. in the range 0-9 we get '0' as result up to dividend 9, ditto e.g. in the 50-59 range ('5' up to 59). Mathematically correct - as to the mathematicans - would be the 'increment result' from '5' or '55' upwards. OP claims '100% correct' for the algorithm which it isn't. It would be correct if it either were returning the remainder together with the result or providing the mathematically correct rounding you described. Quote: In the worst case (with all destroyed registers saved and restored), RetroDan's solution is still smaller and faster than the best-case (with no destroyed registers saved and restored) "Lenze" solution. True, except for the fact that the OP is manipulatively comparing a 'sort of 8 bit integers division' (nice as long as you don't need correct results) with a 16 bit integers division (add another 6 cycles to get the remainder - blows it up to 31 ... geeeze ...) which, thanks to Mr Jones, does return correct results. BTW: As this is the 'tutorial' forum (contents aimed at beginners) I honestly think that it would be more helpful if an approximation technique were explecitely labeled as such instead of trying to sell it off as real binary mathematics. _________________Andreas

 RetroDan
 Posted: Mar 31, 2006 - 06:35 PM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am
 alenze wrote: BTW: As this is the 'tutorial' forum (contents aimed at beginners) I honestly think that it would be more helpful if an approximation technique were explecitely labeled as such instead of trying to sell it off as real binary mathematics. I apologoize Alenze, as the comparison was not meant to in any way diminsh the excellent routines you have in your Div_xx Package. It is full of "tried and true" routine, and are backed by the work of Jones. If anyone is looking for "correct" routine I recommend them highly However, I don't think you read the CAVEAT LECTOR warning in my first post. These postings are to discuss "Dirty Math Tricks" and hopefully have some fun at the same time. If anyone is uneasy about what I'm doing, I suggest they stick to the proven routines like the ones you and Atmel provide and forget my threads even exist.

 lfmorrison
 Posted: Mar 31, 2006 - 07:16 PM
 Joined: Dec 08, 2004 Posts: 4719 Location: Nova Scotia, Canada
 "Closer to mathematically correct". I wasn't claiming that it was a miracle cure-all solution. In fact, it conveniently works out that way by pure coincidence of the range of numbers involved.. Anyway... The "traditional" approach to integer division (ignoring remainders for the moment), and Retro's "radical approach" to integer division are both wrong when compared to result using real numbers. When compared to the "real number" result, the traditional results will always be less than or equal to the true result, with a maximum absolute error error of -1 LSB, and this maximum error is repeated consistently with a periodicity of 10 for all possible inputs. RetroDan's results may differ from the "real number" result as well. The result may be greater than or less than the "real number" result. But it, too, will have a maximum absolute error of +/-1 LSB for the range of inputs we're interested in. Note that the original intent of this thread was only to do 8-bit division by 10. And we actually only approach an absolute error of -1 LSB at the lower extreme with inputs of around 0, and +1 LSB at the upper extreme with inputs of around 630. In the middle of the range, the aboslute error will be limited to a band of +/- 0.5 LSB In light of these results, I don't think it's fair to label his as "more wrong" than the traditional results. Of course, if you want to have access to the remainders so that your application can fine-tune its interpretation of the result, then that's a strong argument in favour of some other approach.

 lfmorrison
 Posted: Mar 31, 2006 - 07:38 PM
 Joined: Dec 08, 2004 Posts: 4719 Location: Nova Scotia, Canada
 Finding remainders: - Multiply the result by 10. Label as 'x'. - Subtract 'x' from the original input. - This result is equal the the "Remainder". Using RetroDan's code, the remainder can either "positive" or "negative", but it will always have a magnitude of 10 or less. If negative, then feel free to subtract 1 from the quotient and add 10 to the remainder. And presto! You have a result that is totally identical to the "Lenze" result, for all possible 8-bit inputs. Code: ; RetroDan's original code DIV10: LDI B,51        MUL A,B        INC R1   ;R1=A/5        LSR R1   ;R1=(A/5)/2 = A/10 ; Luke's insertion        MOV C, R1        LDI B, 10        MUL C, B        SUB A, R0        ; 'A' contains the (signed) remainder.        ; sign of the remainder indicates whether it is an:        ; - under-estimate (positive remainder), or        ; - over-esitmate (negative remainder)        ; 'C' contains the quotient        ; Now a 100% _mathematically_ correct result for all 8-bit inputs.        ; (Destroys the original input operand 'A'.        ;  one could return the remainder in register        ;  'B' at the cost of one extra MOV instruction.)        RET

 Koshchi
 Posted: Mar 31, 2006 - 09:47 PM
 Joined: Nov 17, 2004 Posts: 13834 Location: Vancouver, BC
 Regarding the division by 5 routine you said: Quote: Hmm, we're alway off by minus one, so if we add one, we should be right on the money! But the only numbers that you checked were ones that were actually divisible by five. In fact, these are the only numbers where this routine returns the correct answer. If someone wants to use this, then some correction would definitely be necessary. What I would suggest is to not add the one, then find the remainder as lfmorrison points out. In this case the remainder will always be in the range of [0, 5]. If the remainder is 5, add one to the quotient and make the remainder 0. Otherwise, the quotient and the remainder are correct. Applying this to the divide by 10 (the '* 51' version, not the '* 26' version), again if you drop the 'add one', the remainder is always in the range of [0, 10]. So again the only adjustment is for a remainder of 10. Keep in mind that these remainder ranges are only valid if the original number is 8 bit. I quickly goes awry for larger numbers. Even 261 produces an out of range remainder. You could subtract 10 (or 5 for the div by 5) from the remainder instead of just making it 0 to extend the range. To be more specific, it will be correct as long as the answer fit in 8 bits. _________________Regards, Steve A. The Board helps those that help themselves.

 alenze
 Posted: Mar 31, 2006 - 11:52 PM
 Joined: Mar 28, 2001 Posts: 416 Location: Munich, Germany
 One way to do this (sorry, lfmorrison's signed remainder isn't always mathematically correct - "we should round anything from 0.0 up to 0.49999999... as 0, and anything from 0.500000000...01 up to 0.9999999999999999... as 1": Code: ; Division by 10, useing reciprocal multiplication ; ; Call with: ;   8 bit dividend in r16 ; ; Returns: ;   Result in r16 ;   Remainder in r18    mov   r18, r16   ; save original dividend    ldi   r17,26      ; reciprocal, scaled *256, off a bit    mul   r16,r17 ; result of multiplication now in r1:r0 ; use only r1, thereby effectively divide by 256    dec   r1      ; if imprecise scaling value influences result,             ; result will be '+1'. Decrement to avoid neg.             ; value in later subtraction    ldi   r17,10      ; re-use r17 for divisor    mov   r16,r1      ; save result/256    mul   r16,r17      ; find remainder by multiplication of result by 10d ; result again in r1:r0 ; get remainder and correct result of 'by 10'    sub   r18,r0    cpi   r18,10         brlo   done    subi   r18,10    inc   r16 done: ; result in r16, remainder in r18 ;optional rounding:    cpi   r18,5    brlo   done_r    inc   r16 done_r: (Quick hack, up to two or three cycles could be shaved off) _________________Andreas

 Broxbourne
 Posted: Apr 01, 2006 - 03:34 AM
 Joined: Nov 16, 2005 Posts: 341 Location: Calgary, Alberta, Canada
 I'm going to start with an apology here, because I might be thinking differently, and appear to be stiffling a very simulating and valuable discussion. However as a tutorial, I feel this thread is misplaced. It should be in the main forum. My thoughts of a tutorial are as an introduction, a quickstart guide, a pointer to get going and excited. It is a place for Newbies to look and gain confidence, and make things. Get involved and learn. Good examples to simply use (and with luck considered later), just standard routines. If my manager came up and said, "Eric, we stock a 4k device, the next part up is \$5 and that will take us under", then I would be fantastically interested in this thread. But If I had just found this site, looking for information, I might be scared away by the academic discourse of the moment. I'm looking for gentle introductions to do this, do that. If the device cannot divide by 10, and I cannot comprehend the details that experienced people are debating, then I'm off. Another Newbie scared away. Engineering is about compromises."Gerald!, if you can do it in a day just get on with it.". Versus, "A new part in our inventory, can't you make it smaller". Such pragmatic stuff is key to success, but we have to be careful about introducing it here (apart from sowing thoughts). I hope my opinion is understood in a positive sense. It is only one view among so many good people. Cheers All.

 lfmorrison
 Posted: Apr 01, 2006 - 04:15 AM
 Joined: Dec 08, 2004 Posts: 4719 Location: Nova Scotia, Canada
 Frankly, I don't buy that "negative remainder isn't correct" argument. It's a non-standard representation. So are improper fractions. But just like improper fractions, the result is numerically dead-on. AFAIC, 25 divided by 4 can be perfectly legally expressed as 6 remainder 1. And that is exactly identical to 5 remainder 5, or 7 remainder -3. The only possible result from RetroDan's expression would still be guaranteed to be within 1 LSB of the true result. And the "traditional approach" cannot offer any better than getting within 1 LSB either. But if it really bothers you, then you could always test the sign bit of the remainder... if it is set, then subtract 1 from the quotient and add 10 to the remainder. Code: DIV10: LDI B,51    MUL A,B    INC R1   ;R1=A/5    LSR R1   ;R1=(A/5)/2 = A/10 ; Luke's insertion    MOV C, R1    LDI B, 10    MUL C, B    SUB A, R0    ; 'A' contains the (signed) remainder.    ; sign of the remainder indicates whether it is an:    ; - under-estimate (positive remainder), or    ; - over-esitmate (negative remainder)    ; 'C' contains the quotient    ; We can return here with a totally valid and numerically correct result.    ; But for the sticklers among us, let's convert it    ; to a "cannonical form".    SBRS A, 7 ; test the sign of result.    RJMP PositiveRemainder ; We have a negative remainder; subtract 1 from ; the quotient and add 10 to the remainder.    DEC C    ADD A, B PositiveRemainder ; The result is now in perfect "cannonical" form.    RET And given the limited range of values we're concerned with here, that result (after a single check of the sign on the remainder, and possibly a single subtraction of 1 / addition of 10) will be in the "cannonical" form. (Ie. a guaranteed underesitmate in the result, and a remainder that is guaranteed to be positive, and bound between 0 and 9.)

 RickB
 Posted: Apr 01, 2006 - 04:49 AM
 Joined: Jan 30, 2005 Posts: 852 Location: Junction City, OR USA
 Broxbourne: I agree 100%. The give and take on this subject should be hashed out in another forum. Then and ONLY then should it become a tutorial. After all, what is a beginner to think when seeing the differing opinions. Most beginners are in no position to judge who is right or wrong on the details. I would like to see it here once that is done. Rick

 RetroDan
 Posted: Apr 01, 2006 - 05:05 AM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am
 Broxbourne wrote: I'm going to start with an apology here, because I might be thinking differently, and appear to be stiffling a very simulating and valuable discussion. However as a tutorial, I feel this thread is misplaced. It should be in the main forum. I hope my opinion is understood in a positive sense. It is only one view among so many good people. Cheers All. Thanks for the feed-back. You may not know this, but actual tutorials are labeled [TUT] by moderator to distiguish them from other threads. The moderator has labeled this thread [DIS] for discussion. Another label is [CODE] for code segments that are not tutorials. I think this information can be found in the "Sticky-the-Bear" threads at the top of the list.

 Broxbourne
 Posted: Apr 01, 2006 - 05:40 AM
 Joined: Nov 16, 2005 Posts: 341 Location: Calgary, Alberta, Canada
 Dan, Thank you for the information. I did not realize the moderator was adjusting the titles of the thread according to the intended audience. I just thought this was the place to find tutorials and good examples of code and AVR applications. Thanks for clarifying that. I'll make sure I only read [TUT] postings from now on to avoid getting confused by the more heady academic stuff, although I am still surprised to see such specialised discussion in a beginner forum. Practically Yours.

 alenze
 Posted: Apr 01, 2006 - 12:47 PM
 Joined: Mar 28, 2001 Posts: 416 Location: Munich, Germany
 Hi Luke, Quote: sorry, lfmorrison's signed remainder isn't always mathematically correct - "we should round anything from 0.0 up to 0.49999999... as 0, and anything from 0.500000000...01 up to 0.9999999999999999... as 1" definitely didn't come through quite as intended (sorry about that): Quote: Frankly, I don't buy that "negative remainder isn't correct" argument You are right and I phrased it badly: of course the signed remainder is correctly calculated (cleaner than in my example, too - 3 or four cycles saved). It's the 'inbuild rounding' whose threshold is off by one, mathematically speaking ( ): The quotient 'n.5' (6.5 for example if dviding 65/10) should already be rounded up to n+1 ('7'), here it is still left at '6'. 66/10 (6.6) is correctly rounded to '7' as 64/10 (6.4) becomes a rounded '6'. It's just the '.5' rounding which doesn't quite work (aren't those mathematicians a fascinating breed?). Sorry again - apologies to Broxbourne, too, guess we got carried away a little here ... _________________Andreas

 RetroDan
 Posted: Apr 06, 2006 - 05:07 AM
 Joined: Feb 24, 2006 Posts: 794 Location: http://avr.x.am
 Thank you all for the wonderful feed-back. The information was invaluable and probably saved me a lot of head-scratching as you'll soon find out. I was pleased-as-punch that my "quirky" little division by 10 routine worked. I was even more delighted to find out that it did a pretty good job of rounding to boot. However, when I got around to using it for my original purpose, I realized that rounding was the last thing I wanted. The original use for the routine was to break-down an unsigned single byte for display on the Butterfly LCD. Well I was getting results like: 74, 75, 76, 77, 80, 80, 80, 81, 82, 83. Obviously the little DIV10 routine was working over-time and rounding, so a few fine adjustments were needed to "UN-ROUND" the results. I know that you guys have already worked out your own versions, here is mine: Code: ;-----------------------------------; ; RETRO (SYNTHETIC) DIVISION BY 10  ; ; ANSWER IN R1, R0=REM, A:PRESERVED ; ;-----------------------------------; DIV10:  PUSH B         LDI  B,26   ;MUL BY 26         MUL  A,B    ;R1=A/10         PUSH R1     ;BRUTE-FORCE CALC OF REMAINDER              LDI  B,10   ;CALC REM         MUL  R1,B   ;R0=10xR1(QUOT)         POP  R1     ;RESTORE QUOT         SUB  R0,A   ;SUBTRACT REMx10 NODJST: NEG  R0     ;MAKE POSITIVE          BRPL NONEG ;STILL NEG?         ADD  R0,B   ;OOPS MAKE         DEC  R1     ;ADJUSTMENTS NONEG:   RET If you're wondering why I calculate the remainder "backwards" then negate it, it's to preserve the value of A. If there's no need to preserve A then the remainder could be calculated the other-way-around and eliminate the need for the NEG R0 line. However, you should be forewarned that "letting go" a NEG R0 without very good cause could lead to a charge of racism! Other than removing that one statement I don't know if it can be optimized any further, but it sure would be nice.

 brum103
 Posted: Oct 08, 2006 - 12:58 AM
 Joined: Nov 15, 2004 Posts: 11 Location: Netherlands

 JohnD
 Posted: Oct 21, 2006 - 04:26 PM
 Joined: Oct 12, 2006 Posts: 12
 I tried running your 8 bit division code: [quote];*************************************************************************** ;* "div8u" - 8/8 Bit Unsigned Division ;* ;* This subroutine divides the two register variables "dd8u" (dividend) and ;* "dv8u" (divisor). The result is placed in "dres8u" and the remainder in ;* "drem8u". ;* ;* Number of words :14 ;* Number of cycles :97 ;* Low registers used :1 (drem8u) ;* High registers used :3 (dres8u/dd8u,dv8u,dcnt8u) ;*************************************************************************** .def drem8u =r15 ;remainder .def dres8u =r16 ;result .def dd8u =r16 ;dividend .def dv8u =r17 ;divisor .def dcnt8u =r18 ;loop counter div8u: sub drem8u,drem8u ;clear remainder and carry ldi dcnt8u,9 ;init loop counter d8u_1: rol dd8u ;shift left dividend dec dcnt8u ;decrement counter brne d8u_2 ;if done ret ; return d8u_2: rol drem8u ;shift dividend into remainder sub drem8u,dv8u ;remainder = remainder - divisor brcc d8u_3 ;if result negative add drem8u,dv8u ; restore remainder clc ; clear carry to be shifted into result rjmp d8u_1 ;else d8u_3: sec ; set carry to be shifted into result rjmp d8u_1 When I build this code, it gives me an error: "Register already defined by the .def directive". The above code refers to R16 with two different labels which should be okay. Is there a reason why it gives me this error? Thanks, John

 Koshchi
 Posted: Oct 21, 2006 - 05:13 PM
 Joined: Nov 17, 2004 Posts: 13834 Location: Vancouver, BC
 I've gotten this error recently as well. I think that it used to work, but it has changed in recent versions of AVR Studio. Not sure if this is a bug or a deliberate change. _________________Regards, Steve A. The Board helps those that help themselves.

 froody
 Posted: Oct 21, 2006 - 11:07 PM
 Joined: Oct 20, 2006 Posts: 4
 First, let me say that I have no clue about AVR coding. An Arduino board is in the mail, but I haven't gotten it yet. I love digital math, though. So here's a little different take on the OP's trick. Dividing is the same thing as multiplying by the inverse, so dividing by 10 is the same as multiplying by 0.1. The problem is that there isn't a neat binary representation of 0.1. But it turns out that you can use a good enough 16-bit representation of 0.1 to get accurate results (meaning the same number as division by 10, rounded down as expected). With some simple fixed-point math you can then do your multiply, discard the mantissa (the bit after the . (or , for Europeans)) and there you go. What fixed-point format to use? It'll be obvious in a minute. Let's just assume 8 bits of mantissa for now and start finding the binary representation of 0.1d (decimal). Really what you're looking for is which powers of -2 you need to add up to get as close to 0.1 as possible. If you do that, you find: 1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 0/128 + 1/256 the binary representation for this is: 00000000.00011001 So that's the 16-bit number you should multiply the original number by to divide by 10. But you'll notice we can use extra mantissa bits, as long as the top byte is 0. (If the top byte isn't 0 we might get overflow when we multiply.) So let's use 11 mantissa bits: 1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 0/128 + 1/256 + 1/512 + 0/1024 + 1/2048 or 00000.00011001101 Here it is all nicely coded up in C. I've tested it for 0-255 and it gives the same result as C's /10 operation for every one. gcc did some awful things to the assembly, but I'll leave that part up to you guys because you're much better at it than I am. Code: uint8_t div10_fp(uint8_t number) {     // 1/10th with 11 mantissa bits     // 1/10th = 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 0/128 + 1/256 + 1/512 +     // 0/1024 + 1/2048     // 0.00011001101     uint16_t tenth = 0xcd;     // result with 11 mantissa bits     uint16_t result = number * tenth;   9a:   99 27           eor     r25, r25   9c:   2d ec           ldi     r18, 0xCD       ; 205   9e:   30 e0           ldi     r19, 0x00       ; 0   a0:   82 9f           mul     r24, r18   a2:   a0 01           movw    r20, r0   a4:   83 9f           mul     r24, r19   a6:   50 0d           add     r21, r0   a8:   92 9f           mul     r25, r18   aa:   50 0d           add     r21, r0   ac:   11 24           eor     r1, r1   ae:   ca 01           movw    r24, r20     // Drop mantissa to round down.     return result >> 11;   b0:   89 2f           mov     r24, r25   b2:   99 27           eor     r25, r25   b4:   86 95           lsr     r24   b6:   86 95           lsr     r24   b8:   86 95           lsr     r24 }   ba:   99 27           eor     r25, r25   bc:   08 95           ret You can do the same thing for dividing by any constant, and I'm actually kind of surprised gcc doesn't just do this for me. Tim

 brum103
 Posted: Oct 22, 2006 - 03:35 PM
 Joined: Nov 15, 2004 Posts: 11 Location: Netherlands
 That is a great idea froody! I had a look at your code, and optimized it a bit. Code: ; the input number must be in register "input" ; value/10 is in register "result" ; register temp1 and R1:R0 are clobbered div10:    ldi temp1, 205    mul temp1, input    lsr R1    lsr R1    lsr R1    mov result, R1    ret edit... i have written a little test program in ti basic. Code: :ClrHome :0->G :0->F :For(N,0,1200) :If (iPart(N/10)=iPart((N*205)/2048)) :Then :G+1->G :Else :F+1->F :If F<5 :Output(2+F,1,N :End :Output(1,1,"G=" :Output(1,3,G :Output(2,1,"F=" :Output(2,3,F :End It shows that it is correct up to 1028 and it fails at 1029, more than enough for 8 bit

 froody
 Posted: Oct 22, 2006 - 11:13 PM
 Joined: Oct 20, 2006 Posts: 4
 brum, wow. gcc really sucks, huh? Nice job on optimizing the code. I'm going to install gcc 4.1 soon. Hopefully it won't suck as much. Meanwhile, I clarified my original post a bit, and wrote some more, and now it's turned into a little article: How to divide an unsigned 8-bit integer by a constant. (c) 2006 by Tim Newsome Thanks to RetroDan and several other on avrfreaks.net for the inspiration. Dividing is slow, so let's avoid that. Instead, multiply by the inverse. You can write the inverse as a fixed-point binary number. Just like each digit in a normal binary number represents a power of 2, each digit after the decimal point in a binary number represents a power of -2. So: 10.11 = 1 * 2^1 + 0 * 2^0 + 1 * 2^-1 + 1 * 2^-2 = 2 + 1/2 + 1/4 = 2.75 Example: 0.5 = 1/2 = 0.1 (binary, 1-bit mantissa) Example: 0.25 = 0/2 + 1/4 = 0.01 (binary, 2-bit mantissa) Unfortunately there isn't a precise binary representation of most fractions, so you have to use an approximation. The same problem happens in decimal numbers, just not as often. For instance, 1/3 can be approximated as 0.3, which has an error of 0.3-1/3=-1/30. A better approximation is 0.33, which has an error of -1/300. The more digits you add, the smaller the error becomes. The same thing is true in binary: 1/3 =~ 1/2 = 0.1, error = 0.167 1/3 =~ 0/2 + 1/4 = 0.01, error = -0.0833 1/3 =~ 0/2 + 1/4 + 1/8 = 0.011, error = 0.0417 1/3 =~ 0/2 + 1/4 + 0/8 + 1/16 = 0.0101, error = -0.0208 Because we can't always find the precise inverse, we need to find one that's close enough, and gives us the right answer by rounding down always. (Rounding down just means ignoring the mantissa bits, which is easy to do.) That means we want to find an approximation that is at least as big as the real inverse. In other words, the error (computed as approximation minus real value) has to be positive. Furthermore, the error multiplied by 255 (our largest input value) must be smaller (less negative) than 1/divisor. For dividing by 3, the error must lie between 0 and 1/(3*255). Let's rework 1/3 keeping the error positive, and seeing how many bits we need before the error is small enough that we can use the approximation: 1/3 =~ 1/2 = 0.1, error = .167 1/3 =~ 1/2 + 0/4 = 0.10, error = .167 1/3 =~ 0/2 + 1/4 + 1/8 = 0.011, error = .0417 1/3 =~ 0/2 + 1/4 + 1/8 + 0/16 = 0.0110, error = .0417 1/3 =~ 0/2 + 1/4 + 0/8 + 1/16 + 1/32 = 0.01011, error = .0104 1/3 =~ 0/2 + 1/4 + 0/8 + 1/16 + 1/32 + 0/64 = 0.010110, error = .0104 1/3 =~ 0/2 + 1/4 + 0/8 + 1/16 + 0/32 + 1/64 + 1/128 = 0.0101011, error = .00260 1/3 =~ 0/2 + 1/4 + 0/8 + 1/16 + 0/32 + 1/64 + 1/128 + 0/256 = 0.01010110, error = .00260 1/3 =~ 0/2 + 1/4 + 0/8 + 1/16 + 0/32 + 1/64 + 0/128 + 1/256 + 1/512 = 0.010101011, error = .000651 So 9 bits of mantissa is enough, and our inverse is 0.010101011. Here is some C code that takes advantage of this: Code: uint8_t div3(uint8_t number) {     // Compute the intermediate value, with 9 mantissa bits.     uint16_t intermediate = (uint16_t) number * 0xab;     // Return the result, chopping off the mantissa bits.     return intermediate >> 9; } A good compiler should compile that down to a single multiply instruction, a shift, and maybe a few extra instructions. This is drastically faster than using the generic divide code. Let's do one more example, for division by 10. First figure out the inverse using the fewest number of mantissa bits: 1/10 =~ 1/2, error = 0.400 1/10 =~ 0/2 + 1/4, error = 0.150 1/10 =~ 0/2 + 0/4 + 1/8, error = 0.025 1/10 =~ 0/2 + 0/4 + 1/8 + 0/16, error = 0.025 1/10 =~ 0/2 + 0/4 + 1/8 + 0/16 + 0/32, error = 0.025 1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 1/64, error = .00938 1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 1/128, error = .00156 1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 1/128 + 0/256, error = .00156 1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 1/128 + 0/256 + 0/512, error = .00156 1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 0/128 + 1/256 + 1/512 + 1/1024, error = .000586 1/10 =~ 0/2 + 0/4 + 0/8 + 1/16 + 1/32 + 0/64 + 0/128 + 1/256 + 1/512 + 0/1024 + 1/2048, error = .0000977 Finally we've reached the point where 255 * error is less than 1/10. So we need 11 mantissa bits, and our inverse is 0.00011001101. Here's the C code: Code: uint8_t div10(uint8_t number) {     // Compute the intermediate value, with 11 mantissa bits.     uint16_t intermediate = (uint16_t) number * 0xcd;     // Return the result, chopping off the mantissa bits.     return intermediate >> 11; } So there you have it. Fast division by a constant of an unsigned 8-bit integer. If you followed what I've done here, you should be able to extend this to 16-bit or any other sized number. Last edited by froody on Oct 22, 2006 - 11:50 PM; edited 1 time in total

 abcminiuser
 Posted: Oct 22, 2006 - 11:29 PM
 Joined: Jan 23, 2004 Posts: 9825 Location: Trondheim, Norway
 Froody, Good article. However, in your code examples don't you need an explicit cast to the "number" parameter to force GCC to use 16-bit operations? I would expect GCC to do the multiplication as 8-bit first, then implicit cast to a 16-bit value when storing. - Dean _________________Atmel Studio 6.1 is now released, grab it here. Report AS6/ASF bugs here.

 froody
 Posted: Oct 22, 2006 - 11:49 PM
 Joined: Oct 20, 2006 Posts: 4
 abcminiuser wrote: Good article. However, in your code examples don't you need an explicit cast to the "number" parameter to force GCC to use 16-bit operations? I would expect GCC to do the multiplication as 8-bit first, then implicit cast to a 16-bit value when storing. Thanks. Good point about the typecast. I think it's OK, because the hex constant is implicitly of type int, which is 16 bits, on AVR (right?). So it all works out. But it certainly is safer to add the typecast, so I'll edit my post and stick it in there. Tim

 Koshchi
 Posted: Oct 23, 2006 - 01:54 AM
 Joined: Nov 17, 2004 Posts: 13834 Location: Vancouver, BC
 Quote: You can do the same thing for dividing by any constant, and I'm actually kind of surprised gcc doesn't just do this for me. ... gcc really sucks, huh? I don't know how you can come to this conclusion from this thread. Up until your post this entire thread has been about assembly language and therefore has absolutely nothing to do with gcc. Also, the C language has absolutely no knowledge about fixed point numbers, so how could you possibly expect it to optimize your code in this manner? _________________Regards, Steve A. The Board helps those that help themselves.

 froody
 Posted: Oct 23, 2006 - 03:51 AM
 Joined: Oct 20, 2006 Posts: 4
 Koshchi wrote: Quote: gcc really sucks, huh? I don't know how you can come to this conclusion from this thread. Up until your post this entire thread has been about assembly language and therefore has absolutely nothing to do with gcc. Also, the C language has absolutely no knowledge about fixed point numbers, so how could you possibly expect it to optimize your code in this manner? In the first post you quoted, I showed some C code among the assembly code. You can see that gcc's generated code is really inefficient. For instance it does a mul by a register known to be 0. (Note that I've since installed gcc 4.1.1 and it does a lot better, although there are still some obvious optimizations possible.) Look at this C code: Code: uint8_t function(void) {     volatile uint8_t val = 123;     return val / 10; } When I compile it, I get a call to a general divide routine. But the compiler knows I'm dividing by a constant, so it could do the exact optimization I outlined. Effectively it would rewrite the code to look like: Code: uint8_t function(void) {     volatile uint8_t val = 123;     return ((uint16_t) val * 0xcd) >> 11; } That is actually slightly larger than the original code, but a lot faster. If you really care about code size you could just use the div10() function. Then function() would be 1 instruction shorter (because you don't have to pass 10 to the divide function) but you do take the size penalty of div10() (10 instructions in my gcc 4.1.1 build). Tim

 daqq
 Posted: Jun 14, 2007 - 09:44 AM
 Joined: Dec 15, 2003 Posts: 4402 Location: Slovakia, Bratislava
 Try removing the "voltatile" keyword. You'll get a far better output. _________________There are pointy haired bald people. Time flies when you have a bad prescaler selected.

 safari
 Posted: Jun 14, 2007 - 11:27 AM
 Joined: Mar 10, 2006 Posts: 31 Location: Denmark
 daqq it's an old thread, nearly 3/4 of an year old /Safari

 futrtrubl
 Posted: Jun 14, 2007 - 11:41 AM
 Joined: Aug 29, 2006 Posts: 229 Location: Kingston, Jamaica
 Still, I'm glad he brought it back since this will help me. Edward

 dksmall
 Posted: Jun 14, 2007 - 06:17 PM
 Joined: Apr 16, 2001 Posts: 3522 Location: Phoenix, Arizona
 Nothing like a nice jolt when I saw who the OP was! Then I realized it's an old thread.

 klini
 Posted: Dec 19, 2007 - 05:27 AM
 Joined: Jun 25, 2007 Posts: 7
 Wouldn't it be nice to not deal with the rounding error? If we were to add a compensation amount before multiplying by 26, we then have no error: Code: .def temp1 = r16 .def temp2 = r17 .def A = r18 // Input: A // Output: r1 DivBy10:    // calculate the amount needed to add to A to compensate for the rounding error to come    // compensation = (A + 3) / 64    mov temp2, A    ldi temp1, 3    add temp2,temp1    // "divide by 64" shortcut    swap temp2    andi temp2, 15    lsr temp2    lsr temp2        // now use 'A - compensation'    mov temp1, A    sub temp1, temp2    // Multiply by magic number    ldi temp2, 26    mul temp1, temp2    ret I've checked it out on my computer and seems like this works good, and doesn't eat many more cycles than the original.

 Nephazz
 Posted: Feb 24, 2009 - 08:47 PM
 Joined: Dec 07, 2008 Posts: 347 Location: germany
 Quote: Code: ; the input number must be in register "input" ; value/10 is in register "result" ; register temp1 and R1:R0 are clobbered div10:    ldi temp1, 205    mul temp1, input    lsr R1    lsr R1    lsr R1    mov result, R1    ret This one rocks! Saved my day, today _________________Jan KieÃŸling 74HC Overview Flowchart CAD schematic CAD

 seveprim
 Posted: Mar 06, 2009 - 03:54 AM
 Joined: Jan 18, 2005 Posts: 38
 i suppose one could allocate a huge look up table ln FLASH on the larger AVR range ... practically instant result which i have done many many years ago... (code lost to time)

 vbunch
 Posted: Dec 21, 2009 - 08:04 PM
 Joined: Sep 29, 2001 Posts: 53 Location: Superior, Colorado USA
 Thanks to all for alively discussion. I need help on a 16 bit divide: I have been using the DIV16u straight from apnote 1011 (AVR200) since 1998, and found a possible error. When you have \$7080 (28,800) divided by \$3841 (14,401), the correct answer is: \$0001 and remainder \$FFF6 (1.99986). However, the Atmel DIV16u returns \$0001 remainder \$383F, quite an error. When you have \$7080 (28,800) divided by \$3840 (14,400), the correct answer is: \$0002 and remainder \$0000 (2.00000), and the Atmel code returns the correct same answer. Please help, I need a fix fast, or am I wrong? Vern Bunch, AVR consultant and AVR designer

 peret
 Posted: Jan 05, 2010 - 06:41 PM
 Joined: May 26, 2004 Posts: 2538 Location: Las Vegas, Nevada
 vbunch wrote: When you have \$7080 (28,800) divided by \$3841 (14,401), the correct answer is: \$0001 and remainder \$FFF6 (1.99986). However, the Atmel DIV16u returns \$0001 remainder \$383F, quite an error. Dude, WTF? Let's do this manually, by long division. \$3841 goes into \$7080 once. Code: (hex assumed)       ____1___ 3841 ) 7080        3841 -        ----        383F \$3841 won't go into \$383F, so the quotient is 1 and the remainder is \$383F. The Atmel routine returns the correct answer. The number \$FFF6 might possibly occur in some floating point representation, but can never happen in unsigned integer math, since it is larger than the dividend.

 peret
 Posted: Jan 05, 2010 - 07:02 PM
 Joined: May 26, 2004 Posts: 2538 Location: Las Vegas, Nevada
 Oh, I see where you got it from. If you want to generate the fractional bits, take the remainder, multiply by 16 (shift left 4) and divide this by the divisor. Repeat until you have enough "(hexa)decimal" places. Code:       ____1FF... 3841 ) 7080        3841 -        ----        383F <<4        3841 * F -        ----         3821 <<4         3841 * F -          ----          3641           ... etc Again, this is just confirming that the original division routine got it right. You didn't carry out enough steps to format the answer the way you expected. Note that since the first remainder is guaranteed to be less than the divisor, each successive step is guaranteed to produce a quotient less than 16, ie a single hex digit. You have to manually concatenate these single digits to the number of places you want.

 vbunch
 Posted: Jan 08, 2010 - 05:53 PM
 Joined: Sep 29, 2001 Posts: 53 Location: Superior, Colorado USA
 Thank you my friends for your correct responses. It is my fault that I mentioned "remainder" when I expected a fully usable result, without having to divide again, and one that had 16 bit accuracy. So the Atmel division routine is "correct" but unusable without more computation. Since there was no mention that the result is unusable without further division, I niavely skipped the true math definition of "remainder" and went right to "answer", I should not have done that. I have since written an assembly language routine that gives me a correct (and directly usable) 16 bit division in one run, and it is less time (156 cycles) then the published 16 bit divide routine. It took 3 weeks at 104 hours a week, and I'm way beyond tired.

 clawson
 Posted: Jan 08, 2010 - 06:01 PM
 Joined: Jul 18, 2005 Posts: 62281 Location: (using avr-gcc in) Finchingfield, Essex, England
 Quote: I have since written an assembly language routine that gives me a correct (and directly usable) 16 bit division in one run, and it is less time (156 cycles) then the published 16 bit divide routine. It took 3 weeks at 104 hours a week, and I'm way beyond tired I guess it's too late to say that you could have "borrowed" the source for a / routine from an open source C compiler? _________________

 peret
 Posted: Jan 08, 2010 - 06:09 PM
 Joined: May 26, 2004 Posts: 2538 Location: Las Vegas, Nevada
 Quote: I guess it's too late to say that you could have "borrowed" the source for a / routine from an open source C compiler? Actually I suspect in this case he could not, because the answer he wanted was in a fixed point format. He got simple integer division from the appnote. The compiler would have given him full float, which would challenge me to adapt to a fixed format in assembler (and I consider myself rather good at it). As a matter of interest, if we did look at the source for a / routine in an open source C compiler - let's say, and why not, the GCC compiler - would we be looking at assembler, or C?

 clawson
 Posted: Jan 08, 2010 - 06:16 PM
 Joined: Jul 18, 2005 Posts: 62281 Location: (using avr-gcc in) Finchingfield, Essex, England
 Quote: would we be looking at assembler, or C? Assembler. Here's a snippet from gcc/config/avr/libgcc.s: Code: /*******************************************************        Division 16 / 16 => (result + remainder) *******************************************************/ #define   r_remL   r26   /* remainder Low */ #define   r_remH   r27   /* remainder High */ /* return: remainder */ #define   r_arg1L   r24   /* dividend Low */ #define   r_arg1H   r25   /* dividend High */ /* return: quotient */ #define   r_arg2L   r22   /* divisor Low */ #define   r_arg2H   r23   /* divisor High */     #define   r_cnt   r21   /* loop count */ #if defined (L_udivmodhi4)    .global   __udivmodhi4    .func   __udivmodhi4 __udivmodhi4:    sub   r_remL,r_remL    sub   r_remH,r_remH   ; clear remainder and carry    ldi   r_cnt,17   ; init loop counter    rjmp   __udivmodhi4_ep   ; jump to entry point __udivmodhi4_loop:         rol   r_remL      ; shift dividend into remainder    rol   r_remH         cp   r_remL,r_arg2L   ; compare remainder & divisor    cpc   r_remH,r_arg2H         brcs   __udivmodhi4_ep   ; remainder < divisor         sub   r_remL,r_arg2L   ; restore remainder         sbc   r_remH,r_arg2H __udivmodhi4_ep:         rol   r_arg1L      ; shift dividend (with CARRY)         rol   r_arg1H         dec   r_cnt      ; decrement loop counter         brne   __udivmodhi4_loop    com   r_arg1L    com   r_arg1H ; div/mod results to return registers, as for the div() function    mov_l   r_arg2L, r_arg1L   ; quotient    mov_h   r_arg2H, r_arg1H    mov_l   r_arg1L, r_remL      ; remainder    mov_h   r_arg1H, r_remH    ret    .endfunc #endif /* defined (L_udivmodhi4) */ #if defined (L_divmodhi4)    .global   __divmodhi4    .func   __divmodhi4 __divmodhi4:    .global   _div _div:         bst     r_arg1H,7   ; store sign of dividend         mov     __tmp_reg__,r_arg1H         eor     __tmp_reg__,r_arg2H   ; r0.7 is sign of result    rcall   __divmodhi4_neg1 ; dividend negative : negate    sbrc   r_arg2H,7    rcall   __divmodhi4_neg2 ; divisor negative : negate    rcall   __udivmodhi4   ; do the unsigned div/mod    rcall   __divmodhi4_neg1 ; correct remainder sign    tst   __tmp_reg__    brpl   __divmodhi4_exit __divmodhi4_neg2:    com   r_arg2H    neg   r_arg2L      ; correct divisor/result sign    sbci   r_arg2H,0xff __divmodhi4_exit:    ret __divmodhi4_neg1:    brtc   __divmodhi4_exit    com   r_arg1H    neg   r_arg1L      ; correct dividend/remainder sign    sbci   r_arg1H,0xff    ret    .endfunc #endif /* defined (L_divmodhi4) */ _________________

 MBedder
 Posted: Jan 15, 2010 - 03:50 PM
 Joined: Nov 02, 2009 Posts: 3239 Location: Zelenograd, Russia
 Just for fun, I have castrated the OP's div_10 code even further: Code: ;---------------------------------------- ; 8-bit division by 10 routine. ; ; Input: R0 = dividend. ; Output: R1 = result = dividend/10. ; ; Registers R0 and R16 are clobbered. ; 5 words/10 clocks including RET. ; div10:    ldi   r16,205    mul   r0,r16    ldi   r16,32    mul   r1,r16    ret ;----------------------------------------

 sparky23882
 Posted: Apr 30, 2010 - 11:37 AM
 Joined: Jul 18, 2007 Posts: 154 Location: Germany
 The OP's div_10 code was buggy at values like 19, 29, 39, ... or at 21, 31, 41, ... depending on whether the correction (INC) was done or not. Here's a fixed version, with remainder calculation (remainder lines marked, strip them if not needed, saves 5 cycles): Code: // input: A, output/value: R1, output/remainder: A // total cycles: 13 PUSH A      //2 Me (for remainder) INC A      //1 Me CPI A, 0   //1 Me BRNE x      //1,2 Me DEC A      //1 Me x:         //0 Me LDI B, 51   //1 RetroDan MUL A, B   //2 RetroDan LSR R1      //1 RetroDan POP A      //2 Me (for remainder) SUB A, R1   //1 Me (for remainder) Hope this is useful for someone out there. My approach: I made a list with values, RetroDan and RetroDan with INC. It looked somehow like this: 8 - 0 0 9 - 0 1 10 - 0 1 11 - 1 1 So one was too low, the other too high around exact matches (i.e. dividable by 10 with no remainder). So I took the one without INC (saves one cycle), INCreased the value before giving it to the RetroDan stuff, so all results were accurate - but "255" which becomes 0. A quick check for the overflow after increasing (i.e. value=0), decrease it again (make it 255 again, which gives the correct result) and done. P.S: I know it's an old thread, but I think it's an important one that's read by a lot of people for finding a good divide-by-10 routine. _________________Markus

 danni
 Posted: May 03, 2010 - 10:49 AM
 Joined: Sep 05, 2001 Posts: 2497
 But I would ask at first, whats the purpose to do a division by 10? Mostly it was the decimal output conversion. The fastest way to do so, was the subtraction method, since no divison nor multiplication was needed. A small example for 16 bit ASCII output can be seen on: http://www.avrfreaks.net/index.php?name ... 721#453721 Peter

 Koshchi
 Posted: May 03, 2010 - 08:20 PM
 Joined: Nov 17, 2004 Posts: 13834 Location: Vancouver, BC
 Quote: The fastest way to do so, was the subtraction method, since no divison nor multiplication was needed. But repeated subtraction that that routine uses is division. And the multiplies when done with the hardware multiply are very fast. _________________Regards, Steve A. The Board helps those that help themselves.

 Display posts from previous:  All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First
 Jump to: Select a forum Forum index|--[AVR (8-bit) Technical Forums]|   |-- AVR forum|   |-- XMEGA forum|   |-- AVR Wireless forum|   |-- AVR GCC forum|   |-- AVR Studio 5 and Atmel Studio 6 forum|   |-- AVR studio 4 forum|   |-- AVRfreaks Academy forum|   |-- AVR Tutorials|--[AVR Software Framework]|   |-- AVR Software Framework|--[AVR32 (32-bit) Technical Forums]|   |-- AVR32 Linux Forum|   |-- AVR32 General (standalone)|   |-- AVR32 Software Tools|   |-- AVR32 Hardware|--[General Electronics Technical Forums]|   |-- General Electronics|   |-- Atmel Security Products|--[Non-technical forums]|   |-- AVRfreaks.net Housekeeping|--[Non-topical forums]|   |-- Off-topic forum|   |-- AVRfreaks Trading Post
All times are GMT + 1 Hour