## [DIRTY-MATH] Filthy Math Tricks

25 posts / 0 new
Author
Message

****** DIRTY MATH PACK for AVR/ASM ******

While trying to help someone in another FORUM with a division-by-60 routine to convert seconds to minutes, someone made the remark:

Quote:
What RetroDan has is a quick and dirty approximation.

I don't know if it was an insult or compliment, but it gave me a great name for the type of binary routines I am interested in: "Dirty Math," Quick & Dirty Routines that are small and fast.
Quote:
DIRTY DAN'S RULE #1: ALWAYS USE A POWER OF TWO

If you are writing any program or routine, even if the savings are not obvious at first, pick numbers that are powers of two. If you need to take a average don't take 10 readings and divide by 10, take 8 or 16. If you're setting up a table, don't make entries 12 bytes long, make them 8 or 16. It will greatly simplify things at the hardware-level.
Quote:
DIRTY DAN'S RULE #2: AVOID DIVISION!

While Addition, Subtraction and Multiplication can be done is a little as 1 clock cycle, most division involves looping and uses a lot of processor time. If you must divide try to do it by powers of two (See Rule #1) which can be accomplished with a few Right-Shifts. If you must divide more than once, try and re-arrange equation so you only divide once, for example if you need to divide by 10 and later divide by 12, perhaps you can just divide once by 120.

Last Edited: Sun. Mar 26, 2006 - 04:43 AM

The first routine I suggested (and the one he used in the end) is an adaption of an optimized general "Division-by-Subtraction" routine we kicked around in another thread. Brian (bcb) had the brilliant idea of combining the two variables that are shifted into one, thereby reducing the size and complexity and excection time of the routine.

[CODE REMOVED - SEE LAST POST]
Last Edited: Sun. Mar 26, 2006 - 11:37 PM

Shifting to the right is a "cheap and dirty" way to divide by powers of 2. Someone mentioned that 60 was "close" to 64 and suggested that perhaps dividing by 64 might work. I thought the idea a poor one since, for starters it would involve 6 left-shift of a 16-bit number, so a total of 12 shifts. And besides, I felt that dividing by 64 was just not accurate enought, however, read on, the plot thickens...

Since the AVRs have a very fast MUL on some of their chips, I am intrigued by the method known as "Division-by-Inversion" which means rather than divide our number by 60 we multiply it by 1/60 instead.

Therfore the "Inversion" of 1/60 is 0.0166666666667 a number I can't represent by a byte.

Since division by 256 is super-easy (you just truncate your answer by ignoring the lower byte) I calculated the "inversion" of 60 time 256 = 4.266666667 so the first alternate "Quick & Dirty" routine was just a multiplication by 4 and drop the lower byte.

[CODE REMOVED, SEE LAST POST]
Last Edited: Mon. Mar 27, 2006 - 01:23 AM

Since the answers were off upto 4 in the range 0-3600 (0-60 minutes) I cheated and added a few "adjustment" factors to the answer.

[CODE REMOVED - SEE LAST POST]

Then someone pointed out that I was mearly dividing by 64 and adjusting the result.
BULL CRAP! I thought to myself. I'm multiplying by 4 and dividing by 256....
Wait a minute..... 256 divided by 4 is 64... well what-do-you-know?
In a round-a-bout way I WAS dividing by 64... Go figure!

Last Edited: Sun. Mar 26, 2006 - 11:33 PM

Before the MUL instruction, a "quick and dirty" way to multiply by 3 was to left-shift and add. The computer version of: 3x = 2x + x

Could there be a dirty way to compute division by 3?
Here's were I start.

[CODE REMOVED - SEE LAST POST]
[CODE REMOVED - SEE LAST POST]
Last Edited: Sun. Mar 26, 2006 - 11:32 PM

My next attempt is more accurate:

[CODE REMOVED - SEE LAST POST]
Last Edited: Sun. Mar 26, 2006 - 11:30 PM
LDI  N,8
LUPIE:   ROL  R16         ;DIVIDE R17:R16 BY 60
ROL  R17
BRCS DOSUB
CPI   R17,60
BRCS SKPSUB
DOSUB:   SUBI  R17,60
CLC
SKPSUB:  DEC  N
BRNE LUPIE
ROL  R16
DVDONE:  COM  R16         ;ANSWER IN R16

Be careful. This code returns an incorrect answer if the quotient is larger than 15359.

Regards,
Steve A.

The Board helps those that help themselves.

Last Edited: Fri. Mar 24, 2006 - 06:39 AM

He was only interested in 0-3600 for time measurements, however, what causes it to fail? Do we lose a bit somewhere?

Last Edited: Sun. Mar 26, 2006 - 11:38 PM

I've corrected the number to 15359. You are assuming that the answer fits in an 8 bit number. The largest number in 8 bits is 255, 255 * 60 = 15300, plus a possible 59 remainder gets you 15359.

Quote:
He was only interested in 0-3600 for time measurements

Then your code can be simplified. The BRCS DOSUB is unnecessary since it would never happen (it would require a number greater than 32767). You would therefore also not need the CLC.

Regards,
Steve A.

The Board helps those that help themselves.

Wow Steve, you're a freakin' genius at this stuff!
So for the range 0-15300 the routine could be simplified to:

[CODE REMOVED - SEE LAST POST]
Last Edited: Sun. Mar 26, 2006 - 11:29 PM

Would this quick modification fix the problem? Expanding the answer to 2-bytes?

[CODE REMOVED - SEE LAST POST]

Nope, I just tried it! I Don't think a quick-fix is possible because the 8 iterations will only collect one byte. So the limit of this routine is, as you said: 0

Last Edited: Sun. Mar 26, 2006 - 11:27 PM

In order to do a general divide, the divisor has to be normalized. That is, it needs to be left shifted until the high bit is 1. Of course you also have to make the adjustment on the dividend to make up for the normalization. In other words you need to adjust the decimal point (or is it binary point?). You might also want to normalize the quotient as well, then you would keep track of the difference in the normalizations. Floating point numbers are always kept in normalized form partly for this reason (as well as the fact that it maximizes accuracy).

Regards,
Steve A.

The Board helps those that help themselves.

If I normalize the 60 upto 240 then I need to multiply the dividend by 4 also and the result could flow the 2-byte dividen into another byte and we'd end-up with a 24X8 divide. The cure is worse than the disease.

If I go the "other way" and divide by 240 then multiply result by 4 the accuracy is off. Maybe I'm missing something, but normalizing the 60 doesn't seem to improve anything. And trying to normalize the 2-byte unknown and keeping track would probably be just as much work as just going ahead with the divide in the first place.

I guess the only real way to fix it for the entire 2-byte range of 0-65,536 is to iterate 16 times (ouch!) and collect answer in 2-bytes (see below). Although there should be a way to do it with just 11 iterations and some bit-adjustments at the end.

[CODE REMOVED - SEE LAST POST]
Last Edited: Sun. Mar 26, 2006 - 11:26 PM

Last Edited: Fri. Mar 24, 2006 - 05:35 PM

Here is another method to divide a 16 bit number by 60.
Its a little more complex, but it does not involve looping 8 or 16 times
and executes in just 1.38 uS at 16Mhz:

[CODE REMOVED - SEE LAST POST]
Last Edited: Sun. Mar 26, 2006 - 11:24 PM

Quote:
If I normalize the 60 upto 240 then I need to multiply the dividend by 4 also and the result could flow the 2-byte dividen into another byte and we'd end-up with a 24X8 divide. The cure is worse than the disease.

If I go the "other way" and divide by 240 then multiply result by 4 the accuracy is off.

Consider this. If you divide a 16 bit number by 1, the result is a 16 bit number. If you divide by 2, the result fits in 15 bits, by 4 - 14 bits, etc. This is just a way of saying that the larger the number you divide by, the smaller the result. The number of shifts in the normalization tells you how many bits there need to be in the answer (the more shifts, the larger the dividend). You can then use this knowledge to determine how many times through the "subtract and shift" loop to do.

You can use the same type of logic for normalizing the quotient to further adjust the number of times through the loop. In this case, the more shifts, the smaller the dividend.

Regards,
Steve A.

The Board helps those that help themselves.

Damn, I'm just learning the maths side of assembly language. I personally love the LSL (x2) of a number < 255. I'm still trying to get my head around timers (eg, create a 1 second delay), I can now do this with both TCNT0 and the manual method by cascading multiple registers and relying on the Carry Bit in the SREG to complete the delay. But I like these dirty math tricks, it helps me understand the math to it a little better. Of course I've always preferred accuracy, but I'm still learning :D

[CODE REMOVED]
Last Edited: Mon. Mar 27, 2006 - 01:25 AM

TheDon wrote:
Damn, I'm just learning the maths side of assembly language. I personally love the LSL (x2) of a number < 255... But I like these dirty math tricks, it helps me understand the math to it a little better. Of course I've always preferred accuracy, but I'm still learning :D

I'm very glad you enjoy our discussion. Not only is LSL multiplying by two but LSR is division by two. Accuracy is always going to be an issue unless you either use bloated floating point, extended integers or just try and stick to powers-of-two.

Someone in another thread was looking for a fast way to calculate the RMS value of an AC voltage. While working out the routine I needed a fast way to divide by 16.

I'm new to the AVR core and I noticed that it has an opcode SWAP, at first I thought it swapped the contents of two registers (which would be handy) but it actually swaps the two Nybbles within a single Byte. I wondered if there was a way to do a "dirty" DIV16 because four left-shifts means moving left one Nybble. This routine is only 5 lines and executes in only 0.5uS, however the maximum you can divide is 4,088.

[CODE REMOVED - SEE LAST POST]
Last Edited: Sun. Mar 26, 2006 - 11:23 PM

Here is the same "Dirty" Nybble-Swap DIV16 routine expanded to cover all the two-byte integers from 0-65,536, however it's 8 lines long, so it would be better to just do 4 left-shift on 2 bytes = 8 program steps, and thereby preserving the remainder.

[CODE REMOVED - SEE LAST POST]

I felt like I was playing pea-under-one-the-cups game (three card monty) while working out this routine.

Last Edited: Sun. Mar 26, 2006 - 11:12 PM

RMS (Root Mean Squared) is a measurement of average used on AC circuits. The formula is RMS=SQRT[SUM(X^2)/n] you square all your readings and total them, then you divide by the the number of reading and take the square root of the result. Here is my routine that accomplishes the same without any squaring, dividing or taking any square roots, there's more than just a couple "dirty" trick to this routine but it works. If there's any interest, let me know and I'll explain why it does work.

[CODE REMOVED - SEE LAST POST]
Last Edited: Sun. Mar 26, 2006 - 11:15 PM

Has ANY of this been posted to the academy as per the posting rules?

*******************************************
* DUE TO THE HOSTILE, UNFRIENDLY NATURE   *
* OF MODERATORS AND USERS OF THIS FORUM,  *
* WHO KEEP DISRUPTING INTERESTING THREADS *
* WITH THEIR OFF-TOPIC COMMENTS, I HAVE   *
* DECIDED TO END THIS THREAD.             *
*                                         *
* MY APOLOGIES TO THOSE WHO MAY HAVE      *
* FOUND IT'S CONTENTS INTERESTING.        *
*******************************************
Last Edited: Mon. Mar 27, 2006 - 01:21 AM