## Division only with bitwise shifts

31 posts / 0 new
Author
Message

Dear freaks,

I am using the ATmega64A controller and I need to make a division with the faster (as possible) method.

======================

The division I need to make is :

x / 93      (x = 0 to 16384).

======================

As an example of a multiplication using bit shifts is

x *10 = (x << 3) + (x << 1)

I need something like this for my division.

Can anyone help me ?

Michael.

User of:
IAR Embedded Workbench C/C++ Compiler
Altium Designer

icarus1 wrote:
the faster (as possible) method

How long is it currently taking?

How long do you need it to take?

Can you sacrifice accuracy for speed?

How much code space are you prepared to throw at it?

x *10 = (x << 3) + (x << 1)

That's clearly relying on the fact that 10 = 8 + 2, and that 8 and 2 are both powers of 2 - so achievable by shifts.

Is the 93 fixed and known at compile time?

Top Tips:

1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...

Division by 93 is then same as multiplication by 1/93 so you could work with fixed point maths.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

Dear guys,

I found the sollution my own.

x / 93 = (x >> 7) + (x >> 9) + (x >> 10)

Michael.

User of:
IAR Embedded Workbench C/C++ Compiler
Altium Designer

Just to note for the future that another approach is that 1/93 is roughly 705/65536 (for example). So say you want to use x=12345. We can already know that the answer to x/93 is 132(.74). The way to use 705/65536 is to take the 12345 and multiply it by 705. That gives 8699375 which is 0x84BDEF. The "clever" bit is that to divide that by 65536 is effectively just >>16 so it's just 0x84 which is 132. Of course that was throwing away the 0xBDEF "fraction". As that is over 0x8000 we probably want to round up. So the way you could do that is to take 0x84BDEF and add 0x8000 to it to get 0x853DEF and then throw away the 0x3DEF to be left with 0x85 which is 133 (and the real answer is 132.74 which, rounded up is 133 too).

In general you can convert a fraction to something over 65536 or 16,777,216 or 4,294,967,296 and then after multiplying by the fractional part you just use >>16 or >>24 or >>32 which really just means picking out the higher bytes.

icarus1 wrote:

x / 93 = (x >> 7) + (x >> 9) + (x >> 10)

Have you looked at the errors that solution gives?

For example, let x = 5115.

(5115>>7) + (5115>>9) + (5115>>10) = 52

5115/93 = 55

That's an error of over 5%.

In general for any small value of x the error is huge plus the error distribution of each term is cyclic and in some cases all three terms have a maximum error as illustrated above.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand." #1 Hardware Problem? https://www.avrfreaks.net/forum/...

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

hmmm...to be honest I didn't try this. Thank you Brian.

Michael.

User of:
IAR Embedded Workbench C/C++ Compiler
Altium Designer

clawson wrote:
1/93 is roughly 705/65536 (for example).

Brian Fairchild wrote:
Have you looked at the errors that solution gives?

Hence my original question:

I wrote:
Can you sacrifice accuracy for speed?

Which is another way of saying, "how much error can you tolerate?"

And, as Brian points out, you also need to be aware of the type(s) of error(s) that your solution gives...

Top Tips:

1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...

Another approach (though this is what is really happening when you use the / operator) is to recognise what 1/93 is as a binary fraction.

If I go here

https://www.h-schmidt.net/FloatC...

and type in 0.01075268817204301075268817204301 (which it rounds to 0.010752688 because it's only using 32-bit floats) it tells me the number is:

2^-7 * 1.01100000011 (and some "small change")

So 2^-7 is 1/128. So it's saying that 1/93 is

1/128 + 1/512 + 1/1024 + 1/131072 + 1/262144

If you add that up you get 0.010753631591796875 which is pretty close to the 0.01075268817204301075268817204301 I started with. The inaccuracy comes from the fact that first it was truncated to a 32bit float and then I threw away the "small change"

But anyway I git a few more places than your:

(x >> 7) + (x >> 9) + (x >> 10)

1/128 is your >>7, 1/512 is you >>9, 1/1024 is your >>10. The two added terms are >>17 and >>18

But the more >>n you add the closer you are getting to the original calculation anyway. Shifts can be expensive.

Also this is not going to help with what Brian showed because if you work in integers then 5115>>17 and 5115>>18 add nothing to the result as /131072 and /262144 are both fractional as the divisor is larger than the 5155 input.

icarus1 wrote:
I didn't try this. Thank you Brian.

Just do it in Excel (or your favourite spreadsheet) - it's easy.

Anything like this you should certainly be testing in Excel (or whatever).

Top Tips:

1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...

If you use Cliff's approach of multiplying by 705 you get a different error pattern... #1 Hardware Problem? https://www.avrfreaks.net/forum/...

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

Excel, or any other spreadsheet, is a useful tool to model errors and the like. Plotting a graph really help visualise what's going on.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

And the error pattern in post #12 is simply the pattern that results from using integer maths and having quite a large divisor.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

Sounds like you're trying to reinvent floating point, but discarding the remainder of each term.  Cliff used more terms, but still discarded remainders.

If you maintain that remainder at each step, the error is eliminated.  This:

`(x >> 7) + (x >> 9) + (x >> 10) + (x >>17) + (x >>18)`

... becomes this:

```((x << 11) + (x << 9) + (x << 8) + (x << 1) + x) >> 18
```

The integer results should be nearly identical to dividing by 93.  My spreadsheet shows only 54 16 14-bit values that are off by one from the correct answer, and they're all for large operands, such that the error is at most < 1 %.  The remaining 16,330 operand values yield an exactly correct result.

There are limitations (of course).  Since we start by shifting to the left by 11 bits, the operand must be constrained.  If we employ 32-bit arithmetic, the operand would be limited to 21 bits.  Not a problem if you're working with 16-bit operands.  And of course, the need to work with 32-bit intermediate results will have a speed penalty over working with 16 bits.

In effect, you've traded a 16-bit long division for 5 32-bit shift operations.  While some of the shifts can be optimised, it may not be as much of a win as you are hoping for, if it's a win at all.

If you need ultimate speed, consider using a pre-computed lookup table.

 "Experience is what enables you to recognise a mistake the second time you make it." "Good judgement comes from experience.  Experience comes from bad judgement." "Wisdom is always wont to arrive late, and to be a little approximate on first possession." "When you hear hoofbeats, think horses, not unicorns." "Fast.  Cheap.  Good.  Pick two." "We see a lot of arses on handlebars around here." - [J Ekdahl] Last Edited: Sat. Feb 25, 2017 - 03:00 PM

joeymorin wrote:

Sounds like you're trying to reinvent floating point, but discarding the remainder of each term.  Cliff used more terms, but still discarded remainders.

If you maintain that remainder at each step, the error is eliminated.  This:

`(x >> 7) + (x >> 9) + (x >> 10) + (x >>17) + (x >>18)`

... becomes this:

```((x << 11) + (x << 9) + (x << 8) + (x << 1) + x) >> 18
```

(...)

Also, you can round to nearest instead of truncate if you add "half LSB" before the final shift:

`((x << 11) + (x << 9) + (x << 8) + (x << 1) + x + (1 << 17) ) >> 18`

First I will ask OP why? (is this for speed or fun or size)

From a math perspective a simple correct version is to normalize the mul with 1/93 (to fit 16bitx16bit=32bit where only top 16bit is used), and since input is 16 bit the number is 45100 (cliff did not normalize and therefore have the number correct placed but also with errors ).

so let's see the error: 45100/2^22=45100/4194304 = 1/93,00008869

This has a problem that you divide with more that 1/93 so 93 give 0.999999046 and not 1.

But if we use 45101 all numbers up to 16384 a correct

On a AVR with multiplier this could be a doable way.

Speed

`((x << 11) + (x << 9) + (x << 8) + (x << 1) + x) >> 18`

look very simple it's very slow on a AVR especially here where the shifts are 32 bit .

For this particular case, we can also note that:

`(x << 1) + x`

and

`(x << 9) + (x << 8)`

are the same number, just shifted one byte, witch is almost free. I very much doubt the compiler would notice this, but we can optimize by hand, by spoon feeding the optimization to the compiler:

```temp = (x << 1) + x;
result = (x << 11) + (temp << 8) + temp;```

Also, you can round to nearest instead of truncate if you add "half LSB" before the final shift:

Yes you can, but that doesn't accurately represent the results you would normally get from integer division.  Using that approach results in a far greater number of operand values which yield off-by-one errors, 8211 vs 54.

are the same number, just shifted one byte, witch is almost free. I very much doubt the compiler would notice this, but we can optimize by hand, by spoon feeding the optimization to the compiler:

AVR GCC 4.8.1 often makes use of a generic shift subroutine which is >>fairly<< efficient, but not >>optimally<< efficient.  It identifies the number of bytes shifted, and then uses mov and movw, then cleans up any additional shifts with a loop.

Still, it should generally be faster than integer division.

On a AVR with multiplier this could be a doable way.

Yes, I use a similar approach, and have a host-based program which derives the optimal values of M and S for the general equation D / d = (D * M) >> S, but this approach does not satisfy the OP's request for 'Division only with bitwise shifts'.  Like others, I am curious to know why this is a requirement.

 "Experience is what enables you to recognise a mistake the second time you make it." "Good judgement comes from experience.  Experience comes from bad judgement." "Wisdom is always wont to arrive late, and to be a little approximate on first possession." "When you hear hoofbeats, think horses, not unicorns." "Fast.  Cheap.  Good.  Pick two." "We see a lot of arses on handlebars around here." - [J Ekdahl] Last Edited: Sat. Feb 25, 2017 - 02:58 PM

We still guess why the shift but I just made this C program that do the job fast and stupid correct division job, just to see how fast it is:

and because we know that max number is 16384 we can save some compares.

And basically it take 5 clk if a bit is cleared and 8 if not, plus what ever init that are needed

So in this case it's 35-53 clk + init

stupid me y can be uin8_t (max result is 176)

that make it 5 clk a bit cleared and 7 clk if not so it will be

35-47 clk + init

That is good for a tiny, but close to a normal const division on a mega.

downside the code is about 100 byte.

```/*
* GccApplication13.c
*
* Created: 25-02-2017 13:49:32
*/

#include <avr/io.h>
#include <stdlib.h>
volatile uint16_t counter;
volatile uint16_t q;

int main(void){
counter=0;
uint16_t x,a,y;
while(1){
counter++;
y=0;
a=11904;
x=counter;
if (x>=a){
x-=a;
y+=128;
}
a=5952;
if (x>=a){
x-=a;
y+=64;
}
a=2976;
if (x>=a){
x-=a;
y+=32;
}
a=1488;
if (x>=a){
x-=a;
y+=16;
}
a=744;
if (x>=a){
x-=a;
y+=8;
}
a=372;
if (x>=a){
x-=a;
y+=4;
}
a=186;
if (x>=a){
x-=a;
y+=2;
}
a=93;
if (x>=a){
x-=a;
y+=1;
}
q=y;
}//while 1
}```

Last Edited: Sat. Feb 25, 2017 - 09:38 PM

y=counter/93

take about 50 clk on a mega and about 200 on a tiny.

And I checked this code :

`((x << 11) + (x << 9) + (x << 8) + (x << 1) + x) >> 18`

and that was ugly at least for speed about 150 clk (with x as 32 bit) and that for a mega it will be even slower on a tiny.

I guess that there only is ASM left to speed it up.

and that should be about 25 clk for a 100% correct solution and about 15clk for one never more than 1 away. (on a mega).

150 clk was with compiler option -O2, so I tried with -O3 to see if it would unroll but it didn't !!! it made the same code with a loop of 18  32bit shifts!!! stupid.

so I tried this:

`((x << 11) + (x << 9) + (x << 8) + (x << 1) + x) >> 2`

and then >>16

and then it's down to about 40 clk.

Last Edited: Sat. Feb 25, 2017 - 11:04 PM

Dear Icarus,

i have finally completed the function you wanted in the past.

I have made the text compression to 1 byte.

clawson wrote:
another approach is that 1/93 is roughly 705/65536

Clawson,  how did you come up with 705/65536?

TIA

Jim

(Possum Lodge oath) Quando omni flunkus, moritati.

"I thought growing old would take longer"

octave:2> 65536/93
ans =  704.69

Do y'all know this trread is half a year old?

Doing magic with a USD 7 Logic Analyser: https://www.avrfreaks.net/comment/2421756#comment-2421756

Bunch of old projects with AVR's: http://www.hoevendesign.com

ki0bk wrote:
Clawson, how did you come up with 705/65536?

clawson wrote:
So it's saying that 1/93 is 1/128 + 1/512 + 1/1024 + 1/131072 + 1/262144

Whoa -- a head rush.  An update to a semi-vintage thread causes a reader to review the thread and needs clarification on one of the posts, which causes the next reader [me] of the latest comment to dig back into the thread and ...

When I'm doing an AVR8 app, especially a "tight" app on a small model, paper analysis might come up with a needed factor such as the example 0.01075268817204301075268817204301 ;) or thereabouts, I have a PC program that brute-force goes through all the reasonable integer combinations that come "close" to within a defined difference.  One of the options I put into the brute-force thingy was to use only powers of two as divisors in the ratio.

But I don't think that is what is going on here.  Let me guess:

1/128 is 512/65536.

1/512 is 128/65536

1/1024 is 64/65536.

Add those up and you get 704/65536.

The next two fractional digits add up to more than half, so 705 rounded up.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

nevermind

 "Experience is what enables you to recognise a mistake the second time you make it." "Good judgement comes from experience.  Experience comes from bad judgement." "Wisdom is always wont to arrive late, and to be a little approximate on first possession." "When you hear hoofbeats, think horses, not unicorns." "Fast.  Cheap.  Good.  Pick two." "We see a lot of arses on handlebars around here." - [J Ekdahl] Last Edited: Fri. Jul 7, 2017 - 02:33 PM

As Jim says, to find fraction as a binary multiple just divide. So if you wanted 1/117th and you wanted to do then do 65536 / 117 = 560.1 so 560/65536 is roughly 1/117. In fact 1/117 is 0.00854700854700854700854700854701 (my Windows calculator says) and 560/65536 is 0.008544921875 but it's a close approximation, you might round one to be 0.008547 while the other is 0.008545 - so not a million miles away.

Taking 12345 then 12345 * 560 is 6,913,200 which is 0x69 7CB0. If you now drop the "64K part" you are left with 0x69 (well kind of 0x69 and a half) which is 105. Meanwhile 12345 / 117 is 105.5128. So, like I say 0x69 and a half ;-)

So if you wanted 1/117th and you wanted to do then do 65536 / 117 = 560.1 so 560/65536

I will just add that if you want it better with a 16bit multiplication then do (2**22)/117 = 35848,752136752136752136752136752 so mul with 35849.

0,00854700854700854700854700854701   (1/117)

0,0085470676422119140625                       (35849/(2**22))

0,008544921875                                           (560/(2**16))

Same multiplication time, but the result needs to be shifted

Last Edited: Fri. Jul 7, 2017 - 04:10 PM

Do you really mean **22 not **24? I thought this was all about breaking things across 8 bit boundaries ?

There are two boundaries. Byte alignment and then the size of the multiplication. Where both yours and my number make a 16bit multiplication, but with **24 it would need a 24 bit multiplication.

I will need to shift the result two times left (6 shifts for 16 bit and 10 for 32 bit).

but I guess that your number is fine for a 16 bit :

real  65535/117 = 560,12820512820512820512820512821