How to write fast dividing by 5 and by 10 for uint32_t (4 bytes) input data for ATMega16a using assembler ?
Dividing by 4 is shifting left x div 4 = (x>>2) . x*2= (x<<1).
How to write fast dividing by 5 and by 10 for uint32_t (4 bytes) input data for ATMega16a using assembler ?
Dividing by 4 is shifting left x div 4 = (x>>2) . x*2= (x<<1).
How to write fast dividing by 5 and by 10 for uint32_t (4 bytes) input data for ATMega16a using assembler ?
Interesting. Now, you probably should tell your goals -- just "fast" doesn't mean much.
It also might help if you give the reason for your quest.
That said, have you searched this forum? IIRC there have been extensive thread(s) about this.
https://www.avrfreaks.net/forum/... [Dave Van Horn; Sean Ellis; Jesper; ...]
https://www.avrfreaks.net/forum/...
...
Fast divide-by-10 is commonly discussed, but you need to decide whether you also want to get the remainder...
https://forum.arduino.cc/index.p...
https://forum.arduino.cc/index.p...
As said it has been here many times.
So look around.
Because (your) AVR has a HW multiplayer there is no doubt that mul with 1/5 and 1/10 will be the fastest way.
Do you need it 100% accurate ?
How to write fast dividing by 5 and by 10 for uint32_t (4 bytes) input data for ATMega16a using assembler ?
using assembler
Before considering that implementation detail, think about the general requirement ...
I've not even looked at "prior art" but I can't help noticing that /10 is /5 then /2 - the last bit of which must be pretty easy to get a computer to do! So the challenge is presumably just /5 ?
You could use the fact that the series 1/4 - 1/16 + 1/64 - 1/256 + 1/1024 - 1/4096 ... converges to 1/5.
that is what you when you mul with 1/5.
8 bit mul fast 256/5 = 51 (51.2)
8 bit using shift =205 (204.8)
16 bit fast 2**16/5 =13107 (13107,2)
16 shift =52429 (52428.8)
...
...
And remember that reminder on a chip with HW mul is fast (mul result with 5 and subtract from the org number )
You could use the fact that the series 1/4 - 1/16 + 1/64 - 1/256 + 1/1024 - 1/4096 ... converges to 1/5.
Indeed. In other words,
n/5 = n>>2 - n>>4 + n>>6 - n>>8 and so on...
But still, using multiply by the inverse is probably faster, since the ATMega16 has an hw multiplier.
jaksel wrote:You could use the fact that the series 1/4 - 1/16 + 1/64 - 1/256 + 1/1024 - 1/4096 ... converges to 1/5.
Indeed. In other words,
n/5 = n>>2 - n>>4 + n>>6 - n>>8 and so on...
But still, using multiply by the inverse is probably faster, since the ATMega16 has an hw multiplier.
But how do you represent the inverse in a uint?
But how do you represent the inverse in a uint?
Why surely by sliding the base 2 "decimal point" & keeping track of that. Say you have 8 bit number to divide by 5, mult by 1/5th using the closest 8 bit number representation (thus 1/5 as 1 part in 256), slid over so the msb is 1 (keep track of the number of shifts) ...do the 8x8 mult. Knowing the number of shifts, you can grab the 16bit answer and slide it back to grab the result. You can improve by using 16bit instead of 8, or 24, etc (assuming divisor is not exactly represented by 8 bits)
Yes, let's exemplify the whole process:
1/5 = 0.2, then we can use a calculator like this https://www.exploringbinary.com/binary-converter/
So, 0.2 decimal is 0.00110011001100110011001100110011001100110011001100110011001100... in binary. Now, we shift <<2 to get more significant bits, in other words, 0.8 = 0.110011001100110011001100110011001100110011001100110011001100...
Since this is an infinite fraction, we can round to use just 8 significant bits: 0.8 = 0.11001101 then multiply by 256 (<<8) and we get the 11001101 (decimal 205) "magic number" that we can multiply to eventually get the result of the division.
For example, to divide 20 by 5, we multiply by 205, so 20*205 = 4100 that is 0001_0000 0000_0100 binary. Now we need to "undo" the shifts we did, >>10 and obtain the result 100 binary, which is of course 4 decimal.
Now we need to "undo" the shifts we did, >>10
And depending on how much shifting ("bit grabbing") you need (say it were 5 shifts), it may sometimes be better to use a larger 16bit/24/etc multiplier so you can simply grab result bytes with no shifting & perhaps end up with an even more accuracy as a bonus. It's a bit of a sport to find the optimal implementation.
While this has been discussed many times before as others have mentioned, I haven't seen anything conclusive on what is the smallest and what is the fastest.
Here's something I came up with (untested) after thinking about the problem for 20 minutes, with the goal of writing the smallest 16-bit unsigned divide-by-5. 16 AVR instructions (32 bytes):
ldi r23, hi8(52429) ldi r22, lo8(52429) movw r18, r24 loop: lsrw r18 lsrw r22 brcc .+2 addw r18, r24 addiw r22, 0 brne loop ; now divide result by 4 lsrw r18 lsrw r18
It's just a 16-bit * 16-bit multiply with the lower 16 bits of the result discarded. The fixed multiplier is the magic number to divide by 1.25, and then the result is divided by 4 at the end.
Using hardware multiply, you could probably cut the code size in half, and get something that is over an order of magnitude faster. For AVRs without hardware multiply, I'm curious to find out if there is a smaller implementation than what I just whipped up.
with the goal of writing the smallest 16-bit unsigned divide-by-5.
Why (X*(1/1.25))/4 instead of just X*(1/5) ?
Why (X*(1/1.25))/4 instead of just X*(1/5) ?
Because the higher 2 bits of the binary representation of 0.2 are zero, so it's better to multiply by 0.8 instead, you gain 2 extra bits of precision for intermediate calculations, and do the >>2 shift as a final step.
edit: to be precise, the higher 2 bit of the fractional part (mantissa).
[CPP]
uint8_t a,b;
uint16_t c,d;
uint32_t e,f;
b = (a>>3) - (a>>5) + (a>>7); // b = a/10
d = (c>>3) - (c>>5) + (c>>7) - (c>>9) + (c>>11) - (c>>13) + (c>>15); // d = c/10
f = (e>>3) - (e>>5) + (e>>7) - (e>>9) + (e>>11) - (e>>13) + (e>>15) - (e>>17) + (e>>19) - (e>>21) + (e>>23) - (e>>25) + (e>>27) - (e>>29) + (e>>31); // f = e/10
d = (c>>3) - (c>>5) + (c>>7) - (c>>9) + (c>>11) - (c>>13) + (c>>15); // d = c/10
d = (c- (c- (c- (c- (c- (c- (c>>2) )>>2 )>>2 )>>2 )>>2 )>>2 )>>3;
[/CPP]
b = (a>>3) - (a>>5) + (a>>7); // b = a/10 d = (c>>3) - (c>>5) + (c>>7) - (c>>9) + (c>>11) - (c>>13) + (c>>15); // d = c/10 f = (e>>3) - (e>>5) + (e>>7) - (e>>9) + (e>>11) - (e>>13) + (e>>15) - (e>>17) + (e>>19) - (e>>21) + (e>>23) - (e>>25) + (e>>27) - (e>>29) + (e>>31); // f = e/10 d = (c>>3) - (c>>5) + (c>>7) - (c>>9) + (c>>11) - (c>>13) + (c>>15); // d = c/10 d = (c- (c- (c- (c- (c- (c- (c>>2) )>>2 )>>2 )>>2 )>>2 )>>2 )>>3;
??????
The thread title said "fast dividing". Did you look at the Asm the C compiler churns out for all those shifts? Or did you use the simulator stopwatch to count the cycles in each case? (You would, of course, have to make things volatile or the entire code will be discarded anyway - and hence "very fast" ;-)
http://we.easyelectronics.ru/Soft/preobrazuem-v-stroku-chast-1-celye-chisla.html
Programs for PC for testing fast dividing by zero and subroutines for fast dividing by 10
[CPP]
#include <iostream> #include <stdint.h> using namespace std; struct divmod10_t { uint32_t quot; uint8_t rem; }; inline static divmod10_t divmodu10(uint32_t n) { divmod10_t res; // mul 0.8 res.quot = (n >> 1); res.quot += res.quot >> 1; res.quot += res.quot >> 4; res.quot += res.quot >> 8; res.quot += res.quot >> 16; uint32_t qq = res.quot; // div 8 res.quot >>= 3; // rem res.rem = uint8_t(n - ((res.quot << 1) + (qq & ~7ul))); // corr rem , quot if(res.rem > 9) { res.rem -= 10; res.quot++; } return res; } int main () { uint32_t i; for ( i=0; i<0xFFFFFFFF;i++ ) { uint32_t delta= divmodu10(i).quot -(i/10); if (delta!=0) { cout<< "\n x= "<<hex<<i; cout<< " delta "<<hex<<delta; } } cout<< " end ";
//cout<<hex<<(~7ul); ->0x ff ff ff f8
}
[/CPP]
[CPP]
struct divmod10_t { uint32_t quot; uint8_t rem; }; inline static divmod10_t divmodu10(uint32_t n) { divmod10_t res; // умножаем на 0.8 res.quot = n >> 1; res.quot += res.quot >> 1; res.quot += res.quot >> 4; res.quot += res.quot >> 8; res.quot += res.quot >> 16; uint32_t qq = res.quot; // делим на 8 res.quot >>= 3; // вычисляем остаток res.rem = uint8_t(n - ((res.quot << 1) + (qq & ~7ul))); // корректируем остаток и частное if(res.rem > 9) { res.rem -= 10; res.quot++; } return res; }
[/CPP]
rebuild for assembler
Program is need for DVDC control for PLL (for obtaining codes for divider with variable dividing coefficient from frequency data (uint32_t or 6...7 digits BCD -> uint32_t)) , for PLL design .
Umm... If the data's already in BCD, why not just drop the low digit? /10 right there. Shift the resulting uint32_t left once for /5. S.
Program is need for DVDC control for PLL
What are you talking about, this is not a mystery show...what is DVDC?
Are you saying from your coding, that you already have the answer?
While this has been discussed many times before as others have mentioned, I haven't seen anything conclusive on what is the smallest and what is the fastest.
Here's something I came up with (untested) after thinking about the problem for 20 minutes, with the goal of writing the smallest 16-bit unsigned divide-by-5. 16 AVR instructions (32 bytes):
ldi r23, hi8(52429) ldi r22, lo8(52429) movw r18, r24 loop: lsrw r18 lsrw r22 brcc .+2 addw r18, r24 addiw r22, 0 brne loop ; now divide result by 4 lsrw r18 lsrw r18It's just a 16-bit * 16-bit multiply with the lower 16 bits of the result discarded. The fixed multiplier is the magic number to divide by 1.25, and then the result is divided by 4 at the end.
Using hardware multiply, you could probably cut the code size in half, and get something that is over an order of magnitude faster. For AVRs without hardware multiply, I'm curious to find out if there is a smaller implementation than what I just whipped up.
I just noticed a simple speed optimization would be to change "brcc .+2" to "brcc loop". That cuts the loop time from 10 to 6 cycles when there is no add. And if my math is right, that cuts the total time from 176 cycles to 134.
Umm... If the data's already in BCD, why not just drop the low digit? /10 right there. Shift the resulting uint32_t left once for /5. S.
A left shift of dropping the low digit isn't quite the same is dividing by 5: Consider what happens if the low digit was 5.
Anyway, general observation:
This is a well-studied problem. You probably want to do a multiplication-based approach.
Things to consider: First, AVR has no barrel shifters, meaning that rotate is O(N) on number of bits, meaning it's slow, which is the opposite of most desktop CPUs, where shifts are basically free.
Divide by 10: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 3
Divide by 5: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 2
But beware, those are probably only accurate for 16-bit values. Note also that the >>16 can be trivially bypassed with a bit of pointer math.
If uou are use series, problem with delta <>0 (+/-2 ...3 digits ) was expected
data= (n>>3) - (n>>5) + (n>>7) - (n>>9) + (n>>11) - (n>>13) + (n>>15) - (n>>17) + (n>>19) - (n>>21) + (n>>23) - (n>>25) + (n>>27) - (n>>29) + (n>>31) ;
x= 8 delta 1
x= 9 delta 1
x= 10 delta 1
x= 11 delta 1
x= 12 delta 1
x= 13 delta 1
x= 18 delta 1
x= 19 delta 1
x= 1a delta 1
x= 1b delta 1
x= 1c delta 1
x= 1d delta 1
x= 30 delta 1
x= 31 delta 1
x= 38 delta 1
x= 39 delta 1
x= 3a delta 1
x= 3b delta 1
x= 46 delta ffffffff
x= 47 delta ffffffff
x= 58 delta 1
x= 59 delta 1
x= 64 delta ffffffff
x= 65 delta ffffffff
x= 66 delta ffffffff
x= 67 delta ffffffff
x= 6e delta ffffffff
x= 6f delta ffffffff
x= 80 delta 1
x= 81 delta 1
x= 88 delta 1
x= 89 delta 1
x= 8a delta 1
x= 8b delta 1
x= 90 delta 1
x= 91 delta 1
x= 92 delta 1
x= 93 delta 1
x= 94 delta 1
x= 95 delta 1
x= 98 delta 1
x= 99 delta 1
x= 9a delta 1
x= 9b delta 1
x= 9c delta 1
x= 9d delta 1
x= 9e delta 1
x= 9f delta 1
x= a8 delta 1
x= a9 delta 1
x= b0 delta 1
x= b1 delta 1
x= b2 delta 1
x= b3 delta 1
x= b8 delta 1
x= b9 delta 1
x= ba delta 1
x= bb delta 1
x= bc delta 1
x= bd delta 1
x= d0 delta 1
x= d1 delta 1
x= d8 delta 1
x= d9 delta 1
x= da delta 1
x= db delta 1
x= e6 delta ffffffff
x= e7 delta ffffffff
x= f8 delta 1
x= f9 delta 1
x= 100 delta 1
x= 101 delta 1
x= 102 delta 1
x= 103 delta 1
x= 108 delta 1
x= 109 delta 1
x= 10a delta 1
x= 10b delta 1
x= 10c delta 1
x= 10d delta 1
x= 110 delta 1
x= 111 delta 1
x= 112 delta 1
x= 113 delta 1
x= 114 delta 1
x= 115 delta 1
x= 116 delta 1
x= 117 delta 1
x= 118 delta 1
x= 119 delta 1
x= 11a delta 1
x= 11b delta 1
x= 11c delta 1
x= 11d delta 1
x= 11e delta 1
x= 11f delta 1
x= 120 delta 1
x= 121 delta 1
x= 128 delta 1
x= 129 delta 1
x= 12a delta 1
x= 12b delta 1
x= 130 delta 1
x= 131 delta 1
x= 132 delta 1
x= 133 delta 1
x= 134 delta 1
x= 135 delta 1
x= 138 delta 1
x= 139 delta 1
x= 13a delta 1
x= 13b delta 1
x= 13c delta 1
x= 13d delta 1
x= 13e delta 1
x= 13f delta 1
x= 148 delta 1
x= 149 delta 1
x= 150 delta 1
x= 151 delta 1
x= 152 delta 1
x= 153 delta 1
x= 158 delta 1
x= 159 delta 1
x= 15a delta 1
x= 15b delta 1
x= 15c delta 1
x= 15d delta 1
x= 170 delta 1
x= 171 delta 1
x= 178 delta 1
x= 179 delta 1
x= 17a delta 1
x= 17b delta 1
x= 180 delta 1
x= 181 delta 1
x= 182 delta 1
x= 183 delta 1
x= 184 delta 1
x= 185 delta 1
x= 188 delta 1
x= 189 delta 1
x= 18a delta 1
x= 18b delta 1
x= 18c delta 1
x= 18d delta 1
x= 18e delta 1
x= 18f delta 1
x= 190 delta 1
x= 191 delta 1
x= 192 delta 1
x= 193 delta 1
x= 194 delta 1
x= 195 delta 1
x= 196 delta 1
x= 197 delta 1
x= 198 delta 2
x= 199 delta 2
x= 19a delta 1
x= 19b delta 1
x= 19c delta 1
x= 19d delta 1
x= 19e delta 1
x= 19f delta 1
x= 1a0 delta 1
x= 1a1 delta 1
x= 1a2 delta 1
x= 1a3 delta 1
x= 1a8 delta 1
x= 1a9 delta 1
x= 1aa delta 1
x= 1ab delta 1
x= 1ac delta 1
x= 1ad delta 1
x= 1b0 delta 1
x= 1b1 delta 1
x= 1b2 delta 1
x= 1b3 delta 1
x= 1b4 delta 1
x= 1b5 delta 1
x= 1b6 delta 1
x= 1b7 delta 1
x= 1b8 delta 1
x= 1b9 delta 1
x= 1ba delta 1
x= 1bb delta 1
x= 1bc delta 1
x= 1bd delta 1
x= 1be delta 1
x= 1bf delta 1
x= 1c0 delta 1
x= 1c1 delta 1
x= 1c8 delta 1
x= 1c9 delta 1
x= 1ca delta 1
x= 1cb delta 1
x= 1d0 delta 1
x= 1d1 delta 1
x= 1d2 delta 1
x= 1d3 delta 1
x= 1d4 delta 1
x= 1d5 delta 1
x= 1d8 delta 1
x= 1d9 delta 1
x= 1da delta 1
x= 1db delta 1
x= 1dc delta 1
x= 1dd delta 1
x= 1de delta 1
x= 1df delta 1
x= 1e8 delta 1
x= 1e9 delta 1
x= 1f0 delta 1
x= 1f1 delta 1
x= 1f2 delta 1
x= 1f3 delta 1
x= 1f8 delta 1
x= 1f9 delta 1
x= 1fa delta 1
x= 1fb delta 1
x= 1fc delta 1
x= 1fd delta 1
x= 210 delta 1
x= 211 delta 1
x= 218 delta 1
x= 219 delta 1
x= 21a delta 1
x= 21b delta 1
x= 226 delta ffffffff
x= 227 delta ffffffff
x= 238 delta 1
x= 239 delta 1
x= 244 delta ffffffff
x= 245 delta ffffffff
x= 246 delta ffffffff
x= 247 delta ffffffff
x= 24e delta ffffffff
x= 24f delta ffffffff
x= 262 delta ffffffff
x= 263 delta ffffffff
x= 264 delta ffffffff
x= 265 delta ffffffff
x= 266 delta ffffffff
x= 267 delta ffffffff
x= 26c delta ffffffff
x= 26d delta ffffffff
x= 26e delta ffffffff
x= 26f delta ffffffff
x= 276 delta ffffffff
x= 277 delta ffffffff
x= 288 delta 1
x= 289 delta 1
x= 290 delta 1
x= 291 delta 1
x= 292 delta 1
x= 293 delta 1
x= 298 delta 1
x= 299 delta 1
x= 29a delta 1
x= 29b delta 1
x= 29c delta 1
x= 29d delta 1
x= 2b0 delta 1
x= 2b1 delta 1
x= 2b8 delta 1
x= 2b9 delta 1
x= 2ba delta 1
x= 2bb delta 1
x= 2c6 delta ffffffff
x= 2c7 delta ffffffff
x= 2d8 delta 1
x= 2d9 delta 1
x= 2e4 delta ffffffff
x= 2e5 delta ffffffff
x= 2e6 delta ffffffff
x= 2e7 delta ffffffff
x= 2ee delta ffffffff
x= 2ef delta ffffffff
x= 300 delta 1
x= 301 delta 1
x= 308 delta 1
x= 309 delta 1
x= 30a delta 1
x= 30b delta 1
x= 310 delta 1
x= 311 delta 1
x= 312 delta 1
x= 313 delta 1
x= 314 delta 1
x= 315 delta 1
x= 318 delta 1
x= 319 delta 1
x= 31a delta 1
x= 31b delta 1
x= 31c delta 1
x= 31d delta 1
x= 31e delta 1
x= 31f delta 1
x= 328 delta 1
x= 329 delta 1
x= 330 delta 1
x= 331 delta 1
x= 332 delta 1
x= 333 delta 1
x= 338 delta 1
x= 339 delta 1
x= 33a delta 1
x= 33b delta 1
x= 33c delta 1
x= 33d delta 1
x= 350 delta 1
x= 351 delta 1
...
But no problem with divmod10_t divmodu10(uint32_t n)
#include <iostream>
#include <stdint.h>
using namespace std;
struct divmod10_t
{
uint32_t quot;
uint8_t rem;
};
inline static divmod10_t divmodu10(uint32_t n)
{
divmod10_t res;
// mul 0.8
res.quot = n >> 1;
res.quot += res.quot >> 1;
res.quot += res.quot >> 4;
res.quot += res.quot >> 8;
res.quot += res.quot >> 16;
uint32_t qq = res.quot;
// div 8
res.quot >>= 3;
// rem
//res.rem = uint8_t(n - ((res.quot << 1) + (qq & ~7ul)));
res.rem = uint8_t(n - ((res.quot << 1) + (qq & 0xfffffff8)));
// corr rem , quot
if(res.rem > 9)
{
res.rem -= 10;
res.quot++;
}
return res;
}
uint32_t div10(uint32_t n)
{
//b = (a>>3) - (a>>5) + (a>>7); // b = a/10
//d = (c>>3) - (c>>5) + (c>>7) - (c>>9) + (c>>11) - (c>>13) + (c>>15); // d = c/10
// f = (e>>3) - (e>>5) + (e>>7) - (e>>9) + (e>>11) - (e>>13) + (e>>15) - (e>>17) + (e>>19) - (e>>21) + (e>>23) - (e>>25) + (e>>27) - (e>>29) + (e>>31);
uint32_t ftmp=n;
ftmp=(uint32_t)ftmp>>3;
uint32_t data=0ul;
/*
data+=ftmp;
ftmp=(uint32_t)ftmp>>2; //5
data-=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2; //7
data+=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2; //9
data-=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2; //11
data+=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2; //13
data-=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2; //15
data+=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2; //17
data-=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2; //19
data+=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2; //21
data-=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2; //23
data+=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2; //25
data-=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2; //27
data+=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2; //29
data-=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2; //31
data+=(uint32_t)ftmp;
*/
data= (n>>3) - (n>>5) + (n>>7) - (n>>9) + (n>>11) - (n>>13) + (n>>15) - (n>>17) + (n>>19) - (n>>21) + (n>>23) - (n>>25) + (n>>27) - (n>>29) + (n>>31) ;
return data;
}
int main ()
{
uint32_t i;
//cout<<hex<<(~7ul);
for ( i=0; i<0xFFFFFFFF;i++ )
{
//uint32_t delta= divmodu10(i).quot -(i/10);
uint32_t delta= div10(i) -(i/10);
if (delta!=0)
{
cout<< "\n x= "<<hex<<i;
cout<< " delta "<<hex<<delta;
}
}
cout<< " end ";
}
If uou are use series, problem with delta <>0 (+/-2 ...3 digits ) was expected
What exactly do you mean by this? It is hard to say what your issue is. Are you having a problem or not?
For PC
#include <iostream>
#include <stdint.h>
using namespace std;
//Divide by 10: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 3
//Divide by 5: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 2
/*
struct divmod10_t
{
uint32_t quot;
uint8_t rem;
};
*/
/*
inline static divmod10_t divmodu10(uint32_t n)
{
divmod10_t res;
// mul 0.8
res.quot = n >> 1;
res.quot += res.quot >> 1;
res.quot += res.quot >> 4;
res.quot += res.quot >> 8;
res.quot += res.quot >> 16;
uint32_t qq = res.quot;
// div 8
res.quot >>= 3;
// rem
res.rem = uint8_t(n - ((res.quot << 1) + (qq & ~7ul)));
//res.rem = uint8_t(n - ((res.quot << 1) + (qq & 0xfffffff8)));
// corr rem , quot
if(res.rem > 9)
{
res.rem -= 10;
res.quot++;
}
return res;
}
*/
static uint32_t divmod10(uint32_t in, uint32_t *mod)
{
uint32_t q = (in >> 1) ;
q+= (in >> 2);
q += (q >> 4);
q += (q >> 8);
q += (q >> 16);
q = q >> 3;
uint32_t r = in - ((q << 1) + (q << 3)); // r = in - q*10;
uint32_t div = q + (r > 9); //if r>9 div=q+1, else div=q
if (r > 9) *mod = r - 10; else { *mod = r; }
return div ;
}
int main ()
{
uint32_t i;
//cout<<hex<<(~7ul);
for ( i=0; i<6399999;i++ )
//for ( i=1600000; i<6399990;i++ )
{
// uint32_t delta= divmodu10(i).quot -(i/10);
//uint32_t delta= div10(i) -(i/10);
uint32_t mod;
//uint32_t delta= ( divmodu10((i) ).quot ) -(i/10); //if f<=6399990
// mod=divmodu10((i) ).rem;
//uint32_t delta= ( divmodu10((i ) ).quot ) -(i/10); //if f<=6399990
//uint32_t deltamod=divmodu10((i) ).rem -(i)%10 ;
uint32_t delta= ( divmod10( (i<<1 ),&mod ) ) -(i<<1)/10 ; //if f<=6399990
uint32_t deltamod= mod -(i<<1 ) %10 ;
if( (delta!=0)||(deltamod!=0) )
{
cout<< "\n x= " <<i;
// cout<< " delta x_div10 "<<hex<<delta;
//cout<< " delta x_mod10 "<<deltamod;
cout<< " delta x<<1 _div5 "<<hex<<delta;
cout<< " delta x<<1_mod10 "<<deltamod;
}
}
cout<< " end ";
}
What exactly ? Problem of the resected series , you must use correction of LSB data for
// f = (e>>3) - (e>>5) + (e>>7) - (e>>9) + (e>>11) - (e>>13) + (e>>15) - (e>>17) + (e>>19) - (e>>21) + (e>>23) - (e>>25) + (e>>27) - (e>>29) + (e>>31);
(compare with C++ for PC ).
Here's something I came up with (untested) after thinking about the problem for 20 minutes, with the goal of writing the smallest 16-bit unsigned divide-by-5. 16 AVR instructions (32 bytes):
which I infer count as two words each.
If you do not mind slow, I can beat that with a 14-word C-callable function:
.global div5 div5: MOVW R30, R24 LDI R24, 0 LDI R25, 0 ; omit if you do not mind grindingly slow ; reduce R31 5 at a time RJMP 2f 1: SUBI R31, 5 INC R25 2: CPI R31, 5 BRCC 1b ; R31<=4 ; reduce R31:30 5 at a time RJMP 2f 1: ADIW R24, 1 SBIW R30, 5 BRCC 1b RET
As noted, it is slow.
How to create fast subroutine for decoding uint32_t data to uint8_t bcddigits[7] array of BCD digits (using tmp div 10 , tmp mod 10 or other generic fast method ) and encoder from bcddigits uint8_t[7] and bcddigits uint8_t[4] array of digits for ATMEGA16A?
How to make this program fast using alternative algorithms?
uint8_t * DWORD_TO_BCD_DIGITS(uint32_t indata)
{
uint8_t Digits[7]; //fix
uint32_t temp = indata ; // make into binary degrees value
Digits[6] =(uint8_t) (temp/1000000 ); // (uint8_t) temp/1000000;
temp = (temp%1000000 ); // 0x000F4240
Digits[5] =(uint8_t) (temp/100000 ); // (uint8_t) temp/100000; div32 , Dig_5=(uint8_t) 159999/100000 = 15 =0x0f
temp = (temp%100000 ); // temp - (Dig_5 * 100000); temp%100000, mod32 , temp=159999-1 500 000=99 999
//0x000186A0
Digits[4] =(uint8_t) (temp/10000 ); //(uint8_t) temp/10000; div32 , Dig_4=(uint8_t)99 999/10000=9 =0x09
temp = (temp%10000 ); // temp%10000 , mod32 temp =99 999 - 90000=9 999
//0x00002710
Digits[3] =(uint8_t) (temp/1000 ); // temp/1000; div32 , Dig_3=(uint8_t) 9 999/1000 =9 =0x09
temp = (temp%1000 ); // temp - (Dig_3 * 1000); temp%1000 mod32 temp = 9 999 - 9000 = 999
//0x000003e8
Digits[2] =(uint8_t) (temp/100 ); // temp/100; div32 , Dig_2=(uint8_t) 999/100 =9 =0x09
temp = (temp%100 ); // temp - (Dig_2 * 100); temp%100 mod32 temp = 999 - 900 = 99
//0x00000064
Digits[1] =(uint8_t) (temp/10 ); // temp/10; div32 , Dig_2=(uint8_t) 99/10 =9 =0x09
temp = (temp%10 ); // temp - (Dig_1 * 10); temp%10 div32 , temp = 99 - 90 = 9
//0x0000000A
Digits[0] =(uint8_t) (temp&0x0f); //fix //temp = 9 =0x09
//printf (" %d %d %d %d %d %d ", (int)Digits[5], (int)Digits[4], (int)Digits[3], (int)Digits[2], (int)Digits[1],(int)Digits[0] );
return Digits;
}
I'm reminded of attempts to speed up an/or shrink integer to ascii conversion, where the winner usually (?) involves repeated subtractions of (constant) powers of 10. It's nicely customizable to the exact number of digits you need, as well.
You get a 9 digit ascii result (from 32bits) with less than 20 math operations per digit, vs the 10 divisions (that needs to produce remainder as well) that would be required for the conventional approach.
It would help if you'd ask one question clearly instead of asking several somewhat-similar questions less clearly.
Suggestion: there's very well studied code to convert integers to ASCII. Given a conversion of integer to ASCII, if you want to convert the ASCII values to corresponding numbers, that's just &0xF.
Again for an AVR with HW multiplier it's faster to use it.
I'm not sure if there is code here for 32 bit, but I have code for 5 digits from 16 bit in less that 70 clk which I found good, but El Tangas have solved it in less that 50 clk. (the code is in a thread here somewhere)
Again for an AVR with HW multiplier it's faster to use it.
I'm not sure if there is code here for 32 bit, but I have code for 5 digits from 16 bit in less that 70 clk which I found good, but El Tangas have solved it in less that 50 clk. (the code is in a thread here somewhere)
Fully unrolling the divide-by-5 code I posted earlier results in 51 cycles. So getting 5 digits from a 16-bit number in < 50 must be referring to code that uses the hardware multiplier, right?
Digits[6] =(uint8_t) (temp/1000000 ); // (uint8_t) temp/1000000;
temp = (temp%1000000 ); // 0x000F4240Digits[5] =(uint8_t) (temp/100000 ); // (uint8_t) temp/100000; div32 , Dig_5=(uint8_t) 159999/100000 = 15 =0x0f
temp = (temp%100000 ); // temp - (Dig_5 * 100000); temp%100000, mod32 , temp=159999-1 500 000=99 999
//0x000186A0
But if your goal in this thread is "fast" why aren't you following the guidance from others here and exploring Asm MUL etc in inline assembler?
yes and that is for the div 10 not 5 (but because 10 is a even number I guess it don't really make a difference)
add:
And I guess I need to add in worst case.
uint8_t div32bit_mod (uint32_t in, uint32_t div, uint32_t *mod )
{
*mod=(uint32_t) (in % div) ;
return (uint8_t) (in /div);
}
uint8_t divmod( uint32_t inp, uint32_t div, uint32_t *remainder )
{
return (uint8_t) div32bit_mod( inp, div, remainder );
}
uint8_t *GetDigitsFromUint32_t( uint32_t in) //"uint24_t"
{
uint8_t digits[5];
/*
digits[0]=(uint8_t)(in/100000) ; in=in%100000;
digits[1]=(uint8_t)(in/10000) ; in=in%10000;
digits[2]=(uint8_t)(in/1000) ; in=in%1000;
digits[3]=(uint8_t)(in/100) ; in=in%100;
digits[4]=(uint8_t)(in/10) ; in=in%10;
digits[5]=(uint8_t) (in);
*/
digits[0]=(uint8_t) divmod( in , 0x000186A0 /*100000*/ , &in );
digits[1]=(uint8_t) divmod( in , 0x00002710 /*10000*/, &in );
digits[2]=(uint8_t) divmod( in , 0x000003e8 /*1000*/, &in );
digits[3]=(uint8_t) divmod( in , 0x00000064 /*100*/, &in );
digits[4]=(uint8_t) divmod( in , 0x0000000A /*10*/, &in );
digits[5]=(uint8_t) divmod( in , 0x00000001 /* 1*/ , &in );
return (uint8_t *) digits;
}
#include <iostream>
int main() { //max digit is 1599999 dec
uint32_t in = 1599999; // -> [15] [9] [9] [9] [9] [9] (MSB byte of array must be 0x00...0x0F , other digits are 0x00...0x09)
uint8_t *digits ;
digits = GetDigitsFromUint32_t( in);
printf("%d%d%d%d%d%d ",(int)digits[0],(int)digits[1], (int) digits[2], (int)digits[3],(int)digits[4],(int)digits[5] );
return 0;
}
How to use Brute force algorithm ,Double-Dabble (Shift and Add-3) algorithm, fst analog of the algorithm ,based on divide by 10 ,algorithm based on divide emulated by aproximation and reciprocal multiplication (for 2- 7(8) digits )?
http://blog.malcom.pl/2017/konwe...
How to create fast alternative for prototype of encoder encoder using arithmetic operations (magic numbers and shifting, but you can use "switch() case : res+= " ) ?
uint32_t Getuint32fromBCDDigits( uint8_t *digit )
uint32_t res=0;
for(uint8_t i=0 ;i<digit[0]; i++ ) { res+=1; }
for(uint8_t i=0 ;i<digit[1]; i++ ) { res+=10; }
for(uint8_t i=0 ;i<digit[2]; i++ ) { res+=100; }
for(uint8_t i=0 ;i<digit[3]; i++ ) { res+=1000; }
for(uint8_t i=0 ;i<digit[4]; i++ ) { res+=10000; }
for(uint8_t i=0 ;i<digit[5]; i++ ) { res+=100000; }
for(uint8_t i=0 ;i<digit[6]; i++ ) { res+=1000000; } //fix size
return res;
}
Again for an AVR with HW multiplier it's faster to use it.
I'm not sure if there is code here for 32 bit, but I have code for 5 digits from 16 bit in less that 70 clk which I found good, but El Tangas have solved it in less that 50 clk. (the code is in a thread here somewhere)
I think this is the thread: https://www.avrfreaks.net/forum/optimizing-libc-integer-conversion-routines
There is also stuff here from the same era https://www.avrfreaks.net/forum/integer-string
These are from when I first arrived here at AVRFreaks to learn AVR stuff as an hobby. But I already had experience writing similar algorithms for x86, so I kind of converted that code to AVR.
Since I'm at it, here is the x86 code, from another life, basically... https://board.flatassembler.net/topic.php?t=3924
edit: one more: https://www.avrfreaks.net/forum/avr-assembler-extract-each-3-digit-number-each-register-r23-r24-r25
Yeah, we discuss this stuff on a regular basis
OP keeps posting... things, but I can't comprehend them. Note the fascinating new requirement in the latest post of "MSB byte of array must be 0x00...0x0F", which... is not how BCD works, it's also not how anything else here works, and I don't get it.
I feel like there's a translation issue or something here. The "Getuint32fromBCDDigits" code strikes me as bad; I think most of the modern chips have hardware multiply, so it's probably fine to use that, and that certainly makes it easier.
return (((((digit[6]*10 + digit[5]) * 10 + digit[4]) * 10 + digit[3]) * 10 + digit[2]) + digit[1]) * 10 + digit[0];
So far as I know hardware multiply is cheap, so that's probably fine. Assuming there's always 7 digits...
Sorry, the following code is not of any language I heard of, but I still hope it is clear:
if: x in Y[x] is the byte number of the register Y and: N[4] is the original 32-bit number R[4] is the result of N[4]/5 A[4], B[4] and C[4] are temporary registers A[4]= N[4] C[4]= R[4]= 0 loop: A[4]= A[4]-C[4] if A[4]<256, break B[4]= (A[4]/256)*51 R[4]= R[4]+B[4] C[4]= R[4]*5 goto loop
Edit:
Please note, it is not complete... I tried not to work with more than 32 bits... but the break line as shown is certainly wrong.
/*
* GccApplication2.c
*
* Created: 29.03.2020 10:38:02
* Author : USERPC01
*/
#include <avr/io.h>
#include <math.h>
uint8_t div32bit_mod (uint32_t in, uint32_t div, uint32_t *mod )
{
uint32_t res;
res=(uint32_t)in /div;
*mod=(uint32_t) (in - (res*div)) ;
return (uint8_t) res;
}
uint8_t divmod( uint32_t inp, uint32_t div, uint32_t *remainder )
{
return (uint8_t) div32bit_mod( inp, div, remainder );
}
void GetDigitsFromUint32_t( uint32_t in ) //"uint24_t"
{
uint8_t digits[5];
/*
digits[0]=(uint8_t)(in/100000) ; in=in%100000;
digits[1]=(uint8_t)(in/10000) ; in=in%10000;
digits[2]=(uint8_t)(in/1000) ; in=in%1000;
digits[3]=(uint8_t)(in/100) ; in=in%100;
digits[4]=(uint8_t)(in/10) ; in=in%10;
digits[5]=(uint8_t) (in);
*/
digits[0]=(uint8_t) divmod( in , 0x000186A0 /*100000*/ , &in );
digits[1]=(uint8_t) divmod( in , 0x00002710 /*10000*/, &in );
digits[2]=(uint8_t) divmod( in , 0x000003e8 /*1000*/, &in );
digits[3]=(uint8_t) divmod( in , 0x00000064 /*100*/, &in );
digits[4]=(uint8_t) divmod( in , 0x0000000A /*10*/, &in );
digits[5]=(uint8_t) divmod( in , 0x00000001 /* 1*/ , &in );
//for example, for output to PORTD
DDRD=0xFF;
PORTD=digits[0];
PORTD=digits[1];
PORTD=digits[2];
PORTD=digits[3];
PORTD=digits[4];
PORTD=digits[5];
return ;
}
int main(void)
{
/* Replace with your application code */
while (1)
{
uint32_t in =0; //= 1599999; // -> 15 9 9 9 9 9
//for example, input from port B
in|=(uint32_t)((uint32_t)PINB<<0);
in|=(uint32_t)((uint32_t)PINB<<8);
in|=(uint32_t)((uint32_t)PINB<<16);
in|=(uint32_t)((uint32_t)PINB<<24);
GetDigitsFromUint32_t( in);
}
}
GccApplication2.elf: file format elf32-avr
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000232 00000000 00000000 00000054 2**1
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000000 00800060 00800060 00000286 2**0
CONTENTS, ALLOC, LOAD, DATA
2 .comment 00000030 00000000 00000000 00000286 2**0
CONTENTS, READONLY
3 .note.gnu.avr.deviceinfo 0000003c 00000000 00000000 000002b8 2**2
CONTENTS, READONLY
4 .debug_aranges 00000038 00000000 00000000 000002f4 2**0
CONTENTS, READONLY, DEBUGGING
5 .debug_info 000009cb 00000000 00000000 0000032c 2**0
CONTENTS, READONLY, DEBUGGING
6 .debug_abbrev 0000063f 00000000 00000000 00000cf7 2**0
CONTENTS, READONLY, DEBUGGING
7 .debug_line 000002ac 00000000 00000000 00001336 2**0
CONTENTS, READONLY, DEBUGGING
8 .debug_frame 000000e4 00000000 00000000 000015e4 2**2
CONTENTS, READONLY, DEBUGGING
9 .debug_str 00000342 00000000 00000000 000016c8 2**0
CONTENTS, READONLY, DEBUGGING
10 .debug_loc 000006e5 00000000 00000000 00001a0a 2**0
CONTENTS, READONLY, DEBUGGING
11 .debug_ranges 00000028 00000000 00000000 000020ef 2**0
CONTENTS, READONLY, DEBUGGING
Disassembly of section .text:
00000000 <__vectors>:
0: 0c 94 2a 00 jmp 0x54 ; 0x54 <__ctors_end>
4: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
8: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
c: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
10: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
14: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
18: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
1c: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
20: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
24: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
28: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
2c: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
30: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
34: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
38: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
3c: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
40: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
44: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
48: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
4c: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
50: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
00000054 <__ctors_end>:
54: 11 24 eor r1, r1
56: 1f be out 0x3f, r1 ; 63
58: cf e5 ldi r28, 0x5F ; 95
5a: d4 e0 ldi r29, 0x04 ; 4
5c: de bf out 0x3e, r29 ; 62
5e: cd bf out 0x3d, r28 ; 61
60: 0e 94 b7 00 call 0x16e ; 0x16e <main>
64: 0c 94 17 01 jmp 0x22e ; 0x22e <_exit>
00000068 <__bad_interrupt>:
68: 0c 94 00 00 jmp 0 ; 0x0 <__vectors>
0000006c <GetDigitsFromUint32_t>:
}
void GetDigitsFromUint32_t( uint32_t in ) //"uint24_t"
{
6c: cf 92 push r12
6e: df 92 push r13
70: ef 92 push r14
72: ff 92 push r15
74: 0f 93 push r16
76: 1f 93 push r17
78: cf 93 push r28
7a: df 93 push r29
7c: 6b 01 movw r12, r22
7e: 7c 01 movw r14, r24
#include <math.h>
uint8_t div32bit_mod (uint32_t in, uint32_t div, uint32_t *mod )
{
uint32_t res;
res=(uint32_t)in /div;
80: 20 ea ldi r18, 0xA0 ; 160
82: 36 e8 ldi r19, 0x86 ; 134
84: 41 e0 ldi r20, 0x01 ; 1
86: 50 e0 ldi r21, 0x00 ; 0
88: 0e 94 db 00 call 0x1b6 ; 0x1b6 <__udivmodsi4>
8c: d2 2f mov r29, r18
*mod=(uint32_t) (in - (res*div)) ;
8e: 60 ea ldi r22, 0xA0 ; 160
90: 76 e8 ldi r23, 0x86 ; 134
92: 81 e0 ldi r24, 0x01 ; 1
94: 90 e0 ldi r25, 0x00 ; 0
96: 0e 94 cb 00 call 0x196 ; 0x196 <__mulsi3>
9a: c6 1a sub r12, r22
9c: d7 0a sbc r13, r23
9e: e8 0a sbc r14, r24
a0: f9 0a sbc r15, r25
#include <math.h>
uint8_t div32bit_mod (uint32_t in, uint32_t div, uint32_t *mod )
{
uint32_t res;
res=(uint32_t)in /div;
a2: c7 01 movw r24, r14
a4: b6 01 movw r22, r12
a6: 20 e1 ldi r18, 0x10 ; 16
a8: 37 e2 ldi r19, 0x27 ; 39
aa: 40 e0 ldi r20, 0x00 ; 0
ac: 50 e0 ldi r21, 0x00 ; 0
ae: 0e 94 db 00 call 0x1b6 ; 0x1b6 <__udivmodsi4>
b2: c2 2f mov r28, r18
*mod=(uint32_t) (in - (res*div)) ;
b4: a0 e1 ldi r26, 0x10 ; 16
b6: b7 e2 ldi r27, 0x27 ; 39
b8: 0e 94 fd 00 call 0x1fa ; 0x1fa <__muluhisi3>
bc: c6 1a sub r12, r22
be: d7 0a sbc r13, r23
c0: e8 0a sbc r14, r24
c2: f9 0a sbc r15, r25
#include <math.h>
uint8_t div32bit_mod (uint32_t in, uint32_t div, uint32_t *mod )
{
uint32_t res;
res=(uint32_t)in /div;
c4: c7 01 movw r24, r14
c6: b6 01 movw r22, r12
c8: 28 ee ldi r18, 0xE8 ; 232
ca: 33 e0 ldi r19, 0x03 ; 3
cc: 40 e0 ldi r20, 0x00 ; 0
ce: 50 e0 ldi r21, 0x00 ; 0
d0: 0e 94 db 00 call 0x1b6 ; 0x1b6 <__udivmodsi4>
d4: 12 2f mov r17, r18
*mod=(uint32_t) (in - (res*div)) ;
d6: a8 ee ldi r26, 0xE8 ; 232
d8: b3 e0 ldi r27, 0x03 ; 3
da: 0e 94 fd 00 call 0x1fa ; 0x1fa <__muluhisi3>
de: c6 1a sub r12, r22
e0: d7 0a sbc r13, r23
e2: e8 0a sbc r14, r24
e4: f9 0a sbc r15, r25
#include <math.h>
uint8_t div32bit_mod (uint32_t in, uint32_t div, uint32_t *mod )
{
uint32_t res;
res=(uint32_t)in /div;
e6: c7 01 movw r24, r14
e8: b6 01 movw r22, r12
ea: 24 e6 ldi r18, 0x64 ; 100
ec: 30 e0 ldi r19, 0x00 ; 0
ee: 40 e0 ldi r20, 0x00 ; 0
f0: 50 e0 ldi r21, 0x00 ; 0
f2: 0e 94 db 00 call 0x1b6 ; 0x1b6 <__udivmodsi4>
f6: 02 2f mov r16, r18
*mod=(uint32_t) (in - (res*div)) ;
f8: a4 e6 ldi r26, 0x64 ; 100
fa: b0 e0 ldi r27, 0x00 ; 0
fc: 0e 94 fd 00 call 0x1fa ; 0x1fa <__muluhisi3>
100: c6 1a sub r12, r22
102: d7 0a sbc r13, r23
104: e8 0a sbc r14, r24
106: f9 0a sbc r15, r25
#include <math.h>
uint8_t div32bit_mod (uint32_t in, uint32_t div, uint32_t *mod )
{
uint32_t res;
res=(uint32_t)in /div;
108: c7 01 movw r24, r14
10a: b6 01 movw r22, r12
10c: 2a e0 ldi r18, 0x0A ; 10
10e: 30 e0 ldi r19, 0x00 ; 0
110: 40 e0 ldi r20, 0x00 ; 0
112: 50 e0 ldi r21, 0x00 ; 0
114: 0e 94 db 00 call 0x1b6 ; 0x1b6 <__udivmodsi4>
*mod=(uint32_t) (in - (res*div)) ;
return (uint8_t) res;
118: 82 2f mov r24, r18
11a: 93 2f mov r25, r19
11c: a4 2f mov r26, r20
11e: b5 2f mov r27, r21
120: 88 0f add r24, r24
122: 99 1f adc r25, r25
124: aa 1f adc r26, r26
126: bb 1f adc r27, r27
128: ac 01 movw r20, r24
12a: bd 01 movw r22, r26
12c: 44 0f add r20, r20
12e: 55 1f adc r21, r21
130: 66 1f adc r22, r22
132: 77 1f adc r23, r23
134: 44 0f add r20, r20
136: 55 1f adc r21, r21
138: 66 1f adc r22, r22
13a: 77 1f adc r23, r23
13c: 84 0f add r24, r20
13e: 95 1f adc r25, r21
140: a6 1f adc r26, r22
142: b7 1f adc r27, r23
144: c8 1a sub r12, r24
146: d9 0a sbc r13, r25
148: ea 0a sbc r14, r26
14a: fb 0a sbc r15, r27
digits[4]=(uint8_t) divmod( in , 0x0000000A /*10*/, &in );
digits[5]=(uint8_t) divmod( in , 0x00000001 /* 1*/ , &in );
//for example, for output to PORTD
DDRD=0xFF;
14c: 8f ef ldi r24, 0xFF ; 255
14e: 81 bb out 0x11, r24 ; 17
PORTD=digits[0];
150: d2 bb out 0x12, r29 ; 18
PORTD=digits[1];
152: c2 bb out 0x12, r28 ; 18
PORTD=digits[2];
154: 12 bb out 0x12, r17 ; 18
PORTD=digits[3];
156: 02 bb out 0x12, r16 ; 18
PORTD=digits[4];
158: 22 bb out 0x12, r18 ; 18
PORTD=digits[5];
15a: c2 ba out 0x12, r12 ; 18
return ;
}
15c: df 91 pop r29
15e: cf 91 pop r28
160: 1f 91 pop r17
162: 0f 91 pop r16
164: ff 90 pop r15
166: ef 90 pop r14
168: df 90 pop r13
16a: cf 90 pop r12
16c: 08 95 ret
0000016e <main>:
/* Replace with your application code */
while (1)
{
uint32_t in =0; //= 1599999; // -> 15 9 9 9 9 9
//for example, input from port B
DDRB=0x00;
16e: 17 ba out 0x17, r1 ; 23
in|=(uint32_t)((uint32_t)PINB<<0);
170: 26 b3 in r18, 0x16 ; 22
in|=(uint32_t)((uint32_t)PINB<<8);
172: 36 b3 in r19, 0x16 ; 22
in|=(uint32_t)((uint32_t)PINB<<16);
174: 66 b3 in r22, 0x16 ; 22
176: 86 2f mov r24, r22
178: 90 e0 ldi r25, 0x00 ; 0
17a: a0 e0 ldi r26, 0x00 ; 0
17c: b0 e0 ldi r27, 0x00 ; 0
17e: dc 01 movw r26, r24
180: 99 27 eor r25, r25
182: 88 27 eor r24, r24
184: 93 2b or r25, r19
186: 82 2b or r24, r18
in|=(uint32_t)((uint32_t)PINB<<24);
188: 26 b3 in r18, 0x16 ; 22
GetDigitsFromUint32_t( in);
18a: bc 01 movw r22, r24
18c: cd 01 movw r24, r26
18e: 92 2b or r25, r18
190: 0e 94 36 00 call 0x6c ; 0x6c <GetDigitsFromUint32_t>
194: ec cf rjmp .-40 ; 0x16e <main>
00000196 <__mulsi3>:
196: db 01 movw r26, r22
198: 8f 93 push r24
19a: 9f 93 push r25
19c: 0e 94 fd 00 call 0x1fa ; 0x1fa <__muluhisi3>
1a0: bf 91 pop r27
1a2: af 91 pop r26
1a4: a2 9f mul r26, r18
1a6: 80 0d add r24, r0
1a8: 91 1d adc r25, r1
1aa: a3 9f mul r26, r19
1ac: 90 0d add r25, r0
1ae: b2 9f mul r27, r18
1b0: 90 0d add r25, r0
1b2: 11 24 eor r1, r1
1b4: 08 95 ret
000001b6 <__udivmodsi4>:
1b6: a1 e2 ldi r26, 0x21 ; 33
1b8: 1a 2e mov r1, r26
1ba: aa 1b sub r26, r26
1bc: bb 1b sub r27, r27
1be: fd 01 movw r30, r26
1c0: 0d c0 rjmp .+26 ; 0x1dc <__udivmodsi4_ep>
000001c2 <__udivmodsi4_loop>:
1c2: aa 1f adc r26, r26
1c4: bb 1f adc r27, r27
1c6: ee 1f adc r30, r30
1c8: ff 1f adc r31, r31
1ca: a2 17 cp r26, r18
1cc: b3 07 cpc r27, r19
1ce: e4 07 cpc r30, r20
1d0: f5 07 cpc r31, r21
1d2: 20 f0 brcs .+8 ; 0x1dc <__udivmodsi4_ep>
1d4: a2 1b sub r26, r18
1d6: b3 0b sbc r27, r19
1d8: e4 0b sbc r30, r20
1da: f5 0b sbc r31, r21
000001dc <__udivmodsi4_ep>:
1dc: 66 1f adc r22, r22
1de: 77 1f adc r23, r23
1e0: 88 1f adc r24, r24
1e2: 99 1f adc r25, r25
1e4: 1a 94 dec r1
1e6: 69 f7 brne .-38 ; 0x1c2 <__udivmodsi4_loop>
1e8: 60 95 com r22
1ea: 70 95 com r23
1ec: 80 95 com r24
1ee: 90 95 com r25
1f0: 9b 01 movw r18, r22
1f2: ac 01 movw r20, r24
1f4: bd 01 movw r22, r26
1f6: cf 01 movw r24, r30
1f8: 08 95 ret
000001fa <__muluhisi3>:
1fa: 0e 94 08 01 call 0x210 ; 0x210 <__umulhisi3>
1fe: a5 9f mul r26, r21
200: 90 0d add r25, r0
202: b4 9f mul r27, r20
204: 90 0d add r25, r0
206: a4 9f mul r26, r20
208: 80 0d add r24, r0
20a: 91 1d adc r25, r1
20c: 11 24 eor r1, r1
20e: 08 95 ret
00000210 <__umulhisi3>:
210: a2 9f mul r26, r18
212: b0 01 movw r22, r0
214: b3 9f mul r27, r19
216: c0 01 movw r24, r0
218: a3 9f mul r26, r19
21a: 70 0d add r23, r0
21c: 81 1d adc r24, r1
21e: 11 24 eor r1, r1
220: 91 1d adc r25, r1
222: b2 9f mul r27, r18
224: 70 0d add r23, r0
226: 81 1d adc r24, r1
228: 11 24 eor r1, r1
22a: 91 1d adc r25, r1
22c: 08 95 ret
0000022e <_exit>:
22e: f8 94 cli
00000230 <__stop_program>:
230: ff cf rjmp .-2 ; 0x230 <__stop_program>
#include <avr/io.h>
#include <math.h>
/*
uint8_t div32bit_mod (uint32_t in, uint32_t div, uint32_t *mod )
{
uint32_t res;
res=(uint32_t)in /div;
*mod=(uint32_t) (in - (res*div)) ;
*mod=(uint32_t)(in%div); // (in - (res*div)) ;
return (uint8_t)(in/div) ;// res;
}*/
uint8_t divmod( uint32_t inp, uint32_t div, uint32_t *mod )
{
//return (uint8_t) div32bit_mod( inp, div, remainder );
*mod=(uint32_t)(inp%div); // (in - (res*div)) ;
return (uint8_t)(inp/div) ;// res;
}
void GetDigitsFromUint32_t( uint32_t in ) //"uint24_t"
{
uint8_t digits[5];
/*
digits[0]=(uint8_t)(in/100000) ; in=in%100000;
digits[1]=(uint8_t)(in/10000) ; in=in%10000;
digits[2]=(uint8_t)(in/1000) ; in=in%1000;
digits[3]=(uint8_t)(in/100) ; in=in%100;
digits[4]=(uint8_t)(in/10) ; in=in%10;
digits[5]=(uint8_t) (in);
*/
digits[0]=(uint8_t) divmod( in , 0x000186A0 /*100000*/ , &in );
digits[1]=(uint8_t) divmod( in , 0x00002710 /*10000*/, &in );
digits[2]=(uint8_t) divmod( in , 0x000003e8 /*1000*/, &in );
digits[3]=(uint8_t) divmod( in , 0x00000064 /*100*/, &in );
digits[4]=(uint8_t) divmod( in , 0x0000000A /*10*/, &in );
digits[5]=(uint8_t) divmod( in , 0x00000001 /* 1*/ , &in );
//for example, for output to PORTD
DDRD=0xFF;
PORTD=digits[0];
PORTD=digits[1];
PORTD=digits[2];
PORTD=digits[3];
PORTD=digits[4];
PORTD=digits[5];
return ;
}
int main(void)
{
/* Replace with your application code */
while (1)
{
uint32_t in =0; //= 1599999; // -> 15 9 9 9 9 9
//for example, input from port B
DDRB=0x00;
in|=(uint32_t)((uint32_t)PINB<<0);
in|=(uint32_t)((uint32_t)PINB<<8);
in|=(uint32_t)((uint32_t)PINB<<16);
in|=(uint32_t)((uint32_t)PINB<<24);
GetDigitsFromUint32_t( in);
}
}
GccApplication2.elf: file format elf32-avr
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000142 00000000 00000000 00000054 2**1
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00000000 00800060 00800060 00000196 2**0
CONTENTS, ALLOC, LOAD, DATA
2 .comment 00000030 00000000 00000000 00000196 2**0
CONTENTS, READONLY
3 .note.gnu.avr.deviceinfo 0000003c 00000000 00000000 000001c8 2**2
CONTENTS, READONLY
4 .debug_aranges 00000030 00000000 00000000 00000204 2**0
CONTENTS, READONLY, DEBUGGING
5 .debug_info 00000781 00000000 00000000 00000234 2**0
CONTENTS, READONLY, DEBUGGING
6 .debug_abbrev 0000060e 00000000 00000000 000009b5 2**0
CONTENTS, READONLY, DEBUGGING
7 .debug_line 00000254 00000000 00000000 00000fc3 2**0
CONTENTS, READONLY, DEBUGGING
8 .debug_frame 00000064 00000000 00000000 00001218 2**2
CONTENTS, READONLY, DEBUGGING
9 .debug_str 0000032c 00000000 00000000 0000127c 2**0
CONTENTS, READONLY, DEBUGGING
10 .debug_loc 00000401 00000000 00000000 000015a8 2**0
CONTENTS, READONLY, DEBUGGING
11 .debug_ranges 00000020 00000000 00000000 000019a9 2**0
CONTENTS, READONLY, DEBUGGING
Disassembly of section .text:
00000000 <__vectors>:
0: 0c 94 2a 00 jmp 0x54 ; 0x54 <__ctors_end>
4: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
8: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
c: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
10: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
14: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
18: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
1c: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
20: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
24: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
28: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
2c: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
30: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
34: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
38: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
3c: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
40: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
44: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
48: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
4c: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
50: 0c 94 34 00 jmp 0x68 ; 0x68 <__bad_interrupt>
00000054 <__ctors_end>:
54: 11 24 eor r1, r1
56: 1f be out 0x3f, r1 ; 63
58: cf e5 ldi r28, 0x5F ; 95
5a: d4 e0 ldi r29, 0x04 ; 4
5c: de bf out 0x3e, r29 ; 62
5e: cd bf out 0x3d, r28 ; 61
60: 0e 94 69 00 call 0xd2 ; 0xd2 <main>
64: 0c 94 9f 00 jmp 0x13e ; 0x13e <_exit>
00000068 <__bad_interrupt>:
68: 0c 94 00 00 jmp 0 ; 0x0 <__vectors>
0000006c <GetDigitsFromUint32_t>:
return (uint8_t)(inp/div) ;// res;
}
void GetDigitsFromUint32_t( uint32_t in ) //"uint24_t"
{
6c: 0f 93 push r16
6e: 1f 93 push r17
70: cf 93 push r28
72: df 93 push r29
}*/
uint8_t divmod( uint32_t inp, uint32_t div, uint32_t *mod )
{
//return (uint8_t) div32bit_mod( inp, div, remainder );
*mod=(uint32_t)(inp%div); // (in - (res*div)) ;
74: 20 ea ldi r18, 0xA0 ; 160
76: 36 e8 ldi r19, 0x86 ; 134
78: 41 e0 ldi r20, 0x01 ; 1
7a: 50 e0 ldi r21, 0x00 ; 0
7c: 0e 94 7d 00 call 0xfa ; 0xfa <__udivmodsi4>
80: 02 2f mov r16, r18
82: 20 e1 ldi r18, 0x10 ; 16
84: 37 e2 ldi r19, 0x27 ; 39
86: 40 e0 ldi r20, 0x00 ; 0
88: 50 e0 ldi r21, 0x00 ; 0
8a: 0e 94 7d 00 call 0xfa ; 0xfa <__udivmodsi4>
8e: 12 2f mov r17, r18
90: 28 ee ldi r18, 0xE8 ; 232
92: 33 e0 ldi r19, 0x03 ; 3
94: 40 e0 ldi r20, 0x00 ; 0
96: 50 e0 ldi r21, 0x00 ; 0
98: 0e 94 7d 00 call 0xfa ; 0xfa <__udivmodsi4>
9c: d2 2f mov r29, r18
9e: 24 e6 ldi r18, 0x64 ; 100
a0: 30 e0 ldi r19, 0x00 ; 0
a2: 40 e0 ldi r20, 0x00 ; 0
a4: 50 e0 ldi r21, 0x00 ; 0
a6: 0e 94 7d 00 call 0xfa ; 0xfa <__udivmodsi4>
aa: c2 2f mov r28, r18
return (uint8_t)(inp/div) ;// res;
ac: 2a e0 ldi r18, 0x0A ; 10
ae: 30 e0 ldi r19, 0x00 ; 0
b0: 40 e0 ldi r20, 0x00 ; 0
b2: 50 e0 ldi r21, 0x00 ; 0
b4: 0e 94 7d 00 call 0xfa ; 0xfa <__udivmodsi4>
digits[4]=(uint8_t) divmod( in , 0x0000000A /*10*/, &in );
digits[5]=(uint8_t) divmod( in , 0x00000001 /* 1*/ , &in );
//for example, for output to PORTD
DDRD=0xFF;
b8: 8f ef ldi r24, 0xFF ; 255
ba: 81 bb out 0x11, r24 ; 17
PORTD=digits[0];
bc: 02 bb out 0x12, r16 ; 18
PORTD=digits[1];
be: 12 bb out 0x12, r17 ; 18
PORTD=digits[2];
c0: d2 bb out 0x12, r29 ; 18
PORTD=digits[3];
c2: c2 bb out 0x12, r28 ; 18
PORTD=digits[4];
c4: 22 bb out 0x12, r18 ; 18
PORTD=digits[5];
c6: 62 bb out 0x12, r22 ; 18
return ;
}
c8: df 91 pop r29
ca: cf 91 pop r28
cc: 1f 91 pop r17
ce: 0f 91 pop r16
d0: 08 95 ret
000000d2 <main>:
/* Replace with your application code */
while (1)
{
uint32_t in =0; //= 1599999; // -> 15 9 9 9 9 9
//for example, input from port B
DDRB=0x00;
d2: 17 ba out 0x17, r1 ; 23
in|=(uint32_t)((uint32_t)PINB<<0);
d4: 26 b3 in r18, 0x16 ; 22
in|=(uint32_t)((uint32_t)PINB<<8);
d6: 36 b3 in r19, 0x16 ; 22
in|=(uint32_t)((uint32_t)PINB<<16);
d8: 66 b3 in r22, 0x16 ; 22
da: 86 2f mov r24, r22
dc: 90 e0 ldi r25, 0x00 ; 0
de: a0 e0 ldi r26, 0x00 ; 0
e0: b0 e0 ldi r27, 0x00 ; 0
e2: dc 01 movw r26, r24
e4: 99 27 eor r25, r25
e6: 88 27 eor r24, r24
e8: 93 2b or r25, r19
ea: 82 2b or r24, r18
in|=(uint32_t)((uint32_t)PINB<<24);
ec: 26 b3 in r18, 0x16 ; 22
GetDigitsFromUint32_t( in);
ee: bc 01 movw r22, r24
f0: cd 01 movw r24, r26
f2: 92 2b or r25, r18
f4: 0e 94 36 00 call 0x6c ; 0x6c <GetDigitsFromUint32_t>
f8: ec cf rjmp .-40 ; 0xd2 <main>
000000fa <__udivmodsi4>:
fa: a1 e2 ldi r26, 0x21 ; 33
fc: 1a 2e mov r1, r26
fe: aa 1b sub r26, r26
100: bb 1b sub r27, r27
102: fd 01 movw r30, r26
104: 0d c0 rjmp .+26 ; 0x120 <__udivmodsi4_ep>
00000106 <__udivmodsi4_loop>:
106: aa 1f adc r26, r26
108: bb 1f adc r27, r27
10a: ee 1f adc r30, r30
10c: ff 1f adc r31, r31
10e: a2 17 cp r26, r18
110: b3 07 cpc r27, r19
112: e4 07 cpc r30, r20
114: f5 07 cpc r31, r21
116: 20 f0 brcs .+8 ; 0x120 <__udivmodsi4_ep>
118: a2 1b sub r26, r18
11a: b3 0b sbc r27, r19
11c: e4 0b sbc r30, r20
11e: f5 0b sbc r31, r21
00000120 <__udivmodsi4_ep>:
120: 66 1f adc r22, r22
122: 77 1f adc r23, r23
124: 88 1f adc r24, r24
126: 99 1f adc r25, r25
128: 1a 94 dec r1
12a: 69 f7 brne .-38 ; 0x106 <__udivmodsi4_loop>
12c: 60 95 com r22
12e: 70 95 com r23
130: 80 95 com r24
132: 90 95 com r25
134: 9b 01 movw r18, r22
136: ac 01 movw r20, r24
138: bd 01 movw r22, r26
13a: cf 01 movw r24, r30
13c: 08 95 ret
0000013e <_exit>:
13e: f8 94 cli
00000140 <__stop_program>:
140: ff cf rjmp .-2 ; 0x140 <__stop_program>
cycle counter 2953
frequency 12.000MHz
stop watch 246,08 us ; after execuiting GetDigitsFromUint32_t( in);
How to create more fast algorithm of decoding (input and output bytes are for example in this program for virtual modelling only )?
/*
* GccApplication2.c
*
* Created: 29.03.2020 10:38:02
* Author : USERPC01
*/
#include <avr/io.h>
#include <math.h>
/*
uint8_t div32bit_mod (uint32_t in, uint32_t div, uint32_t *mod )
{
uint32_t res;
res=(uint32_t)in /div;
*mod=(uint32_t) (in - (res*div)) ;
*mod=(uint32_t)(in%div); // (in - (res*div)) ;
return (uint8_t)(in/div) ;// res;
}*/
uint8_t divmod( uint32_t inp, uint32_t div, uint32_t *mod )
{
//return (uint8_t) div32bit_mod( inp, div, remainder );
*mod=(uint32_t)(inp%div); // (in - (res*div)) ;
return (uint8_t)(inp/div) ;// res;
}
void GetDigitsFromUint32_t( uint32_t in ) //"uint24_t"
{
uint8_t digits[5];
/*
digits[0]=(uint8_t)(in/100000) ; in=in%100000;
digits[1]=(uint8_t)(in/10000) ; in=in%10000;
digits[2]=(uint8_t)(in/1000) ; in=in%1000;
digits[3]=(uint8_t)(in/100) ; in=in%100;
digits[4]=(uint8_t)(in/10) ; in=in%10;
digits[5]=(uint8_t) (in);
*/
digits[0]=(uint8_t) divmod( in , 0x000186A0 /*100000*/ , &in );
digits[1]=(uint8_t) divmod( in , 0x00002710 /*10000*/, &in );
digits[2]=(uint8_t) divmod( in , 0x000003e8 /*1000*/, &in );
digits[3]=(uint8_t) divmod( in , 0x00000064 /*100*/, &in );
digits[4]=(uint8_t) divmod( in , 0x0000000A /*10*/, &in );
digits[5]=(uint8_t) divmod( in , 0x00000001 /* 1*/ , &in );
//for example, for output to PORTD
DDRD=0xFF;
PORTD=digits[0];
PORTD=digits[1];
PORTD=digits[2];
PORTD=digits[3];
PORTD=digits[4];
PORTD=digits[5];
return ;
}
uint32_t DivideBy10(uint32_t inp)
{
return (uint32_t) (inp/10);
}
//for debugger test only
int main(void)
{
/* Replace with your application code */
while (1)
{
uint32_t in =0; //= 1599999; // -> 15 9 9 9 9 9
//for example, input from port B
DDRB=0;
DDRB=0x00;
in|=(uint32_t)((uint32_t)PINB<<0);
in|=(uint32_t)((uint32_t)PINB<<8);
in|=(uint32_t)((uint32_t)PINB<<16);
in|=(uint32_t)((uint32_t)PINB<<24);
// GetDigitsFromUint32_t( in);
uint32_t out;
out= DivideBy10(in);
//for example, out bytes to port D
DDRD=0xff;
PORTD=(uint8_t)((out&0x000000ff)>>0);
PORTD=(uint8_t)((out&0x0000ff00)>>8);
PORTD=(uint8_t)((out&0x00ff0000)>>16);
PORTD=(uint8_t)((out&0xff000000)>>24);
}
}