Fast dividing by 5 and by 10 for uint32_t input data for ATMEGA16A

Go To Last Post
68 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How to write fast    dividing   by 5 and by 10 for uint32_t  (4 bytes)  input data  for ATMega16a using assembler ? 

Dividing by 4   is shifting left  x div 4 =  (x>>2)   . x*2= (x<<1).  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rpz3598 wrote:
How to write fast    dividing   by 5 and by 10 for uint32_t  (4 bytes)  input data  for ATMega16a using assembler ? 

Interesting.  Now, you probably should tell your goals -- just "fast" doesn't mean much.

 

It also might help if you give the reason for your quest.

 

That said, have you searched this forum?  IIRC there have been extensive thread(s) about this.

https://www.avrfreaks.net/forum/... [Dave Van Horn; Sean Ellis; Jesper; ...]

https://www.avrfreaks.net/forum/...

...

 

 

 

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Sun. Mar 22, 2020 - 08:43 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Fast divide-by-10 is commonly discussed, but you need to decide whether you also want to get the remainder...

https://forum.arduino.cc/index.p...

https://forum.arduino.cc/index.p...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As said it has been here many times.

So look around.

 

Because (your) AVR has a HW multiplayer there is no doubt that mul with 1/5 and 1/10 will be the fastest way.

 

Do you need it 100% accurate ?

 

  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rpz3598 wrote:
How to write fast    dividing   by 5 and by 10 for uint32_t  (4 bytes)  input data  for ATMega16a using assembler ? 
What range of values?

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rpz3598 wrote:
using assembler

Before considering that implementation detail, think about the general requirement ...

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've not even looked at "prior art" but I can't help noticing that /10 is /5 then /2 - the last bit of which must be pretty easy to get a computer to do! So the challenge is presumably just /5 ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You could use the fact that the series 1/4 - 1/16 + 1/64 - 1/256 + 1/1024 - 1/4096 ... converges to 1/5.

 

 

/Jakob Selbing

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

that is what you when you mul with 1/5.

 

8 bit mul fast 256/5  = 51  (51.2) 

8 bit using shift        =205  (204.8)

16 bit fast 2**16/5  =13107 (13107,2)

16 shift                   =52429 (52428.8)

...

...

 

And remember that reminder on a chip with HW mul is fast (mul result with 5 and subtract from the org number )

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jaksel wrote:
You could use the fact that the series 1/4 - 1/16 + 1/64 - 1/256 + 1/1024 - 1/4096 ... converges to 1/5.

 

Indeed. In other words,

 

n/5 = n>>2 - n>>4 + n>>6 - n>>8 and so on...

 

But still, using multiply by the inverse is probably faster, since the ATMega16 has an hw multiplier.

Last Edited: Mon. Mar 23, 2020 - 11:54 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:

jaksel wrote:
You could use the fact that the series 1/4 - 1/16 + 1/64 - 1/256 + 1/1024 - 1/4096 ... converges to 1/5.

 

Indeed. In other words,

 

n/5 = n>>2 - n>>4 + n>>6 - n>>8 and so on...

 

But still, using multiply by the inverse is probably faster, since the ATMega16 has an hw multiplier.

 

But how do you represent the inverse in a uint?

 

 

If you don't know my whole story, keep your mouth shut.

If you know my whole story, you're an accomplice. Keep your mouth shut. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But how do you represent the inverse in a uint?

Why surely by sliding the base 2 "decimal point" & keeping track of that.  Say you have 8 bit number to divide by 5, mult by 1/5th using the closest 8 bit number representation (thus 1/5  as 1 part in 256), slid over so the msb is 1 (keep track of the number of shifts) ...do the 8x8 mult.   Knowing the number of shifts, you can grab the 16bit answer and slide it back to  grab the result.  You can improve by using 16bit instead of 8, or 24, etc (assuming divisor is not exactly represented by 8 bits)

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Mon. Mar 23, 2020 - 03:51 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, let's exemplify the whole process:

1/5 = 0.2, then we can use a calculator like this  https://www.exploringbinary.com/binary-converter/

 

So, 0.2 decimal is 0.00110011001100110011001100110011001100110011001100110011001100... in binary. Now, we shift <<2 to get more significant bits, in other words, 0.8 = 0.110011001100110011001100110011001100110011001100110011001100...

Since this is an infinite fraction, we can round to use just 8 significant bits: 0.8 = 0.11001101 then multiply by 256 (<<8) and we get the 11001101 (decimal 205) "magic number" that we can multiply to eventually get the result of the division.

 

For example, to divide 20 by 5, we multiply by 205, so 20*205 = 4100 that is 0001_0000 0000_0100 binary. Now we need to "undo" the shifts we did, >>10 and obtain the result 100 binary, which is of course 4 decimal.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 Now we need to "undo" the shifts we did, >>10

And depending on how much shifting ("bit grabbing") you need (say it were 5 shifts), it may sometimes be better to use a larger 16bit/24/etc multiplier so you can simply grab result bytes with no shifting & perhaps end up with an even more accuracy as a bonus.  It's a bit of a sport to find the optimal implementation.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

While this has been discussed many times before as others have mentioned, I haven't seen anything conclusive on what is the smallest and what is the fastest.

Here's something I came up with (untested) after thinking about the problem for 20 minutes, with the goal of writing the smallest 16-bit unsigned divide-by-5.  16 AVR instructions (32 bytes):

ldi r23, hi8(52429)
ldi r22, lo8(52429)
movw r18, r24
loop:
lsrw r18
lsrw r22
brcc .+2
addw r18, r24
addiw r22, 0
brne loop
; now divide result by 4
lsrw r18
lsrw r18

It's just a 16-bit * 16-bit multiply with the lower 16 bits of the result discarded.  The fixed multiplier is the magic number to divide by 1.25, and then the result is divided by 4 at the end.

Using hardware multiply, you could probably cut the code size in half, and get something that is over an order of magnitude faster.  For AVRs without hardware multiply, I'm curious to find out if there is a smaller implementation than what I just whipped up.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

Last Edited: Tue. Mar 24, 2020 - 11:45 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
with the goal of writing the smallest 16-bit unsigned divide-by-5. 
Cool.  Simple extension, then, for the OP's needed 32 bit?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why (X*(1/1.25))/4 instead of just X*(1/5) ?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

Why (X*(1/1.25))/4 instead of just X*(1/5) ?

 

 

Because the higher 2 bits of the binary representation of 0.2 are zero, so it's better to multiply by 0.8 instead, you gain 2 extra bits of precision for intermediate calculations, and do the >>2 shift as a final step.

 

edit: to be precise, the higher 2 bit of the fractional part (mantissa).

Last Edited: Wed. Mar 25, 2020 - 02:26 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

[CPP]

uint8_t  a,b;
uint16_t c,d;
uint32_t e,f;

b = (a>>3) - (a>>5) + (a>>7);  // b = a/10
d = (c>>3) - (c>>5) + (c>>7) - (c>>9) + (c>>11) - (c>>13) + (c>>15);   // d = c/10
f =  (e>>3) - (e>>5) + (e>>7) - (e>>9) + (e>>11) - (e>>13) + (e>>15)  - (e>>17) + (e>>19) - (e>>21) + (e>>23) - (e>>25) + (e>>27)  - (e>>29) + (e>>31);   // f = e/10

d = (c>>3) - (c>>5) + (c>>7) - (c>>9) + (c>>11) - (c>>13) + (c>>15);   // d = c/10

d = (c- (c- (c- (c- (c- (c- (c>>2) )>>2  )>>2  )>>2  )>>2  )>>2  )>>3; 

[/CPP]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
b = (a>>3) - (a>>5) + (a>>7);  // b = a/10
d = (c>>3) - (c>>5) + (c>>7) - (c>>9) + (c>>11) - (c>>13) + (c>>15);   // d = c/10
f =  (e>>3) - (e>>5) + (e>>7) - (e>>9) + (e>>11) - (e>>13) + (e>>15)  - (e>>17) + (e>>19) - (e>>21) + (e>>23) - (e>>25) + (e>>27)  - (e>>29) + (e>>31);   // f = e/10

d = (c>>3) - (c>>5) + (c>>7) - (c>>9) + (c>>11) - (c>>13) + (c>>15);   // d = c/10

d = (c- (c- (c- (c- (c- (c- (c>>2) )>>2  )>>2  )>>2  )>>2  )>>2  )>>3; 

??????

 

The thread title said "fast dividing".  Did you look at the Asm the C compiler churns out for all those shifts? Or did you use the simulator stopwatch to count the cycles in each case? (You would, of course, have to make things volatile or the entire code will be discarded anyway - and hence "very fast" ;-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

http://we.easyelectronics.ru/Soft/preobrazuem-v-stroku-chast-1-celye-chisla.html

Programs for PC for testing fast dividing by zero  and  subroutines for fast dividing by 10 

 

[CPP]

#include <iostream>
#include <stdint.h>
using namespace std; 
struct divmod10_t
{
    uint32_t quot;
    uint8_t rem;
};
 
 
inline static divmod10_t divmodu10(uint32_t n)
{
    divmod10_t res;
// mul 0.8
    res.quot = (n >> 1);
    res.quot += res.quot >> 1;
    res.quot += res.quot >> 4;
    res.quot += res.quot >> 8;
    res.quot += res.quot >> 16;
    uint32_t qq = res.quot;
// div 8
    res.quot >>= 3;
// rem
    res.rem = uint8_t(n - ((res.quot << 1) + (qq & ~7ul)));
// corr rem , quot
    if(res.rem > 9)
    {
        res.rem -= 10;
        res.quot++;
    }
    
    
    return res;
}
 
int main ()
{
 
uint32_t i; 
for ( i=0; i<0xFFFFFFFF;i++ )
{
uint32_t delta=  divmodu10(i).quot -(i/10);
if (delta!=0) 
{
    
  cout<< "\n x= "<<hex<<i;
   cout<< " delta "<<hex<<delta;
    
}
 
}
 
 cout<< " end ";
 
 
 
//cout<<hex<<(~7ul);   ->0x ff ff ff f8
 
}

 

[/CPP]

[CPP]

 

struct divmod10_t
{
    uint32_t quot;
    uint8_t rem;
};
inline static divmod10_t divmodu10(uint32_t n)
{
    divmod10_t res;
// умножаем на 0.8
    res.quot = n >> 1;
    res.quot += res.quot >> 1;
    res.quot += res.quot >> 4;
    res.quot += res.quot >> 8;
    res.quot += res.quot >> 16;
    uint32_t qq = res.quot;
// делим на 8
    res.quot >>= 3;
// вычисляем остаток
    res.rem = uint8_t(n - ((res.quot << 1) + (qq & ~7ul)));
// корректируем остаток и частное
    if(res.rem > 9)
    {
        res.rem -= 10;
        res.quot++;
    }
    return res;
}

[/CPP]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rebuild for assembler 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Program is need  for DVDC control for PLL  (for obtaining codes for divider with variable dividing coefficient from frequency data (uint32_t or 6...7 digits BCD -> uint32_t))    , for PLL design  .  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Umm...  If the data's already in BCD, why not just drop the low digit?  /10 right there.  Shift the resulting uint32_t left once for /5.  S.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Program is need  for DVDC control for PLL

What are you talking about, this is not a mystery show...what is DVDC?

Are you saying  from your coding, that you already have the answer? 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:

While this has been discussed many times before as others have mentioned, I haven't seen anything conclusive on what is the smallest and what is the fastest.

Here's something I came up with (untested) after thinking about the problem for 20 minutes, with the goal of writing the smallest 16-bit unsigned divide-by-5.  16 AVR instructions (32 bytes):

ldi r23, hi8(52429)
ldi r22, lo8(52429)
movw r18, r24
loop:
lsrw r18
lsrw r22
brcc .+2
addw r18, r24
addiw r22, 0
brne loop
; now divide result by 4
lsrw r18
lsrw r18

It's just a 16-bit * 16-bit multiply with the lower 16 bits of the result discarded.  The fixed multiplier is the magic number to divide by 1.25, and then the result is divided by 4 at the end.

Using hardware multiply, you could probably cut the code size in half, and get something that is over an order of magnitude faster.  For AVRs without hardware multiply, I'm curious to find out if there is a smaller implementation than what I just whipped up.

 

 

I just noticed a simple speed optimization would be to change "brcc .+2" to "brcc loop".  That cuts the loop time from 10 to 6 cycles when there is no add.  And if my math is right, that cuts the total time from 176 cycles to 134.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Scroungre wrote:

Umm...  If the data's already in BCD, why not just drop the low digit?  /10 right there.  Shift the resulting uint32_t left once for /5.  S.

 

A left shift of dropping the low digit isn't quite the same is dividing by 5: Consider what happens if the low digit was 5.

 

Anyway, general observation:

 

This is a well-studied problem. You probably want to do a multiplication-based approach.

 

Things to consider: First, AVR has no barrel shifters, meaning that rotate is O(N) on number of bits, meaning it's slow, which is the opposite of most desktop CPUs, where shifts are basically free.

 

Divide by 10: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 3
Divide by 5: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 2

But beware, those are probably only accurate for 16-bit values. Note also that the >>16 can be trivially bypassed with a bit of pointer math.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If uou  are use  series, problem with delta <>0  (+/-2 ...3 digits ) was expected 

 

data=    (n>>3) - (n>>5) + (n>>7)  - (n>>9) + (n>>11) - (n>>13) + (n>>15)   - (n>>17) + (n>>19) - (n>>21) + (n>>23) - (n>>25) + (n>>27)  - (n>>29) + (n>>31) ; 
 

 x= 8 delta 1
 x= 9 delta 1
 x= 10 delta 1
 x= 11 delta 1
 x= 12 delta 1
 x= 13 delta 1
 x= 18 delta 1
 x= 19 delta 1
 x= 1a delta 1
 x= 1b delta 1
 x= 1c delta 1
 x= 1d delta 1
 x= 30 delta 1
 x= 31 delta 1
 x= 38 delta 1
 x= 39 delta 1
 x= 3a delta 1
 x= 3b delta 1
 x= 46 delta ffffffff
 x= 47 delta ffffffff
 x= 58 delta 1
 x= 59 delta 1
 x= 64 delta ffffffff
 x= 65 delta ffffffff
 x= 66 delta ffffffff
 x= 67 delta ffffffff
 x= 6e delta ffffffff
 x= 6f delta ffffffff
 x= 80 delta 1
 x= 81 delta 1
 x= 88 delta 1
 x= 89 delta 1
 x= 8a delta 1
 x= 8b delta 1
 x= 90 delta 1
 x= 91 delta 1
 x= 92 delta 1
 x= 93 delta 1
 x= 94 delta 1
 x= 95 delta 1
 x= 98 delta 1
 x= 99 delta 1
 x= 9a delta 1
 x= 9b delta 1
 x= 9c delta 1
 x= 9d delta 1
 x= 9e delta 1
 x= 9f delta 1
 x= a8 delta 1
 x= a9 delta 1
 x= b0 delta 1
 x= b1 delta 1
 x= b2 delta 1
 x= b3 delta 1
 x= b8 delta 1
 x= b9 delta 1
 x= ba delta 1
 x= bb delta 1
 x= bc delta 1
 x= bd delta 1
 x= d0 delta 1
 x= d1 delta 1
 x= d8 delta 1
 x= d9 delta 1
 x= da delta 1
 x= db delta 1
 x= e6 delta ffffffff
 x= e7 delta ffffffff
 x= f8 delta 1
 x= f9 delta 1
 x= 100 delta 1
 x= 101 delta 1
 x= 102 delta 1
 x= 103 delta 1
 x= 108 delta 1
 x= 109 delta 1
 x= 10a delta 1
 x= 10b delta 1
 x= 10c delta 1
 x= 10d delta 1
 x= 110 delta 1
 x= 111 delta 1
 x= 112 delta 1
 x= 113 delta 1
 x= 114 delta 1
 x= 115 delta 1
 x= 116 delta 1
 x= 117 delta 1
 x= 118 delta 1
 x= 119 delta 1
 x= 11a delta 1
 x= 11b delta 1
 x= 11c delta 1
 x= 11d delta 1
 x= 11e delta 1
 x= 11f delta 1
 x= 120 delta 1
 x= 121 delta 1
 x= 128 delta 1
 x= 129 delta 1
 x= 12a delta 1
 x= 12b delta 1
 x= 130 delta 1
 x= 131 delta 1
 x= 132 delta 1
 x= 133 delta 1
 x= 134 delta 1
 x= 135 delta 1
 x= 138 delta 1
 x= 139 delta 1
 x= 13a delta 1
 x= 13b delta 1
 x= 13c delta 1
 x= 13d delta 1
 x= 13e delta 1
 x= 13f delta 1
 x= 148 delta 1
 x= 149 delta 1
 x= 150 delta 1
 x= 151 delta 1
 x= 152 delta 1
 x= 153 delta 1
 x= 158 delta 1
 x= 159 delta 1
 x= 15a delta 1
 x= 15b delta 1
 x= 15c delta 1
 x= 15d delta 1
 x= 170 delta 1
 x= 171 delta 1
 x= 178 delta 1
 x= 179 delta 1
 x= 17a delta 1
 x= 17b delta 1
 x= 180 delta 1
 x= 181 delta 1
 x= 182 delta 1
 x= 183 delta 1
 x= 184 delta 1
 x= 185 delta 1
 x= 188 delta 1
 x= 189 delta 1
 x= 18a delta 1
 x= 18b delta 1
 x= 18c delta 1
 x= 18d delta 1
 x= 18e delta 1
 x= 18f delta 1
 x= 190 delta 1
 x= 191 delta 1
 x= 192 delta 1
 x= 193 delta 1
 x= 194 delta 1
 x= 195 delta 1
 x= 196 delta 1
 x= 197 delta 1
 x= 198 delta 2
 x= 199 delta 2
 x= 19a delta 1
 x= 19b delta 1
 x= 19c delta 1
 x= 19d delta 1
 x= 19e delta 1
 x= 19f delta 1
 x= 1a0 delta 1
 x= 1a1 delta 1
 x= 1a2 delta 1
 x= 1a3 delta 1
 x= 1a8 delta 1
 x= 1a9 delta 1
 x= 1aa delta 1
 x= 1ab delta 1
 x= 1ac delta 1
 x= 1ad delta 1
 x= 1b0 delta 1
 x= 1b1 delta 1
 x= 1b2 delta 1
 x= 1b3 delta 1
 x= 1b4 delta 1
 x= 1b5 delta 1
 x= 1b6 delta 1
 x= 1b7 delta 1
 x= 1b8 delta 1
 x= 1b9 delta 1
 x= 1ba delta 1
 x= 1bb delta 1
 x= 1bc delta 1
 x= 1bd delta 1
 x= 1be delta 1
 x= 1bf delta 1
 x= 1c0 delta 1
 x= 1c1 delta 1
 x= 1c8 delta 1
 x= 1c9 delta 1
 x= 1ca delta 1
 x= 1cb delta 1
 x= 1d0 delta 1
 x= 1d1 delta 1
 x= 1d2 delta 1
 x= 1d3 delta 1
 x= 1d4 delta 1
 x= 1d5 delta 1
 x= 1d8 delta 1
 x= 1d9 delta 1
 x= 1da delta 1
 x= 1db delta 1
 x= 1dc delta 1
 x= 1dd delta 1
 x= 1de delta 1
 x= 1df delta 1
 x= 1e8 delta 1
 x= 1e9 delta 1
 x= 1f0 delta 1
 x= 1f1 delta 1
 x= 1f2 delta 1
 x= 1f3 delta 1
 x= 1f8 delta 1
 x= 1f9 delta 1
 x= 1fa delta 1
 x= 1fb delta 1
 x= 1fc delta 1
 x= 1fd delta 1
 x= 210 delta 1
 x= 211 delta 1
 x= 218 delta 1
 x= 219 delta 1
 x= 21a delta 1
 x= 21b delta 1
 x= 226 delta ffffffff
 x= 227 delta ffffffff
 x= 238 delta 1
 x= 239 delta 1
 x= 244 delta ffffffff
 x= 245 delta ffffffff
 x= 246 delta ffffffff
 x= 247 delta ffffffff
 x= 24e delta ffffffff
 x= 24f delta ffffffff
 x= 262 delta ffffffff
 x= 263 delta ffffffff
 x= 264 delta ffffffff
 x= 265 delta ffffffff
 x= 266 delta ffffffff
 x= 267 delta ffffffff
 x= 26c delta ffffffff
 x= 26d delta ffffffff
 x= 26e delta ffffffff
 x= 26f delta ffffffff
 x= 276 delta ffffffff
 x= 277 delta ffffffff
 x= 288 delta 1
 x= 289 delta 1
 x= 290 delta 1
 x= 291 delta 1
 x= 292 delta 1
 x= 293 delta 1
 x= 298 delta 1
 x= 299 delta 1
 x= 29a delta 1
 x= 29b delta 1
 x= 29c delta 1
 x= 29d delta 1
 x= 2b0 delta 1
 x= 2b1 delta 1
 x= 2b8 delta 1
 x= 2b9 delta 1
 x= 2ba delta 1
 x= 2bb delta 1
 x= 2c6 delta ffffffff
 x= 2c7 delta ffffffff
 x= 2d8 delta 1
 x= 2d9 delta 1
 x= 2e4 delta ffffffff
 x= 2e5 delta ffffffff
 x= 2e6 delta ffffffff
 x= 2e7 delta ffffffff
 x= 2ee delta ffffffff
 x= 2ef delta ffffffff
 x= 300 delta 1
 x= 301 delta 1
 x= 308 delta 1
 x= 309 delta 1
 x= 30a delta 1
 x= 30b delta 1
 x= 310 delta 1
 x= 311 delta 1
 x= 312 delta 1
 x= 313 delta 1
 x= 314 delta 1
 x= 315 delta 1
 x= 318 delta 1
 x= 319 delta 1
 x= 31a delta 1
 x= 31b delta 1
 x= 31c delta 1
 x= 31d delta 1
 x= 31e delta 1
 x= 31f delta 1
 x= 328 delta 1
 x= 329 delta 1
 x= 330 delta 1
 x= 331 delta 1
 x= 332 delta 1
 x= 333 delta 1
 x= 338 delta 1
 x= 339 delta 1
 x= 33a delta 1
 x= 33b delta 1
 x= 33c delta 1
 x= 33d delta 1
 x= 350 delta 1
 x= 351 delta 1

...

 

But no problem with divmod10_t divmodu10(uint32_t n)

Last Edited: Fri. Mar 27, 2020 - 12:14 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

#include <iostream>
#include <stdint.h>
using namespace std; 
struct divmod10_t
{
    uint32_t quot;
    uint8_t rem;
};

inline static divmod10_t divmodu10(uint32_t n)
{
    divmod10_t res;
// mul 0.8
    res.quot = n >> 1;
    res.quot += res.quot >> 1;
    res.quot += res.quot >> 4;
    res.quot += res.quot >> 8;
    res.quot += res.quot >> 16;
    uint32_t qq = res.quot;
// div 8
    res.quot >>= 3;
// rem
    //res.rem = uint8_t(n - ((res.quot << 1) + (qq & ~7ul)));
    res.rem = uint8_t(n - ((res.quot << 1) + (qq & 0xfffffff8))); 
// corr rem , quot
    if(res.rem > 9)
    {
        res.rem -= 10;
        res.quot++;
    }
    
    
    return res;
}

uint32_t   div10(uint32_t n)
{

//b = (a>>3) - (a>>5) + (a>>7);  // b = a/10
//d = (c>>3) - (c>>5) + (c>>7) - (c>>9) + (c>>11) - (c>>13) + (c>>15);   // d = c/10    
// f =  (e>>3) - (e>>5) + (e>>7) - (e>>9) + (e>>11) - (e>>13) + (e>>15)  - (e>>17) + (e>>19) - (e>>21) + (e>>23) - (e>>25) + (e>>27)  - (e>>29) + (e>>31);     
     
    
uint32_t ftmp=n;
ftmp=(uint32_t)ftmp>>3;
uint32_t data=0ul;
/*
 data+=ftmp;

ftmp=(uint32_t)ftmp>>2;  //5
data-=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2;  //7
data+=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2;  //9
data-=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2;  //11
data+=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2;  //13
data-=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2;  //15
data+=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2;  //17 
data-=(uint32_t)ftmp;  
ftmp=(uint32_t)ftmp>>2;  //19 
data+=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2;  //21 
data-=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2;  //23 
data+=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2;  //25 
data-=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2;  //27 
data+=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2;  //29 
data-=(uint32_t)ftmp;
ftmp=(uint32_t)ftmp>>2;  //31 
data+=(uint32_t)ftmp;

*/
data=    (n>>3) - (n>>5) + (n>>7)  - (n>>9) + (n>>11) - (n>>13) + (n>>15)   - (n>>17) + (n>>19) - (n>>21) + (n>>23) - (n>>25) + (n>>27)  - (n>>29) + (n>>31) ; 

return data;

}

int main ()
{
 
uint32_t i;    

//cout<<hex<<(~7ul);

for ( i=0; i<0xFFFFFFFF;i++    )
{
//uint32_t delta=  divmodu10(i).quot -(i/10);
 uint32_t delta=  div10(i)  -(i/10);
if (delta!=0) 
{
    
  cout<< "\n x= "<<hex<<i;
   cout<< " delta "<<hex<<delta;
    
}

}

 cout<< " end ";

}

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If uou  are use  series, problem with delta <>0  (+/-2 ...3 digits ) was expected 

What exactly do you mean by this?   It is hard to say what your issue is.  Are you having a problem or not?

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For PC

 

#include <iostream>
#include <stdint.h>
using namespace std; 
 

//Divide by 10: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 3
//Divide by 5: (((uint32_t)A * (uint32_t)0xCCCD) >> 16) >> 2
/*
struct divmod10_t
{
    uint32_t quot;
    uint8_t rem;
};
 */
 
 /*
inline static divmod10_t divmodu10(uint32_t n)
{
    divmod10_t res;
// mul 0.8
    res.quot = n >> 1;
    res.quot += res.quot >> 1;
    res.quot += res.quot >> 4;
    res.quot += res.quot >> 8;
    res.quot += res.quot >> 16;
    uint32_t qq = res.quot;
// div 8
    res.quot >>= 3;
// rem
     res.rem = uint8_t(n - ((res.quot << 1) + (qq & ~7ul)));
    //res.rem = uint8_t(n - ((res.quot << 1) + (qq & 0xfffffff8)));  
// corr rem , quot
    if(res.rem > 9)
    {
        res.rem -= 10;
        res.quot++;
    }
    
    
    return res;
}

*/

static  uint32_t divmod10(uint32_t in,  uint32_t *mod)
{
    
 uint32_t q = (in >> 1) ;
 q+= (in >> 2);
 q +=  (q >> 4);
 q +=  (q >> 8);
 q +=  (q >> 16);
 q = q >> 3;
   
uint32_t r = in - ((q << 1) + (q << 3)); // r = in - q*10;
 uint32_t div = q + (r > 9); //if  r>9 div=q+1, else div=q
 if (r > 9) *mod  = r - 10;     else {  *mod = r; }
 
return div ;
}

int main ()
{
 
uint32_t i;    

//cout<<hex<<(~7ul);

for ( i=0; i<6399999;i++    )
//for ( i=1600000; i<6399990;i++    )
{
// uint32_t delta=  divmodu10(i).quot -(i/10);
 //uint32_t delta=  div10(i)  -(i/10);
uint32_t mod;
//uint32_t delta= ( divmodu10((i) ).quot )  -(i/10);  //if f<=6399990
//  mod=divmodu10((i) ).rem;
   
//uint32_t delta= ( divmodu10((i ) ).quot )  -(i/10);  //if f<=6399990
//uint32_t deltamod=divmodu10((i) ).rem  -(i)%10 ;

    uint32_t delta= ( divmod10(   (i<<1   ),&mod )  )  -(i<<1)/10 ;  //if f<=6399990
    
    uint32_t deltamod= mod    -(i<<1 )  %10 ;

  if(  (delta!=0)||(deltamod!=0) )
{
    
  cout<< "\n x= " <<i;
  // cout<< " delta x_div10 "<<hex<<delta;
   //cout<< " delta x_mod10 "<<deltamod;
   
     cout<< " delta x<<1 _div5 "<<hex<<delta;
      cout<< " delta x<<1_mod10 "<<deltamod;
   
   
   
   
   
    
}

}

 cout<< " end ";

}

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What exactly  ? Problem of the resected series , you must use  correction of LSB data for  

 

// f =  (e>>3) - (e>>5) + (e>>7) - (e>>9) + (e>>11) - (e>>13) + (e>>15)  - (e>>17) + (e>>19) - (e>>21) + (e>>23) - (e>>25) + (e>>27)  - (e>>29) + (e>>31);  

 

(compare with C++ for PC ).  

Last Edited: Fri. Mar 27, 2020 - 06:15 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
Here's something I came up with (untested) after thinking about the problem for 20 minutes, with the goal of writing the smallest 16-bit unsigned divide-by-5.  16 AVR instructions (32 bytes):
I count 11 instructions: 5 single-word instruction, 1 addw and 4 lsrw,

which I infer count as two words each.

If you do not mind slow, I can beat that with a 14-word C-callable function:

  .global div5
  div5:
  MOVW R30, R24
  LDI R24, 0
  LDI R25, 0

  ; omit if you do not mind grindingly slow
  ; reduce R31 5 at a time
  RJMP 2f
    1:
    SUBI R31, 5
    INC R25
    2:
    CPI R31, 5
    BRCC 1b
    ; R31<=4

  ; reduce R31:30 5 at a time
  RJMP 2f
    1:
    ADIW R24, 1
    SBIW R30, 5
    BRCC 1b

  RET

As noted, it is slow.

 

Iluvatar is the better part of Valar.

Last Edited: Fri. Mar 27, 2020 - 09:23 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How to create fast subroutine for decoding uint32_t  data to  uint8_t  bcddigits[7] array of BCD digits (using tmp div 10 , tmp mod 10  or other generic  fast method ) and encoder from bcddigits uint8_t[7] and bcddigits uint8_t[4] array of digits for ATMEGA16A?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How to make  this program  fast using alternative  algorithms?  
 
uint8_t * DWORD_TO_BCD_DIGITS(uint32_t indata)
{
    
uint8_t Digits[7]; //fix
 
 
uint32_t    temp = indata ;   // make into binary degrees  value   
   Digits[6] =(uint8_t) (temp/1000000 );   //        (uint8_t) temp/1000000;    
    temp  =   (temp%1000000 );                         //   ‭0x000F4240‬

  Digits[5] =(uint8_t) (temp/100000 );   //        (uint8_t) temp/100000;   div32  , Dig_5=(uint8_t) 159999/100000  = 15  =0x0f
    temp  =   (temp%100000 );                         //  temp - (Dig_5 * 100000);    temp%100000, mod32 , temp=159999-1 500 000‬=99 999
                                                                     //0x000186A0
 Digits[4] =(uint8_t) (temp/10000  );      //(uint8_t) temp/10000;  div32  ,  Dig_4=(uint8_t)99 999/10000=9 =0x09
    temp  =  (temp%10000  );                             // temp%10000 ,  mod32    temp  =99 999 - 90000=9 999
                                                               //0x0000‭2710‬
 Digits[3] =(uint8_t) (temp/1000   );   // temp/1000;  div32  ,  Dig_3=(uint8_t) 9 999/1000 =9  =0x09
   temp  = (temp%1000   );                            // temp - (Dig_3 * 1000);   temp%1000  mod32    temp  = 9 999 - 9000 = 999
                                                             //0x000003e8
  Digits[2] =(uint8_t) (temp/100    );  //  temp/100;   div32  ,  Dig_2=(uint8_t)  999/100  =9    =0x09
   temp  = (temp%100    );                           // temp - (Dig_2 * 100);   temp%100  mod32    temp  =  999 - 900  =  99 
                                                                 //0x00000064
 Digits[1] =(uint8_t) (temp/10     );  // temp/10;  div32  ,  Dig_2=(uint8_t)  99/10  =9 =0x09
   temp  =  (temp%10     );                          // temp - (Dig_1 * 10);   temp%10  div32  , temp  =  99  - 90  =  9 
                                                                 //0x0000000A
 Digits[0] =(uint8_t) (temp&0x0f);    //fix                    //temp  =    9   =0x09
 
  //printf (" %d %d %d %d %d %d ",  (int)Digits[5], (int)Digits[4], (int)Digits[3], (int)Digits[2], (int)Digits[1],(int)Digits[0]   );
return  Digits;
 
 
}

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm reminded of attempts to speed up an/or shrink integer to ascii conversion, where the winner usually (?) involves repeated subtractions of (constant) powers of 10.  It's nicely customizable to the exact number of digits you need, as well.

You get a 9 digit ascii result (from 32bits) with less than 20 math operations per digit, vs the 10 divisions (that needs to produce remainder as well) that would be required for the conventional approach.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It would help if you'd ask one question clearly instead of asking several somewhat-similar questions less clearly.

 

Suggestion: there's very well studied code to convert integers to ASCII. Given a conversion of integer to ASCII, if you want to convert the ASCII values to corresponding numbers, that's just &0xF.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Again for an AVR with HW multiplier it's faster to use it.

I'm not sure if there is code here for 32 bit, but I have code for 5 digits from 16 bit in less that 70 clk which I found good, but El Tangas have solved it in less that 50 clk. (the code is in a thread here somewhere)  

  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:

Again for an AVR with HW multiplier it's faster to use it.

I'm not sure if there is code here for 32 bit, but I have code for 5 digits from 16 bit in less that 70 clk which I found good, but El Tangas have solved it in less that 50 clk. (the code is in a thread here somewhere)  

  

 

Fully unrolling the divide-by-5 code I posted earlier results in 51 cycles.  So getting 5 digits from a 16-bit number in < 50 must be referring to code that uses the hardware multiplier, right?

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rpz3598 wrote:

Digits[6] =(uint8_t) (temp/1000000 );   //        (uint8_t) temp/1000000;    
    temp  =   (temp%1000000 );                         //   ‭0x000F4240‬

  Digits[5] =(uint8_t) (temp/100000 );   //        (uint8_t) temp/100000;   div32  , Dig_5=(uint8_t) 159999/100000  = 15  =0x0f
    temp  =   (temp%100000 );                         //  temp - (Dig_5 * 100000);    temp%100000, mod32 , temp=159999-1 500 000‬=99 999
                                                                     //0x000186A0

this is madness. It's full of / and %. These are not "fast" and if you really are going to do both a divide and a remainder why are you doing them separately? Both are a divide and you can get both quotient and remainder in a single  using div()/ldiv().

 

But if your goal in this thread is "fast" why aren't you following the guidance from others here and exploring Asm MUL etc in inline assembler?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

yes and that is for the div 10 not 5 (but because 10 is a even number I guess it don't really make a difference)

 

add:

And I guess I need to add in worst case. 

Last Edited: Sat. Mar 28, 2020 - 01:32 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

uint8_t  div32bit_mod (uint32_t  in, uint32_t  div,    uint32_t *mod    )
{
*mod=(uint32_t)  (in % div) ;
return (uint8_t)  (in /div);
}

uint8_t divmod( uint32_t inp, uint32_t  div,  uint32_t *reminder )
{
 return (uint8_t) div32bit_mod( inp,  div, reminder  );
     
}

uint8_t *GetDigitsFromUint32_t( uint32_t in)  //"uint24_t" 
{
 uint8_t digits[5];
    /*
    digits[0]=(uint8_t)(in/100000) ;  in=in%100000;     
    digits[1]=(uint8_t)(in/10000) ;  in=in%10000;  
    digits[2]=(uint8_t)(in/1000) ;  in=in%1000;
    digits[3]=(uint8_t)(in/100) ;  in=in%100;
    digits[4]=(uint8_t)(in/10) ;  in=in%10;
    digits[5]=(uint8_t) (in);
        */    
   digits[0]=(uint8_t) divmod(  in , 0x000186A0 /*100000*/ ,  &in );
   digits[1]=(uint8_t) divmod(  in , 0x00002710 /*10000*/,  &in );
   digits[2]=(uint8_t) divmod(  in , 0x000003e8 /*1000*/,  &in );
   digits[3]=(uint8_t) divmod(  in , 0x00000064 /*100*/,  &in );
   digits[4]=(uint8_t) divmod(  in , 0x0000000A /*10*/,  &in ); 
   digits[5]=(uint8_t) divmod(  in , 0x00000001 /* 1*/ ,  &in );   
     
     
return (uint8_t *) digits;    
}

#include <iostream>

int main() {              //max digit is  1599999 dec 
    uint32_t in =  1599999;  // -> [15] [9] [9] [9] [9] [9]    (MSB  byte of array must be  0x00...0x0F , other  digits  are 0x00...0x09)
     uint8_t *digits ; 
     digits = GetDigitsFromUint32_t(  in);
   printf("%d%d%d%d%d%d ",(int)digits[0],(int)digits[1], (int) digits[2], (int)digits[3],(int)digits[4],(int)digits[5] );    
    return 0;
}

 

 

How to use  Brute force algorithm ,Double-Dabble (Shift and Add-3) algorithm, fst analog of the algorithm ,based on divide by 10 ,algorithm based on  divide emulated by aproximation and reciprocal multiplication  (for 2- 7(8) digits )?    

http://blog.malcom.pl/2017/konwe...

 

How to create  fast alternative for prototype  of encoder encoder  using  arithmetic  operations (magic numbers and shifting, but you can use "switch() case :  res+= " )      ?

uint32_t  Getuint32fromBCDDigits(   uint8_t  *digit  )

uint32_t res=0;

for(uint8_t i=0 ;i<digit[0]; i++ ) { res+=1;  }

for(uint8_t i=0 ;i<digit[1]; i++ ) { res+=10;  }

for(uint8_t i=0 ;i<digit[2]; i++ ) { res+=100;  }

for(uint8_t i=0 ;i<digit[3]; i++ ) { res+=1000;  }

for(uint8_t i=0 ;i<digit[4]; i++ ) { res+=10000;  }

for(uint8_t i=0 ;i<digit[5]; i++ ) { res+=100000;  }

for(uint8_t i=0 ;i<digit[6]; i++ ) { res+=1000000; } //fix size 

 

return res;

}

 

 

 

 

 

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:

Again for an AVR with HW multiplier it's faster to use it.

I'm not sure if there is code here for 32 bit, but I have code for 5 digits from 16 bit in less that 70 clk which I found good, but El Tangas have solved it in less that 50 clk. (the code is in a thread here somewhere)  

  

 

I think this is the thread:  https://www.avrfreaks.net/forum/optimizing-libc-integer-conversion-routines

There is also stuff here from the same era https://www.avrfreaks.net/forum/integer-string

 

These are from when I first arrived here at AVRFreaks to learn AVR stuff as an hobby. But I already had experience writing similar algorithms for x86, so I kind of converted that code to AVR.

Since I'm at it, here is the x86 code, from another life, basically...  https://board.flatassembler.net/topic.php?t=3924

 

edit: one more:  https://www.avrfreaks.net/forum/avr-assembler-extract-each-3-digit-number-each-register-r23-r24-r25

Yeah, we discuss this stuff on a regular basis smiley

Last Edited: Sat. Mar 28, 2020 - 02:07 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OP keeps posting... things, but I can't comprehend them. Note the fascinating new requirement in the latest post of "MSB byte of array must be 0x00...0x0F", which... is not how BCD works, it's also not how anything else here works, and I don't get it.

 

I feel like there's a translation issue or something here. The "Getuint32fromBCDDigits" code strikes me as bad; I think most of the modern chips have hardware multiply, so it's probably fine to use that, and that certainly makes it easier.

return (((((digit[6]*10 + digit[5]) * 10 + digit[4]) * 10 + digit[3]) * 10 + digit[2]) + digit[1]) * 10 + digit[0];

So far as I know hardware multiply is cheap, so that's probably fine. Assuming there's always 7 digits...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sorry, the following code is not of any language I heard of, but I still hope it is clear:

 

if:
x in Y[x] is the byte number of the register Y

and:
N[4] is the original 32-bit number
R[4] is the result of N[4]/5
A[4], B[4] and C[4] are temporary registers	

    A[4]= N[4]
    C[4]= R[4]= 0	

loop:
    A[4]= A[4]-C[4]

  if A[4]<256, break

    B[4]= (A[4]/256)*51
    R[4]= R[4]+B[4]
    C[4]= R[4]*5
    goto loop

 

Edit:

Please note, it is not complete... I tried not to work with more than 32 bits... but the break line as shown is certainly wrong. 

Last Edited: Sun. Mar 29, 2020 - 06:40 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

/*
 * GccApplication2.c
 *
 * Created: 29.03.2020 10:38:02
 * Author : USERPC01
 */ 

#include <avr/io.h>
#include <math.h>

uint8_t  div32bit_mod (uint32_t  in, uint32_t  div,    uint32_t *mod    )
{
 uint32_t res;
     res=(uint32_t)in /div;
    
*mod=(uint32_t)   (in - (res*div)) ;

return (uint8_t)  res;
}

uint8_t divmod( uint32_t inp, uint32_t  div,  uint32_t *reminder )
{
 return (uint8_t) div32bit_mod( inp,  div, reminder  );
     
}

void  GetDigitsFromUint32_t( uint32_t in )  //"uint24_t" 
{
 uint8_t digits[5];
    /*
    digits[0]=(uint8_t)(in/100000) ;  in=in%100000;     
    digits[1]=(uint8_t)(in/10000) ;  in=in%10000;  
    digits[2]=(uint8_t)(in/1000) ;  in=in%1000;
    digits[3]=(uint8_t)(in/100) ;  in=in%100;
    digits[4]=(uint8_t)(in/10) ;  in=in%10;
    digits[5]=(uint8_t) (in);
        */    
   digits[0]=(uint8_t) divmod(  in , 0x000186A0 /*100000*/ ,  &in );
   digits[1]=(uint8_t) divmod(  in , 0x00002710 /*10000*/,  &in );
   digits[2]=(uint8_t) divmod(  in , 0x000003e8 /*1000*/,  &in );
   digits[3]=(uint8_t) divmod(  in , 0x00000064 /*100*/,  &in );
   digits[4]=(uint8_t) divmod(  in , 0x0000000A /*10*/,  &in ); 
   digits[5]=(uint8_t) divmod(  in , 0x00000001 /* 1*/ ,  &in );   
     
     
     //for example, for output to PORTD 

         DDRD=0xFF;
         PORTD=digits[0];
         PORTD=digits[1];
         PORTD=digits[2];
         PORTD=digits[3];
         PORTD=digits[4];
         PORTD=digits[5];
     
return    ;    
}
int main(void)
{
    /* Replace with your application code */
    while (1) 
    {
     uint32_t in =0; //=  1599999;  // -> 15 9 9 9 9 9    
     //for example, input from port B
     in|=(uint32_t)((uint32_t)PINB<<0);
     in|=(uint32_t)((uint32_t)PINB<<8);
     in|=(uint32_t)((uint32_t)PINB<<16);
     in|=(uint32_t)((uint32_t)PINB<<24);
     
     GetDigitsFromUint32_t(  in);    

 
    }
}

Last Edited: Sun. Mar 29, 2020 - 08:29 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

GccApplication2.elf:     file format elf32-avr

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000232  00000000  00000000  00000054  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  00800060  00800060  00000286  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .comment      00000030  00000000  00000000  00000286  2**0
                  CONTENTS, READONLY
  3 .note.gnu.avr.deviceinfo 0000003c  00000000  00000000  000002b8  2**2
                  CONTENTS, READONLY
  4 .debug_aranges 00000038  00000000  00000000  000002f4  2**0
                  CONTENTS, READONLY, DEBUGGING
  5 .debug_info   000009cb  00000000  00000000  0000032c  2**0
                  CONTENTS, READONLY, DEBUGGING
  6 .debug_abbrev 0000063f  00000000  00000000  00000cf7  2**0
                  CONTENTS, READONLY, DEBUGGING
  7 .debug_line   000002ac  00000000  00000000  00001336  2**0
                  CONTENTS, READONLY, DEBUGGING
  8 .debug_frame  000000e4  00000000  00000000  000015e4  2**2
                  CONTENTS, READONLY, DEBUGGING
  9 .debug_str    00000342  00000000  00000000  000016c8  2**0
                  CONTENTS, READONLY, DEBUGGING
 10 .debug_loc    000006e5  00000000  00000000  00001a0a  2**0
                  CONTENTS, READONLY, DEBUGGING
 11 .debug_ranges 00000028  00000000  00000000  000020ef  2**0
                  CONTENTS, READONLY, DEBUGGING

Disassembly of section .text:

00000000 <__vectors>:
   0:    0c 94 2a 00     jmp    0x54    ; 0x54 <__ctors_end>
   4:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
   8:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
   c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  10:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  14:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  18:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  1c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  20:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  24:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  28:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  2c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  30:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  34:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  38:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  3c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  40:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  44:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  48:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  4c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  50:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>

00000054 <__ctors_end>:
  54:    11 24           eor    r1, r1
  56:    1f be           out    0x3f, r1    ; 63
  58:    cf e5           ldi    r28, 0x5F    ; 95
  5a:    d4 e0           ldi    r29, 0x04    ; 4
  5c:    de bf           out    0x3e, r29    ; 62
  5e:    cd bf           out    0x3d, r28    ; 61
  60:    0e 94 b7 00     call    0x16e    ; 0x16e <main>
  64:    0c 94 17 01     jmp    0x22e    ; 0x22e <_exit>

00000068 <__bad_interrupt>:
  68:    0c 94 00 00     jmp    0    ; 0x0 <__vectors>

0000006c <GetDigitsFromUint32_t>:
     
}

void  GetDigitsFromUint32_t( uint32_t in )  //"uint24_t" 
{
  6c:    cf 92           push    r12
  6e:    df 92           push    r13
  70:    ef 92           push    r14
  72:    ff 92           push    r15
  74:    0f 93           push    r16
  76:    1f 93           push    r17
  78:    cf 93           push    r28
  7a:    df 93           push    r29
  7c:    6b 01           movw    r12, r22
  7e:    7c 01           movw    r14, r24
#include <math.h>

uint8_t  div32bit_mod (uint32_t  in, uint32_t  div,    uint32_t *mod    )
{
 uint32_t res;
     res=(uint32_t)in /div;
  80:    20 ea           ldi    r18, 0xA0    ; 160
  82:    36 e8           ldi    r19, 0x86    ; 134
  84:    41 e0           ldi    r20, 0x01    ; 1
  86:    50 e0           ldi    r21, 0x00    ; 0
  88:    0e 94 db 00     call    0x1b6    ; 0x1b6 <__udivmodsi4>
  8c:    d2 2f           mov    r29, r18
    
*mod=(uint32_t)   (in - (res*div)) ;
  8e:    60 ea           ldi    r22, 0xA0    ; 160
  90:    76 e8           ldi    r23, 0x86    ; 134
  92:    81 e0           ldi    r24, 0x01    ; 1
  94:    90 e0           ldi    r25, 0x00    ; 0
  96:    0e 94 cb 00     call    0x196    ; 0x196 <__mulsi3>
  9a:    c6 1a           sub    r12, r22
  9c:    d7 0a           sbc    r13, r23
  9e:    e8 0a           sbc    r14, r24
  a0:    f9 0a           sbc    r15, r25
#include <math.h>

uint8_t  div32bit_mod (uint32_t  in, uint32_t  div,    uint32_t *mod    )
{
 uint32_t res;
     res=(uint32_t)in /div;
  a2:    c7 01           movw    r24, r14
  a4:    b6 01           movw    r22, r12
  a6:    20 e1           ldi    r18, 0x10    ; 16
  a8:    37 e2           ldi    r19, 0x27    ; 39
  aa:    40 e0           ldi    r20, 0x00    ; 0
  ac:    50 e0           ldi    r21, 0x00    ; 0
  ae:    0e 94 db 00     call    0x1b6    ; 0x1b6 <__udivmodsi4>
  b2:    c2 2f           mov    r28, r18
    
*mod=(uint32_t)   (in - (res*div)) ;
  b4:    a0 e1           ldi    r26, 0x10    ; 16
  b6:    b7 e2           ldi    r27, 0x27    ; 39
  b8:    0e 94 fd 00     call    0x1fa    ; 0x1fa <__muluhisi3>
  bc:    c6 1a           sub    r12, r22
  be:    d7 0a           sbc    r13, r23
  c0:    e8 0a           sbc    r14, r24
  c2:    f9 0a           sbc    r15, r25
#include <math.h>

uint8_t  div32bit_mod (uint32_t  in, uint32_t  div,    uint32_t *mod    )
{
 uint32_t res;
     res=(uint32_t)in /div;
  c4:    c7 01           movw    r24, r14
  c6:    b6 01           movw    r22, r12
  c8:    28 ee           ldi    r18, 0xE8    ; 232
  ca:    33 e0           ldi    r19, 0x03    ; 3
  cc:    40 e0           ldi    r20, 0x00    ; 0
  ce:    50 e0           ldi    r21, 0x00    ; 0
  d0:    0e 94 db 00     call    0x1b6    ; 0x1b6 <__udivmodsi4>
  d4:    12 2f           mov    r17, r18
    
*mod=(uint32_t)   (in - (res*div)) ;
  d6:    a8 ee           ldi    r26, 0xE8    ; 232
  d8:    b3 e0           ldi    r27, 0x03    ; 3
  da:    0e 94 fd 00     call    0x1fa    ; 0x1fa <__muluhisi3>
  de:    c6 1a           sub    r12, r22
  e0:    d7 0a           sbc    r13, r23
  e2:    e8 0a           sbc    r14, r24
  e4:    f9 0a           sbc    r15, r25
#include <math.h>

uint8_t  div32bit_mod (uint32_t  in, uint32_t  div,    uint32_t *mod    )
{
 uint32_t res;
     res=(uint32_t)in /div;
  e6:    c7 01           movw    r24, r14
  e8:    b6 01           movw    r22, r12
  ea:    24 e6           ldi    r18, 0x64    ; 100
  ec:    30 e0           ldi    r19, 0x00    ; 0
  ee:    40 e0           ldi    r20, 0x00    ; 0
  f0:    50 e0           ldi    r21, 0x00    ; 0
  f2:    0e 94 db 00     call    0x1b6    ; 0x1b6 <__udivmodsi4>
  f6:    02 2f           mov    r16, r18
    
*mod=(uint32_t)   (in - (res*div)) ;
  f8:    a4 e6           ldi    r26, 0x64    ; 100
  fa:    b0 e0           ldi    r27, 0x00    ; 0
  fc:    0e 94 fd 00     call    0x1fa    ; 0x1fa <__muluhisi3>
 100:    c6 1a           sub    r12, r22
 102:    d7 0a           sbc    r13, r23
 104:    e8 0a           sbc    r14, r24
 106:    f9 0a           sbc    r15, r25
#include <math.h>

uint8_t  div32bit_mod (uint32_t  in, uint32_t  div,    uint32_t *mod    )
{
 uint32_t res;
     res=(uint32_t)in /div;
 108:    c7 01           movw    r24, r14
 10a:    b6 01           movw    r22, r12
 10c:    2a e0           ldi    r18, 0x0A    ; 10
 10e:    30 e0           ldi    r19, 0x00    ; 0
 110:    40 e0           ldi    r20, 0x00    ; 0
 112:    50 e0           ldi    r21, 0x00    ; 0
 114:    0e 94 db 00     call    0x1b6    ; 0x1b6 <__udivmodsi4>
    
*mod=(uint32_t)   (in - (res*div)) ;

return (uint8_t)  res;
 118:    82 2f           mov    r24, r18
 11a:    93 2f           mov    r25, r19
 11c:    a4 2f           mov    r26, r20
 11e:    b5 2f           mov    r27, r21
 120:    88 0f           add    r24, r24
 122:    99 1f           adc    r25, r25
 124:    aa 1f           adc    r26, r26
 126:    bb 1f           adc    r27, r27
 128:    ac 01           movw    r20, r24
 12a:    bd 01           movw    r22, r26
 12c:    44 0f           add    r20, r20
 12e:    55 1f           adc    r21, r21
 130:    66 1f           adc    r22, r22
 132:    77 1f           adc    r23, r23
 134:    44 0f           add    r20, r20
 136:    55 1f           adc    r21, r21
 138:    66 1f           adc    r22, r22
 13a:    77 1f           adc    r23, r23
 13c:    84 0f           add    r24, r20
 13e:    95 1f           adc    r25, r21
 140:    a6 1f           adc    r26, r22
 142:    b7 1f           adc    r27, r23
 144:    c8 1a           sub    r12, r24
 146:    d9 0a           sbc    r13, r25
 148:    ea 0a           sbc    r14, r26
 14a:    fb 0a           sbc    r15, r27
   digits[4]=(uint8_t) divmod(  in , 0x0000000A /*10*/,  &in ); 
   digits[5]=(uint8_t) divmod(  in , 0x00000001 /* 1*/ ,  &in );   
     
     
     //for example, for output to PORTD 
          DDRD=0xFF;
 14c:    8f ef           ldi    r24, 0xFF    ; 255
 14e:    81 bb           out    0x11, r24    ; 17
         PORTD=digits[0];
 150:    d2 bb           out    0x12, r29    ; 18
         PORTD=digits[1];
 152:    c2 bb           out    0x12, r28    ; 18
         PORTD=digits[2];
 154:    12 bb           out    0x12, r17    ; 18
         PORTD=digits[3];
 156:    02 bb           out    0x12, r16    ; 18
         PORTD=digits[4];
 158:    22 bb           out    0x12, r18    ; 18
         PORTD=digits[5];
 15a:    c2 ba           out    0x12, r12    ; 18
     
return    ;    
}
 15c:    df 91           pop    r29
 15e:    cf 91           pop    r28
 160:    1f 91           pop    r17
 162:    0f 91           pop    r16
 164:    ff 90           pop    r15
 166:    ef 90           pop    r14
 168:    df 90           pop    r13
 16a:    cf 90           pop    r12
 16c:    08 95           ret

0000016e <main>:
    /* Replace with your application code */
    while (1) 
    {
     uint32_t in =0; //=  1599999;  // -> 15 9 9 9 9 9    
     //for example, input from port B
     DDRB=0x00;
 16e:    17 ba           out    0x17, r1    ; 23
     in|=(uint32_t)((uint32_t)PINB<<0);
 170:    26 b3           in    r18, 0x16    ; 22
     in|=(uint32_t)((uint32_t)PINB<<8);
 172:    36 b3           in    r19, 0x16    ; 22
     in|=(uint32_t)((uint32_t)PINB<<16);
 174:    66 b3           in    r22, 0x16    ; 22
 176:    86 2f           mov    r24, r22
 178:    90 e0           ldi    r25, 0x00    ; 0
 17a:    a0 e0           ldi    r26, 0x00    ; 0
 17c:    b0 e0           ldi    r27, 0x00    ; 0
 17e:    dc 01           movw    r26, r24
 180:    99 27           eor    r25, r25
 182:    88 27           eor    r24, r24
 184:    93 2b           or    r25, r19
 186:    82 2b           or    r24, r18
     in|=(uint32_t)((uint32_t)PINB<<24);
 188:    26 b3           in    r18, 0x16    ; 22
     
     GetDigitsFromUint32_t(  in);    
 18a:    bc 01           movw    r22, r24
 18c:    cd 01           movw    r24, r26
 18e:    92 2b           or    r25, r18
 190:    0e 94 36 00     call    0x6c    ; 0x6c <GetDigitsFromUint32_t>
 194:    ec cf           rjmp    .-40         ; 0x16e <main>

00000196 <__mulsi3>:
 196:    db 01           movw    r26, r22
 198:    8f 93           push    r24
 19a:    9f 93           push    r25
 19c:    0e 94 fd 00     call    0x1fa    ; 0x1fa <__muluhisi3>
 1a0:    bf 91           pop    r27
 1a2:    af 91           pop    r26
 1a4:    a2 9f           mul    r26, r18
 1a6:    80 0d           add    r24, r0
 1a8:    91 1d           adc    r25, r1
 1aa:    a3 9f           mul    r26, r19
 1ac:    90 0d           add    r25, r0
 1ae:    b2 9f           mul    r27, r18
 1b0:    90 0d           add    r25, r0
 1b2:    11 24           eor    r1, r1
 1b4:    08 95           ret

000001b6 <__udivmodsi4>:
 1b6:    a1 e2           ldi    r26, 0x21    ; 33
 1b8:    1a 2e           mov    r1, r26
 1ba:    aa 1b           sub    r26, r26
 1bc:    bb 1b           sub    r27, r27
 1be:    fd 01           movw    r30, r26
 1c0:    0d c0           rjmp    .+26         ; 0x1dc <__udivmodsi4_ep>

000001c2 <__udivmodsi4_loop>:
 1c2:    aa 1f           adc    r26, r26
 1c4:    bb 1f           adc    r27, r27
 1c6:    ee 1f           adc    r30, r30
 1c8:    ff 1f           adc    r31, r31
 1ca:    a2 17           cp    r26, r18
 1cc:    b3 07           cpc    r27, r19
 1ce:    e4 07           cpc    r30, r20
 1d0:    f5 07           cpc    r31, r21
 1d2:    20 f0           brcs    .+8          ; 0x1dc <__udivmodsi4_ep>
 1d4:    a2 1b           sub    r26, r18
 1d6:    b3 0b           sbc    r27, r19
 1d8:    e4 0b           sbc    r30, r20
 1da:    f5 0b           sbc    r31, r21

000001dc <__udivmodsi4_ep>:
 1dc:    66 1f           adc    r22, r22
 1de:    77 1f           adc    r23, r23
 1e0:    88 1f           adc    r24, r24
 1e2:    99 1f           adc    r25, r25
 1e4:    1a 94           dec    r1
 1e6:    69 f7           brne    .-38         ; 0x1c2 <__udivmodsi4_loop>
 1e8:    60 95           com    r22
 1ea:    70 95           com    r23
 1ec:    80 95           com    r24
 1ee:    90 95           com    r25
 1f0:    9b 01           movw    r18, r22
 1f2:    ac 01           movw    r20, r24
 1f4:    bd 01           movw    r22, r26
 1f6:    cf 01           movw    r24, r30
 1f8:    08 95           ret

000001fa <__muluhisi3>:
 1fa:    0e 94 08 01     call    0x210    ; 0x210 <__umulhisi3>
 1fe:    a5 9f           mul    r26, r21
 200:    90 0d           add    r25, r0
 202:    b4 9f           mul    r27, r20
 204:    90 0d           add    r25, r0
 206:    a4 9f           mul    r26, r20
 208:    80 0d           add    r24, r0
 20a:    91 1d           adc    r25, r1
 20c:    11 24           eor    r1, r1
 20e:    08 95           ret

00000210 <__umulhisi3>:
 210:    a2 9f           mul    r26, r18
 212:    b0 01           movw    r22, r0
 214:    b3 9f           mul    r27, r19
 216:    c0 01           movw    r24, r0
 218:    a3 9f           mul    r26, r19
 21a:    70 0d           add    r23, r0
 21c:    81 1d           adc    r24, r1
 21e:    11 24           eor    r1, r1
 220:    91 1d           adc    r25, r1
 222:    b2 9f           mul    r27, r18
 224:    70 0d           add    r23, r0
 226:    81 1d           adc    r24, r1
 228:    11 24           eor    r1, r1
 22a:    91 1d           adc    r25, r1
 22c:    08 95           ret

0000022e <_exit>:
 22e:    f8 94           cli

00000230 <__stop_program>:
 230:    ff cf           rjmp    .-2          ; 0x230 <__stop_program>
 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

#include <avr/io.h>
#include <math.h>
/*
uint8_t  div32bit_mod (uint32_t  in, uint32_t  div,    uint32_t *mod    )
{
  uint32_t res;
      res=(uint32_t)in /div;
    
 *mod=(uint32_t)   (in - (res*div)) ;
*mod=(uint32_t)(in%div); //  (in - (res*div)) ;

return (uint8_t)(in/div) ;// res;
}*/

uint8_t divmod( uint32_t inp, uint32_t  div,  uint32_t *mod )
{
 //return (uint8_t) div32bit_mod( inp,  div, reminder  );
     *mod=(uint32_t)(inp%div); //  (in - (res*div)) ;

     return (uint8_t)(inp/div) ;// res;
}

void  GetDigitsFromUint32_t( uint32_t in )  //"uint24_t" 
{
 uint8_t digits[5];
    /*
    digits[0]=(uint8_t)(in/100000) ;  in=in%100000;     
    digits[1]=(uint8_t)(in/10000) ;  in=in%10000;  
    digits[2]=(uint8_t)(in/1000) ;  in=in%1000;
    digits[3]=(uint8_t)(in/100) ;  in=in%100;
    digits[4]=(uint8_t)(in/10) ;  in=in%10;
    digits[5]=(uint8_t) (in);
        */    
   digits[0]=(uint8_t) divmod(  in , 0x000186A0 /*100000*/ ,  &in );
   digits[1]=(uint8_t) divmod(  in , 0x00002710 /*10000*/,  &in );
   digits[2]=(uint8_t) divmod(  in , 0x000003e8 /*1000*/,  &in );
   digits[3]=(uint8_t) divmod(  in , 0x00000064 /*100*/,  &in );
   digits[4]=(uint8_t) divmod(  in , 0x0000000A /*10*/,  &in ); 
   digits[5]=(uint8_t) divmod(  in , 0x00000001 /* 1*/ ,  &in );   
     
     
     //for example, for output to PORTD 
          DDRD=0xFF;
         PORTD=digits[0];
         PORTD=digits[1];
         PORTD=digits[2];
         PORTD=digits[3];
         PORTD=digits[4];
         PORTD=digits[5];
     
return    ;    
}
int main(void)
{
    /* Replace with your application code */
    while (1) 
    {
     uint32_t in =0; //=  1599999;  // -> 15 9 9 9 9 9    
     //for example, input from port B
     DDRB=0x00;
     in|=(uint32_t)((uint32_t)PINB<<0);
     in|=(uint32_t)((uint32_t)PINB<<8);
     in|=(uint32_t)((uint32_t)PINB<<16);
     in|=(uint32_t)((uint32_t)PINB<<24);
     
     GetDigitsFromUint32_t(  in);    

 
    }
}

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

GccApplication2.elf:     file format elf32-avr

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000142  00000000  00000000  00000054  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  00800060  00800060  00000196  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .comment      00000030  00000000  00000000  00000196  2**0
                  CONTENTS, READONLY
  3 .note.gnu.avr.deviceinfo 0000003c  00000000  00000000  000001c8  2**2
                  CONTENTS, READONLY
  4 .debug_aranges 00000030  00000000  00000000  00000204  2**0
                  CONTENTS, READONLY, DEBUGGING
  5 .debug_info   00000781  00000000  00000000  00000234  2**0
                  CONTENTS, READONLY, DEBUGGING
  6 .debug_abbrev 0000060e  00000000  00000000  000009b5  2**0
                  CONTENTS, READONLY, DEBUGGING
  7 .debug_line   00000254  00000000  00000000  00000fc3  2**0
                  CONTENTS, READONLY, DEBUGGING
  8 .debug_frame  00000064  00000000  00000000  00001218  2**2
                  CONTENTS, READONLY, DEBUGGING
  9 .debug_str    0000032c  00000000  00000000  0000127c  2**0
                  CONTENTS, READONLY, DEBUGGING
 10 .debug_loc    00000401  00000000  00000000  000015a8  2**0
                  CONTENTS, READONLY, DEBUGGING
 11 .debug_ranges 00000020  00000000  00000000  000019a9  2**0
                  CONTENTS, READONLY, DEBUGGING

Disassembly of section .text:

00000000 <__vectors>:
   0:    0c 94 2a 00     jmp    0x54    ; 0x54 <__ctors_end>
   4:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
   8:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
   c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  10:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  14:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  18:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  1c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  20:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  24:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  28:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  2c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  30:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  34:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  38:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  3c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  40:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  44:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  48:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  4c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  50:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>

00000054 <__ctors_end>:
  54:    11 24           eor    r1, r1
  56:    1f be           out    0x3f, r1    ; 63
  58:    cf e5           ldi    r28, 0x5F    ; 95
  5a:    d4 e0           ldi    r29, 0x04    ; 4
  5c:    de bf           out    0x3e, r29    ; 62
  5e:    cd bf           out    0x3d, r28    ; 61
  60:    0e 94 69 00     call    0xd2    ; 0xd2 <main>
  64:    0c 94 9f 00     jmp    0x13e    ; 0x13e <_exit>

00000068 <__bad_interrupt>:
  68:    0c 94 00 00     jmp    0    ; 0x0 <__vectors>

0000006c <GetDigitsFromUint32_t>:
     return (uint8_t)(inp/div) ;// res;
}

void  GetDigitsFromUint32_t( uint32_t in )  //"uint24_t" 
{
  6c:    0f 93           push    r16
  6e:    1f 93           push    r17
  70:    cf 93           push    r28
  72:    df 93           push    r29
}*/

uint8_t divmod( uint32_t inp, uint32_t  div,  uint32_t *mod )
{
 //return (uint8_t) div32bit_mod( inp,  div, reminder  );
     *mod=(uint32_t)(inp%div); //  (in - (res*div)) ;
  74:    20 ea           ldi    r18, 0xA0    ; 160
  76:    36 e8           ldi    r19, 0x86    ; 134
  78:    41 e0           ldi    r20, 0x01    ; 1
  7a:    50 e0           ldi    r21, 0x00    ; 0
  7c:    0e 94 7d 00     call    0xfa    ; 0xfa <__udivmodsi4>
  80:    02 2f           mov    r16, r18
  82:    20 e1           ldi    r18, 0x10    ; 16
  84:    37 e2           ldi    r19, 0x27    ; 39
  86:    40 e0           ldi    r20, 0x00    ; 0
  88:    50 e0           ldi    r21, 0x00    ; 0
  8a:    0e 94 7d 00     call    0xfa    ; 0xfa <__udivmodsi4>
  8e:    12 2f           mov    r17, r18
  90:    28 ee           ldi    r18, 0xE8    ; 232
  92:    33 e0           ldi    r19, 0x03    ; 3
  94:    40 e0           ldi    r20, 0x00    ; 0
  96:    50 e0           ldi    r21, 0x00    ; 0
  98:    0e 94 7d 00     call    0xfa    ; 0xfa <__udivmodsi4>
  9c:    d2 2f           mov    r29, r18
  9e:    24 e6           ldi    r18, 0x64    ; 100
  a0:    30 e0           ldi    r19, 0x00    ; 0
  a2:    40 e0           ldi    r20, 0x00    ; 0
  a4:    50 e0           ldi    r21, 0x00    ; 0
  a6:    0e 94 7d 00     call    0xfa    ; 0xfa <__udivmodsi4>
  aa:    c2 2f           mov    r28, r18

     return (uint8_t)(inp/div) ;// res;
  ac:    2a e0           ldi    r18, 0x0A    ; 10
  ae:    30 e0           ldi    r19, 0x00    ; 0
  b0:    40 e0           ldi    r20, 0x00    ; 0
  b2:    50 e0           ldi    r21, 0x00    ; 0
  b4:    0e 94 7d 00     call    0xfa    ; 0xfa <__udivmodsi4>
   digits[4]=(uint8_t) divmod(  in , 0x0000000A /*10*/,  &in ); 
   digits[5]=(uint8_t) divmod(  in , 0x00000001 /* 1*/ ,  &in );   
     
     
     //for example, for output to PORTD 
          DDRD=0xFF;
  b8:    8f ef           ldi    r24, 0xFF    ; 255
  ba:    81 bb           out    0x11, r24    ; 17
         PORTD=digits[0];
  bc:    02 bb           out    0x12, r16    ; 18
         PORTD=digits[1];
  be:    12 bb           out    0x12, r17    ; 18
         PORTD=digits[2];
  c0:    d2 bb           out    0x12, r29    ; 18
         PORTD=digits[3];
  c2:    c2 bb           out    0x12, r28    ; 18
         PORTD=digits[4];
  c4:    22 bb           out    0x12, r18    ; 18
         PORTD=digits[5];
  c6:    62 bb           out    0x12, r22    ; 18
     
return    ;    
}
  c8:    df 91           pop    r29
  ca:    cf 91           pop    r28
  cc:    1f 91           pop    r17
  ce:    0f 91           pop    r16
  d0:    08 95           ret

000000d2 <main>:
    /* Replace with your application code */
    while (1) 
    {
     uint32_t in =0; //=  1599999;  // -> 15 9 9 9 9 9    
     //for example, input from port B
     DDRB=0x00;
  d2:    17 ba           out    0x17, r1    ; 23
     in|=(uint32_t)((uint32_t)PINB<<0);
  d4:    26 b3           in    r18, 0x16    ; 22
     in|=(uint32_t)((uint32_t)PINB<<8);
  d6:    36 b3           in    r19, 0x16    ; 22
     in|=(uint32_t)((uint32_t)PINB<<16);
  d8:    66 b3           in    r22, 0x16    ; 22
  da:    86 2f           mov    r24, r22
  dc:    90 e0           ldi    r25, 0x00    ; 0
  de:    a0 e0           ldi    r26, 0x00    ; 0
  e0:    b0 e0           ldi    r27, 0x00    ; 0
  e2:    dc 01           movw    r26, r24
  e4:    99 27           eor    r25, r25
  e6:    88 27           eor    r24, r24
  e8:    93 2b           or    r25, r19
  ea:    82 2b           or    r24, r18
     in|=(uint32_t)((uint32_t)PINB<<24);
  ec:    26 b3           in    r18, 0x16    ; 22
     
     GetDigitsFromUint32_t(  in);    
  ee:    bc 01           movw    r22, r24
  f0:    cd 01           movw    r24, r26
  f2:    92 2b           or    r25, r18
  f4:    0e 94 36 00     call    0x6c    ; 0x6c <GetDigitsFromUint32_t>
  f8:    ec cf           rjmp    .-40         ; 0xd2 <main>

000000fa <__udivmodsi4>:
  fa:    a1 e2           ldi    r26, 0x21    ; 33
  fc:    1a 2e           mov    r1, r26
  fe:    aa 1b           sub    r26, r26
 100:    bb 1b           sub    r27, r27
 102:    fd 01           movw    r30, r26
 104:    0d c0           rjmp    .+26         ; 0x120 <__udivmodsi4_ep>

00000106 <__udivmodsi4_loop>:
 106:    aa 1f           adc    r26, r26
 108:    bb 1f           adc    r27, r27
 10a:    ee 1f           adc    r30, r30
 10c:    ff 1f           adc    r31, r31
 10e:    a2 17           cp    r26, r18
 110:    b3 07           cpc    r27, r19
 112:    e4 07           cpc    r30, r20
 114:    f5 07           cpc    r31, r21
 116:    20 f0           brcs    .+8          ; 0x120 <__udivmodsi4_ep>
 118:    a2 1b           sub    r26, r18
 11a:    b3 0b           sbc    r27, r19
 11c:    e4 0b           sbc    r30, r20
 11e:    f5 0b           sbc    r31, r21

00000120 <__udivmodsi4_ep>:
 120:    66 1f           adc    r22, r22
 122:    77 1f           adc    r23, r23
 124:    88 1f           adc    r24, r24
 126:    99 1f           adc    r25, r25
 128:    1a 94           dec    r1
 12a:    69 f7           brne    .-38         ; 0x106 <__udivmodsi4_loop>
 12c:    60 95           com    r22
 12e:    70 95           com    r23
 130:    80 95           com    r24
 132:    90 95           com    r25
 134:    9b 01           movw    r18, r22
 136:    ac 01           movw    r20, r24
 138:    bd 01           movw    r22, r26
 13a:    cf 01           movw    r24, r30
 13c:    08 95           ret

0000013e <_exit>:
 13e:    f8 94           cli

00000140 <__stop_program>:
 140:    ff cf           rjmp    .-2          ; 0x140 <__stop_program>
 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

cycle counter 2953

frequency 12.000MHz

stop watch 246,08 us  ; after  execuiting GetDigitsFromUint32_t(  in); 

 

How to create more fast algorithm of decoding (input and output bytes  are for example in this program for  virtual modelling only )? 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

/*
 * GccApplication2.c
 *
 * Created: 29.03.2020 10:38:02
 * Author : USERPC01
 */ 

#include <avr/io.h>
#include <math.h>
/*
uint8_t  div32bit_mod (uint32_t  in, uint32_t  div,    uint32_t *mod    )
{
  uint32_t res;
      res=(uint32_t)in /div;
    
 *mod=(uint32_t)   (in - (res*div)) ;
*mod=(uint32_t)(in%div); //  (in - (res*div)) ;

return (uint8_t)(in/div) ;// res;
}*/

uint8_t divmod( uint32_t inp, uint32_t  div,  uint32_t *mod )
{
 //return (uint8_t) div32bit_mod( inp,  div, reminder  );
     *mod=(uint32_t)(inp%div); //  (in - (res*div)) ;

     return (uint8_t)(inp/div) ;// res;
}

void  GetDigitsFromUint32_t( uint32_t in )  //"uint24_t" 
{
 uint8_t digits[5];
    /*
    digits[0]=(uint8_t)(in/100000) ;  in=in%100000;     
    digits[1]=(uint8_t)(in/10000) ;  in=in%10000;  
    digits[2]=(uint8_t)(in/1000) ;  in=in%1000;
    digits[3]=(uint8_t)(in/100) ;  in=in%100;
    digits[4]=(uint8_t)(in/10) ;  in=in%10;
    digits[5]=(uint8_t) (in);
        */    
   digits[0]=(uint8_t) divmod(  in , 0x000186A0 /*100000*/ ,  &in );
   digits[1]=(uint8_t) divmod(  in , 0x00002710 /*10000*/,  &in );
   digits[2]=(uint8_t) divmod(  in , 0x000003e8 /*1000*/,  &in );
   digits[3]=(uint8_t) divmod(  in , 0x00000064 /*100*/,  &in );
   digits[4]=(uint8_t) divmod(  in , 0x0000000A /*10*/,  &in ); 
   digits[5]=(uint8_t) divmod(  in , 0x00000001 /* 1*/ ,  &in );   
     
     
     //for example, for output to PORTD 
          DDRD=0xFF;
         PORTD=digits[0];
         PORTD=digits[1];
         PORTD=digits[2];
         PORTD=digits[3];
         PORTD=digits[4];
         PORTD=digits[5];
     
return    ;    
}

uint32_t DivideBy10(uint32_t inp)
{
return (uint32_t) (inp/10);     
    
}
//for debugger test only 

int main(void)
{
    /* Replace with your application code */
    while (1) 
    {
     uint32_t in =0; //=  1599999;  // -> 15 9 9 9 9 9    
     //for example, input from port B
     DDRB=0;
     DDRB=0x00;
     in|=(uint32_t)((uint32_t)PINB<<0);
     in|=(uint32_t)((uint32_t)PINB<<8);
     in|=(uint32_t)((uint32_t)PINB<<16);
     in|=(uint32_t)((uint32_t)PINB<<24);
     
    // GetDigitsFromUint32_t(  in);
    uint32_t out;    
       out=   DivideBy10(in);
       //for example, out bytes to port D  
       DDRD=0xff;
       PORTD=(uint8_t)((out&0x000000ff)>>0);
       PORTD=(uint8_t)((out&0x0000ff00)>>8);
       PORTD=(uint8_t)((out&0x00ff0000)>>16);
       PORTD=(uint8_t)((out&0xff000000)>>24);
       
    }
}

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

GccApplication2.elf:     file format elf32-avr

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000104  00000000  00000000  00000054  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  00800060  00800060  00000158  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .comment      00000030  00000000  00000000  00000158  2**0
                  CONTENTS, READONLY
  3 .note.gnu.avr.deviceinfo 0000003c  00000000  00000000  00000188  2**2
                  CONTENTS, READONLY
  4 .debug_aranges 00000038  00000000  00000000  000001c4  2**0
                  CONTENTS, READONLY, DEBUGGING
  5 .debug_info   000007da  00000000  00000000  000001fc  2**0
                  CONTENTS, READONLY, DEBUGGING
  6 .debug_abbrev 00000631  00000000  00000000  000009d6  2**0
                  CONTENTS, READONLY, DEBUGGING
  7 .debug_line   000002b9  00000000  00000000  00001007  2**0
                  CONTENTS, READONLY, DEBUGGING
  8 .debug_frame  00000074  00000000  00000000  000012c0  2**2
                  CONTENTS, READONLY, DEBUGGING
  9 .debug_str    00000337  00000000  00000000  00001334  2**0
                  CONTENTS, READONLY, DEBUGGING
 10 .debug_loc    0000044a  00000000  00000000  0000166b  2**0
                  CONTENTS, READONLY, DEBUGGING
 11 .debug_ranges 00000040  00000000  00000000  00001ab5  2**0
                  CONTENTS, READONLY, DEBUGGING

Disassembly of section .text:

00000000 <__vectors>:
   0:    0c 94 2a 00     jmp    0x54    ; 0x54 <__ctors_end>
   4:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
   8:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
   c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  10:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  14:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  18:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  1c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  20:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  24:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  28:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  2c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  30:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  34:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  38:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  3c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  40:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  44:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  48:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  4c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  50:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>

00000054 <__ctors_end>:
  54:    11 24           eor    r1, r1
  56:    1f be           out    0x3f, r1    ; 63
  58:    cf e5           ldi    r28, 0x5F    ; 95
  5a:    d4 e0           ldi    r29, 0x04    ; 4
  5c:    de bf           out    0x3e, r29    ; 62
  5e:    cd bf           out    0x3d, r28    ; 61
  60:    0e 94 36 00     call    0x6c    ; 0x6c <main>
  64:    0c 94 80 00     jmp    0x100    ; 0x100 <_exit>

00000068 <__bad_interrupt>:
  68:    0c 94 00 00     jmp    0    ; 0x0 <__vectors>

0000006c <main>:
return    ;    
}

uint32_t DivideBy10(uint32_t inp)
{
return (uint32_t) (inp/10);     
  6c:    0f 2e           mov    r0, r31
  6e:    fa e0           ldi    r31, 0x0A    ; 10
  70:    cf 2e           mov    r12, r31
  72:    d1 2c           mov    r13, r1
  74:    e1 2c           mov    r14, r1
  76:    f1 2c           mov    r15, r1
  78:    f0 2d           mov    r31, r0
     
    // GetDigitsFromUint32_t(  in);
    uint32_t out;    
       out=   DivideBy10(in);
       //for example, out bytes to port D  
       DDRD=0xff;
  7a:    cf ef           ldi    r28, 0xFF    ; 255
    /* Replace with your application code */
    while (1) 
    {
     uint32_t in =0; //=  1599999;  // -> 15 9 9 9 9 9    
     //for example, input from port B
     DDRB=0;
  7c:    17 ba           out    0x17, r1    ; 23
     DDRB=0x00;
  7e:    17 ba           out    0x17, r1    ; 23
     in|=(uint32_t)((uint32_t)PINB<<0);
  80:    26 b3           in    r18, 0x16    ; 22
     in|=(uint32_t)((uint32_t)PINB<<8);
  82:    36 b3           in    r19, 0x16    ; 22
     in|=(uint32_t)((uint32_t)PINB<<16);
  84:    66 b3           in    r22, 0x16    ; 22
  86:    86 2f           mov    r24, r22
  88:    90 e0           ldi    r25, 0x00    ; 0
  8a:    a0 e0           ldi    r26, 0x00    ; 0
  8c:    b0 e0           ldi    r27, 0x00    ; 0
  8e:    dc 01           movw    r26, r24
  90:    99 27           eor    r25, r25
  92:    88 27           eor    r24, r24
  94:    93 2b           or    r25, r19
  96:    82 2b           or    r24, r18
     in|=(uint32_t)((uint32_t)PINB<<24);
  98:    26 b3           in    r18, 0x16    ; 22
return    ;    
}

uint32_t DivideBy10(uint32_t inp)
{
return (uint32_t) (inp/10);     
  9a:    bc 01           movw    r22, r24
  9c:    cd 01           movw    r24, r26
  9e:    92 2b           or    r25, r18
  a0:    a7 01           movw    r20, r14
  a2:    96 01           movw    r18, r12
  a4:    0e 94 5e 00     call    0xbc    ; 0xbc <__udivmodsi4>
     
    // GetDigitsFromUint32_t(  in);
    uint32_t out;    
       out=   DivideBy10(in);
       //for example, out bytes to port D  
       DDRD=0xff;
  a8:    c1 bb           out    0x11, r28    ; 17
       PORTD=(uint8_t)((out&0x000000ff)>>0);
  aa:    22 bb           out    0x12, r18    ; 18
       PORTD=(uint8_t)((out&0x0000ff00)>>8);
  ac:    32 bb           out    0x12, r19    ; 18
       PORTD=(uint8_t)((out&0x00ff0000)>>16);
  ae:    42 bb           out    0x12, r20    ; 18
       PORTD=(uint8_t)((out&0xff000000)>>24);
  b0:    85 2f           mov    r24, r21
  b2:    99 27           eor    r25, r25
  b4:    aa 27           eor    r26, r26
  b6:    bb 27           eor    r27, r27
  b8:    82 bb           out    0x12, r24    ; 18
  ba:    e0 cf           rjmp    .-64         ; 0x7c <main+0x10>

000000bc <__udivmodsi4>:
  bc:    a1 e2           ldi    r26, 0x21    ; 33
  be:    1a 2e           mov    r1, r26
  c0:    aa 1b           sub    r26, r26
  c2:    bb 1b           sub    r27, r27
  c4:    fd 01           movw    r30, r26
  c6:    0d c0           rjmp    .+26         ; 0xe2 <__udivmodsi4_ep>

000000c8 <__udivmodsi4_loop>:
  c8:    aa 1f           adc    r26, r26
  ca:    bb 1f           adc    r27, r27
  cc:    ee 1f           adc    r30, r30
  ce:    ff 1f           adc    r31, r31
  d0:    a2 17           cp    r26, r18
  d2:    b3 07           cpc    r27, r19
  d4:    e4 07           cpc    r30, r20
  d6:    f5 07           cpc    r31, r21
  d8:    20 f0           brcs    .+8          ; 0xe2 <__udivmodsi4_ep>
  da:    a2 1b           sub    r26, r18
  dc:    b3 0b           sbc    r27, r19
  de:    e4 0b           sbc    r30, r20
  e0:    f5 0b           sbc    r31, r21

000000e2 <__udivmodsi4_ep>:
  e2:    66 1f           adc    r22, r22
  e4:    77 1f           adc    r23, r23
  e6:    88 1f           adc    r24, r24
  e8:    99 1f           adc    r25, r25
  ea:    1a 94           dec    r1
  ec:    69 f7           brne    .-38         ; 0xc8 <__udivmodsi4_loop>
  ee:    60 95           com    r22
  f0:    70 95           com    r23
  f2:    80 95           com    r24
  f4:    90 95           com    r25
  f6:    9b 01           movw    r18, r22
  f8:    ac 01           movw    r20, r24
  fa:    bd 01           movw    r22, r26
  fc:    cf 01           movw    r24, r30
  fe:    08 95           ret

00000100 <_exit>:
 100:    f8 94           cli

00000102 <__stop_program>:
 102:    ff cf           rjmp    .-2          ; 0x102 <__stop_program>
 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

/*
 * GccApplication2.c
 *
 * Created: 29.03.2020 10:38:02
 * Author : USERPC01
 */ 

#include <avr/io.h>
#include <math.h>
/*
uint8_t  div32bit_mod (uint32_t  in, uint32_t  div,    uint32_t *mod    )
{
  uint32_t res;
      res=(uint32_t)in /div;
    
 *mod=(uint32_t)   (in - (res*div)) ;
*mod=(uint32_t)(in%div); //  (in - (res*div)) ;

return (uint8_t)(in/div) ;// res;
}*/

uint8_t divmod( uint32_t inp, uint32_t  div,  uint32_t *mod )
{
 //return (uint8_t) div32bit_mod( inp,  div, reminder  );
     *mod=(uint32_t)(inp%div); //  (in - (res*div)) ;

     return (uint8_t)(inp/div) ;// res;
}

void  GetDigitsFromUint32_t( uint32_t in )  //"uint24_t" 
{
 uint8_t digits[5];
    /*
    digits[0]=(uint8_t)(in/100000) ;  in=in%100000;     
    digits[1]=(uint8_t)(in/10000) ;  in=in%10000;  
    digits[2]=(uint8_t)(in/1000) ;  in=in%1000;
    digits[3]=(uint8_t)(in/100) ;  in=in%100;
    digits[4]=(uint8_t)(in/10) ;  in=in%10;
    digits[5]=(uint8_t) (in);
        */    
   digits[0]=(uint8_t) divmod(  in , 0x000186A0 /*100000*/ ,  &in );
   digits[1]=(uint8_t) divmod(  in , 0x00002710 /*10000*/,  &in );
   digits[2]=(uint8_t) divmod(  in , 0x000003e8 /*1000*/,  &in );
   digits[3]=(uint8_t) divmod(  in , 0x00000064 /*100*/,  &in );
   digits[4]=(uint8_t) divmod(  in , 0x0000000A /*10*/,  &in ); 
   digits[5]=(uint8_t) divmod(  in , 0x00000001 /* 1*/ ,  &in );   
     
     
     //for example, for output to PORTD 
          DDRD=0xFF;
         PORTD=digits[0];
         PORTD=digits[1];
         PORTD=digits[2];
         PORTD=digits[3];
         PORTD=digits[4];
         PORTD=digits[5];
     
return    ;    
}

uint32_t DivideBy5(uint32_t inp)
{
return (uint32_t) (inp/5);     
    
}
//for debugger test only 

int main(void)
{
    /* Replace with your application code */
    while (1) 
    {
     uint32_t in =0; //=  1599999;  // -> 15 9 9 9 9 9    
     //for example, input from port B
     DDRB=0;
     DDRB=0x00;
     in|=(uint32_t)((uint32_t)PINB<<0);
     in|=(uint32_t)((uint32_t)PINB<<8);
     in|=(uint32_t)((uint32_t)PINB<<16);
     in|=(uint32_t)((uint32_t)PINB<<24);
     
    // GetDigitsFromUint32_t(  in);
    uint32_t out;    
       out=   DivideBy5(in);
       //for example, out bytes to port D  
       DDRD=0xff;
       PORTD=(uint8_t)((out&0x000000ff)>>0);
       PORTD=(uint8_t)((out&0x0000ff00)>>8);
       PORTD=(uint8_t)((out&0x00ff0000)>>16);
       PORTD=(uint8_t)((out&0xff000000)>>24);
       
    }
}

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

GccApplication2.elf:     file format elf32-avr

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000196  00000000  00000000  00000054  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  00800060  00800060  000001ea  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .comment      00000030  00000000  00000000  000001ea  2**0
                  CONTENTS, READONLY
  3 .note.gnu.avr.deviceinfo 0000003c  00000000  00000000  0000021c  2**2
                  CONTENTS, READONLY
  4 .debug_aranges 00000038  00000000  00000000  00000258  2**0
                  CONTENTS, READONLY, DEBUGGING
  5 .debug_info   000007da  00000000  00000000  00000290  2**0
                  CONTENTS, READONLY, DEBUGGING
  6 .debug_abbrev 00000610  00000000  00000000  00000a6a  2**0
                  CONTENTS, READONLY, DEBUGGING
  7 .debug_line   000002b3  00000000  00000000  0000107a  2**0
                  CONTENTS, READONLY, DEBUGGING
  8 .debug_frame  000000a4  00000000  00000000  00001330  2**2
                  CONTENTS, READONLY, DEBUGGING
  9 .debug_str    00000336  00000000  00000000  000013d4  2**0
                  CONTENTS, READONLY, DEBUGGING
 10 .debug_loc    000004d4  00000000  00000000  0000170a  2**0
                  CONTENTS, READONLY, DEBUGGING
 11 .debug_ranges 00000028  00000000  00000000  00001bde  2**0
                  CONTENTS, READONLY, DEBUGGING

Disassembly of section .text:

00000000 <__vectors>:
   0:    0c 94 2a 00     jmp    0x54    ; 0x54 <__ctors_end>
   4:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
   8:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
   c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  10:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  14:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  18:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  1c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  20:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  24:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  28:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  2c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  30:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  34:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  38:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  3c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  40:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  44:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  48:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  4c:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>
  50:    0c 94 34 00     jmp    0x68    ; 0x68 <__bad_interrupt>

00000054 <__ctors_end>:
  54:    11 24           eor    r1, r1
  56:    1f be           out    0x3f, r1    ; 63
  58:    cf e5           ldi    r28, 0x5F    ; 95
  5a:    d4 e0           ldi    r29, 0x04    ; 4
  5c:    de bf           out    0x3e, r29    ; 62
  5e:    cd bf           out    0x3d, r28    ; 61
  60:    0e 94 36 00     call    0x6c    ; 0x6c <main>
  64:    0c 94 c9 00     jmp    0x192    ; 0x192 <_exit>

00000068 <__bad_interrupt>:
  68:    0c 94 00 00     jmp    0    ; 0x0 <__vectors>

0000006c <main>:
     
    // GetDigitsFromUint32_t(  in);
    uint32_t out;    
       out=   DivideBy5(in);
       //for example, out bytes to port D  
       DDRD=0xff;
  6c:    cf ef           ldi    r28, 0xFF    ; 255
    /* Replace with your application code */
    while (1) 
    {
     uint32_t in =0; //=  1599999;  // -> 15 9 9 9 9 9    
     //for example, input from port B
     DDRB=0;
  6e:    17 ba           out    0x17, r1    ; 23
     DDRB=0x00;
  70:    17 ba           out    0x17, r1    ; 23
     in|=(uint32_t)((uint32_t)PINB<<0);
  72:    26 b3           in    r18, 0x16    ; 22
     in|=(uint32_t)((uint32_t)PINB<<8);
  74:    36 b3           in    r19, 0x16    ; 22
     in|=(uint32_t)((uint32_t)PINB<<16);
  76:    66 b3           in    r22, 0x16    ; 22
  78:    86 2f           mov    r24, r22
  7a:    90 e0           ldi    r25, 0x00    ; 0
  7c:    a0 e0           ldi    r26, 0x00    ; 0
  7e:    b0 e0           ldi    r27, 0x00    ; 0
  80:    dc 01           movw    r26, r24
  82:    99 27           eor    r25, r25
  84:    88 27           eor    r24, r24
  86:    93 2b           or    r25, r19
  88:    82 2b           or    r24, r18
     in|=(uint32_t)((uint32_t)PINB<<24);
  8a:    26 b3           in    r18, 0x16    ; 22
return    ;    
}

uint32_t DivideBy5(uint32_t inp)
{
return (uint32_t) (inp/5);     
  8c:    bc 01           movw    r22, r24
  8e:    cd 01           movw    r24, r26
  90:    92 2b           or    r25, r18
  92:    2d ec           ldi    r18, 0xCD    ; 205
  94:    3c ec           ldi    r19, 0xCC    ; 204
  96:    4c ec           ldi    r20, 0xCC    ; 204
  98:    5c ec           ldi    r21, 0xCC    ; 204
  9a:    0e 94 68 00     call    0xd0    ; 0xd0 <__umulsidi3>
  9e:    00 e2           ldi    r16, 0x20    ; 32
  a0:    0e 94 95 00     call    0x12a    ; 0x12a <__lshrdi3>
  a4:    82 2e           mov    r8, r18
  a6:    93 2e           mov    r9, r19
  a8:    a4 2e           mov    r10, r20
  aa:    b5 2e           mov    r11, r21
  ac:    b6 94           lsr    r11
  ae:    a7 94           ror    r10
  b0:    97 94           ror    r9
  b2:    87 94           ror    r8
  b4:    b6 94           lsr    r11
  b6:    a7 94           ror    r10
  b8:    97 94           ror    r9
  ba:    87 94           ror    r8
     
    // GetDigitsFromUint32_t(  in);
    uint32_t out;    
       out=   DivideBy5(in);
       //for example, out bytes to port D  
       DDRD=0xff;
  bc:    c1 bb           out    0x11, r28    ; 17
       PORTD=(uint8_t)((out&0x000000ff)>>0);
  be:    82 ba           out    0x12, r8    ; 18
       PORTD=(uint8_t)((out&0x0000ff00)>>8);
  c0:    92 ba           out    0x12, r9    ; 18
       PORTD=(uint8_t)((out&0x00ff0000)>>16);
  c2:    a2 ba           out    0x12, r10    ; 18
       PORTD=(uint8_t)((out&0xff000000)>>24);
  c4:    8b 2c           mov    r8, r11
  c6:    99 24           eor    r9, r9
  c8:    aa 24           eor    r10, r10
  ca:    bb 24           eor    r11, r11
  cc:    82 ba           out    0x12, r8    ; 18
  ce:    cf cf           rjmp    .-98         ; 0x6e <main+0x2>

000000d0 <__umulsidi3>:
  d0:    e8 94           clt

000000d2 <__umulsidi3_helper>:
  d2:    df 93           push    r29
  d4:    cf 93           push    r28
  d6:    fc 01           movw    r30, r24
  d8:    db 01           movw    r26, r22
  da:    0e 94 b1 00     call    0x162    ; 0x162 <__umulhisi3>
  de:    7f 93           push    r23
  e0:    6f 93           push    r22
  e2:    e9 01           movw    r28, r18
  e4:    9a 01           movw    r18, r20
  e6:    ac 01           movw    r20, r24
  e8:    bf 93           push    r27
  ea:    af 93           push    r26
  ec:    3f 93           push    r19
  ee:    2f 93           push    r18
  f0:    df 01           movw    r26, r30
  f2:    0e 94 b1 00     call    0x162    ; 0x162 <__umulhisi3>
  f6:    26 f4           brtc    .+8          ; 0x100 <__umulsidi3_helper+0x2e>
  f8:    6c 1b           sub    r22, r28
  fa:    7d 0b           sbc    r23, r29
  fc:    82 0b           sbc    r24, r18
  fe:    93 0b           sbc    r25, r19
 100:    9e 01           movw    r18, r28
 102:    eb 01           movw    r28, r22
 104:    fc 01           movw    r30, r24
 106:    0e 94 c0 00     call    0x180    ; 0x180 <__muldi3_6>
 10a:    af 91           pop    r26
 10c:    bf 91           pop    r27
 10e:    2f 91           pop    r18
 110:    3f 91           pop    r19
 112:    0e 94 c0 00     call    0x180    ; 0x180 <__muldi3_6>
 116:    be 01           movw    r22, r28
 118:    cf 01           movw    r24, r30
 11a:    f9 01           movw    r30, r18
 11c:    2f 91           pop    r18
 11e:    3f 91           pop    r19
 120:    cf 91           pop    r28
 122:    df 91           pop    r29
 124:    08 95           ret

00000126 <__ashrdi3>:
 126:    97 fb           bst    r25, 7
 128:    10 f8           bld    r1, 0

0000012a <__lshrdi3>:
 12a:    16 94           lsr    r1
 12c:    00 08           sbc    r0, r0
 12e:    0f 93           push    r16
 130:    08 30           cpi    r16, 0x08    ; 8
 132:    98 f0           brcs    .+38         ; 0x15a <__lshrdi3+0x30>
 134:    08 50           subi    r16, 0x08    ; 8
 136:    23 2f           mov    r18, r19
 138:    34 2f           mov    r19, r20
 13a:    45 2f           mov    r20, r21
 13c:    56 2f           mov    r21, r22
 13e:    67 2f           mov    r22, r23
 140:    78 2f           mov    r23, r24
 142:    89 2f           mov    r24, r25
 144:    90 2d           mov    r25, r0
 146:    f4 cf           rjmp    .-24         ; 0x130 <__lshrdi3+0x6>
 148:    05 94           asr    r0
 14a:    97 95           ror    r25
 14c:    87 95           ror    r24
 14e:    77 95           ror    r23
 150:    67 95           ror    r22
 152:    57 95           ror    r21
 154:    47 95           ror    r20
 156:    37 95           ror    r19
 158:    27 95           ror    r18
 15a:    0a 95           dec    r16
 15c:    aa f7           brpl    .-22         ; 0x148 <__lshrdi3+0x1e>
 15e:    0f 91           pop    r16
 160:    08 95           ret

00000162 <__umulhisi3>:
 162:    a2 9f           mul    r26, r18
 164:    b0 01           movw    r22, r0
 166:    b3 9f           mul    r27, r19
 168:    c0 01           movw    r24, r0
 16a:    a3 9f           mul    r26, r19
 16c:    70 0d           add    r23, r0
 16e:    81 1d           adc    r24, r1
 170:    11 24           eor    r1, r1
 172:    91 1d           adc    r25, r1
 174:    b2 9f           mul    r27, r18
 176:    70 0d           add    r23, r0
 178:    81 1d           adc    r24, r1
 17a:    11 24           eor    r1, r1
 17c:    91 1d           adc    r25, r1
 17e:    08 95           ret

00000180 <__muldi3_6>:
 180:    0e 94 b1 00     call    0x162    ; 0x162 <__umulhisi3>
 184:    46 0f           add    r20, r22
 186:    57 1f           adc    r21, r23
 188:    c8 1f           adc    r28, r24
 18a:    d9 1f           adc    r29, r25
 18c:    08 f4           brcc    .+2          ; 0x190 <__muldi3_6+0x10>
 18e:    31 96           adiw    r30, 0x01    ; 1
 190:    08 95           ret

00000192 <_exit>:
 192:    f8 94           cli

00000194 <__stop_program>:
 194:    ff cf           rjmp    .-2          ; 0x194 <__stop_program>
 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What is the point of posting all of his oddball code?  At least explain what exactly you are doing (methods, how, why)  (this is a discussion forum)...otherwise it will likely all be unused (even if it is great) and just swept into the dust pan and put in the dumpster.   

What is the advantage(s) of all of this code?

Why have you completely forgotten to at least comment any of your code?  

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sun. Mar 29, 2020 - 09:18 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have a pretty strong feeling we're being trolled. At the very least OP seems incapable of taking onboard any advice being given. #1 talked about an assembler solution, many excellent ideas were proposed, all ignored. When OP showed code using / and % I pointed out that there was no point in doing TWO divides when it could be done with one. Also ignored. Don't see the point in trying to advise any more.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The OP does appear to have a history of asking questions and not replying.

#1 This forum helps those that help themselves

#2 All grounds are not created equal

#3 How have you proved that your chip is running at xxMHz?

#4 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand." - Heater's ex-boss

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Anyway, I wrote and tested the code below for a fast division by 5. Sorry if it was done already.

 

Symbolic code:

 

x in Y[x] is the byte number of the register Y

 

N[4] the original 32-bit number

R[4] the result of N[4]/5

A[4], B[4], C[4] and K[1] temporary registers

 

div32_5:

; PUSH registers

    A[4]= N[4]

    C[4]= R[4]= 0

 

loop1:

    A[4]= A[4]-C[4]

 

  if A[4]<256, break1

 

    B[4]= (A[4]/256)*51

    R[4]= R[4]+B[4]

    C[4]= B[4]*5

    goto loop1

 

break1:

    K[1]=0

 

loop2:

    A[1]=A[1]-5

 

  if A[1]<0, goto break2

 

    K[1]=K[1]+1

    goto loop2

 

break2:

   R[4]= R[4]+K[1]

 

; POP registers

    RET

 

 

 

Last Edited: Sun. Mar 29, 2020 - 04:33 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
.def Kst = r24

.def N_3 = r23
.def N_2 = r22
.def N_1 = r21
.def N_0 = r20

.def A_3 = r19
.def A_2 = r18
.def A_1 = r17
.def A_0 = r16

.def R_3 = r15
.def R_2 = r14
.def R_1 = r13
.def R_0 = r12

.def B_3 = r11
.def B_2 = r10
.def B_1 = r9
.def B_0 = r8

.def C_3 = r7
.def C_2 = r6
.def C_1 = r5
.def C_0 = r4

; divident
.equ dd_Q = 0xFF
.equ dd_U = 0xFF
.equ dd_H = 0xFF
.equ dd_L = 0xF0

REPEAT:
    LDI  N_3, dd_Q
    LDI  N_2, dd_U
    LDI  N_1, dd_H
    LDI  N_0, dd_L

    RCALL div32_5

    RJMP  REPEAT


div32_5:
; PUSH registers
; A[4]= N[4]
    MOV   A_0, N_0              ;1
    MOV   A_1, N_1              ;1
    MOV   A_2, N_2              ;1
    MOV   A_3, N_3              ;1

; C[4]= R[4]= 0
    CLR   C_0                   ;1
    CLR   C_1                   ;1
    CLR   C_2                   ;1
    CLR   C_3                   ;1
    CLR   R_0                   ;1
    CLR   R_1                   ;1
    CLR   R_2                   ;1
    CLR   R_3                   ;1

loop1:
; A[4]= A[4]-C[4]
    SUB   A_0, C_0              ;1
    SBC   A_1, C_1              ;1
    SBC   A_2, C_2              ;1
    SBC   A_3, C_3              ;1

; if A[4]<256, break1
    TST   A_3                   ;1
    BRNE  cnt                   ;2

    TST   A_2                   ;1
    BRNE  cnt                   ;2

    TST   A_1                   ;1
    BREQ  break1                ;1,2

cnt:
; B[4]= (A[4]/256)*51
    LDI   Kst, 51               ;1
    MUL   A_1, Kst              ;2
    MOV   B_0, r0               ;1
    MOV   B_1, r1               ;1

    MUL   A_2, Kst              ;2
    CLR   B_2                   ;1
    ADD   B_1, r0               ;1
    ADC   B_2, r1               ;1

    MUL   A_3, Kst              ;2
    CLR   B_3                   ;1
    ADD   B_2, r0               ;1
    ADC   B_3, r1               ;1

; R[4]= R[4]+B[4]
    ADD   R_0, B_0              ;1
    ADC   R_1, B_1              ;1
    ADC   R_2, B_2              ;1
    ADC   R_3, B_3              ;1

; C[4]= B[4]*5
    LDI   Kst, 5                ;1
    MUL   B_0, Kst              ;2
    MOV   C_0, r0               ;1
    MOV   C_1, r1               ;1

    MUL   B_1, Kst              ;2
    CLR   C_2                   ;1
    ADD   C_1, r0               ;1
    ADC   C_2, r1               ;1

    MUL   B_2, Kst              ;2
    CLR   C_3                   ;1
    ADD   C_2, r0               ;1
    ADC   C_3, r1               ;1

    MUL   B_3, Kst              ;2
    ADD   C_3, r0               ;1

; goto loop1
    RJMP  loop1                 ;2

break1:
; K[1]=0
    CLR   Kst                   ;1

loop2:
; A[1]=A[1]-5
    SUBI  A_0, 5                ;1

; if A[1]<0, goto break2 
    BRLO  break2                ;2

; K[1]=K[1]+1
    INC   Kst                   ;1

; goto loop2
    RJMP  loop2


break2:
; R[4]= R[4]+K[1]
    ADD   R_0, Kst              ;1
    CLR   Kst                   ;1
    ADC   R_1, Kst              ;1
    ADC   R_2, Kst              ;1
    ADC   R_3, Kst              ;1

; POP registers

    RET                         ;4

Sorry, after I minimized the registers names for clarity, some of them were already defined by AS.

I will edit it very soon.

 

Updated:

 

Last Edited: Sun. Mar 29, 2020 - 04:11 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Anyway, I wrote and tested the code below for a fast division by 5. Sorry if it was done already.

Do you usually post code with no explanation?  What is K, What is B?  Nobody will use it in the future if you don't state how to use it 

 

 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You are right, it seems I erased them by mistake while editing :(

 

Symbolic Code:

 

x in Y[x] is the byte number of the register Y

 

N[4] the original 32-bit number

R[4] the result of N[4]/5

A[4], B[4], C[4] and K[1] temporary registers

 

(#58)

 

But the slowest part is last division after "break1:", I used the simplest method (subtraction)

I couldn't find division of 8-bit by 8-bit (or 4-bit since it is 5 here).

 

Last Edited: Sun. Mar 29, 2020 - 04:36 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
;===========================================
; Dividing 32-bit register by the constant 5
;===========================================
;
; By using many registers, its cycles maximum is 289 only.
; Perhaps, there is a faster code which was done already (or will be done).
;

.def Kst = r24                  ; for constants

.def N_3 = r23                  ; the original 32-bit number
.def N_2 = r22
.def N_1 = r21
.def N_0 = r20

.def A_3 = r19                  ; temporary
.def A_2 = r18
.def A_1 = r17
.def A_0 = r16

.def R_3 = r15                  ; result of N_3:N_0 / 5
.def R_2 = r14
.def R_1 = r13
.def R_0 = r12

.def B_3 = r11                  ; temporary
.def B_2 = r10
.def B_1 = r9
.def B_0 = r8

.def C_3 = r7                   ; temporary
.def C_2 = r6
.def C_1 = r5
.def C_0 = r4

; divident , max cycles 289
.equ dd_Q = 0xFF
.equ dd_U = 0xFF
.equ dd_H = 0xFF
.equ dd_L = 0xFF

REPEAT:
    LDI  N_3, dd_Q
    LDI  N_2, dd_U
    LDI  N_1, dd_H
    LDI  N_0, dd_L

    RCALL div32_5

    RJMP  REPEAT

div32_5:
; PUSH registers
; A[4]= N[4]
    MOV   A_0, N_0              ;1
    MOV   A_1, N_1              ;1
    MOV   A_2, N_2              ;1
    MOV   A_3, N_3              ;1

; C[4]= R[4]= 0
    CLR   C_0                   ;1
    CLR   C_1                   ;1
    CLR   C_2                   ;1
    CLR   C_3                   ;1
    CLR   R_0                   ;1
    CLR   R_1                   ;1
    CLR   R_2                   ;1
    CLR   R_3                   ;1

loop1:
; A[4]= A[4]-C[4]
    SUB   A_0, C_0              ;1
    SBC   A_1, C_1              ;1
    SBC   A_2, C_2              ;1
    SBC   A_3, C_3              ;1

; if A[4]<256, break1
    TST   A_3                   ;1
    BRNE  cnt                   ;2

    TST   A_2                   ;1
    BRNE  cnt                   ;2

    TST   A_1                   ;1
    BREQ  break1                ;1,2

cnt:
; B[4]= (A[4]/256)*51
    LDI   Kst, 51               ;1
    MUL   A_1, Kst              ;2
    MOV   B_0, r0               ;1
    MOV   B_1, r1               ;1

    MUL   A_2, Kst              ;2
    CLR   B_2                   ;1
    ADD   B_1, r0               ;1
    ADC   B_2, r1               ;1

    MUL   A_3, Kst              ;2
    CLR   B_3                   ;1
    ADD   B_2, r0               ;1
    ADC   B_3, r1               ;1

; R[4]= R[4]+B[4]
    ADD   R_0, B_0              ;1
    ADC   R_1, B_1              ;1
    ADC   R_2, B_2              ;1
    ADC   R_3, B_3              ;1

; C[4]= B[4]*5
    LDI   Kst, 5                ;1
    MUL   B_0, Kst              ;2
    MOV   C_0, r0               ;1
    MOV   C_1, r1               ;1

    MUL   B_1, Kst              ;2
    CLR   C_2                   ;1
    ADD   C_1, r0               ;1
    ADC   C_2, r1               ;1

    MUL   B_2, Kst              ;2
    CLR   C_3                   ;1
    ADD   C_2, r0               ;1
    ADC   C_3, r1               ;1

    MUL   B_3, Kst              ;2
    ADD   C_3, r0               ;1

; goto loop1
    RJMP  loop1                 ;2

break1:
; A1=A0
    MOV   A_1, A_0
; C2=B2=0
    CLR   C_2
    CLR   B_2

loop2:
; A1= A1-C2
    SUB   A_1, C_2

; if A1<16, break2
    CPI   A_1, 16
    BRLO  break2

; B1= (A1/16)*3
    MOV   A_2, A_1
    SWAP  A_2
    ANDI  A_2, 0x0F
    LDI   Kst, 3
    MUL   A_2, Kst
    MOV   B_1, r0

; B2= B2+B1
    ADD   B_2, B_1

; C2= B1*5
    LDI   Kst, 5
    MUL   B_1, Kst
    MOV   C_2, r0

; goto loop2
    RJMP  loop2

break2:
; K[1]=0
    CLR   Kst                   ;1

loop3:
; A1=A1-5
    SUBI  A_1, 5                ;1

; if A1<0, goto break3
    BRLO  break3                ;2

; K[1]=K[1]+1
    INC   Kst                   ;1

; goto loop3
    RJMP  loop3

break3:
; K[1]= K[1]+B2
    ADD   Kst, B_2

; R[4]= R[4]+K[1]
    ADC   R_0, Kst              ;1
    CLR   Kst                   ;1
    ADC   R_1, Kst              ;1
    ADC   R_2, Kst              ;1
    ADC   R_3, Kst              ;1

; POP registers

    RET                         ;4

 

Last Edited: Sun. Mar 29, 2020 - 07:49 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

https://gmplib.org/~tege/divcnst...

 

So, for instance, consider:

 

#define BUFSIZE 50
char *decimal (unsigned int x)
{
    static char buf[BUFSIZE];
    char *bp = buf + BUFSIZE - 1;
    *bp = 0;
    do {
        *--bp = ’0’ + x % 10;
        x /= 10;
    } while (x != 0);
    return bp; /* Return pointer to first digit */
}

With this research, first published quite some time ago, the actual implementation can be... quite a bit different.

 

The paper gives samples in assembly for common targets, and whaddya know, it's very similar to the code you already listed, only with the constant 0xcccccccd because it's for 32-bit values.

 

 

That said: I don't think this is gonna work out. Looking at your most recent code, you have four consecutive reads from PINB, and then in another function, four consecutive writes to PORTD. That is not how anything works. I don't think detailed discussions of the multiplication algorithm are going to do you any good if you don't know how to produce working code, and can't articulate what you want to do with it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Suggestion: I'd guess you could compute X/5, where X is an 8-bit value, using a lookup table. Progmem, obviously.

static const uint8_t over5[256] = {
    0, 0, 0, 0, 0,
    1, 1, 1, 1, 1,
    /* ... */
    50, 50, 50, 50, 50,
    51,
};

this is almost certainly faster than the loop, and it does impose any code size at all, but it seems pretty cheap.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'd guess you could compute X/5, where X is an 8-bit value,

We want 32bits, not 8bits.  That makes a lookup table impractical.

An interesting question is whether you can take advantage of the CCCCCCCC... bit pattern to implement the multiplication faster or smaller than you could do a generic 32x32 multiply with an 8bit multiplier (and, I guess: do you need all 32bits of the constant to get an accurate divisor, for these specific cases?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

???

does impose any code size at all

Before you start to code you have spend 256 byte of code space.  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

I'd guess you could compute X/5, where X is an 8-bit value,

We want 32bits, not 8bits.  That makes a lookup table impractical.

An interesting question is whether you can take advantage of the CCCCCCCC... bit pattern to implement the multiplication faster or smaller than you could do a generic 32x32 multiply with an 8bit multiplier (and, I guess: do you need all 32bits of the constant to get an accurate divisor, for these specific cases?

 

That was directed specifically at KerimF's thing, which has an inner loop to compute X/5 on an 8-bit value.

 

So at a smallish overhead in progmem, that inner loop could go away.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
An interesting question is whether you can take advantage of the CCCCCCCC... bit pattern to implement the multiplication faster or smaller than you could do a generic 32x32 multiply with an 8bit multiplier (and, I guess: do you need all 32bits of the constant to get an accurate divisor, for these specific cases?

We can, we only need to do a 32*8 mul, the rest is just sums.

This is a test program:

#include <stdio.h>
#include<stdint.h>


int main(int argc, char *argv[])
{
	uint32_t number = 12345;
	uint64_t tmp;

	tmp = number * 0xcc;
	tmp += tmp >> 8;
	tmp += tmp >> 16;
	tmp += 512;
	tmp >>= 10;

	return tmp;
}