## Fast conversion of Integer to BCD; assembly atmega328p.

141 posts / 0 new

## Pages

I am reopening this post for a small, but important issue.

I am using the word "Remainder" as in a result of Modulo(Integer,1000).

You are using remainder as the decimal part of an integer division.

When I look at https://en.wikipedia.org/wiki/Re..., I might

think that I haven't incorrectly used the word.

Am I wrong?

Last Edited: Fri. Feb 5, 2021 - 06:00 PM

Nope remainder is where you for example divide 13 by 3 and get the dividend 4 with a remainder of 1. 3 goes into 13 four times with 1 left over.

It's true that 13/3 can also be seen as 4.33 where the 4 is the integer result and the .333 (which represent the "remainder 1") is the fractional result but it would be wrong to refer to the .33 part of this as the "remainder".

So, I just raised the problem to El Tangas:

"so the next digits pop up from the remainder just by mul with 10"

I think that's the decimal part of the result of dividing the

"Two Bytes" Number by 1000.

He multiplies the number by constant 256^3/1000;

that is multiplied by 256^3 and then dividing by 1000; he is making an unambiguous Approximation.

Thus, you have to round the Decimal part on very, very many divisions. It's normal.

The Problem is Remainder is an Integer, not a Fractional Number.

So, I thing here he was unintentionally misleading.

When you multiply by 10 the decimal part of a division it's obvious that the "next digits pop up".

This is not True about Remainder.

Last Edited: Sat. Feb 6, 2021 - 01:00 AM

I was trying to understand El Tangas's algorithm, and I encounterd difficulties.

Than it poped my mind that maybe he's not meaning Remainder.

I think that many people are using Remainder as "the decimal part of a fractional number",

but it is not absolutely correct.

Thank you.

Last Edited: Fri. Feb 5, 2021 - 06:06 PM

clawson wrote:
it would be wrong to refer to the .33 part of this as the "remainder".

Catalin Ioan Stanciu wrote:

I think that many people are using Remainder as "the decimal part of a fractional number",

but it is not absolutely correct.

That's right, I mean the fractional part, not "remainder" in a strict mathematical sense. Sorry if it caused confusion.

IIRC I used some kind of fixed point format.

Last Edited: Fri. Feb 5, 2021 - 06:18 PM

I studied your algo. Though uses aproximation it's avg. just few fractions of Cycle faster than mine.

I tried hard to get one instruction faster on my algo, but in vain.

I guess aproximation works harder on many Bytes. I'll try there. :-)

I'll update all my sources, from time to time, on first post. (#1)

Thanks a lot.

Are you now talking about a 16 or 24 bit version , packed / unpacked and what is the number of clocks ?

Then I will see if I can solve at better way :)

For others yes I do this for fun, like a crossword, since we know it can be less than  than 1% of the AVR's time there is no need!

I'm hopeless. I'm still at contest; can't abstain.

Still the Two Bytes; were I might have found something; though Not Sure Yet!

If you discount the BCD packing, on my 16 bit, I'm at 49,53 clk AVG. <---WRONG , it is 51.53

That's one instruction, and AVG will fall below 49.

Yes, I now I'll be up on MAX; but as I said: "I can't abstain".

Last Edited: Sat. Feb 6, 2021 - 02:19 PM

try this on debug, AS7:

ldi r16,low(01234)

ldi r17,high(01234)

it gives:

and without first zero:

ldi r16,low(1234)

ldi r17,high(1234)

gives:

01234 means Octal(1234) which equals Hex (29C). It seems that I got screwd up by this!

These are measurements without counting INPUT and without BCD packing.

I am wrong now, and I was wrong also the first time I did the measurement:

my original version (V2) is         AVG=50.532 Clks   on 0-65535

my actual version (see bellow) AVG=51.98 Clks   on 0-65535

Actual version has this code bellow:

;------

breq PC+11
; r1 can be 1 or 3
dec r1
breq PC+3        ; it means it was 1, but now is 0
inc r17
mov r1,SIX
add r1,SIX    ; now r1 is SIX or TWELVE
inc r17        ; increment Quotient
brcc PC+3    ; possible one do-it-again on Carry resulted from last instruction
inc r17
; previous 13 instructions: min. 4 and max. 14 clk

;---------------

and this is worse than V2 version:

;----------

lsr r1            ; r1=0 and possible 1 shifted to C
brcc PC+4        ; branch if zero (evident)
inc R17        ; increment Quotient(R1 was one)
brcs PC-2    ; possible one do-it-again on Carry resulted from last instruction
; On previous 5 instructions
; if High(INPUT)>\$A8 5 or 9 clk, depending on RB256
; if High(INPUT)<\$A8 3 or 5 clk, depending on RB256

brcc PC+4        ; but if we have CARRY?
inc r17        ; increment Quotient
brcs PC-2    ; possible one repeat on Carry resulted from last instruction
; if there were 6 or  9 on previous paragraph we have only 3 clk  here
; if there were 3 or  5 on previous paragraph we have 3 to max. 9 here
;    so, this is somehow compensated (on both) to 6 to max. 15 clk

;---------

Were I can see that "min. 4 and max. 14 clk" is worse than "6 to max. 15 clk"

Last Edited: Sat. Feb 6, 2021 - 11:13 PM

Do you know the record on 24 and 32 bits?

Nope.

And it will start to get crowded in the registers so perhaps it should include store result in RAM.

(FYI)

My "posted here" Three Bytes(24bit) version is AVG=145.76 from 00:00:00 to FF:FF:FF

My "posted here" Three Bytes(24bit) version is AVG=148.07 from FF:00:00 to FF:FF:FF

(it'll take hours to count from 00:00:00 to FF:FF:FF; it should thus be under 150 clk)

Last Edited: Sat. Feb 6, 2021 - 11:08 PM

49.86 clk

My First Version on Two Bytes AVG under 50 Cycles.

## Attachment(s):

nice job

yes the 99 split works fine without correction for 65, the only errors are at 69 79 89 and 99.(I write 64 because the check is for that bit)

See #41 there are all the numbers.

One thing could actually be to make a version that don't use the HW mul. That will be slower (about 70-80 clk), but by far the fastest on on chips like tiny85.

Last Edited: Mon. Feb 8, 2021 - 10:37 AM

You made me do that nice job; and I'm very glad you did it!

You are right about 65; it was in my mind to change something in comment there, too.

But, I'm still searching on why can't I get one instr. faster. Can't move along until that hepens.

HW mul is very powerfull; I can make "a touch" on the subject, as a pause.

Yes, it would be a good thing to make a change in my thoughts; at least for a while.

Guess first "MUL by 6" can be relaced with something like:

mov Rn,R16

brcc      ; check Carry

The rest can be done with with "my 12.5clk version"; see. #25 .

I'll try.

AVG 73.18 CLK

This is a DRAFT, still untested, but seems to work; I'll try to fasten and update.

Att. Removed. It is WRONG. Has 26200/65536 errors on output. I'll be back.

Replaced .DEF SIX=R18 with .DEF SIX=R19; because I was using R18!

Now works 0-65535.

 Counter with program 5.18902e+06 Counter run empty 393215 Diffrence 4.7958e+06 Diffrence/65536 73.18

## Attachment(s):

Last Edited: Mon. Feb 8, 2021 - 02:35 PM

On an AVR you normally don't check for carry, but have a register that is zero, so you just ADC with zero (I always use R15, don't use R1 as GCC)

Yes, this I'll have to learn.

It's just a rapid transformation; copy and paste ... Draft, No Deep Thinking. Please excuse.

Thanks.

See Att. tested Atiny85 BCD Packed version.

Result BCD packed in R18:R17:R16
Used Registers: R0,R2,R3,R16,R17,R18

Max 83 & Min 65 Cycles and measured AVG=73.18
Used Code Segment memory 84 Words

I believe The Unpacked version will be Max<80 Cycles, also.

## Attachment(s):

but have a register that is zero, so you just ADC with zero

I'm doing the same but, Here I have both to:

- Increment Quotient

And I know i spent few days optimizing this, 'cause the "below 50" uses this.

Last Edited: Mon. Feb 8, 2021 - 04:17 PM

See Att. retested Atiny85 BCD UnPacked version.

Result in:   R20:R19:R18:R17:R16   ( R0 is also used )

Max 74 & Min 56 Cycles and measured AVG=65.18

Used Code Segment memory 76 Words

The Unpacked version is less or equal than 74 Cycles.

Above modifications in Red, due to:

mov r0,r17    ;
add r0,r0        ; These are replacement for MUL ( +3 clk )
add r0,r17    ;  as suggested by
sbc r19,r19    ; r19 = minus carry

## Attachment(s):

If this code

```  clr r19		;
mov r0,r17	;
add r0,r17	; These are replacement for MUL ( +4 clk )

is replaced by

```  mov r0,r17	;
add r0,r17	; These are replacement for MUL ( +4 clk )
sbc r19,r19	; r19 = minus carry

and nothing gets broken (I didn't check) a cycle may be saved.

I checked. This is finesse!

47.86 clk

My Second Version on Two Bytes AVG under 50 Cycles. (AVRe)

Tested.

; 3529736    Counter with program
;  393215    Counter run empty
; 3136520    Diffrence
47.86    Diffrence/65536       My Second Under 50 Cyles Average!
; uses 7 MUL's so it should be         43 Words length  Min/Max 47/50

It was indeed a welcome pause to do the:

Atiny85 BCD UnPacked version.

Result in:   R20:R19:R18:R17:R16   ( R0 is also used )

Max 74 & Min 56 Cycles and measured AVG=65.18

Used Code Segment memory 76 Words

The Unpacked version is less or equal than 74 Cycles.

## Attachment(s):

Presumably, things like itoa() and printf() must be doing basically this - so has anyone looked at their implementations ... ?

Top Tips:

1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...

No.

We are speaking here about 65 ns (at 16MHz)!  <-- There, in red, It was mistakenly written "S".

This is,  I have to do it "because it is there" (was first uttered by Edmund Hillary when he and Tenzing Norgay conquered Mount Everest in 1953).

Added later: May be, some should see my  "/1000" algo.

Last Edited: Wed. Feb 10, 2021 - 11:52 AM

Catalin Ioan Stanciu wrote:
65 nS

lowercase 's' for seconds.

(uppercase 'S' = siemens)

https://www.avrfreaks.net/commen...

Top Tips:

1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Wed. Feb 10, 2021 - 11:32 AM

s= second. I do apologize for any inconvenience.

Time is one of the seven fundamental physical quantities in both the International System of Units (SI) and International System of Quantities.

Seven for the Dwarven Lords, in their halls of stone.

I deleted the original post.

It was inappropriate.

Last Edited: Wed. Feb 10, 2021 - 01:24 PM

Seven Units for Seven Quantities ... ?

Top Tips:

1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...

here was the answer back in 2016

https://www.avrfreaks.net/commen...

So itoa()  take about 875clk  for 12345

(And since AVR's with and without HW take about the same time I guess it use a normal div routine.)

But perhaps the libs have changed.

if it's a signed or unsigned version don't really make a difference, (neg16 take 3 clk)

Last Edited: Wed. Feb 10, 2021 - 03:01 PM

sparrow2 wrote:
So itoa()  take about 875clk  for 12345

So just the binary-to-BCD part should be a little less?

Top Tips:

1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...

Sweet Dreams (Are Made Of This).

Who am i to disagree.

cited from "Eurythmics, Annie Lennox, Dave Stewart - Sweet Dreams (Are Made Of This) (Official Video)".

sparrow2 wrote:
But perhaps the libs have changed.
I just tried a couple and got broadly similar answers for cycles in the simulator. I was a bit surprised that megas (MUL) actually return almost identical numbers to tiny.

The core code (itoa leads to ultoa_ncheck) is here:

http://svn.savannah.gnu.org/view...

but that code use 32bit , there should be a 16bit version !?

For the sake of the argument,

let us say that the Assembly Language is like a motorcycle

and other programming languages are like a car.

Now, if you are on a motorcycle you know that drivers may fail to see motorcycles in plain sight.

You may feel you want to ride a motorcycle or not,

but for me being on a motorcycle there is no point in arguing with someone who may fail to see me.

I have a 4-byte(32bit unsigned) version of about 170clk; hope I'll finish bugs and testing in the next days.

Is there realy a need for numbers larger than 256^4-1 = 4_294_967_295 ?

This post is about speed and anyone may question the need for speed.