Fast conversion of Integer to BCD; assembly atmega328p.

Go To Last Post
141 posts / 0 new

Pages

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am reopening this post for a small, but important issue.

 

I am using the word "Remainder" as in a result of Modulo(Integer,1000).

 

In https://www.avrfreaks.net/commen...

You are using remainder as the decimal part of an integer division.

 

When I look at https://en.wikipedia.org/wiki/Re..., I might

think that I haven't incorrectly used the word.

 

Am I wrong?

Last Edited: Fri. Feb 5, 2021 - 06:00 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nope remainder is where you for example divide 13 by 3 and get the dividend 4 with a remainder of 1. 3 goes into 13 four times with 1 left over.

 

It's true that 13/3 can also be seen as 4.33 where the 4 is the integer result and the .333 (which represent the "remainder 1") is the fractional result but it would be wrong to refer to the .33 part of this as the "remainder".

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So, I just raised the problem to El Tangas:

"so the next digits pop up from the remainder just by mul with 10"

I think that's the decimal part of the result of dividing the

"Two Bytes" Number by 1000.

 

He multiplies the number by constant 256^3/1000;

that is multiplied by 256^3 and then dividing by 1000; he is making an unambiguous Approximation.

 

Thus, you have to round the Decimal part on very, very many divisions. It's normal.

 

The Problem is Remainder is an Integer, not a Fractional Number.

So, I thing here he was unintentionally misleading.

 

When you multiply by 10 the decimal part of a division it's obvious that the "next digits pop up".

This is not True about Remainder.

Last Edited: Sat. Feb 6, 2021 - 01:00 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I was trying to understand El Tangas's algorithm, and I encounterd difficulties.

Than it poped my mind that maybe he's not meaning Remainder.

 

I think that many people are using Remainder as "the decimal part of a fractional number",

but it is not absolutely correct.

 

Thank you.

Last Edited: Fri. Feb 5, 2021 - 06:06 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
it would be wrong to refer to the .33 part of this as the "remainder".

 

Catalin Ioan Stanciu wrote:

I think that many people are using Remainder as "the decimal part of a fractional number",

but it is not absolutely correct.

 

That's right, I mean the fractional part, not "remainder" in a strict mathematical sense. Sorry if it caused confusion.

IIRC I used some kind of fixed point format.

 

 

Last Edited: Fri. Feb 5, 2021 - 06:18 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I studied your algo. Though uses aproximation it's avg. just few fractions of Cycle faster than mine.

I tried hard to get one instruction faster on my algo, but in vain.

 

I guess aproximation works harder on many Bytes. I'll try there. :-)

 

I'll update all my sources, from time to time, on first post. (#1)

Thanks a lot.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Are you now talking about a 16 or 24 bit version , packed / unpacked and what is the number of clocks ?

 

Then I will see if I can solve at better way :)

 

 

For others yes I do this for fun, like a crossword, since we know it can be less than  than 1% of the AVR's time there is no need!

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm hopeless. I'm still at contest; can't abstain.

 

Still the Two Bytes; were I might have found something; though Not Sure Yet! wink

 

If you discount the BCD packing, on my 16 bit, I'm at 49,53 clk AVG. <---WRONG , it is 51.53

That's one instruction, and AVG will fall below 49.

Yes, I now I'll be up on MAX; but as I said: "I can't abstain".

Last Edited: Sat. Feb 6, 2021 - 02:19 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


try this on debug, AS7:

ldi r16,low(01234)

ldi r17,high(01234)

it gives:

 

and without first zero:

ldi r16,low(1234)

ldi r17,high(1234)

gives:

01234 means Octal(1234) which equals Hex (29C). It seems that I got screwd up by this! crying

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

These are measurements without counting INPUT and without BCD packing.

 

I am wrong now, and I was wrong also the first time I did the measurement:

 

my original version (V2) is         AVG=50.532 Clks   on 0-65535

my actual version (see bellow) AVG=51.98 Clks   on 0-65535

 

Actual version has this code bellow:

;------

add r16,r0
adc r1,r1
breq PC+11
  ; r1 can be 1 or 3
  dec r1
  breq PC+3        ; it means it was 1, but now is 0
    inc r17
    mov r1,SIX
  add r1,SIX    ; now r1 is SIX or TWELVE
  inc r17        ; increment Quotient
  add r16,r1    ; add SIX or TWELVE
  brcc PC+3    ; possible one do-it-again on Carry resulted from last instruction
   inc r17
   add r16,SIX
; previous 13 instructions: min. 4 and max. 14 clk

;---------------

 

and this is worse than V2 version:

;----------

 lsr r1            ; r1=0 and possible 1 shifted to C
 brcc PC+4        ; branch if zero (evident)
  inc R17        ; increment Quotient(R1 was one)
  add r16,SIX   ; add SIX; because we INC Quotient
  brcs PC-2    ; possible one do-it-again on Carry resulted from last instruction
; On previous 5 instructions
; if High(INPUT)>$A8 5 or 9 clk, depending on RB256
; if High(INPUT)<$A8 3 or 5 clk, depending on RB256

 add r16,r0      ; we add Remainder on the old one
 brcc PC+4        ; but if we have CARRY?
  inc r17        ; increment Quotient
  add r16,SIX    ; add SIX; because we INC Quotient
  brcs PC-2    ; possible one repeat on Carry resulted from last instruction
; if there were 6 or  9 on previous paragraph we have only 3 clk  here
; if there were 3 or  5 on previous paragraph we have 3 to max. 9 here
;    so, this is somehow compensated (on both) to 6 to max. 15 clk

;---------

 

Were I can see that "min. 4 and max. 14 clk" is worse than "6 to max. 15 clk"sad

"much ado about nothing"smiley

Last Edited: Sat. Feb 6, 2021 - 11:13 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do you know the record on 24 and 32 bits?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nope.

 

And it will start to get crowded in the registers so perhaps it should include store result in RAM. 

 

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

(FYI)

My "posted here" Three Bytes(24bit) version is AVG=145.76 from 00:00:00 to FF:FF:FF

My "posted here" Three Bytes(24bit) version is AVG=148.07 from FF:00:00 to FF:FF:FF

  (it'll take hours to count from 00:00:00 to FF:FF:FF; it should thus be under 150 clk)

 

Last Edited: Sat. Feb 6, 2021 - 11:08 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

49.86 clk

 

My First Version on Two Bytes AVG under 50 Cycles.

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nice job

 

yes the 99 split works fine without correction for 65, the only errors are at 69 79 89 and 99.(I write 64 because the check is for that bit) 

See #41 there are all the numbers.

 

Add:

One thing could actually be to make a version that don't use the HW mul. That will be slower (about 70-80 clk), but by far the fastest on on chips like tiny85.

Last Edited: Mon. Feb 8, 2021 - 10:37 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You made me do that nice job; and I'm very glad you did it!

 

You are right about 65; it was in my mind to change something in comment there, too.

But, I'm still searching on why can't I get one instr. faster. Can't move along until that hepens.

 

HW mul is very powerfull; I can make "a touch" on the subject, as a pause.

Yes, it would be a good thing to make a change in my thoughts; at least for a while.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Guess first "MUL by 6" can be relaced with something like:

add R16,R16

mov Rn,R16

add Rn,Rn

add Rn,R16

brcc      ; check Carry

The rest can be done with with "my 12.5clk version"; see. #25 .

I'll try.smiley

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

AVG 73.18 CLK

This is a DRAFT, still untested, but seems to work; I'll try to fasten and update.  

Att. Removed. It is WRONG. Has 26200/65536 errors on output. I'll be back.

Replaced .DEF SIX=R18 with .DEF SIX=R19; because I was using R18!

Now works 0-65535.

 

Counter with program 5189017
Counter run empty 393215
Diffrence 4795802
Diffrence/65536 73.18

Attachment(s): 

Last Edited: Mon. Feb 8, 2021 - 02:35 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

On an AVR you normally don't check for carry, but have a register that is zero, so you just ADC with zero (I always use R15, don't use R1 as GCC)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, this I'll have to learn.smiley

 

It's just a rapid transformation; copy and paste ... Draft, No Deep Thinking. Please excuse.

Thanks.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

See Att. tested Atiny85 BCD Packed version.

 

Result BCD packed in R18:R17:R16
Used Registers: R0,R2,R3,R16,R17,R18

Max 83 & Min 65 Cycles and measured AVG=73.18
Used Code Segment memory 84 Words

I believe The Unpacked version will be Max<80 Cycles, also.

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

but have a register that is zero, so you just ADC with zero

I'm doing the same but, Here I have both to:

  - Increment Quotient

  - Add Six to Remainder

 

And I know i spent few days optimizing this, 'cause the "below 50" uses this.

Last Edited: Mon. Feb 8, 2021 - 04:17 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

See Att. retested Atiny85 BCD UnPacked version.

Result in:   R20:R19:R18:R17:R16   ( R0 is also used )

Max 74 & Min 56 Cycles and measured AVG=65.18

Used Code Segment memory 76 Words

The Unpacked version is less or equal than 74 Cycles.

 

Above modifications in Red, due to:

  mov r0,r17    ;
  add r0,r0        ; These are replacement for MUL ( +3 clk )
  add r0,r17    ;  as suggested by
  add r0,r0        ;  https://www.avrfreaks.net/users/...
  sbc r19,r19    ; r19 = minus carry

 sub  R17,r19   ; add carry

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If this code

 

  clr r19		;
  mov r0,r17	;
  add r0,r0		;
  add r0,r17	; These are replacement for MUL ( +4 clk )
  add r0,r0		;  
  adc r19,r19	;

 add  R17,r19    ; No Carry

is replaced by

 

  mov r0,r17	;
  add r0,r0		;
  add r0,r17	; These are replacement for MUL ( +4 clk )
  add r0,r0		;  
  sbc r19,r19	; r19 = minus carry

 sub  R17,r19    ; add carry

and nothing gets broken (I didn't check) a cycle may be saved.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I checked. This is finesse!

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

47.86 clk

 

My Second Version on Two Bytes AVG under 50 Cycles. (AVRe)

Tested.

 

; 3529736    Counter with program
;  393215    Counter run empty
; 3136520    Diffrence
47.86    Diffrence/65536       My Second Under 50 Cyles Average!
; uses 7 MUL's so it should be         43 Words length  Min/Max 47/50

 

It was indeed a welcome pause to do the:

Atiny85 BCD UnPacked version.

Result in:   R20:R19:R18:R17:R16   ( R0 is also used )

Max 74 & Min 56 Cycles and measured AVG=65.18

Used Code Segment memory 76 Words

The Unpacked version is less or equal than 74 Cycles.

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Presumably, things like itoa() and printf() must be doing basically this - so has anyone looked at their implementations ... ?

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No.

We are speaking here about 65 ns (at 16MHz)!  <-- There, in red, It was mistakenly written "S".

 

This is,  I have to do it "because it is there" (was first uttered by Edmund Hillary when he and Tenzing Norgay conquered Mount Everest in 1953).

 

Added later: May be, some should see my  "/1000" algo. wink

Last Edited: Wed. Feb 10, 2021 - 11:52 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Catalin Ioan Stanciu wrote:
65 nS

lowercase 's' for seconds.

 

(uppercase 'S' = siemens)

 

https://www.avrfreaks.net/commen...

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Wed. Feb 10, 2021 - 11:32 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

s= second. I do apologize for any inconvenience.

Time is one of the seven fundamental physical quantities in both the International System of Units (SI) and International System of Quantities.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Seven for the Dwarven Lords, in their halls of stone.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I deleted the original post.

It was inappropriate.

Last Edited: Wed. Feb 10, 2021 - 01:24 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Seven Units for Seven Quantities ... ?

 

laugh

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

here was the answer back in 2016

 

https://www.avrfreaks.net/commen...

 

So itoa()  take about 875clk  for 12345

(And since AVR's with and without HW take about the same time I guess it use a normal div routine.)

 

 

But perhaps the libs have changed.

 

Add:

if it's a signed or unsigned version don't really make a difference, (neg16 take 3 clk)

Last Edited: Wed. Feb 10, 2021 - 03:01 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
So itoa()  take about 875clk  for 12345

So just the binary-to-BCD part should be a little less?

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sweet Dreams (Are Made Of This).

Who am i to disagree.

 cited from "Eurythmics, Annie Lennox, Dave Stewart - Sweet Dreams (Are Made Of This) (Official Video)". smiley

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
But perhaps the libs have changed.
I just tried a couple and got broadly similar answers for cycles in the simulator. I was a bit surprised that megas (MUL) actually return almost identical numbers to tiny.

 

The core code (itoa leads to ultoa_ncheck) is here:

 

http://svn.savannah.gnu.org/view...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

but that code use 32bit , there should be a 16bit version !?

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For the sake of the argument,

let us say that the Assembly Language is like a motorcycle

and other programming languages are like a car.

Now, if you are on a motorcycle you know that drivers may fail to see motorcycles in plain sight.

 

You may feel you want to ride a motorcycle or not,

but for me being on a motorcycle there is no point in arguing with someone who may fail to see me.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have a 4-byte(32bit unsigned) version of about 170clk; hope I'll finish bugs and testing in the next days.

Is there realy a need for numbers larger than 256^4-1 = 4_294_967_295 ?

 

This post is about speed and anyone may question the need for speed.

 

As I've already already said,

nobody's compelled to do caving discovery or mind entertainment exercises.

Pages