## Fast conversion of Integer to BCD; assembly atmega328p.

141 posts / 0 new
Author
Message

Latest posts:

AVG  164.08  Cycles and 146 words length

AVG  102.45  Cycles and   92 words length

AVG    47.86  Cycles and   43 words length

Divide by 10 is time consuming.

I'm using divide by 1000. That is a divide by 4,

followed by a divide by 250. (a)

Divide by 250 starts with a divide by 256.

The Quotient on divide by 256 is multiplied by 6; (256-250)=6.

Next, we add the division remainder (a), and divide again; same way.

The main.asm attachment is a  AVG=146 Clks Three Bytes conversion to BCD.  <-- this was first posted

Long program, but may be shortened (with lose of speed).

Four bytes(or many) can be  similarly converted, but I will not upload untested cumbersome work.

Added Two_Bytes_BCD.asm, better comments, and hope a better way to understand the algorithm; AVG=72.3  Cycles

Added Two_Bytes_BCD_v2.asm, 52 to 66clk as MACRO and not counted INPUT; AVG. from 0 to 65535  =  54.53  Cycles

Added Two_Bytes_BCD_v3.asm           tested     min 49 to max 52  clks                                              AVG=49.86  Cycles

Added Two_Bytes_BCD_v4.asm                                tested         min   47 to max  50  clks         AVG=  48  Cycles

Added Three_Bytes_To_Unpacked-BCD_V2.asm       tested         min 110 to max 135  clks       AVG=119  Cycles;       135 Words(cseg)

Added Three_Bytes_To_Unpacked-BCD_V4.asm       tested         min 121 to max 127  clks       AVG=122.5  Cycles;    110 Words(cseg)

Added Three_Bytes_To_Unpacked-BCD_V5.asm    tested         min 102 to max 105  clks       AVG=102.5  Cycles;      92 Words(cseg) <-- last modified on 02/24/21

Added Four_Bytes_To_Unpacked-BCD_V1.asm      tested         min 164 to max 167  clks       AVG=164.08  Cycles;  146 Words(cseg)

## Attachment(s):

This topic has a solution.
Last Edited: Thu. Feb 25, 2021 - 07:33 AM
Total votes: 0

We have been there before (but not with 3 byte).

Do you have how many clk. it takes (fastest, slowest, AVG).

On AVR's with a HW mul , the fastest way is some form for mul with 1/x.

Somewhere in the forum there is code with worst case at something like 45clk (for 16bit numbers).

I also made a 16 bit version, but it's 68 clk worst case.  (it's also here somewhere)

And that worked with :

hack to find 1. digit (so 0..9999 is left, which often is all you need for 4 digit display)

div with 100

split the result and remainder to digits.

Add make sure that you can compete with the simple code that:

sub with 10000000 until you can't do it any more

then do the same with 1000000

then 100000

10000

...

...

With 24 bit I guess you can avoid first loop and just compare

It can be done a tad faster if you sub until number negative

for next digit you ADD until it becomes positive

If worst case matter then this is better: ()

sub with 4000 until negative (and add 4 )

then ADD 1000 until positive

the AVG is slower but worst case is better.

Last Edited: Sat. Jan 30, 2021 - 12:42 PM
Total votes: 0

I don't know how to measure worst case.

Think I'm under 160clk.

Is there a way to find the worst case with Atmel Studio?

Also, are you interested on average for numbers above two bytes?

You can convert a two bytes with another program, like your version.

So, fastest above 65535?

My atmega328p works on 16MHz.

With three bytes you can display your real frequency, in real time.

Three bytes maximum is 16.777215 M .

I'm also displaying the temperature above the Quartz, using the same uC.

The measuring is done using an UBLOX GPS signal.

Total votes: 0

With three bytes you can display your real frequency, in real time.

Well that goes without saying...if you run at 20MHz, and assume real time is 10 updates/sec...you could use roughly 20e6/10=  2 million cycles per display update (for everything)...don't think 160 clk or even 1600 clk  is gonna be too bad!

here is a source for a lot of good general ideas:

http://www.piclist.com/techref/m...

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sat. Jan 30, 2021 - 03:48 PM
Total votes: 0

I am counting my own uC clocks using Timer/Counter1 Capture Event,

250 times per second, on an Event raised by a GPS receiver.

In the same time I have a TWI connection with a thermometer,

and a 2Mbit full duplex UART connection ... and this is just the begining.

I have indeed a "too fast" conversion for hex to ASCII,

but the relative large length of my "routine" is not a problem;

and that is because I'm not using libraries.

Total votes: 0

and this is just the begining.

Soon it will also play three concurrent games of chess, while controlling a pizza oven.

and that is because I'm not using libraries.

Doing it yourself can be customized to your needs, and without fear of library fines.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Total votes: 0

I'm thinking on storing  frequencies on a temperature base,

that is on TWI also. Some kind of auto-calibration.

Thank You.

Total votes: 0

?

5 numbers for a polynomial should cover all you ever will need, so 10 (perhaps 15) bytes of eeprom should cover. (remember the chip age so the internal clk will change over time, so a 0.1% calibration on a new chip is a overkill.)

Add: a crystal also age it's just have a much smaller drift.

Last Edited: Sat. Jan 30, 2021 - 06:12 PM
Total votes: 0

I meant only on crystal. As far as I've seen the large drift is on temperature.

I guess there's no problem to "link" your device on UBLOX only once a year,

to corect your freq. on temp. table, due to the aging of the Quartz.

adt7422 is sensitive enough for an 0.1 degree step, so I'll have 40 bytes/degree;

meaning temerature + frequency. I would prefer to store this outside uC.

Total votes: 0

Buy an external osc, a \$2 one would be better that any crystal with the AVR oscillator (the two caps also provide to an error).

(On some external osc. there are a pin you can control the speed with a DAC (only +-2kHz or so)

And how accurate do you expect the clk to be ?  and do you just need to know the error or do you active want to change the speed.

On some old VHF radios the crystal was in a box heated up to 70deg controlled by a ntc/ptc resistor.

Last Edited: Sat. Jan 30, 2021 - 07:29 PM
Total votes: 0

The code is a bit hard to follow.  The only register names I see being used are ZERO and SIX.  Naming more registers like REMAINDER would make the code easier to follow.

It's also hard to follow because it is not simplified.  When I see code that takes more instructions to do something than the simple way, I then have to figure out if there is a reason for doing it the complicated way, or if the writer just didn't know the simple way.

For example, your code takes 6 instructions to convert a byte in the range of 0 to 16 from binary to BCD.  It only takes 3 instructions to do that:

cpi r18, 10

brlo 1f

subi r18, lo8(-6)

1:

I have no special talents.  I am only passionately curious. - Albert Einstein

Total votes: 0

Naming more registers like REMAINDER would make the code easier to follow.

It also GREATLY reduces the risk of an error ...much too easy to mix up  r12, r13,r2, r23, r21, at least just glancing around the code.   On the other hand, maybe no names forces you to work very extra carefully to not make a mistake.

When we used pencil cards, we had to work hours to submit 30 lines of code....so there was a LOT of care used to make sure it was correct the first time...no second chance.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Total votes: 0

My code:

clr r18        ; max. 16 byte conversion to BCD
cpi r16,10
brlo PC+3
subi r16,10     ldi r18,\$10

or r16,r18

First instruction should be erased; my mistake.

Your code is faster. I was just copying and paste (from my code above); not thinking.

Thank you.

I know my code is hard to follow, because I had to debug it. As I've already said, it is cumbersome. I'm sorry.

There are very many remainders, so I've tried to give a general ideea on what the algorithm does.

If you'll let me know what you did understood, it might be a good start.

Total votes: 0

I don't think the Quartz should be accurate; and I do have an external oscilator(not in use for the moment).

Very good advice. Thank You.

I change the Counter value in the time of counting, after the reading, but not on this project.

So, as long as I know my Fclk is 15999432 on 22.2 degrees, this should be "reasonably fine".

Few years ago I've made a clock. Very accurate; just 1-2 seconds on a three month basis.

But it was working so fine just because the temperature in my kitchen was very stable. :-)

Total votes: 0

1. I've tested my code from 0(one) to 256^3-1, and it is working.

2. As you probably suspect, I'm working on interrupts, I do know what confusing a register might do to you.

3. I've worked with perforated cards and perforated tapes;

I've seen pencil cards but they didn't let us run the program;

they (the theachers) were doing only a visual inspection of the program.

4. If you want to see a program REALY hard to understand, please see att.

Again, I'm sorry for the inconvenience.

## Attachment(s):

Total votes: 0

Your code really needs a comment block header explaining what is going on. I'm still struggling to read through and understand it after several passes.

Anyway this topic has; since the dawn of time; been a rite-of-passage that every programmer must undertake. You might find this "reference thread" interesting. https://www.avrfreaks.net/forum/binary-decimal-conversion-reference-all

Total votes: 0

I'm working on a two byte version, hoping that it would be easier to understand the algorithm.

I do believe that is the algorithm that causes you difficulties.

It's similar to making the division by hand, with a pencil.

int(32735/256)=127    mod(32735/256)=223  ; you don't have to do this because it's just BYTE1 and BYTE2

on the other hand we are looking for:

int(32735/250)=130    mod(32735/250)=235

Based on 256=250+6: this means that 127*6+223=985

now int(985/256)=3  mod(985,256)=217   ; we are kiping score on the quotient!

and again 3*6+217=235,   BUT THIS TIME the rest is smaller than 250.

So the result will be 127+3=130 and the rest is 235

Does this make sense?

Total votes: 0

I totally forgot this way to do it :

https://www.avrfreaks.net/forum/...

and the last version can be 24bit without code changes. (the code use that registers are memory mapped on org AVR's, so with more digits the start needs to be lower than 20)

the 16 bit take 400clk and the general should be about 1200 clk.

But the last code is 27 instructions regardless of the size even if the number are 32 bit.

Total votes: 0

I was trying to avoid the usage of registers above R16. I need those for other purposes.

Also my intention was to use as few registers as possible.

PUSH and POP  on interrupts costs 4 cycles, but you have to do LDS and STS also.

So 8clk. is a price hard to bear when you are dealing with more than 250 ints./second.

Also, I was trying to find something ready to fit on my purpose, and I didn't.

When I have no luck, I'm getting to work on my own.

It was my intention to share; may be others will have more luck than me. ;-)

Last Edited: Sat. Jan 30, 2021 - 11:59 PM
Total votes: 0

I have not checked your code but from this comment

`; max. 99 byte conversion to BCD`

and the next 17 instructions do that.

this can be done faster and smaller

if you need to do this more that once then have a register that hold 10 all the time and one that hold 26

the algorithm use 26/256 is close to 1/10

if number is less than 64 then high byte of mul hold the correct 10th

else dec number before mul.

then last digit is: number - 10*the 10th number

(this code is just fast from my head, but about correct ;) )

```; nummer in r16 only 0..99 are legal
; 10 i r21
; 1  i r20

ldi r17,26
mov r20,r16
sbrc r18,6
dec r20
mul r20,r17
mov r21,r1
ldi r17,10
mul r21,r17
sub r20,r0
```

is wrong correct code in #43

Last Edited: Sun. Jan 31, 2021 - 08:49 PM
Total votes: 0

I was trying to avoid the usage of registers above R16. I need those for other purposes.

the link showed  comparing immediate to 5  & 7 (high registers)---you can just load those into some registers before the entire code & have them ready when needed (low resgisters)

ldi temp, 5

mov myfive, temp

------------------------------------

so instead of: cpi r18, 5

brlo L3

you do:   cp r7,  myfive

brlo L3

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sun. Jan 31, 2021 - 02:38 AM
Total votes: 0

Catalin Ioan Stanciu wrote:

If you'll let me know what you did understood, it might be a good start.

I understood the divide by 4 (2 shift right), and the code to convert the last byte to BCD.  I read the macro that I think is supposed to convert 1 byte to 2 BCD digits, but could not make sense of it.

When I write asm, I use GNU as format.  Most of my asm code is used by C/C++ code, so GNU as format is the only practical option in that case.

I'd like to second the suggestion of repeated subtraction.  The code will work on AVRs without a mul, it's still relatively fast, and it will give you the smallest code for converting a 16-bit or 8-bit value.

Here's a version I wrote last year:

https://github.com/nerdralph/deb...

I have no special talents.  I am only passionately curious. - Albert Einstein

Total votes: 0

I've promised to upload a two byte conversion, for the purpose of simplicity on understanding the algorithm; see att. .

This time I've been commenting a lot, hope not too much.

This version is also tested(0 to 65535).

I'm just guessing that divideing by 1000 is less eficient on two bytes than on three bytes.

Anyway on my testings I never saw more than 72clks used; so I'm assuming less than 80.

Dividing by 4 first, and with 250 later is advantageous, because the 250 division will be done on a smaller number.

This time there is no MACRO, and the code is less obfuscated - I hope.

On the contrary I think MUL is a very powerful tool, even I can't argue for the moment; because of the algorithm I use.

I will end this post, and look on the link afterwords; hope you don't mind.

Thank you.

## Attachment(s):

Total votes: 0

That is correct.

Sometimes I'm comparring (for example) with 150, 75 and 25, and later on with 60, 30 and 25.

So, many rereads on the written code will reveal gains you can make; or errors.

Also, the more people are looking to your code, the better.

Thank you.

Total votes: 0

So what's the minimum for converting \$63 to two decimal digits, on atmega328p?

This code is 12.5clk AVG, min. 9clk and max. 15clk. Not bad at all, on first writing. :-)

number

between     Cycles

0 and 9        9

10 and 19   13

20 and 29   14

30 and 39   10

40 and 49   14

50 and 59   15

60 and 69   10

70 and 79   14

80 and 89   15

90 and 99   11

Total votes: 0

The code I wrote in #20 always take 11 clk as it is.

With constant registers for 10 and 26 then it takes 9 clk (and the 2 for init)

Total votes: 0

Mul takes two Cycles:

"MUL Rd, Rr Multiply unsigned R1:R0  Rd  Rr Z,C  2"

from 7810D–AVR–01/15  ATmega328P [DATASHEET] pag. 281

so total is 9+2=11

if i'll do the initialization will be 13; and I will lose the benefit of not using two more registers.

I am very, very sorry; in this case I'll chose AVG 12.5 is better.

Don't know if you looked at the conversion of the two Bytes to BCD, att. at #23.

I've measured AVG=72.3 , from 256 to 65535; that is except the One Byte conversion.

Total votes: 0

Given how cheap on-board flash is these days, don't you just do it with a LUT? 1 Byte to 3 BCD is only 512 bytes, or 384 with a small amount of smarts.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

Total votes: 0

"Just saw I was wrong, and you are right. Please excuse.

Your code is indeed 11 cycles and I have to reconsider my code.

Thank you."  - my original comment.

Now, that I've tested the code in Question,

I can say that It Is Wrong Code.

It just seems to be good.

So this code:

```; nummer in r16 only 0..99 are legal
; 10 i r21
; 1  i r20

ldi r17,26
mov r20,r16
sbrc r18,6
dec r20
mul r20,r17
mov r21,r1
ldi r17,10
mul r21,r17
sub r20,r0```

Gives this conversion:

 Hex. Dec. Conversion Hex. Dec. Conversion Hex. Dec. Conversion Hex. Dec. Conversion 00 0 0000 19 25 0205 32 50 0500 4B 75 0705 01 1 0001 1A 26 0206 33 51 0501 4C 76 0706 02 2 0002 1B 27 0207 34 52 0502 4D 77 0707 03 3 0003 1C 28 0208 35 53 0503 4E 78 0708 04 4 0004 1D 29 0209 36 54 0504 4F 79 08FF 05 5 0005 1E 30 0300 37 55 0505 50 80 0800 06 6 0006 1F 31 0301 38 56 0506 51 81 0801 07 7 0007 20 32 0302 39 57 0507 52 82 0802 08 8 0008 21 33 0303 3A 58 0508 53 83 0803 09 9 0009 22 34 0304 3B 59 0509 54 84 0804 0A 10 0100 23 35 0305 3C 60 0600 55 85 0805 0B 11 0101 24 36 0306 3D 61 0601 56 86 0806 0C 12 0102 25 37 0307 3E 62 0602 57 87 0807 0D 13 0103 26 38 0308 3F 63 0603 58 88 0808 0E 14 0104 27 39 0309 40 64 0604 59 89 09FF 0F 15 0105 28 40 0400 41 65 0605 5A 90 0900 10 16 0106 29 41 0401 42 66 0606 5B 91 0901 11 17 0107 2A 42 0402 43 67 0607 5C 92 0902 12 18 0108 2B 43 0403 44 68 0608 5D 93 0903 13 19 0109 2C 44 0404 45 69 07FF 5E 94 0904 14 20 0200 2D 45 0405 46 70 0700 5F 95 0905 15 21 0201 2E 46 0406 47 71 0701 60 96 0906 16 22 0202 2F 47 0407 48 72 0702 61 97 0907 17 23 0203 30 48 0408 49 73 0703 62 98 0908 18 24 0204 31 49 0409 4A 74 0704 63 99 0AFF
Last Edited: Sun. Jan 31, 2021 - 10:15 AM
Total votes: 0

Is this just a "fun" exercise to see how fast you can do it?  What if it took 123 clocks to convert?--if run at 20MHz, that is only 6 microseconds ...doubt you will have time to complain it is too slow.  If you don't have 6us to spare every 100 ms or so to update your display, there is other trouble brewing.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Total votes: 0

It's your code I just give input.

I found my old code, but it's 12 years old and today I could do better, and as you see I use a my old code to split 0..99

https://www.avrfreaks.net/commen...

But as I said there are some code here in the forum that do the 16 bit conversion in less than 50 clk (worst case).

It is based on a more precise mul with 1/x so the next digits pop up from the remainder just by mul with 10

(I do it a fast way to get the correct result, but my remainder is useless, so I have to sub 10*result)

In general don't optimize to much of the cost of how you use the routine, you can easily waist the saved by having to store registers, do remapping etc.

(like my code leave a number 0..9 for each register not \$30..\$39 as your print routine want, and if left side digits are 0 they should be \$20).

Total votes: 0

Thank You.

Total votes: 0

I just reread the thread and in #5 I found :

250 times per second, on an Event raised by a GPS receiver.

Don't your GPS give you the time? (so you don't need a precise crystal)

(else get a GPS with a 1 pps clk output)

(or check how precise the time for first char in a position is, there could be jitter but no drift)

and said in #30 why can't it take a bit longer. If you run 16MHz and do this 250 times a sec (about 25 times than needed!) and the routine take 200 clk (with print), it's still only 0.3% of the time.

Total votes: 0

Given how cheap on-board flash is these days, don't you just do it with a LUT? 1 Byte to 3 BCD is only 512 bytes, or 384 with a small amount of smarts.

If the most extreme speed is needed, and your AVR has spare flash space going to waste, you may include the BCD to 7 segment conversion in the LUT...otherwise that will take as many (or more) cycles to figure out which segments to light from the BCD.   What is the end goal, is this some sort of contest?

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sun. Jan 31, 2021 - 10:57 AM
Total votes: 0

I am using 0.5s or 1 second update, and I'm not using the "oscilloscope" type display; so I don't know.

I've done some programs with displaying on Hitachi HD44780; this uses 4 lines of data, and 2 lines for control.

Know I'm using only UART or TWI. I haven't got issues  with these, but I'm using short Interrupt Routines (~60Cycles).

I have two adapters TWI to Hitachi HD44780, not used.

I'm using "USART Rx Complete" Interrupt, but on the transmit side I use:

.MACRO SChar2UART
lds        @1,UCSR0A
sbrs    @1,UDRE0    ; Wait for empty transmit buffer
rjmp    PC-3
sts        UDR0,@0        ; Send CHAR
.ENDMACRO

.MACRO SendUART2ASC
mov r16,@0
swap r16
ASC r16
SChar2UART r16,r17
mov r16,@0
ASC r16
SChar2UART r16,r17
.ENDMACRO

Total votes: 0

I am using 0.5s or 1 second update, and I'm not using the "oscilloscope" type display; so I don't know.

So why worry whether it takes 10us or 10ms to do the conversion?   The conversion should have no effect at all on your interrupts...the IRQs will just interrupt your conversion whenever they need to, even if the conversion took 5000 cycles.

As you convert each digit, it will be sent into the TX buffer along with whatever other messages and tidbits your are sending out to be displayed.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Total votes: 1

I. GPS

II. I'm using Counter1 To count the pulses on the signal received from UBLOX.

can't count more than 65535 on a two bytes counter (or it's difficult)

so I've chose 250x64000; that is on 64000 pulses from U-BLOX I have INT

and based on the GPS precission, I have a very good aprox of what I'said:

250 INTs/second. I'm using only 60 Cycles on every INT, generated by the UBLOX,

but I also have the USART Rx Complete INT, and the TWI transmission.

III. I've said "too fast":

I am counting my own uC clocks using Timer/Counter1 Capture Event,

250 times per second, on an Event raised by a GPS receiver.

In the same time I have a TWI connection with a thermometer,

and a 2Mbit full duplex UART connection ... and this is just the begining.

I have indeed a "too fast" conversion for hex to ASCII,

but the relative large length of my "routine" is not a problem;

and that is because I'm not using libraries.

Added on 03/03/2021:

Someone found something rather useful here, so att. are present-day my file sources.

## Attachment(s):

Total votes: 0

I'm not  worrying " So why worry whether it takes 10us or 10ms to do the conversion?"

I've said "Fast conversion of Integer to BCD; assembly atmega328p",

and I'm am open to make it faster, if suggested.

on #19 I said " It was my intention to share", that is all.

Total votes: 0

was wrong correct code in #43

Last Edited: Sun. Jan 31, 2021 - 08:48 PM
Total votes: 0

wrong:

 Hex. Dec. Conversion Hex. Dec. Conversion Hex. Dec. Conversion Hex. Dec. Conversion 00 0 0000 19 25 0205 32 50 0500 4B 75 0705 01 1 0001 1A 26 0206 33 51 0501 4C 76 0706 02 2 0002 1B 27 0207 34 52 0502 4D 77 0707 03 3 0003 1C 28 0208 35 53 0503 4E 78 0708 04 4 0004 1D 29 0209 36 54 0504 4F 79 08FF 05 5 0005 1E 30 0300 37 55 0505 50 80 0800 06 6 0006 1F 31 0301 38 56 0506 51 81 0801 07 7 0007 20 32 0302 39 57 0507 52 82 0802 08 8 0008 21 33 0303 3A 58 0508 53 83 0803 09 9 0009 22 34 0304 3B 59 0509 54 84 0804 0A 10 0100 23 35 0305 3C 60 0600 55 85 0805 0B 11 0101 24 36 0306 3D 61 0601 56 86 0806 0C 12 0102 25 37 0307 3E 62 0602 57 87 0807 0D 13 0103 26 38 0308 3F 63 0603 58 88 0808 0E 14 0104 27 39 0309 40 64 0604 59 89 09FF 0F 15 0105 28 40 0400 41 65 0605 5A 90 0900 10 16 0106 29 41 0401 42 66 0606 5B 91 0901 11 17 0107 2A 42 0402 43 67 0607 5C 92 0902 12 18 0108 2B 43 0403 44 68 0608 5D 93 0903 13 19 0109 2C 44 0404 45 69 07FF 5E 94 0904 14 20 0200 2D 45 0405 46 70 0700 5F 95 0905 15 21 0201 2E 46 0406 47 71 0701 60 96 0906 16 22 0202 2F 47 0407 48 72 0702 61 97 0907 17 23 0203 30 48 0408 49 73 0703 62 98 0908 18 24 0204 31 49 0409 4A 74 0704 63 99 0AFF
Total votes: 0

ok I have not time to look at it now, and can't get to the code I have tested (and I remember it was I clk faster).

this is what I did yesterday in excel: 1. is with x (used 0..63) and the other is with x-1 (64..99)

 0 0 0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 10 1 0 11 1 1 12 1 1 13 1 1 14 1 1 15 1 1 16 1 1 17 1 1 18 1 1 19 1 1 20 2 1 21 2 2 22 2 2 23 2 2 24 2 2 25 2 2 26 2 2 27 2 2 28 2 2 29 2 2 30 3 2 31 3 3 32 3 3 33 3 3 34 3 3 35 3 3 36 3 3 37 3 3 38 3 3 39 3 3 40 4 3 41 4 4 42 4 4 43 4 4 44 4 4 45 4 4 46 4 4 47 4 4 48 4 4 49 4 4 50 5 4 51 5 5 52 5 5 53 5 5 54 5 5 55 5 5 56 5 5 57 5 5 58 5 5 59 5 5 60 6 5 61 6 6 62 6 6 63 6 6 64 6 6 65 6 6 66 6 6 67 6 6 68 6 6 69 7 6 70 7 7 71 7 7 72 7 7 73 7 7 74 7 7 75 7 7 76 7 7 77 7 7 78 7 7 79 8 7 80 8 8 81 8 8 82 8 8 83 8 8 84 8 8 85 8 8 86 8 8 87 8 8 88 8 8 89 9 8 90 9 9 91 9 9 92 9 9 93 9 9 94 9 9 95 9 9 96 9 9 97 9 9 98 9 9 99 10 9

Total votes: 0

Ups also wrong

Add

The error is that the last sub is from the number there is sub'ed with 1 (to save a move), and there is a way around it but I can't see it now, (I have to run now but will be back)

Last Edited: Sun. Jan 31, 2021 - 12:39 PM
Total votes: 0

Ok now it should work I hope:

; nummer in r16  [0..99]
; 10 in r21
; 1  in r20

; if used more then once load 10 and 26 to two registers

ldi r17,26      ; load value for mul with 1/10 (26/256)
mov r20,r16  ;make a copy to the remainder place
sbrc r16,6      ;if more than 64
dec r16          ; use one less
mul r16,r17    ; do the 1/10 mul
mov r21,r1     ; high result in high byte of mul
ldi r17,10       ; load value for mul with 10
mul r1,r17
sub r20,r0      ;low digit is the remainder

Total votes: 0

sparrow2 wrote:

Ok now it should work I hope:

; nummer in r16  [0..99]
; 10 in r21
; 1  in r20

; if used more then once load 10 and 26 to two registers

ldi r17,26      ; load value for mul with 1/10 (26/256)
mov r20,r16  ;make a copy to the remainder place
sbrc r16,6      ;if more than 64
dec r16          ; use one less
mul r16,r17    ; do the 1/10 mul
mov r21,r1     ; high result in high byte of mul
ldi r17,10       ; load value for mul with 10
mul r1,r17
sub r20,r0      ;low digit is the remainder

You clobber r20 with the "mov r20, r16" instruction, so why does r20 need to be initialized with the value 1?

I have no special talents.  I am only passionately curious. - Albert Einstein

Total votes: 0

?

1 is low digit of result

10 is high digit of result

Total votes: 0

maybe say:

r21:r20 will contain the 10's:1's  digit conversion result

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Total votes: 0

Yes, this is a freak piece of code. It works.

This:

; nummer in r16  [0..99]
; 10 in r21
; 1  in r20

; if used more then once load 10 and 26 to two registers

ldi r17,26      ; load value for mul with 1/10 (26/256)
mov r20,r16  ;make a copy to the remainder place
sbrc r16,6      ;if more than 64
dec r16          ; use one less
mul r16,r17    ; do the 1/10 mul
mov r21,r1     ; high result in high byte of mul
ldi r17,10       ; load value for mul with 10
mul r1,r17
sub r20,r0      ;low digit is the remainder

gives these results:

 Hex. Dec. Conversion Hex. Dec. Conversion Hex. Dec. Conversion Hex. Dec. Conversion 00 0 00:00 19 25 02:05 32 50 05:00 4B 75 07:05 01 1 00:01 1A 26 02:06 33 51 05:01 4C 76 07:06 02 2 00:02 1B 27 02:07 34 52 05:02 4D 77 07:07 03 3 00:03 1C 28 02:08 35 53 05:03 4E 78 07:08 04 4 00:04 1D 29 02:09 36 54 05:04 4F 79 07:09 05 5 00:05 1E 30 03:00 37 55 05:05 50 80 08:00 06 6 00:06 1F 31 03:01 38 56 05:06 51 81 08:01 07 7 00:07 20 32 03:02 39 57 05:07 52 82 08:02 08 8 00:08 21 33 03:03 3A 58 05:08 53 83 08:03 09 9 00:09 22 34 03:04 3B 59 05:09 54 84 08:04 0A 10 01:00 23 35 03:05 3C 60 06:00 55 85 08:05 0B 11 01:01 24 36 03:06 3D 61 06:01 56 86 08:06 0C 12 01:02 25 37 03:07 3E 62 06:02 57 87 08:07 0D 13 01:03 26 38 03:08 3F 63 06:03 58 88 08:08 0E 14 01:04 27 39 03:09 40 64 06:04 59 89 08:09 0F 15 01:05 28 40 04:00 41 65 06:05 5A 90 09:00 10 16 01:06 29 41 04:01 42 66 06:06 5B 91 09:01 11 17 01:07 2A 42 04:02 43 67 06:07 5C 92 09:02 12 18 01:08 2B 43 04:03 44 68 06:08 5D 93 09:03 13 19 01:09 2C 44 04:04 45 69 06:09 5E 94 09:04 14 20 02:00 2D 45 04:05 46 70 07:00 5F 95 09:05 15 21 02:01 2E 46 04:06 47 71 07:01 60 96 09:06 16 22 02:02 2F 47 04:07 48 72 07:02 61 97 09:07 17 23 02:03 30 48 04:08 49 73 07:03 62 98 09:08 18 24 02:04 31 49 04:09 4A 74 07:04 63 99 09:09
Last Edited: Mon. Feb 1, 2021 - 06:33 AM
Total votes: 0

That is pretty wicked!!  Note for clarity

 4B 75 0705

note--should be shown as:  4B   75  07:05   , since it is 2 results, much different than the value seven hundred five.

Now can you make one to do 16 bits?   result from 0 to 65535

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sun. Jan 31, 2021 - 05:55 PM
Total votes: 0

Slim chances but, some things can't get out of my mind.

Total votes: 0

Slim chances but, some things can't get out of my mind.

A 16 bit to 5 digit, would prove that you can do it greatly, like its never been done before---are you going to take the challenge? Maybe it requires a lot of coffee

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Total votes: 0

The way OP do it he will never get under 50 clk. He will get close to my code around 70clk.

(I can never find the code that do it in less than 50 clk someone help! ).

And I think to get under 40 clk you will need one or more LUTs .

Total votes: 0

Can the double dabble method be implemented faster?   How many cycles for 16bits-->5 digits?

https://en.wikipedia.org/wiki/Do...

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Total votes: 0

Can the double dabble method be implemented faster?   How many cycles for 16bits-->5 digits?

https://en.wikipedia.org/wiki/Do...

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Total votes: 0

Recursivity. Indeed, requires coffee.

Total votes: 0

read #18

Total votes: 0

When you add 6 to     9 you get   15.

When you add 6 to 249 you get 255.

249 is "F" for base 250', and 255 is "F" for 256, 9 is "F" for 10.

If there is a "Double dabble" between 16 and 10,

is there one between 16 and 250?

You can shift 8 bits very easy, and from 250 to 1000,

that's my point.

10^0-1 =   9
16^0-1 =  15 (F)

250^0-1 = 249
256^0-1 = 255

15-6 =     9
255-6 = 249

Last Edited: Wed. Feb 3, 2021 - 02:18 PM
Total votes: 0

Just info, I have never found a good way to use it but:

since 10 is even, you can shift once down and just remember if the number is odd or even.

Then do everything with 5 instead of 10.

if number was odd add one to 1's (will never overflow).

Total votes: 0

I'm already dividing first by 4, for these 4 reasons:

- 4*250=1000

- you can get the hundreeds digit from one byte:

2^10=1024 so, you need 10 bits for 999,

but the last two bits have no influence on the hundreeds

Shifting back 2 bits you get tens and units

- the number to convert is 2bits shorter so,

you don't have to worry about carry

- the number to convert is lower

As you said 10 divides by 2; 1000 divides by 8,

but 8 doesn't help 'couse of the 8 bit register.

Last Edited: Mon. Feb 1, 2021 - 04:18 PM
Total votes: 0

sparrow2 wrote:

The way OP do it he will never get under 50 clk. He will get close to my code around 70clk.

(I can never find the code that do it in less than 50 clk someone help! ).

And I think to get under 40 clk you will need one or more LUTs .

Possibly:

1. Divide by 100, and use the above code to convert the 2-digit remainder to BCD.

2. Repeat step 1.

3. convert 0-16 to BCD in 3 cycles

I have no special talents.  I am only passionately curious. - Albert Einstein

Total votes: 0

how do you do 3. in 3 clk? (easy to do in 5 clk)

add:

And why [0..16] and not [0..6]

Last Edited: Tue. Feb 2, 2021 - 08:32 AM
Total votes: 0

If I take your code, use it(as it is and as model of thinking), and combined with mine(4x250),

can be arround 60 clk, 16 bits, result 0 to 06:55:35 BCD packed.

I take this as a first target.

Total votes: 0

It's easier, for me, to divide by 1000, than by 100;

and I have purified my code.

Already have a first "draft", and seems to work;

still have to do the testing, and the comments, but I feel I have to sleep. Now.

Total votes: 0

Hi Catalin,

I am impressed with your somewhat unusual approach.

On an ATmega328P I had all 2^16 numbers converted ​​(without errors ). The timing envelope per scope is as follows:

At 16 MHz, the average conversion time is 5 µsec, which means that 80 CPU cycles average per conversion can be concluded, including RCALL, RET and I/O instructions.

While all numbers ​​are being run through, a strong jitter between approx. 4.5 µsec and 5.7 µsec can be observed, which is caused by the many queries.

I recently implemented the well-known Douglas W. Jones method using my structured AVR assembler s’AVR for calculating large factorials, see http://www.led-treiber.de/html/f...

Converting all 2^16 numbers ​​using this Bin2Dec algorithm results in the 2nd timing envelope (also @ 16MHz):

The conversion time for almost all numbers is constantly 6 µsec, which means 96 CPU cycles (including RCALL, RET and I/O instructions).

There are only very few numbers (namely 185, all >= 49146) that require up to 6.8 µsec pure conversion time.

In case you don't mind: I've rewritten your assembler program for my s’AVR precompiler (and kept the replaced instructions as comments):

```;
; Two_Bytes_To_BCD.asm
;
; Created: 1/30/2021 6:43:29 PM
; Author : cata
;
; structured AVR assembler by s'AVR

.cseg
.org 0x0000
jmp reset

.dseg
.org SRAM_START		; 0x0100  (m328pdef.inc)

.cseg
.org INT_VECTORS_SIZE ; 0x34 (m328pdef.inc)

reset:
.DEF SIX=R2

ldi	r16,6
mov	SIX,r16		; SIX=R9=6

ldi r16,low(12735)
ldi r17,high(12735)

; General Ideea: divide R17:R16 by 1000 (rest 999)
; I. Divide by 4
lsr r17
ror r16
ror r3
lsr r17
ror r16
ror r3	; div. by 4 (remainder stored to r6)

; II. Divide by 250 (max. \$3fff)

; R17 is Quotient and R16 is Remainder
;	initially for the 256 division
;	and finally for the 250 division

; How many times we had to ADD SIX,
; based on the 256 Quotient? That is Quotient*SIX
mul r17,SIX  ; R1=Quotient, R0=Remainder (new ones)

; We have new Q and R but, we have to do this here.
; the 4 instr. below do to things
; first  - if Remainder>249, INC Quotient and
;							 add SIX on Remainder
; second - assures No Carry on next "add SIX" on Remainder
; cpi r16,250
; brlo PC+3
IF r16 == #250
add r16,SIX
inc R17
ENDI
; we can't do it before the multiply; 'cause we multiply R17

; the new Quotient can be only 0 or 1; max. 3f*6= 017A
; or r1,r1		; set ZERO flag if r1=0; r1 is new Q
; breq PC+3		; branch if zero
IF r1 == #0
inc R17		; increment Quotient(R1 was one)
add r16,SIX   ; add SIX; because we INC Quotient
ENDI
add r16,r0	  ; we add Remainder on the old one
; brcc PC+7		; but if we have CARRY?
IF C
inc r17		; increment Quotient
; cpi r16,250	; can't simply add SIX    (I)
; brlo PC+3		; and that is because of the overflow
IF r16 >= #250
add r16,SIX
inc r17
ENDI
add r16,SIX	; now we can add SIX; see (I)
ENDI
; and, again Remainder should be lower than 250
; cpi r16,250
; brlo PC+3
IF r16 >= #250
add r16,SIX
inc r17
ENDI
mov r4,r17		; but we need R17(on SUBI) !
; R4 will be the Remainder
; now  R4 = Quotient   on 250 division
; and R16 = Remainder  on 250 division

; we have a 10 bit Remainder (see I. Divide by 4)
; never mind we can manage; see below
clr r17	; hundreds digit for the Remainder
; cpi r16,150
; brlo PC+3
IF r16 >= #150
subi r16,150
mov r17,SIX
ENDI
; cpi r16,75
; brlo PC+3
IF r16 >= #75
subi r16,75
subi r17,-3
ENDI
; cpi r16,25
; brlo PC+7
IF r16 >= #25
subi r16,25
inc r17
; cpi r16,25
; brlo PC+3
EXITIF r16 < #25
subi r16,25
inc r17	; hundreds digit for the Remainder
ENDI
lsl r3		; and ReStore "the lost two bits"
rol r16
lsl r3
rol r16

clr r18		; max. 99 byte conversion to BCD
; cpi r16,60		; if greater than sixty
; brlo PC+3
IF r16 >= #60
mov r18,SIX	; the tens would be at least SIX
subi r16,60   ; and substract 60
ENDI
; cpi r16,30		; if initial number or previous SUB
; brlo PC+3
IF r16 >= #30
subi r16,30
subi r18,-3
ENDI
; cpi r16,10		;  ... and so on
; brlo PC+7
IF r16 >= #10
subi r16,10
inc r18
; cpi r16,10
; brlo PC+3
EXITIF r16 < #10
subi r16,10
inc r18
ENDI
; now you can review the previous
; "hundreds digit for the Remainder"
; because it is the same thing

swap r18		; these are for BCD packing
or r16,r18		; and freeing R18
mov r0,r16		;    and R16

mov r16,r4		; remember we left the Quotient on R4
clr r18		; max. 69, byte conversion to BCD
; cpi r16,30		; this is smaller, because 69<99
; brlo PC+3
IF r16 >= #30
subi r16,30
ldi r18,3
ENDI
; cpi r16,20
; brlo PC+3
IF r16 >= #20
subi r16,20
subi r18,-2
ENDI
; cpi r16,10
; brlo PC+3
IF r16 >= #10
subi r16,10
inc r18
ENDI
swap r16		; again BCD packing
or R17,r16		; Result in R18:R17:R16
mov r16,r0		; used registers: R0,R1,R2,R4,R16,R17,R18
; s'AVR: R3 as well

.UNDEF SIX
; less than 80 cycles on my estimation
LOOP
nop
; rjmp PC-1
ENDL
```

After compiling, of course, the result is the same assembler code as the original one (if I haven't made any mistakes), see attached.

I'm curious what your next version will look like.

Edit: Due to the test loop, the timings include RCALL, RET and I/O (1x each), which is 9 CPU cycles.

## Attachment(s):

Last Edited: Tue. Feb 2, 2021 - 06:14 PM
Total votes: 0

That's very good feedback; I saw that is =80*62.5nS=5uS,

but this ideea about measuring with the Oscilloscope helps me a lot.

Thank You.

On two bytes, as I said, seems to be ~60*62.5=3.75uS at 16MHz, this time.

Feel that will be better at 3 bytes, too.

Now,  think I should stop my mind,

and get to the  A. Studio.

Total votes: 0

Ok I found the fast code (less than 50 clk for 16bit):

https://www.avrfreaks.net/commen...

And since that is faster than my code (less than 70 clk for 16bit), I haven't look at it since, I guess that it can be optimized to around 60 clk, but not lower.

Add:

So perhaps all in a good combo of the different things make a better solution, but if it take more than 50 clk, it needs to be smarter in some other way, (small use less registers).

And in some situations it matter which digit can come out first (start sending first digit before the hole number is found)

Last Edited: Tue. Feb 2, 2021 - 12:45 PM
Total votes: 0

I'll try my best after test and post the self imposed "mixed version".

That is for promoting the idea of "divide by 1000" on a larger than 2 bytes conversion method.

For two bytes "divide by 1000" it will not be otherwise my first aproach. Your method (Excel) is better.

Thank You.

Total votes: 0

Att. tested 0-65535.

; if I counted well it is between 55 to 69 clk with INPUT
;  or 53 to 67 if we do not count INPUT on R17:R16 (AVG 60clk)
; if you don't use MACRO and make a CALL it is 63 to MAX 77 clk
;  I will say it is:    63 clk as MACRO       70 clk as CALL
;
; Average on not counted input, from 0 to 65535    = 53.53 clk
;        ( measured&calculated based on AtmelStudio Counter )
; ( and this supports my "hand counting" -min.53 without INPUT)

Thank You.

PS.

Saw one mistake in comments: it is previous 5 instr.; that is without MUL; counting still good

" mul r17,SIX    ; that is 6 times QB256 + possible RB250 in RB256
; R1 can be only 0 or 1; max. 3f*6= 017A; \$ff shifted right 2 times
lsr r1            ; r1=0 and possible 1 shifted to C
brcc PC+4        ; branch if zero (evident)
inc R17        ; increment Quotient(R1 was one)
add r16,SIX   ; add SIX; because we INC Quotient
brcs PC-2    ; possible one do-it-again on Carry resulted from last instruction
; On previous 6 instructions"

PS2. last version(minor modif. on comments)  only in Att. at first Post ( #1 )

## Attachment(s):

Last Edited: Tue. Feb 2, 2021 - 04:58 PM
Total votes: 0

Hope Wikipedia will mention "divide by 1000" someday, as well.

Total votes: 0

OK, here's the envelope for your # 2:

The fastest conversion was 3.88 µsec @16 MHz or 62 cycles in total, equal to 53 cycles for the conversion, since 9 cycles can be subtracted for RCALL, RET and 1x I/O due to the test loop.

I also ran El Tangas' algorithm for all 2^16 numbers:

Constant 3.63 µsec @16 MHz result in 58 cycles in total or 49 cycles for the conversion, as indicated in the source code.

Edit: Updated correct screenshot.

Last Edited: Tue. Feb 2, 2021 - 05:56 PM
Total votes: 0

Can you give me a more precise direction. Think I'm lost. I'm looking for:

Search found 1289 items

• El Tangas
Total votes: 0

see link given at #65

Total votes: 0

Thanks a lot.

I'll look. Just I can't  pass the fact that this is unpacked, and I Packed the BCD. This is "first look" only and I might be Really wrong.

`unpacked BCD in R24:R23:R22:R21:R20`
Total votes: 0

Yep, it is 5 digits unpacked, like the quite old algorithm by Douglas W. Jones, which I have used for AVR (see above).

Total votes: 0

Is it the rjmp from the start of the Code Segment? The code itself itself is 49 clk; counted by "my hand" and atmel Studio.

.cseg
.org 0x0000
r jmp reset

Also seems like a very good aproach, as in "Merge Sort".

It is shorter and much more "elegant" than mine.

But, I didn't sign to enter a competition.

Simply, I wanted just to share an Idea.

Last Edited: Tue. Feb 2, 2021 - 06:19 PM
Total votes: 0

Packing cost few +steps/cycles; and we seem to be "On the Edge".

Nothing more than the evidence.

Total votes: 0

Yes, it is on the ideea "Your method (Excel) is better. " formulated in #66 .

Never knew who's the author, and may be not originated from by sparrow2. Please excuse.

Excel can't be (or it is not imaginable by me, at this point) used on realy large numbers; three bytes up.

And when I say Excel, I mean VISICALC. This is not I'm rude, it is just a way of speaking.

I realy do not want to ofend anybody, it might be I'm simply to tired.

Below it's how I found the "magic":

ldi r20,164    ; found "164" with Excel, as suggested by:

mul r20,r16    ; https://www.avrfreaks.net/users/...

Last Edited: Tue. Feb 2, 2021 - 06:59 PM
Total votes: 0

sparrow2 wrote:

how do you do 3. in 3 clk? (easy to do in 5 clk)

add:

And why [0..16] and not [0..6]

I got mixed up with the 3-instruction packed BCD 0-16 code I posted earlier in this thread.

So 0-6 is what it should be after dividing by 100 2x.

I have no special talents.  I am only passionately curious. - Albert Einstein

Total votes: 0

sparrow2 wrote:

Ok I found the fast code (less than 50 clk for 16bit):

https://www.avrfreaks.net/commen...

I count exactly 40 instructions and 50 cycles, excluding the instruction.

I have no special talents.  I am only passionately curious. - Albert Einstein

Total votes: 0

It was on the three bytes conversion.

Your code is good:

cpi r18, 10

brlo 1f

subi r18, lo8(-6)

1:

max 256^3-1=16 777 215 ; here we talk about 16

I'm know at 256^2 (only two bytes) so max 65535/1000. gives us max 65.

I'm dividing first by four and next by 250. So I have 65 (think I got it wrong somewere speaking about QB250 on two bytes); it is only one.

This added later: found the "wrong" posted on #1, and corrected (fyi)

The remainder is 535; 999<1024 and fits in 10 bits. Q in base 250, is max. 65 and fits in one byte.

Yes, BCD unpacked needs two. This is not the case on my final result 06:55:35; this is only not to be confused with 655360.

Did I understood the problem?

Like this:

see also #13  Posted : Sat. Jan 30, 2021 - 09:55 PM (Reply to #11)

Last Edited: Wed. Feb 3, 2021 - 11:31 AM
Total votes: 0

This is posted two times. I thoght I can just delete it.

I replied on to that code strating at #74.

I counted 49 Clk. So did Atmel Studio. May be it is a just a slithly diff. in code.

Code has no start rjmp, so it might be interpreted like 51 that if runing like s'AVR did on #63,

but it is not 51. It's plain and clear 49clk.

Last Edited: Tue. Feb 2, 2021 - 10:12 PM
Total votes: 0

I replied on to that code strating at #74.

I counted 49 Clk. So did Atmel Studio. May be it is a just a slithly diff. in code.

Code has no start rjmp, so it might be interpreted like 51, if runing like s'AVR did on #63,

but it is not 51. It's plain and clear 49clk.

Recovered from my email box, here's the mtext I've replied to:

"Constant 3.76 µsec @16 MHz result in 60 cycles in total or 51 cycles

for the conversion, although it should actually be 49 according to the source

code.

I wonder where those 2 cycles are hidden ... "

Last version, of my code,  only in #1.

Last Edited: Tue. Feb 2, 2021 - 10:20 PM
Total votes: 0

Of course, if you really wanted to convert hex to decimal fast, get external hardware.  Given that the point of converting to decimal is for display, and going straight through to seven segment displays, it would not be difficult to wire up two 22V10 GALs to turn 16 bits of hex into four digits decimal on a four-digit seven-segment; and even display  'Err' if it overflows 9999.

You might be able to do it with 16V8s - Think you'll run low on pins, though.  And you have to ping-pong digits - each chip will only display one digit at a time.  A CPLD like the now-gone (and much lamented!) Xilinx XC9572 wouldn't even have that problem.

So you need seventeen pins (two eight-bit ports and a latch signal (or ten - one port and two latches, but that's slower)) and you've got it out of the AVR in four(ish) cycles.  Digit ping-ponging is left as an exercise for your external circuitry.    S.

Total votes: 0

Given that the point of converting to decimal is for display, and going straight through to seven segment displays, it would not be difficult to wire up two 22V10 GALs

I was actually wondering (insanely) how it would look to use a logic simplification for each segment (maybe combing some common terms), rather than a "calculation"  approach

like: digit 2, seg E=  b15*b12*b3 + b7*b3 *(b2 + b5)  ...etc

...would be "interesting" to see the logic minimization.....get out your quine-mcluskey or other methods

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Total votes: 0

In fact I was pretty good with Hexa printed on my Tera Term.

Simply my mother visited and she saw what I was looking at.

She never asked me to do something, after I said "it's Hexa".

She left and probably my mind got screwed somehow.

Total votes: 0

avrcandies wrote:

get out your quine-mcluskey or other methods

I have programs that do that for me.

But a preliminary think says it should be doable.  Have not yet built and tested a solution - if you don't like, your money back.  S.

Total votes: 0

I'll simply make a table. The real chalenge will be sorting the table;

a permutation of BRLO's, in these large GB ram we have today,

works very nicely with multicore technology. Don't you think?

Total votes: 0

Catalin Ioan Stanciu wrote:

"it's Hexa".

For some of my very early AVR designs, I didn't even bother to convert output ASCII into hex.  I just added 0x30 and learned to interpret the punctuation marks that came out when it overran the 0-9 range.   S.

? = F, among others.  S.

Total votes: 0

I've replied in romanian(to my mother), and translated only the HIGH part to you.

The LOW got out twice, unintentionally untranslated and undetected.

Last Edited: Tue. Feb 2, 2021 - 11:37 PM
Total votes: 0

Lost in Translation ...

I have a strange brain which works funny on me.

Now that hex is gone, I felt like I lost something and

I thought "it surely must be speed".

Total votes: 0

Just a month into the year and we are already discussing the classic binary to decimal conversion. This time I was busy and arrived late to the party

I guess I'll just post some links to our previous discussions, even some that have already been mentioned. This way, next time I can just refer to this thread

https://www.avrfreaks.net/forum/optimizing-libc-integer-conversion-routines

https://www.avrfreaks.net/forum/integer-string

https://www.avrfreaks.net/forum/fractional-division

https://www.avrfreaks.net/forum/division-only-bitwise-shifts

https://www.avrfreaks.net/forum/fast-dividing-5-and-10-uint32t-input-data-atmega16a

https://www.avrfreaks.net/forum/avr-assembler-extract-each-3-digit-number-each-register-r23-r24-r25

Total votes: 0

You're most wellcome!

It is very good to have both you and the links here.

And now to the point.

A computer should do what I want, and not me to do
the way someone else likes it.
Multiplication is the way we (humans) do division.
On large numbers we do division by 1000; thousands millions billions trillions etc.
This is one method to do /1000, applied by chance
on conversion from binary Int to BCD.

I hoped I can pass this to others.

PS1. Last version, of my codes,  only in #1.  Please excuse inherent and inadvertent errors.

Sources are tested, but not the comments. Suggested comments are welcome.

PS2. On my first post I intended to go four bytes and up, but I had to go down to two, for explanations.

Last Edited: Wed. Feb 3, 2021 - 11:46 AM
Total votes: 0

Catalin Ioan Stanciu wrote:
49
Catalin Ioan Stanciu wrote:

This is posted two times. I thoght I can just delete it.

I replied on to that code strating at #74.

I counted 49 Clk. So did Atmel Studio. May be it is a just a slithly diff. in code.

I think you are looking at the incomplete version that's missing the clr r1.  If you count the instructions excluding ret in the final version, it's 40 instructions, 10 of which are mul that take 2 cycles, so the total is 50 cycles.

I have no special talents.  I am only passionately curious. - Albert Einstein

Total votes: 0

it's 39+10, on the you reposted from sparrow link,  at. #78

"sparrow2 wrote:

Ok I found the fast code (less than 50 clk for 16bit):

https://www.avrfreaks.net/commen...

I count exactly 40 instructions and 50 cycles, excluding the instruction."

Last Edited: Wed. Feb 3, 2021 - 12:46 PM
Total votes: 0

Catalin Ioan Stanciu wrote:
I count exactly 40 instructions
Just an idea but isn't it a whole heap easier to get the AS7 simulator to count the cycles for you? (no chance of error that way).

Total votes: 0

Sorry, I counted 39+10, on a post(reposted) by ralphd , and he said it's 50.

added later: #78 Posted : Tue. Feb 2, 2021 - 11:19 PM, by ralphd

I already counted with AtmelStudio, and emphasize it, in previous posts.

BTW. Nice to have you here, Sir.

Last Edited: Wed. Feb 3, 2021 - 01:13 PM
Total votes: 0

the clr r1 is only to make the C compiler happy, nothing to do with the algorithm!

Total votes: 0

Thank You.

Total votes: 0

Now when talk about packed and unpacked (and sometimes ASCII), it depends of how the result is used!

It don't make sense to make a fast algorithm if the you need to do a mapping later.

In most cases you don't want packed BCD (yes I know it only take 3 clk to unpack a byte).

The most common way to place the result (16 bit int version) in 5 registers i order so you can read the result via a pointer.

Note Don't work on AVR0 AVR1 etc. sinse they are not compatible with "real" AVR's

Last Edited: Wed. Feb 3, 2021 - 02:29 PM
This reply has been marked as the solution.
Total votes: 0

Catalin Ioan Stanciu wrote:
A computer should do what I want, and not me to do
the way someone else likes it.

Indeed. The beauty of this classic problem is that it can be solved in many interesting ways, that's why I like it. You can see I've been interested in this for almost 20 years:  https://board.flatassembler.net/topic.php?t=3924

For example, your approach:

Catalin Ioan Stanciu wrote:

I'm working on a two byte version, hoping that it would be easier to understand the algorithm.

I do believe that is the algorithm that causes you difficulties.

It's similar to making the division by hand, with a pencil.

int(32735/256)=127    mod(32735/256)=223  ; you don't have to do this because it's just BYTE1 and BYTE2

on the other hand we are looking for:

int(32735/250)=130    mod(32735/250)=235

Based on 256=250+6: this means that 127*6+223=985

now int(985/256)=3  mod(985,256)=217   ; we are kiping score on the quotient!

and again 3*6+217=235,   BUT THIS TIME the rest is smaller than 250.

So the result will be 127+3=130 and the rest is 235

Does this make sense?

Took me a while to figure out, and that is a good thing

32735 = 127*256 + 223 = 127*(250 + 6) + 223 = 127*250 + 127*6 + 223 = 127*250 + 985

and then we divide 985 by 250.

I like it.

Last Edited: Wed. Feb 3, 2021 - 03:46 PM
Total votes: 0

You are the first; you noticed the algorithm. Great!

cited below from my Two Bytes:

";  depending on INPUT. And 19 to 33 clk from start."

that's after /1000

So, I made the Two Bytes /1000 in less than 33clk.

Of course it'll take longer on many bytes; but is a begining.

That's the Ideea.

Thank you.

Last Edited: Wed. Feb 3, 2021 - 04:12 PM
Total votes: 0

The title of this post is:

"Fast conversion of Integer to BCD; assembly atmega328p"

But,  it is just to make an example of a /1000 usage.

I dislike competition but, I joined it to make my point.

You are completely right about packed and unpacked,

and you helped me a lot.

Thank You.

Total votes: 0

I am reopening this post for a small, but important issue.

I am using the word "Remainder" as in a result of Modulo(Integer,1000).

You are using remainder as the decimal part of an integer division.

When I look at https://en.wikipedia.org/wiki/Re..., I might

think that I haven't incorrectly used the word.

Am I wrong?

Last Edited: Fri. Feb 5, 2021 - 06:00 PM
Total votes: 0

Nope remainder is where you for example divide 13 by 3 and get the dividend 4 with a remainder of 1. 3 goes into 13 four times with 1 left over.

It's true that 13/3 can also be seen as 4.33 where the 4 is the integer result and the .333 (which represent the "remainder 1") is the fractional result but it would be wrong to refer to the .33 part of this as the "remainder".

Total votes: 0

So, I just raised the problem to El Tangas:

"so the next digits pop up from the remainder just by mul with 10"

I think that's the decimal part of the result of dividing the

"Two Bytes" Number by 1000.

He multiplies the number by constant 256^3/1000;

that is multiplied by 256^3 and then dividing by 1000; he is making an unambiguous Approximation.

Thus, you have to round the Decimal part on very, very many divisions. It's normal.

The Problem is Remainder is an Integer, not a Fractional Number.

So, I thing here he was unintentionally misleading.

When you multiply by 10 the decimal part of a division it's obvious that the "next digits pop up".

This is not True about Remainder.

Last Edited: Sat. Feb 6, 2021 - 01:00 AM
Total votes: 0

I was trying to understand El Tangas's algorithm, and I encounterd difficulties.

Than it poped my mind that maybe he's not meaning Remainder.

I think that many people are using Remainder as "the decimal part of a fractional number",

but it is not absolutely correct.

Thank you.

Last Edited: Fri. Feb 5, 2021 - 06:06 PM
Total votes: 0

clawson wrote:
it would be wrong to refer to the .33 part of this as the "remainder".

Catalin Ioan Stanciu wrote:

I think that many people are using Remainder as "the decimal part of a fractional number",

but it is not absolutely correct.

That's right, I mean the fractional part, not "remainder" in a strict mathematical sense. Sorry if it caused confusion.

IIRC I used some kind of fixed point format.

Last Edited: Fri. Feb 5, 2021 - 06:18 PM
Total votes: 0

I studied your algo. Though uses aproximation it's avg. just few fractions of Cycle faster than mine.

I tried hard to get one instruction faster on my algo, but in vain.

I guess aproximation works harder on many Bytes. I'll try there. :-)

I'll update all my sources, from time to time, on first post. (#1)

Thanks a lot.

Total votes: 0

Are you now talking about a 16 or 24 bit version , packed / unpacked and what is the number of clocks ?

Then I will see if I can solve at better way :)

For others yes I do this for fun, like a crossword, since we know it can be less than  than 1% of the AVR's time there is no need!

Total votes: 0

I'm hopeless. I'm still at contest; can't abstain.

Still the Two Bytes; were I might have found something; though Not Sure Yet!

If you discount the BCD packing, on my 16 bit, I'm at 49,53 clk AVG. <---WRONG , it is 51.53

That's one instruction, and AVG will fall below 49.

Yes, I now I'll be up on MAX; but as I said: "I can't abstain".

Last Edited: Sat. Feb 6, 2021 - 02:19 PM
Total votes: 0

try this on debug, AS7:

ldi r16,low(01234)

ldi r17,high(01234)

it gives:

and without first zero:

ldi r16,low(1234)

ldi r17,high(1234)

gives:

01234 means Octal(1234) which equals Hex (29C). It seems that I got screwd up by this!

Total votes: 0

These are measurements without counting INPUT and without BCD packing.

I am wrong now, and I was wrong also the first time I did the measurement:

my original version (V2) is         AVG=50.532 Clks   on 0-65535

my actual version (see bellow) AVG=51.98 Clks   on 0-65535

Actual version has this code bellow:

;------

add r16,r0
adc r1,r1
breq PC+11
; r1 can be 1 or 3
dec r1
breq PC+3        ; it means it was 1, but now is 0
inc r17
mov r1,SIX
add r1,SIX    ; now r1 is SIX or TWELVE
inc r17        ; increment Quotient
add r16,r1    ; add SIX or TWELVE
brcc PC+3    ; possible one do-it-again on Carry resulted from last instruction
inc r17
add r16,SIX
; previous 13 instructions: min. 4 and max. 14 clk

;---------------

and this is worse than V2 version:

;----------

lsr r1            ; r1=0 and possible 1 shifted to C
brcc PC+4        ; branch if zero (evident)
inc R17        ; increment Quotient(R1 was one)
add r16,SIX   ; add SIX; because we INC Quotient
brcs PC-2    ; possible one do-it-again on Carry resulted from last instruction
; On previous 5 instructions
; if High(INPUT)>\$A8 5 or 9 clk, depending on RB256
; if High(INPUT)<\$A8 3 or 5 clk, depending on RB256

add r16,r0      ; we add Remainder on the old one
brcc PC+4        ; but if we have CARRY?
inc r17        ; increment Quotient
add r16,SIX    ; add SIX; because we INC Quotient
brcs PC-2    ; possible one repeat on Carry resulted from last instruction
; if there were 6 or  9 on previous paragraph we have only 3 clk  here
; if there were 3 or  5 on previous paragraph we have 3 to max. 9 here
;    so, this is somehow compensated (on both) to 6 to max. 15 clk

;---------

Were I can see that "min. 4 and max. 14 clk" is worse than "6 to max. 15 clk"

"much ado about nothing"

Last Edited: Sat. Feb 6, 2021 - 11:13 PM
Total votes: 0

Do you know the record on 24 and 32 bits?

Total votes: 0

Nope.

And it will start to get crowded in the registers so perhaps it should include store result in RAM.

Total votes: 0

(FYI)

My "posted here" Three Bytes(24bit) version is AVG=145.76 from 00:00:00 to FF:FF:FF

My "posted here" Three Bytes(24bit) version is AVG=148.07 from FF:00:00 to FF:FF:FF

(it'll take hours to count from 00:00:00 to FF:FF:FF; it should thus be under 150 clk)

Last Edited: Sat. Feb 6, 2021 - 11:08 PM
Total votes: 0

49.86 clk

My First Version on Two Bytes AVG under 50 Cycles.

## Attachment(s):

Total votes: 0

nice job

yes the 99 split works fine without correction for 65, the only errors are at 69 79 89 and 99.(I write 64 because the check is for that bit)

See #41 there are all the numbers.

Add:

One thing could actually be to make a version that don't use the HW mul. That will be slower (about 70-80 clk), but by far the fastest on on chips like tiny85.

Last Edited: Mon. Feb 8, 2021 - 10:37 AM
Total votes: 0

You made me do that nice job; and I'm very glad you did it!

You are right about 65; it was in my mind to change something in comment there, too.

But, I'm still searching on why can't I get one instr. faster. Can't move along until that hepens.

HW mul is very powerfull; I can make "a touch" on the subject, as a pause.

Yes, it would be a good thing to make a change in my thoughts; at least for a while.

Total votes: 0

Guess first "MUL by 6" can be relaced with something like:

add R16,R16

mov Rn,R16

add Rn,Rn

add Rn,R16

brcc      ; check Carry

The rest can be done with with "my 12.5clk version"; see. #25 .

I'll try.

Total votes: 0

AVG 73.18 CLK

This is a DRAFT, still untested, but seems to work; I'll try to fasten and update.

Att. Removed. It is WRONG. Has 26200/65536 errors on output. I'll be back.

Replaced .DEF SIX=R18 with .DEF SIX=R19; because I was using R18!

Now works 0-65535.

 Counter with program 5.18902e+06 Counter run empty 393215 Diffrence 4.7958e+06 Diffrence/65536 73.18

## Attachment(s):

Last Edited: Mon. Feb 8, 2021 - 02:35 PM
Total votes: 0

On an AVR you normally don't check for carry, but have a register that is zero, so you just ADC with zero (I always use R15, don't use R1 as GCC)

Total votes: 0

Yes, this I'll have to learn.

It's just a rapid transformation; copy and paste ... Draft, No Deep Thinking. Please excuse.

Thanks.

Total votes: 0

See Att. tested Atiny85 BCD Packed version.

Result BCD packed in R18:R17:R16
Used Registers: R0,R2,R3,R16,R17,R18

Max 83 & Min 65 Cycles and measured AVG=73.18
Used Code Segment memory 84 Words

I believe The Unpacked version will be Max<80 Cycles, also.

## Attachment(s):

Total votes: 0

but have a register that is zero, so you just ADC with zero

I'm doing the same but, Here I have both to:

- Increment Quotient

- Add Six to Remainder

And I know i spent few days optimizing this, 'cause the "below 50" uses this.

Last Edited: Mon. Feb 8, 2021 - 04:17 PM
Total votes: 0

See Att. retested Atiny85 BCD UnPacked version.

Result in:   R20:R19:R18:R17:R16   ( R0 is also used )

Max 74 & Min 56 Cycles and measured AVG=65.18

Used Code Segment memory 76 Words

The Unpacked version is less or equal than 74 Cycles.

Above modifications in Red, due to:

mov r0,r17    ;
add r0,r0        ; These are replacement for MUL ( +3 clk )
add r0,r17    ;  as suggested by
add r0,r0        ;  https://www.avrfreaks.net/users/...
sbc r19,r19    ; r19 = minus carry

sub  R17,r19   ; add carry

## Attachment(s):

Total votes: 0

If this code

```  clr r19		;
mov r0,r17	;
add r0,r0		;
add r0,r17	; These are replacement for MUL ( +4 clk )
add r0,r0		;
adc r19,r19	;

add  R17,r19    ; No Carry```

is replaced by

```  mov r0,r17	;
add r0,r0		;
add r0,r17	; These are replacement for MUL ( +4 clk )
add r0,r0		;
sbc r19,r19	; r19 = minus carry

sub  R17,r19    ; add carry```

and nothing gets broken (I didn't check) a cycle may be saved.

Total votes: 0

I checked. This is finesse!

Total votes: 0

47.86 clk

My Second Version on Two Bytes AVG under 50 Cycles. (AVRe)

Tested.

; 3529736    Counter with program
;  393215    Counter run empty
; 3136520    Diffrence
47.86    Diffrence/65536       My Second Under 50 Cyles Average!
; uses 7 MUL's so it should be         43 Words length  Min/Max 47/50

It was indeed a welcome pause to do the:

Atiny85 BCD UnPacked version.

Result in:   R20:R19:R18:R17:R16   ( R0 is also used )

Max 74 & Min 56 Cycles and measured AVG=65.18

Used Code Segment memory 76 Words

The Unpacked version is less or equal than 74 Cycles.

## Attachment(s):

Total votes: 0

Presumably, things like itoa() and printf() must be doing basically this - so has anyone looked at their implementations ... ?

Top Tips:

1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Total votes: 0

No.

We are speaking here about 65 ns (at 16MHz)!  <-- There, in red, It was mistakenly written "S".

This is,  I have to do it "because it is there" (was first uttered by Edmund Hillary when he and Tenzing Norgay conquered Mount Everest in 1953).

Added later: May be, some should see my  "/1000" algo.

Last Edited: Wed. Feb 10, 2021 - 11:52 AM
Total votes: 0

Catalin Ioan Stanciu wrote:
65 nS

lowercase 's' for seconds.

(uppercase 'S' = siemens)

https://www.avrfreaks.net/commen...

Top Tips:

1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Wed. Feb 10, 2021 - 11:32 AM
Total votes: 0

s= second. I do apologize for any inconvenience.

Time is one of the seven fundamental physical quantities in both the International System of Units (SI) and International System of Quantities.

Total votes: 0

Seven for the Dwarven Lords, in their halls of stone.

Total votes: 0

I deleted the original post.

It was inappropriate.

Last Edited: Wed. Feb 10, 2021 - 01:24 PM
Total votes: 0

Seven Units for Seven Quantities ... ?

Top Tips:

1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Total votes: 0

here was the answer back in 2016

https://www.avrfreaks.net/commen...

So itoa()  take about 875clk  for 12345

(And since AVR's with and without HW take about the same time I guess it use a normal div routine.)

But perhaps the libs have changed.

Add:

if it's a signed or unsigned version don't really make a difference, (neg16 take 3 clk)

Last Edited: Wed. Feb 10, 2021 - 03:01 PM
Total votes: 0

sparrow2 wrote:
So itoa()  take about 875clk  for 12345

So just the binary-to-BCD part should be a little less?

Top Tips:

1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Total votes: 0

Sweet Dreams (Are Made Of This).

Who am i to disagree.

cited from "Eurythmics, Annie Lennox, Dave Stewart - Sweet Dreams (Are Made Of This) (Official Video)".

Total votes: 0

sparrow2 wrote:
But perhaps the libs have changed.
I just tried a couple and got broadly similar answers for cycles in the simulator. I was a bit surprised that megas (MUL) actually return almost identical numbers to tiny.

The core code (itoa leads to ultoa_ncheck) is here:

http://svn.savannah.gnu.org/view...

Total votes: 0

but that code use 32bit , there should be a 16bit version !?

Total votes: 0

For the sake of the argument,

let us say that the Assembly Language is like a motorcycle

and other programming languages are like a car.

Now, if you are on a motorcycle you know that drivers may fail to see motorcycles in plain sight.

You may feel you want to ride a motorcycle or not,

but for me being on a motorcycle there is no point in arguing with someone who may fail to see me.

Total votes: 0

I have a 4-byte(32bit unsigned) version of about 170clk; hope I'll finish bugs and testing in the next days.

Is there realy a need for numbers larger than 256^4-1 = 4_294_967_295 ?

This post is about speed and anyone may question the need for speed.

As I've already already said,

nobody's compelled to do caving discovery or mind entertainment exercises.