## AVR Assembler Extract Each 3 digit number to each register (R23, R24, R25)

91 posts / 0 new
Author
Message

I have written in assembler 3-Digit 7Segment Multiplexing Code that works very well...because i have inpit format number from 000-999 To be displayed i need to write ASM function that takes this input number and put each digit in correspoding registers so that my function then can take this number that is index in array and read 7-Segment HEX code that will be send to each segment to turn on LED so that correct digit is displayed.

So Here is Example:

Input Number Is: 432 And i Need to after function is done to put in registers these values:

R23 - 04

R24 - 03

R25 - 02

So i look at ATMEL avr200.asm and test these code:

```.include "m328pdef.inc"

.CSEG
digit: .DB 0x3F, 0x06, 0x5B, 0x4F, 0x66, 0x6D, 0x7D, 0x07, 0x7F, 0x6F
.ORG	0x0005
rjmp RESET

RESET:
; INIT - Stack Pointer
ldi		R16, HIGH(RAMEND)
out		SPH, R16
ldi		R16, LOW(RAMEND)
out		SPL, R16

;***** Subroutine Register Variables

.def	drem16uL=r14
.def	drem16uH=r15
.def	dres16uL=r16
.def	dres16uH=r17
.def	dd16uL	=r16
.def	dd16uH	=r17
.def	dv16uL	=r18
.def	dv16uH	=r19
.def	dcnt16u	=r20

; LOAD - Divident(dd8u) And Divisor(dv8u) => 432(1B0) / 100 = 4.32
; Divident 1
ldi		dd16uH, HIGH(0x01B0)
ldi		dd16uL, LOW(0x01B0)

; Divisor 1
ldi		dv16uH, HIGH(100)
ldi		dv16uL, LOW(100)
call	div16u

ldi		ZL, Low(digit)
lpm		R23, Z						; Read Digit1 From Flash

loop:
rjmp loop

;***** Code

div16u:
clr	drem16uL			; clear remainder Low byte
sub	drem16uH, drem16uH	; clear remainder High byte and carry
ldi	dcnt16u,17			;init loop counter
d16u_1:
rol	dd16uL				;shift left dividend
rol	dd16uH
dec	dcnt16u				; decrement counter
brne d16u_2			    ; if done
ret						;    return
d16u_2:
rol	drem16uL		    ;shift dividend into remainder
rol	drem16uH
sub	drem16uL,dv16uL		;remainder = remainder - divisor
sbc	drem16uH,dv16uH		;
brcc	d16u_3			;if result negative
clc						;    clear carry to be shifted into result
rjmp	d16u_1			;else
d16u_3:	sec					;    set carry to be shifted into result
rjmp	d16u_1
```

I added code that read MSB and LSB byte of first result digit and this is 04 and get from index array value  0x66 that when send to 7-Display Segment will be number 4.

So i see in debugger that drem16uL and drem16uH have value in my example 32 so i need to get 3 and 2 and put it in above example registers. U know that 432/100=4,32 and AVR take in above example whole number that is 4 and 32 put in reminder registers so how to do that efficiently to get 3 and 2?

Thanks.

I used in above code to divide 432/100 = 4,32 and put 4 to register R23, and 32 is in drea,16uL and dreamuH registers.

So i now get to idea to add to this above program this theory:

432/100 = 4,32 => R23 = 04
32/10 => 3,2 => R24 = 03
R25 = 02

So i divide 432 by 100 get 4,32..take 4 and put into R23 to have value 04, then i take divider 32 and divide with 10 get 3,2...take 3 and put into R24 so that value is 03...and take reminder 2 and put into R25 so that value is 02.

This is just idea but if someone have another idea please write, i know that above program is cycle counting not very efficient..it takes 13uS to execute...so if i add divider by 10 it will take for sure just ore 4-5uS so it will be executed in cca 20uS and that is very long..because i have temperature meausurement that reads every 500ms value and then this function will extract each digit to coressponding registers...

Ideas?

Forget dividing, just use some binary to bcd conversion, here is one for 16 bits, can easily be reduced to 8 bits (0255), sytart with hundreds code:

from here, got take a look http://www.avr-asm-tutorial.net/avr_en/calc/CONVERT.html#bin2bcd

```; Bin2ToBcd5
; ==========
; converts a 16-bit-binary to a 5-digit-BCD
; In: 16-bit-binary in rBin1H:L, Z points to first digit
;   where the result goes to
; Out: 5-digit-BCD, Z points to first BCD-digit
; Used registers: rBin1H:L (unchanged), rBin2H:L (changed),
;   rmp
; Called subroutines: Bin2ToDigit
;
Bin2ToBcd5:
push rBin1H ; Save number
push rBin1L
mov rBin2H,rmp
ldi rmp,LOW(10000)
mov rBin2L,rmp
rcall Bin2ToDigit ; Calculate digit
ldi rmp,HIGH(1000) ; Next with thousands
mov rBin2H,rmp
ldi rmp,LOW(1000)
mov rBin2L,rmp
rcall Bin2ToDigit ; Calculate digit
ldi rmp,HIGH(100) ; Next with hundreds
mov rBin2H,rmp
ldi rmp,LOW(100)
mov rBin2L,rmp
rcall Bin2ToDigit ; Calculate digit
ldi rmp,HIGH(10) ; Next with tens
mov rBin2H,rmp
ldi rmp,LOW(10)
mov rBin2L,rmp
rcall Bin2ToDigit ; Calculate digit
st z,rBin1L ; Remainder are ones
sbiw ZL,4 ; Put pointer to first BCD
pop rBin1L ; Restore original binary
pop rBin1H
ret ; and return
;
; Bin2ToDigit
; ===========
; converts one decimal digit by continued subraction of a
;   binary coded decimal
; Used by: Bin2ToBcd5, Bin2ToAsc5, Bin2ToAsc
; In: 16-bit-binary in rBin1H:L, binary coded decimal in
;   rBin2H:L, Z points to current BCD digit
; Out: Result in Z, Z incremented
; Used registers: rBin1H:L (holds remainder of the binary),
;   rBin2H:L (unchanged), rmp
; Called subroutines: -
;
Bin2ToDigit:
clr rmp ; digit count is zero
Bin2ToDigita:
cp rBin1H,rBin2H ; Number bigger than decimal?
brcs Bin2ToDigitc ; MSB smaller than decimal
brne Bin2ToDigitb ; MSB bigger than decimal
cp rBin1L,rBin2L ; LSB bigger or equal decimal
brcs Bin2ToDigitc ; LSB smaller than decimal
Bin2ToDigitb:
sub rBin1L,rBin2L ; Subtract LSB decimal
sbc rBin1H,rBin2H ; Subtract MSB decimal
inc rmp ; Increment digit count
rjmp Bin2ToDigita ; Next loop
Bin2ToDigitc:
st z+,rmp ; Save digit and increment
ret ; done
```

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

That looks very interesting...before i study code and see how is done can you please tell me if this will work because i have readed temperature in hexadecimal format so my value is for example 0x1B0 (432) that need to convert to binary before i can call bin2bcd? If yes do you have example code how to convert hex to bin in asm?

Thanks.

Well the above routine IS effectively doing division (by repeated subtraction of 10000, 1000, 100 and then 10) so, yeah it should work.
.
Personally I can't think of a way of extracting decimal digits that does not involve division/repeated subtraction.
.
EDIT obviously if it's just 0.. 999 then you only need the 100 and 10 bits.

Last Edited: Sun. Apr 15, 2018 - 05:10 PM

I googled and found informations that for BCD i need between AVR and 7 segment BCD driver that translates BCD to 7-Segment Decimal format...so one IC more...hmm..i would not like to use one IC more...because i love simplicity..so using multiplexing and 3 NPN transistors with 10k resistors to its base will be fine...So there is no other suggestions for extracting digits that i wrote above? Divide by 100, take result, then take reminder divide by 10 and take result and divider and you got all three digits...BCD conversion looks very nice in simulator only 6uS but i can't send BCD to 7 segment display directly from AVR..so if i understand correctly...

You said in first post you have the display working fine

I have written in assembler 3-Digit 7Segment Multiplexing Code that works very well...

Is the display workling or not?  That is pretty trivial, just one transistor per digit (search for display multiplexing)..Each digit shares the common segment lines.  Of course, sharing means dimmer (example, 5 digits, each on 20% of time, 80% time off!!)

for 3 to 5 digits, assign 3 to 5 registers to hold bcd values (0-9)  & keep them on the display.

AFTER you have the display showing the register values properly (digits!), THEN worry about doing a conversion (put the conversion BCD results in those registers)

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Ok thanks...but just to ask..if for this BCD conversion do i need special chip that converts BCD from AVR registers to 7 Segments or not?

If i don't need special decoder chip eg. 74LS47 or similiar then i will proceed to make BCD Conversion code to study..this is unfamiliar to me do i need decoder chip or not? And i im begginer so this is why i ask this.

robydream wrote:

Ok thanks...but just to ask..if for this BCD conversion do i need special chip that converts BCD from AVR registers to 7 Segments or not?

If i don't need special decoder chip eg. 74LS47 or similiar then i will proceed to make BCD Conversion code to study..this is unfamiliar to me do i need decoder chip or not? And i im begginer so this is why i ask this.

you can do the BCD conversion with a function or a macro.

Some older CPUs like the 8080 or 6502 had this as an instruction.  Since the AVR is RISC we can emulate the 8080 DAA instruction as:

```DAA:
; the 8 bit number i n the accumulator is adjusted to form two
; four bit Binary Coded Decimal digits by the following process:
;
; 1. If the value of the lease significant 4 bits
; of the accumulator is greater than 9 or if the
; AC flag is set, 6 is added to the accumulator
; 2. If the value of the most significant 4 buts
; of the accumulator is greater than 9 or if the CY
; flag is set 6 is added to the most significant
; four bits of the accumulator
; Note: All flags are affected

mov r22,r24
mov r23,r24
in r25,SREG
andi r25, (1<<SREG_C)

clc
brhs DAA_adjlo		; Half carry set
andi r22,0x0F
cpi r22,10
brlo DAA_hi
ldi r22, 6

DAA_hi:
tst r25
mov r22,r23
cpi r22,0x9A
ldi r22,0x60
sec
rjmp DAA_end
clc

DAA_end:

ret```

This will convert the r24 value (which I usually call ACC) to or from BCD using the cY and hCY flags.  A few temp registers are used. Be sure to protect these in the caller if they should not be trashed.

Complement arithmetic can also be used if the values are subtracted from BCD 99.

I used this function to port the 4 function math package from an 8080 based basic interpreter.   Many calculators use BCD so the interpreter simulated a calculator.  Advantage is this can extend the digit precision.  One keeps the values in SRAM and indexes into this 2 BCD digits at a time.  by chaining the cY and the hCY flags long sequences of digits can be added, subtracted multiplied or divided.  The full code is on my git.  https://github.com/sheepdoll/PTExtendedBasicArduino.git

I also wrote a Vacuum Florescent Display (VFD) driver that uses the  HD44780 protocol to emulate a 7 segment display, with a clock as the test application this is https://github.com/sheepdoll/AVRVFDCLOCK.git.  I wrote this before I found out about the DAA instruction.  Object of that project was to create a DOS style file time stamp for fatFS.    Hindsight says it would have been better to emulate a DS1307.

Edit: found a register name in the code example (ARGL) that was not converted in the an actual register (r22) for simplification.

Last Edited: Mon. Apr 16, 2018 - 06:24 PM

do i need special chip that converts BCD from AVR registers to 7 Segments or not?

Why would you think that?  Aren't you just lighting up segments on a simple (non-graphics) display? Isn't this what you are writing your software to do?  You need to form a clear picture and thoughts of what you want the code to do!

For example, you could find some 7 segment drivers/controller chips to do the multiplexing for you (then you just send the values).   But to keep it simple, why not do the multiplexing in software?

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

If your AVR have a HW MUL, the fastest routine would be one that use that div is the same as mul with 1/x.

Long time ago I have posted a routine that can take 16 bit int and get 5 bytes out, in less than 70 clk, but it has since beaten with a routine that take less than 50 clk.

Both codes are here look for something like int to BCD  or bin to BCD

so if it just want 000 to 999 and you really need speed my guess is that it can be done in about 25-30 clk.

I would do (have done!) the BCD to 7-segment display conversion with a lookup table inside the AVR.

```e.g.:

7SEG_LOOKUP:

.db 0b01110111   ; 0

.db 0b00010001   ; 1

.db 0b00111110   ; 2

(and so on)

Of course, the various bit patterns depend upon your displays and how they're wired up.  To access, just use:

ldi ZL, low(2*7SEG_LOOKUP)

ldi ZH, high(2*7SEG_LOOKUP)

add ZL, bcd_digit                           ; Where bcd_digit is a register containing the value (in BCD) you would like to display

adc ZH, zero                                  ; and 'zero' is a register containing the value \$00

ldi temp, Z                                     ; and 'temp' is a high-side (16-31) register that you don't mind frying

out PORTB, temp                           ; and PORTB is where your LED display(s) are wired up to.```

e.g.:

7SEG_LOOKUP:

.db 0b01110111   ; 0

.db 0b00010001   ; 1

.db 0b00111110   ; 2

(and so on)

Of course, the various bit patterns depend upon your displays and how they're wired up.  To access, just use:

ldi ZL, low(2*7SEG_LOOKUP)

ldi ZH, high(2*7SEG_LOOKUP)

add ZL, bcd_digit                           ; Where bcd_digit is a register containing the value (in BCD) you would like to display

adc ZH, zero                                  ; and 'zero' is a register containing the value \$00

lpm temp, Z                                     ; and 'temp' is a high-side (16-31) register that you don't mind frying

out PORTB, temp                           ; and PORTB is where your LED display(s) are wired up to.

&c.

S.

PS - Yes, I know there's a code window.  Yes, I know how to click on the relevant icon.  I also know that it does not work here, so kwitcher bitchin' until the site admins fix it.  S.

Edited to remove spurious \$, make it an lpm temp, and thank JS.  S,

I also know that it does not work here

JS

Last Edited: Mon. Apr 16, 2018 - 01:52 AM

Thanks..that is nice idea...convert HEXToBCD to get each digit in Binary Coded Decimal, and then lookup in FLASH table for value that needs to be send to 7Segment Display...Very smart way...i will study the code and post here when i got it working...

I have written in assembler 3-Digit 7Segment Multiplexing Code that works very well..

Wasn't your display already working?   Now, you just need to perform the conversions

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Yes my display is working if i manually specify to load number 3 from DSEG array then is shown correctly...now i need to get from number 432 each digit so that i can then call from DSEG digit positon to get hex value for 7SEGment data to be send.

I will do division math...with this:

432 / 100 = 4,32

Take 4 and PLACE in R23 => 04

Take Reminder 32 and divide it with 10 = 3,2

Take 3 and PLACE in R24 => 03

Take Reminder 2 and place it in R25 => 02

So Register After Division Needs To Look like this:

R23 => 04

R24 => 03

R25 => 02

Can SomeOne Write This in Assembler? I im learning and need lower cycles program to do that...i im reading AVR

https://github.com/sheepdoll/AVR...

"div16u" - 16/16 Bit Unsigned Division

I try it and get 4,32 so result and reminder...code needs to be improved by adding after reminder is set to divide by 10 and take new result and reminder into coressponding result registers...could please write someone this? I have finished project but this i need to add so that i can send each digit to DSEG array and get HEX value to be send to turn on digit on segment.

Thanks

robydream wrote:

The freaks are volunteers who will help you get YOUR code correct, but as professionals we get paid to write code, are you offering to pay someone?

Jim

FF = PI > S.E.T

```        ldi     r18, -1 + '0'
_bcd3:  inc     r18
subi    r16, low(100)           ;-100
sbci    r17, high(100)
brcc    _bcd3

ldi     r17, 10 + '0'
_bcd4:  dec     r17
subi    r16, -10                ;+10
brcs    _bcd4

subi    r16, -'0'
```

this is cut and paste from here

Have in mind it don't use the same reg. than you, the source number change, and the output is the ASCII numbers so 4 is 0x34 not 0x04 as you want , but it show how simple it can be done.

It's faster then div. but slower than the use of HW multiplier

Which AVR do you use?

Ok..fair is offer...i will then study and have questions how to do what i need to do.

This code works excellent...i use 8 bit value to divide (254/100) to get 2 5 and 4....using 16-bit value i will add later because it is only MSB and LSB...so i get using this code 2 and 54...

```.include	"m328pdef.inc"

.ORG	0x0000

_Reset:
ldi		yl,byte1(RAMEND)
out		SPL,yl
ldi		yh,byte2(RAMEND)
out		SPL+1,yh

.DEF  ANS = R0            ;To hold answer
.DEF  REM = R2            ;To hold remainder
.DEF    A = R16           ;To hold dividend
.DEF    B = R18           ;To hold divisor
.DEF    C = R20           ;Bit Counter

LDI A,254         ;Load dividend into A
LDI B,100         ;Load divisor into B
DIV88:
Sub REM,REM       ;Clear Remainder And Carry
MOV ANS,A         ;Copy Dividend To Answer
Loop:   ROL ANS           ;Shift the answer To the Left
DEC C             ;Decrement Counter
BREQ DONE        ;Exit If eight bits done
ROL REM           ;Shift the remainder To the Left
Sub REM,B         ;Try To Subtract divisor from remainder
BRCC SKIP        ;If the result was negative Then
ADD REM,B         ;reverse the subtraction To try again
CLC               ;Clear Carry Flag so zero shifted into A
RJMP Loop        ;Loop Back
SKIP:   SEC               ;Set Carry Flag To be shifted into A
RJMP Loop
DONE:```

But idea is when i get called DONE: macro i would like to put ANS value into R23, load reminder into A and set divider to 10 and loop again and second time i call DONE i need to leave R23 (not overwriting value) and put AND to R24 and reminder into R25

But problem is that i don't know how to test in DONE: macro if R23 register is empty or not (value 00)..if is empty copy value from AND to R23, if is not empty skip adding value and add AND to R24 and REM to R25.

So this is in short:

```DONE:
// CHECK - if R23 is 00 if yes
MOV R23, AND
LDI A,REM         ;Load dividend into A
LDI B, 10         ;Load divisor into B
RCALL DIV88           ;Call again but divide reminder with 10

// if R23 is NOT 00 then

MOV R24, AND

MOV R25, REM

But i just don't know how to test if register R23 is 00 so that i can use above code..if someone can help with asm mnemonic how is called?

Ok i got it..it was so simple:

```.include	"C:\FastAVR\inc\m8def.inc"

.ORG	0x0000

_Reset:
ldi		yl,byte1(RAMEND)
out		SPL,yl
ldi		yh,byte2(RAMEND)
out		SPL+1,yh

.DEF  ANS = R0            ;To hold answer
.DEF  REM = R2            ;To hold remainder
.DEF    A = R16           ;To hold dividend
.DEF    B = R18           ;To hold divisor
.DEF    C = R20           ;Bit Counter

LDI A,254         ;Load dividend into A
LDI B,100         ;Load divisor into B
DIV88:
Sub REM,REM       ;Clear Remainder And Carry
MOV ANS,A         ;Copy Dividend To Answer
Loop:   ROL ANS           ;Shift the answer To the Left
DEC C             ;Decrement Counter
BREQ DONE        ;Exit If eight bits done
ROL REM           ;Shift the remainder To the Left
Sub REM,B         ;Try To Subtract divisor from remainder
BRCC SKIP        ;If the result was negative Then
ADD REM,B         ;reverse the subtraction To try again
CLC               ;Clear Carry Flag so zero shifted into A
RJMP Loop        ;Loop Back
SKIP:   SEC               ;Set Carry Flag To be shifted into A
RJMP Loop
DONE:
TST R23			  ;Check If R23(first digit) is Set
BREQ Exit		  ;Branch If R23=00
MOV R24, ANS	  ;Second Digit
MOV R25, REM      ;Third Digit
RET
Exit:
MOV R23, ANS	  ;First Digit
MOV A, REM		  ;Load reminder divident into A
LDI B, 10		  ;Load divisor into B
RCALL DIV88		  ;Call again but divide divident by 10	```

Now one thing that worry me is 199 Clock Cycles that is for this operation too much...how can i lower cycles?

Code works after executing this i see this:

R23 => 02

R24 => 05

R25 => 04

I tested it with value 84 and it failed...because 2 digit number will be failed because R23 will be in DONE: macro always 00 and comapre will first time failed and in second time i will have:

R23 => 08

R24 => 00

R25 => 04

So it is not ok for 1 or 2 digit only for 3.

How can i fix this to work for 1 2 and 3 digit together, for example if number is 0 1 84 125 255 to get correctly values?

Ok..i updated code and now all digits from 000-255 works...

```.include	"C:\FastAVR\inc\m8def.inc"

.ORG	0x0000

_Reset:
ldi		yl,byte1(RAMEND)
out		SPL,yl
ldi		yh,byte2(RAMEND)
out		SPL+1,yh

.DEF  ANS = R0            ;To hold answer
.DEF  REM = R2            ;To hold remainder
.DEF    A = R16           ;To hold dividend
.DEF    B = R18           ;To hold divisor
.DEF    C = R20           ;Bit Counter

LDI A,253         ;Load dividend into A
LDI B,100         ;Load divisor into B
DIV88:
Sub REM,REM       ;Clear Remainder And Carry
MOV ANS,A         ;Copy Dividend To Answer
Loop:   ROL ANS           ;Shift the answer To the Left
DEC C             ;Decrement Counter
BREQ DONE        ;Exit If eight bits done
ROL REM           ;Shift the remainder To the Left
Sub REM,B         ;Try To Subtract divisor from remainder
BRCC SKIP        ;If the result was negative Then
ADD REM,B         ;reverse the subtraction To try again
CLC               ;Clear Carry Flag so zero shifted into A
RJMP Loop        ;Loop Back
SKIP:   SEC               ;Set Carry Flag To be shifted into A
RJMP Loop
DONE:
TST R1			  ;Check If R1(first digit) is Set
BREQ Exit		  ;Branch If R1=00
MOV R24, ANS	  ;Second Digit
MOV R25, REM      ;Third Digit
RET
Exit:
MOV R23, ANS	  ;First Digit
LDI A, 0x01
MOV R1, A         ;First Digit Set
MOV A, REM		  ;Load reminder divident into A
LDI B, 10		  ;Load divisor into B
RCALL DIV88		  ;Call again but divide divident by 10	```

Now challenge is to add HIGH and LOW bits to load 16-bit number and get three digits...

robydream wrote:

Can SomeOne Write This in Assembler? I im learning and need lower cycles program to do that...i im reading AVR

https://github.com/sheepdoll/AVR...

"div16u" - 16/16 Bit Unsigned Division

I try it and get 4,32 so result and reminder...code needs to be improved by adding after reminder is set to divide by 10 and take new result and reminder into coressponding result registers...could please write someone this? I have finished project but this i need to add so that i can send each digit to DSEG array and get HEX value to be send to turn on digit on segment.

Thanks

This is a bit confusing.   That link is to an included tech note that was in the folder.    I see I forgot to change one of the register labels into the DAA code, which could lead to some other confusion.  I edited my post for future reference.

I have not take the time to look at the OP's code in detail.   This should be some fairly straightforward shifts and multiplies in the nybbles.   Three digits of BCD will take 2 bytes,  2 bytes of BCD take 1 byte.   If there are more than 2 digits then the 3rd digit carries into the next nybble.   So there will always be an nybble that is zero unless there is an overflow.  I would think that be the end of sequence flag.

On the other hand if leading or trailing zero suppression is required, then one is basically writing a printf function,  Which then starts detailing the itoa function.   This can be a simple table lookup that divides the digits by 1,10,100 as was done in the code of the late 1970s.  Advantage is this can work in any base.    Here is one that I converted from 68K assembly to AVR.  (the 68k opcodes are commented out)

```;********************
;*	 DCVT	    *
;********************
;Decimal and octal conversion subroutines
;D1 contains the input value
;A2 indexes the output result area
;D6 contains the control word as follows:
;
;      Bits 0-7 contains the fixed length number of characters
;      Bit  8 - set to suppress leading zeros if fixed length
;      Bit  9 - set for terminal output (else uses A2)
;      Bit 10 -
;      Bit 11 - set for byte size in D1
;      Bit 12 - set for word size in D1
;      Bit 13 -
;      Bit 14 - set for octal output
;      Bit 15 - set for hex output
;
;Registers modified are none unless A2 is used

DCVT:
;	SAVE	D0,D1,D3,D4,A1		; save registers
;	MOV	D6,D0			; control word to D0
mov ARGL,D6_TMPL
mov ARGH,D6_TMPH

;Set table index A1 for decimal, octal or hex output
;	LEA	A1,HCTBL		; set hex table index
ldi ZL,low(HCTBL*2)
ldi ZH,high(HCTBL*2)

;	TSTW	D0			; is it hex?
tst ARGH
;	BMI	10\$			;   yes
brmi DCVT_10

;	LEA	A1,OCTBL		; set octal table index
ldi ZL,low(OCTBL*2)
ldi ZH,high(OCTBL*2)

;	BTST	#14.,D0			; is it octal?
;	BNE	10\$			;   yes
sbrc ARGH,6
rjmp DCVT_10

;	LEA	A1,DCTBL		; set decimal table index
ldi ZL,low(DCTBL*2)
ldi ZH,high(DCTBL*2)

;Put the table size into D4
DCVT_10:
;	MOV	(A1)+,D4		; pick up table size
mov A1_TMPL,ZL
mov A1_TMPH,ZH
lpm
mov idx,r0

;Set the input data into D3 and strip to word or byte as required
;	MOV	D1,D3			; get binary input
; avr D1 = ACC,BCC,XL,XH is parameter
;	BTST	#11.,D0			; byte size data?
;	BEQ	15\$			;   no
sbrs ARGH,3
rjmp DCVT_15
;	AND	#377,D3			; strip to byte data
clr BCC
clr XL
clr XH

DCVT_15:
;	BTST	#12.,D0			; word size data?
;	BEQ	DCND			;   no
sbrs ARGH,4
rjmp DCND
;	AND	#177777,D3		; strip to word data
clr XL
clr XH

;Calculate value of next digit
DCND:
;CLR	D1			; preclear result
clr D7_TMPL			; avr shadow D1 in r12-15

lpm
mov A6_TMPL,r0		; avr shadow internal to A6
lpm
mov A6_TMPH,r0
lpm
mov A6_Page,r0
lpm
mov A4_Page,r0
DCND_10:
;	CMP	D3,@A1			; compare to table value
cp ACC,A6_TMPL
cpc BCC,A6_TMPH
cpc XL,A6_Page
cpc XH,A4_Page

;	BLO	DCCZ
brlo DCCZ

;	INCW	D1
inc D7_TMPL

;	SUB	@A1,D3
sub ACC,A6_TMPL
sbc BCC,A6_TMPH
sbc XL,A6_Page
sbc XH,A4_Page

;	BR	10\$
rjmp DCND_10

;Check for digit of zero
DCCZ:
;	TSTW	D1			; zero?
;	BNE	DCNZ			; nope
cp D7_TMPL,zero
brne DCNZ

;Digit is zero - check for zero suppress unless units digit
;Bypass if variable length in progress
;	CMP	D4,#1			; units digit?
;	BEQ	DCNZ			; yes - no suppress
cpi idx,1
breq DCNZ

;	TSTB	D0			; check for variable length
;	BEQ	DCDD
tst ARGL
breq DCDD

;	BTST	#8.,D0			; test control bit 0
;	BEQ	DCNZ			; no suppress if off
sbrs ARGH,0
rjmp DCNZ

;	MOV	#40,D1			; set space
ldi c_tmp,32
mov D7_TMPL,c_tmp
;	BR	DCRO
rjmp DCRO

;Digit is not zero or it is zero in units position
;Make it ASCII and reset zero suppress bit
DCNZ:
;	CMPW	D1,#9.			; decimal digit?
ldi c_tmp,9
cp D7_TMPL,c_tmp
;	BLOS	10\$			;   yes
breq DCNZ_10
brlt DCNZ_10

ldi c_tmp,7

DCNZ_10:
;ADDW	#60,D1			; make ASCII for output
ldi c_tmp,48

;	ANDW	#177377,D0		; reset zero suppress bit
andi ARGH,0xFE

;Check for variable length output
;A digit has been detected so reset length if variable
;	TSTB	D0			; variable output length?
;	BNE	DCRO			; nope
tst ARGL
brne DCRO
;	OR	D4,D0			; reset for fixed length
mov idx,ARGL

;Ready for output - bypass if fixed length is less
DCRO:
;	CMPB	D0,D4			; check fixed length
;	BLO	DCDD			; bypass output if less
cp ARGL,idx
brlo DCDD

;	BTST	#9.,D0			; terminal output?
;	BEQ	10\$			;   no - use A2
SBRC ARGH,1
rjmp DCRO_10
mov D6_TMPL,D7_TMPL
rcall TOTCHR	;D1			; output to terminal
;BR	DCDD
rjmp DCDD

DCRO_10:
;	MOVB	D1,(A2)+		; output the digit
st Y+,D7_TMPL

;Digit done - check for all digits processed
DCDD:
;	ADD	#4,A1			; bump table index
;	ldi c_tmp,4

;	DEC	D4			; decrement digit count
dec idx
;	BNE	DCND			; loop if not last digit
brne DCND

;End of processing - restore registers and exit
;	REST	D0,D1,D3,D4,A1	 	; restore registers
;	RTE
ret

;Table for decimal conversion
DCTBL:	.dw	10
.dd	1000000000
.dd	100000000
.dd	10000000
.dd	1000000
.dd	100000
.dd	10000
.dd	1000
.dd	100
.dd	10
.dd	1

;Table for octal conversion
OCTBL:	.dw	11
.dd	1073741824
.dd	134217728
.dd	16777216
.dd	2097152
.dd	262144
.dd	32768
.dd	4096
.dd	512
.dd	64
.dd	8
.dd	1

;Table for hex conversion
HCTBL:	.dw	8
.dd		0x10000000
.dd		0x1000000
.dd		0x100000
.dd		0x10000
.dd		0x1000
.dd		0x100
.dd		0x10
.dd		0x1
```

And a 16-bit numbers support to extract three digit so numbers from 000-999 is successfully extracted...time to software do this is on 18,432MHz quarz 26,69uS or 492 Clock Cycles Counter...so using above math /100 take reminder and divide by 10 take so much cycles...maybe if i put some 20MHz quarz that is maximum for ATmega328p i will get slighty lower executing time...i will study above code and i think if i write this all manually without rjmp and other mnemonics that takes 2cycles to execute i think i can decrease time from 492 Cycles to about 350Cycles (18,98uS)....bit need to study...i im wondering if some have a idea or clue how to extract numbers from 000-999 using multiplication? I see in atmel datasheet that AVR have Hardware MUL instruction that takes 2cycles...and i try to multiplicate for example:

432 x 0,01 = 4,32 => Take 4 Put into R23 => 04

Take Reminder 32 And:

32 x 0,1 = 3,2 => Take 3 Put into R24 => 03

And Take Reminder and put into R25 => 02

So using HW MUL instruction if is allowed to multiple with decimal value? Because it runs 2cycles per multiplication so this will be around 30-40cycles if it can be done this way...and software division needs 492 cycles so this will be huge discovery in AVR world and huge speed.

Here is code that is working without errors with division and extracting numbers from 000-999:

```.include	"C:\FastAVR\inc\m8def.inc"

.DSEG
Digit1: .Byte 1
Digit2: .Byte 1
Digit3: .Byte 1
.CSEG
.ORG  0x0000
RJMP Reset

Reset:
.DEF ANSL = R0            ;To hold Low-Byte of answer
.DEF ANSH = R1            ;To hold high-Byte of answer
.DEF REML = R2            ;To hold Low-Byte of remainder
.DEF REMH = R3            ;To hold high-Byte of remainder
.DEF   AL = R16           ;To hold Low-Byte of dividend
.DEF   AH = R17           ;To hold high-Byte of dividend
.DEF   BL = R18           ;To hold Low-Byte of divisor
.DEF   BH = R19           ;To hold high-Byte of divisor
.DEF    C = R20           ;Bit Counter

LDI AL,Low(0x363) ;Load Low-Byte of dividend into AL
LDI AH,HIGH(0x363);Load HIGH-Byte of dividend into AH
LDI BL,Low(100)   ;Load Low-Byte of divisor into BL
LDI BH,HIGH(100)  ;Load high-Byte of divisor into BH
DIV1616:
MOVW ANSH:ANSL,AH:AL ;Copy dividend into answer
Sub REML,REML     ;Clear Remainder And Carry
CLR REMH          ;
Loop:   ROL ANSL          ;Shift the answer To the Left
ROL ANSH          ;
DEC C             ;Decrement Counter
BREQ DONE        ;Exit If sixteen bits done
ROL REML          ;Shift remainder To the Left
ROL REMH          ;
Sub REML,BL       ;Try To subtract divisor from remainder
SBC REMH,BH
BRCC SKIP        ;If the result was negative Then
ADD REML,BL       ;reverse the subtraction To try again
CLC               ;Clear Carry Flag so zero shifted into A
RJMP Loop        ;Loop Back
SKIP:   SEC               ;Set Carry Flag To be shifted into A
RJMP Loop
DONE:
TST R4			  ;Check If R4(First Digit) Is Set
BREQ Exit           ;Branch If R4=00
MOV R24, ANSL 	  ;Second Digit Low Byte
MOV R25, REML       ;Third Digit Low Byte
RET
Exit:
MOV R23, ANSL	      ;First Digit
LDI AL, 0x01
MOV R4, AL		  ;First Digit Set
MOV AL, REML        ;Load Reminder Low-Byte of dividend into AL
MOV AH, REMH        ;Load Reminder HIGH-Byte of dividend into AH
LDI BL,Low(10)      ;Load Divisor Low-Byte of divisor into BL
LDI BH,HIGH(10)     ;Load Divisor high-Byte of divisor into BH
RCALL DIV1616       ;Call again but divide divident by 10```

God And Bad Comments Are Welcome how can i improve this eather way with multiplication or division....

Why are you worried about execution time? The function will be called less than 100 times per second, so any potential savings won’t amount to much. Rule #1 get the code working first. Then think about optimising.

Ok you are right...i have some doubts and that's are:

I im building solder station controller using 3-Digit 7Segment Display Common Cathode...that is refreshed using Timer1 Configured As CTC mode and time to get interrupt ist set to 5ms...so every 5ms timer1 is fired it reads registers R23 R24 R25, read digit numbers from it...loads from FLASH array index coressponding hex value for digit 0...9 and using multiplexing send that hex value on PORTD that turns that digit decimal value.

I use Timer0 that is fired every 500ms that reads optocoupler temperature (MAX6675). left shift bytes to get temperature reading...divide result with 4 to get Celsius, and then it calls above function that extract celsius temperature that sets valid decimal value in registers R23 R24 R25 that is read using Timer1 that refresh display.

My question is is this 492cycles ok for this operation so that i can get fast reading temperature or it will be some cpu usage high and slow reading because of many cycles that divide numbers and extract each digit? I know that this is small time but i think it can be improved...ideas welcome....and yes i know that 1s = 1e+6s so this is under 30us executing time..so it must be without delay and low cpu usage...

The code I gave in #17 will do the job in less than 100 clk.

The code at the end of my link will do a full 16 -> 5 digit in max 68 clk and if you remove the code for finding the first digit it's about 40 clk.

And as a say in #11 if it only 3 digit in can be done in about 25-30 clk for an optimal routine.

All this is without formatting!

How do you want to show 8?

008  no formating

__8 (_ indicate off)

_8_ centered

8__ looks best if at the end of a text

I will study your code in simulator..and needs to get clear in my head of what each line is doing...this is way to learn asm and to know how things works..so i don't want just to copy paste code and say hey i finished project it is working...and later in next project i will surely not know how divider works in asm..so i will spend much more time to get point of that...

as you are asking for formating this i have done with formating table in flash and is working perfect...3cycles is needed to fetch formatiing from flash (2cycles is needed for SRAM but this is not critical because i im not building space shuttle where 1 cycle is very important).

```.CSEG
digit: .DB 0x3f, 0x06, 0x5b, 0x4f, 0x66, 0x6d, 0x7d, 0x07, 0x7f, 0x6f

THIS IS CODE FROM TIMER1 THAT READ REGISTERS
ldi		ZL, Low(digit)

lpm		R23, Z						; Read Digit1 From Flash

lpm		R24, Z						; Read Digit2 From Flash

lpm		R25, Z						; Read Digit3 From Flash```

So this is done and it works perfectly with very low cycles...so now i must study your code and get division in cycles below 100 that will be huge improvement...so when i get things cleaner in simulator i will post here your code that works for what i need..and thanks for code...

So that mean that you will show 8  as 008 (most people prefer not to show the two zeros )

Yes you are right...look at this picture...when i power on solder station it will start showint temperature rising to set temperature from 000 038  067  098  127  148  189 200 if i set 200C as set temperature and wil stay there..so if i use 3 digit 7 segment display to me it is beautiful to see 008 regards that 8...it is what people likes for its own style.

I see that you use MUL in you code to get first digit and other is very similiar idea to me but with much smart code so cycles is smaller..i im now studing and writing your code...i im now waiting to finish this code translations to my needs and see how many cycles is needs...

robydream wrote:

...bit need to study...i im wondering if some have a idea or clue how to extract numbers from 000-999 using multiplication? I see in atmel datasheet that AVR have Hardware MUL instruction that takes 2cycles...and i try to multiplicate for example:

... God And Bad Comments Are Welcome how can i improve this eather way with multiplication or division....

I think Nietzsche is dead so God probably does not care what the comments are.

In post #22 I gave code that does base conversion which only uses subtractions from a table.   Does not get much simpler than that.  Sure the code looks a bit long, becouse it is for two different ASM mnuemonics and has comments that explain it.   This is a general purpose function.  You can distill out only the parts needed.   The table and the subtraction.  I could have pulled the function from the basic interpreter which is basically the same thing.

If you really are concerned about cycles and not code space you can do an unrolled loop.  This is no calls or returns or indexes.   You count each instruction and branch.  Then you need to figure out if skip or branch instructions is better for the compares.   In an unrolled loop if you are only interested in 3 digits, you have 10 subtracts by 100, then 10 subtracts by 10 and 10 subtracts by 1.  Count the subtracts, Exit is on underflow past zero (negative) to the next digit.  The worse case is the number 9 9 9 the best case is 0 0 0.  The resulting numbers will be in the index (counter) registers for each digit.  There is a lot of old process control code for 8080, Z80, 8051, that uses this method.

Software divide is better for library systems and CISC processes that have hardware divisions when implementing printf(). This uses a lot of abstraction layers, to keep the code universal across platforms and languages.  There might be ways of working with fixed point divisions, which are academic, 0x1/0xA is a repeating fraction, otherwise this would be a more popular method.  This leaves subtraction loops (like the ASM200 tech note) or subdivision such as the DAA function which subdivides into compare to 10 and subtraction (inverse) addition of 6. since 10x6 = 16.

robydream wrote:
time to get interrupt ist set to 5ms...so every 5ms timer1 is fired
What speed is this 328 being run at? Let's assume that it's only just a conservative 1MHz. if it executes 1,000,000 cycles in 1 second then in 5ms it executes 5,000 cycles. So whether your display update takes 100 cycles, 500 cycles, 1,000 cycles, 2,000 cycles or whatever what does it really matter? Or is the CPU doing so many other CPU intensive operations that it is actually already using 4,900 cycles out of the 5,000 available in each 5ms that you HAVE to get the display updating done in just 100 or something?

(oh and this was 1MHz - if you run at 16MHz (say) then you have 80,000(!) cycles each time to get stuff done)

I have using ATMega328P, and 16MHz Quarz...i must use this high refresh rate because i im doing multiplexing on 7Segment displays...and to see digit nice without flickering. ghosting and so on..i must use this high rate...if i lower the rate for example 500ms o 1sec then i see each digit is shown while others are off..so if there is better way of using 7SEG Display please let me know.

And using code from sparrow2 i get very nice results from simulator...only 56cycles is needed to get each digit of 3 digit number in each registers...and would like to ask if sparrow2 can give me clue or formula how do make thinga optimized to get for example 25-30cycles for 3 digit? I ust removed code that get first digit and got very nice 56cycles...so i think it can be more improved.

I would like to get idea or code so that i can use it and save for later next project that i will do in assembly...for example when i finish this project and learn asm...my next idea is cnc controller where timings, cycles and precision using UART is very important..so i need to learn how to minimize cycles to get 3 digit...and using HW MUL mnemonic i was very impressed that it can be done with multiplication to do division...wonderful thing.

Just for record using sparrow2 HW MUL code i got 56cycles and when i do with bit shifting i got 492cycles...for same result! So hey sparrow2 imporeve cycles for about around 9x...which is very nice improvement...

So question for math users...how can i get formula in theory to do multiplication to get each digit from number? So that i can try to write code and maybe maybe i can get lower cycles....but my expetations are for 3 digit extraction in reality it is need 35-40cycles...below this in software routing using HW MUL i think is impossible...and question..why designers when they have designed AVR why they did not inlcude HW DIV instruction...they include HW MUL  but not DIV?

Even calculator that have division function have multiple function so this is what i can't figure out.

last thing first, a 8 bit HW MUL can easy be expanded to 16 32 bit. You can't do that with a 8 bit DIV.

But first question what else do you need to do since you will need 16MHz , it's not because of the multiplexing something like a 5 ms update rate would be fine, and what we have seen here can be done with a 128KHz clk!

I will look into the 25-30 clk later I busy now

and there should be plenty of mul with 1/X for div here , but short div with 256, 65536 ..... is free it's just move offset with a byte or word or .....

so mul with 26 and move a byte is close to div with 10 (26/256=0.10156).

then the best for 8 bit mul is to mul with the biggest number that is 2**n bigger but still fit a byte that is 8*1/10*256=204,8 so use 205

that give a result of 205/(256*8)=0.100097 so very close, but now you need to shift the result.

error in a number

Last Edited: Tue. Apr 17, 2018 - 11:10 AM

robydream wrote:
I have using ATMega328P, and 16MHz Quarz...
robydream wrote:
if sparrow2 can give me clue or formula how do make thinga optimized to get for example 25-30cycles for 3 digit?
I guess I must be missing something. As I just told you, in 5ms at 16MHz you have EIGHTY THOUSAND cycles. So why are you so focussed on getting this already optimized code further optimized to reduce it still further. Even if it takes 100 cycles that is 0.125% of the execution time you have available. What are you planning to do with the other 99.875% of execution time that is left ??

Clawson great question...here is what i im planning to do with AVR MCU..but i will wait for sparrow2 optimized code so that i can every juice from AVR speed...this is where at beggining learning AVR i would like to learn...and later i will have optimized code that can i use and undestand how it works in my next upcoming project...

So i im planning on building solder station controller that have the following futures:

- when solder station is power on on 7seg display it is shown S-E (Senzor-Error) => No Thermocouple Iron Attached

- when iron is attached, on display show 000 and every 500ms refresh 7seg display and show temperature (000, 013, 025, 0,48...)

- show iron temperature until iron temperature reach set temperature with rotary encoder (for example set temp is on display 200C)

- when i rotate left or right rotary encoder lower set temperature or high set temperature digits on 7seg display

- after 5sec is done and no rotary encoder is settinge temp start showing iron temperature

- when iron temp is lower that set temp turn ON HEATER RED LED, when it is equal or higher turn OFF HEATER LED

- add senzor that reads if iron is on stand and not picked up from stand then after 10min BUZZER will give sound 10 bips and iron station will automatically OFF( to prevent hause burning if is on stand and i im away)

- presing long about 1sec, 7seg digits start flashing showing P01 wich means Preset 01 to store set temperature onto preset 1 after i press another long 1sec on rotary encodert show SUC whick is SUCCESS fully stored into eeprom 1 position

- if i press in P01 menu one long press 5sec show P-- that shows that that current postion is erased from eeprom

- saving positions will br 9 (P01 to P09)

And so on...but here i need first to get extract digit using excellent sparrow2 code and get stepbystep..so if u leave unoptimized code at every step on planing soldering stations i will then get into trouble and bugs and all will be slow...so later i will add heather soldering gun...so ideas are in my head and all this will be very functional soldering station with heather gun for smd reflow...

But i must go every step by step..i wrote code that reads MAX6675 temperature....and got in hex &1B0 (432), have function that reads rom FLASH 7seg digits value but need to decode 432 into 4 3 2 usign sparrow2 ultra fast low cycle code...so this is where i need to study and learn...i hope you got my idea.....

Sorry but where in that list are the tasks you think require the other 79,900 (or even 79,950) cycles left out of every 80,000 in 5ms ?

I don't see anything listed there that is "processor intensive".

As so often stated there: the first rule of optimization is "don't".

It seems to me that you just aren't grasping exactly how powerful a 16MHz AVR actually is - you could be running an alarm clock a GPS receiver and still have time to let the user play Space Invaders on an attached graphic LCD as well as controlling the solder station in the cycles you have available. Or, to look at it another way, you could wind the clock back from 16MHz to 100kHz or even less and you would STILL have sufficient cycles for everything you say though the point you made earlier about multiplexing LEDs is the one thing that may introduce a requirement to run a little faster than that.

The key thing about writing code (usually) is to keep it simple and easy to read so it is easy to maintain - only worry about optimization when you hit some task that really is CPU bound. Often "optimization" will go hand in hand with "obfuscation" as there's a tendency to employ "clever tricks" and things like that which may not be immediately clear to the maintainer when they come back to fix some critical bug in a couple of years time (BTW that maintainer could be you when you have long forgotten how you implemented the original code).

I know and you are totally right...but in my nature i would like to optimize it maximum as it can be ...so i think that is good way...i know that AVR is very powerful CPU..and eather it is 8-bit you can do 32-bit operations with it...using more registers and more cycles...but for this atmel have prodeuces 32 bit PU and ARM have too nice CPU...to me very interesting is AVR so as i im building soldering station i don't hurry with build and would like to learn how can code be optimized and be more faster..as CPU runs at 16MHz it has so many available cycles that iwill not see if macro is executed in 3uS or 27uS because it is so fast..but my learning curve is get code to work..then optimize it maximum as you can and then use it....save optimized code macro to use it in next project..so that i don't need to think again how it workd.

I previusly started programming in ATMEL Studio in C language and after compiling same code that works like in ASM i get very dissipointing result...in ASM i got 105words and in C after compiling c to asm and to hex i got 225words..so eather one way to go in asm because asm is pretty close to AVR HW...so this way i perefer optimizations and working in AVR....

Thanks to clearing things to me...i know that you are right..but in my nature i would like to get optimized above code to run in 25-30cycles because if AVR will have DIV mnemonic and execution will be 2cycles..then to divide 432 number and get each digit i will need 10cycles...so above code if can be done in 25cycles will be 2,5x slower that HW division(and AVR don't have it)...but is 19x faster that software division...so if you think that 19x faster improvement is not right for same result from macro (division) then i think you are wrong...492 cycles in not same as 25cycles...and why bother CPU bus and ALU with 492 cycles if you can do with better code in 25cycles? This is my opinion...

but 432 is bigger than a byte! so no easy DIV way on a 8 bitter ! (first line of #33)

robydream wrote:
I previusly started programming in ATMEL Studio in C language and after compiling same code that works like in ASM i get very dissipointing result...in ASM i got 105words and in C after compiling c to asm and to hex i got 225words..so eather one way to go in asm because asm is pretty close to AVR HW...so this way i perefer optimizations and working in AVR....
yes but C is more easily readable and maintainable. If you go on to do software engineering professionally you will rapidly find that maintainability is one of the key design goals. You can write the fanciest solution you like but if you can't later fix the code (or re-use it as part of new / better designs) you will have a real headache on your hands. In fact even C is generally surplanted by C++ now because the design goal of that language was more easy maintenance and reuse possibilities. Apart from the most cost constrained of projects (typically extremely high volume where every \$0.01 counts) you won't find many commercial micro designs using Asm these days as the maintenance / readability / reuse requirements are over-riding.

you are right byte is going from 0-255 so as i need for my 7seg display 000-999 this needs to be .BYTE 2 if im right...and put LOW and HIGH...here is your code that works wonderful with 56 cycles...i just slight modify it to make my needs and see i put in this example number 432 and i got in registers 02 03 04 and is what i need...so when you have time to optimize it or write me here forumula so that i can try (remember i im beginner and this is the way that i can try so that i can learn)

So if you please have time to write me example with MUL menmonic how can i do this with number in decimal 432 to get 04 03 02 with math theory...and i will try to write asm...

```.include	"C:\FastAVR\inc\m8def.inc"

.ORG  0x0000								; 1 Cycle
RJMP Reset								; 2 Cycle

Reset:
; TEST - 0x1B0 (432)
ldi	R16, Low(0x1B0)						; 1 Cycle
ldi R17, HIGH(0x1B0)					; 1 Cycle
movw R10, R16							; 1 Cycle

Top:
; MUL by 41
ldi		r18,41							; 1 Cycle
ldi		r22,0							; 1 Cycle
mul		r16,r18							; 2 Cycle
movw		r20,r0						; 1 Cycle
mul		r17,r18							; 2 Cycle

;>>10 is the same As highbyte >>2
lsr		r17								; 1 Cycle
lsr		r17								; 1 Cycle
;Do mul And Sub result
mul		r17,r18							; 2 Cycle
Sub		r20,r0							; 1 Cycle
sbc		r21,r1							; 1 Cycle
sbci		r22,0						; 1 Cycle

;>>12 is the same As <<4 we know that result
; only is 1 Byte
Swap		r22
Swap		r21
eor		r22,r21
andi		r22,240
eor		r22,r21							; 1st digit R22
;find reminder by number-result*100
ldi		r20,100
mul		r20,r22
movw		r20,r16
Sub		r20,r0

;split the value in R22 into 2 digits.
;formular y=(number*51+20)>>9
ldi		r16,51
mul		r22,r16
movw		r18,r0
subi		r18,Low(-20)
sbci		r19,high(-20)
lsr		r19
;calc the reminder
ldi		r17,10
mul		r19,r17
Sub		r22,r0
mov		r21,r19

;this is a repeat On the other 2 digits
mul		r20,r16
movw		r18,r0
subi		r18,Low(-20)
sbci		r19,high(-20)
lsr		r19
mul		r19,r17
Sub		r20,r0
;this is just move so everything stays in order
mov		r23,r21
mov		r21,r19```

...i must use this high refresh rate because i im doing multiplexing on 7Segment displays...and to see digit nice without flickering. ghosting and so on..i must use this high rate...if i lower the rate for example 500ms o 1sec then i see each digit is shown while others are off..so if there is better way of using 7SEG Display please let me know

Your worries border on the ridiculous. The calculations have nothing to do with the situation at all.  Once the digit values have been calculated (say once a second), they can be multiplexed hundreds or thousands of times a second onto the display with no problem, or flicker.   Look up the segment mappings & multiplex those at high speed.    Only when new readings are needed should the reading values be recalculated, which is much slower than the display multiplexing. Try using once a second.  Faster than that and you will get a blurry display (368, 369, 367, 368, 369, 368, 370, 369,368....).

You can also average the values going to the calculation (before digit extraction) to "smooth" them out   Xdisp=(Xdisp+Xsample)/2, say 5 times a second

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Tue. Apr 17, 2018 - 03:32 PM

robydream wrote:
I know and you are totally right...but in my nature i would like to optimize it maximum as it can be ...so i think that is good way...i know that AVR is very powerful CPU..
You might want to reconsider your criteria for optimal.

Also, AVRs are not all that powerful, but your display does not take much.

Moderation in all things. -- ancient proverb

clawson wrote:
Even if it takes 100 cycles that is 0.125% of the execution time you have available. What are you planning to do with the other 99.875% of execution time that is left ??

...especially for a display update, which only needs to be refreshed a few times per second.  If that, for a solder station.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

ok it was rain so I had some time to look at it.

First some code based on the code in #17

```;max 92 clk
movw    r24,r16 ; only needed if to save org.
ldi     r20, -1
L0:     inc     r20
subi    r24, low(100)           ;-100
sbci    r25, high(100)
brcc    L0
ldi     r19, 10
L1:     dec     r19
subi    r24, -10                ;+10
brcs    L1
mov     r18,r24 ; only needed if number in order
```

an optimized version

```;max 53 clk
movw    r24,r16 ; only needed if to save org.
ldi     r20, 0
L0:     subi    r20, -4
subi    r24, low(400)           ;-400
sbci    r25, high(400)
brcc    L0
L0a:	dec	r20
subi    r24, low(-100)           ;+100
sbci    r25, high(-100)
brcs	L0a
ldi     r19, 0
L1:     subi    r19, -4
subi    r24, 40                ;-40
brcc    L1
L1a:	dec     r19
subi	r24,-10			;+10
brcs	L1a

mov     r18,r24 ; only needed if number in order

```

And then a faster using HW mul

```;max 31 clk
;input 0-999 in r17:r16 out r22:r21:r20
movw		r24,r16		;copy number
sbrc		r25,2		;if number >=512 guess one higher
inc		r25
mov		r22,r25		;first 100 digit guess
inc		r25		;add one so guess can be to high
ldi		r18,100
mul		r18,r25		;r1:r0 == 0 100 200 300 500 600 700 or 800
movw	        r24,r16		;get a new copy
sub		r24,r0		;sub 0 100 200 300 500 600 700 or 800 from number
sbc		r25,r1
brcs	        L0  		;if negativ done with 100
sub		r24,r18		;else sub 100 and inc guess
inc		r22
brcs	        L0	    	;if negativ done with 100
sub		r24,r18		;else sub 100 and inc guess
inc		r22
;now correct 100 digit
L0:	add		r24,r18		;make reminder back to 0-99
ldi		r18,26		;divide with 10 by mul with 26/256
mul		r24,r18
movw	        r20,r0
sbrs	        r24,6		;if >=64 sub 20 (mul 26 is a tad to big)
rjmp	        L1
subi	        r20,20
sbci	        r21,0
;now correct 10 digit
L1:	ldi		r18,10		;mul reminder with 10
mul		r20,r18
mov		r20,r1		;now last digit correct
```

I'm sure it can be done faster but I think it's hard to cut of 5 clk (with code, not a problem with a LUT)

And yes this is like a crossword for me. I know it's not needed for OP

Thanks...this is awesome...only 31 clk using HW multiplier...it is more that enough...and i don't need more optimized version that that....thanks i will add it to 7seg code and when arrive 7seg displays i will publish video..thanks...

Here is an algorithm I use, I'll illustrate in C so it's easier to understand. In this case it's packed BCD because I'm having trouble wrestling the compiler to make it work with 32 bit ints.

But I just want to show the algorithm.

```#include <stdio.h>
#include <stdint.h>
uint16_t bin2bcd (uint16_t);

int main (void) {
for (uint16_t i = 0, bcd; i <= 999; i++) {
bcd = bin2bcd(i);
printf("0x%03x \n", bcd);
}
}

#define BASE 10
#define MAGIC 41				/* 41 = BASE*4096/1000 rounded up; 4096 = 3 nibbles; 1000 = numbers in the 0-999 range */
uint16_t bin2bcd (uint16_t bin) {
uint16_t bcd;
bin *= MAGIC;
bcd = (bin & 0xF000) >> 4;		/* store 100s digit */
bin &= 0x0FFF;				/* mask only remainder nibbles */
bin *= BASE;				/* get 10s digit in most significant nibble */
bcd |= (bin & 0xF000) >> 8;		/* store 10s digit */
bin &= 0x0FFF;				/* mask only remainder nibbles */
bin *= BASE;		        	/* get 1s digit in most significant nibble */
bcd |= (bin & 0xF000) >> 12;		/* store 1s digit */
return bcd;
}```

You can paste this in an online compiler like https://www.onlinegdb.com/online... to see the output.

Last Edited: Thu. Apr 26, 2018 - 12:35 PM

I tried it your way aswell but could never get it as fast that was why I didn't post it. It's the 9 clk to deal with the high bytes that kills it but if you want to look at it here is a direct way to solve it:

```	ldi		r18,41
mul		r18,r16
movw	        r24,r0
mul		r18,r17
mov		r20,r25
ldi		r18,10
andi	        r25,0x0f
mov		r26,r25
mul		r18,r24
movw	        r24,r0
mul		r26,r18
mov		r21,r25
andi	        r25,0x0f
mov		r26,r25
mul		r18,r24
movw	        r24,r0
mul		r26,r18
mov		r22,r25
swap	        r20
swap	        r21
swap	        r22
andi	        r20,0x0f
andi	        r21,0x0f
andi	        r22,0x0f
```

Last Edited: Fri. Apr 27, 2018 - 08:53 AM

Yeah, it's somewhat slower than your method. The advantage is that it executes in constant time, that's useful sometimes.

ok I looked at it again and if I use your way for the first digit and mine for the last two it will take max 26 clk ;)

```;in r17:r16  out r22:r21:r20
ldi		r18,41
mul		r18,r16
mov		r22,r1
mul		r18,r17
swap	        r22
andi	        r22,0x0f
ldi		r18,100
movw	        r24,r16
mul		r18,r22
sub		r24,r0
ldi		r18,26
mul		r24,r18
movw	        r20,r0
sbrs	        r24,6
rjmp	        L0
subi            r20,20
sbci	        r21,0
L0:	ldi		r18,10
mul		r20,r18
mov		r20,r1
```

if source is aloud to change one clk are saved.

The code is also correct up to 1023 (fail at 1100), with high digit having the value of 10, so it's easy to extent to 10 bit for the ADC

Last Edited: Fri. Apr 27, 2018 - 06:52 PM

Nice

Ok Sparrow...a challenge

Can you give a routine that will round a 16 bit number to the nearest 10 (give 12367, get 12370, give 123 get 120)?

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

first which format is the output you want? (bin or BCD)

but I guess just take El Tangas's solution that is somewhere here that make 16 bit to BCD in 45-48 (I can't remember) and use the formula, and then make the adjustment.

Or shift right three times (/8) subtract the result twice more (/10) and multiply by ten again.  S.

PS - No, that won't work in BCD.  Do all the math in hex and only convert for display.  S.

BCD rounding is too easy (check LSD for >=5)...wonder if it can be done in binary efficiently (maybe not)...useful for further math operations on the binary value (such as set motor RPM to nearest 10 rpm).

Previously I convert to BCD, round the digits & cvt back to binary...wonder if here is a more direct way.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sat. Apr 28, 2018 - 07:37 AM

if it's 16 bit unsigned int to 16 bit unsigned int you want, I would find the way El targas do it (I can't find it) it's 16 unsigned int to bcd in less than 50 clk, an my guess is if it's only the reminder you need it then can be done in about 40 clk, the corection will then be sub the reminder (and add 10 if reminder >=5, , best to adjust the value of reminder).

so all in all about 50 clk

I think you mean this thread: https://www.avrfreaks.net/commen...

No I don't want BCD result at all.

Given an unsigned 16 bit binary unsigned integer, can it be converted (using some rather fancy tricks, without using BCD)  into a rounded-to-tens binary number?    You give it 12345 it returns 12350, you give it 223 it returns 220.  Would even settle for 8 bit solution for starters. It is a rather difficult challenge.   Was looking for something slick, like swap bits 3 &4, add to original, number shift right, swap bits 1&2, add to previous result, shift right...gives answer.

For an example of an unrelated slick method, here's a binary-friendly example to find the greatest common divisor of two numbers.   https://en.wikipedia.org/wiki/Binary_GCD_algorithm#Efficiency   I've used this a few times.

```unsigned int gcd(unsigned int u, unsigned int v)
{
// simple cases (termination)
if (u == v)
return u;

if (u == 0)
return v;

if (v == 0)
return u;

// look for factors of 2
if (~u & 1) // u is even
{
if (v & 1) // v is odd
return gcd(u >> 1, v);
else // both u and v are even
return gcd(u >> 1, v >> 1) << 1;
}

if (~v & 1) // u is odd, v is even
return gcd(u, v >> 1);

// reduce larger argument
if (u > v)
return gcd((u - v) >> 1, v);

return gcd((v - u) >> 1, u);
}```

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sat. Apr 28, 2018 - 03:52 PM

If you happen to have \$F0 lying around in a register, 'and' and 'swap' is faster than four right-shifts.  S.

What's wrong with the suggestion in #53?

(first add 5 to round to nearest)

then, as suggested in #53, Div by 10, truncate to integer, mul by 10.

edit: here is an implementation, in C, for the 8 bit case (note, not really 8 bit, max input is 250). The algorithm may not be immediately obvious, but it's the one stated above.

```#include <stdio.h>
#include <stdint.h>

uint8_t round10 (uint8_t);

int main()
{
for (uint8_t i = 0; i <= 250; i++) {
printf("%d %d \n", i, round10(i + 5));
}
}

#define MAGIC 205							/* 205 = 256/10*8 */
uint8_t round10 (uint8_t bin) {
bin = (MAGIC * bin) >> 8;
bin &= 0xF8;
bin += bin >> 2;
return bin;
}```

Conversion to AVR asm is straightforward, left as an exercise to the interested reader

Last Edited: Sat. Apr 28, 2018 - 08:00 PM

That is most excellent...nice magic!!!   The shift by 8 is nice, since you can just grab the high byte.

note, not really 8 bit, max input is 250

well, actually 254...it works!

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

avrcandies wrote:
well, actually 254...it works!

No, because when 5 is added (because of the rounding), it will overflow, 254+5=259. But here is a corrected version, that actually works to 254:

```#include <stdio.h>
#include <stdint.h>

uint8_t round10 (uint8_t);

int main()
{
for (uint8_t i = 0; i <= 254; i++) {
printf("%d %d \n", i, round10(i));
}
}

#define MAGIC 205							/* 205 = 256/10*8 */
uint8_t round10 (uint8_t bin) {
bin = (MAGIC * bin + MAGIC * 5) >> 8;
bin &= 0xF8;
bin += bin >> 2;
return bin;
}```

In this case I add 5 at a different stage of the calculation, where overflow will not happen. A nice piece of code, if I do say so myself.

No, because when 5 is added

NO it does work as was given...all the way up to 254 (changed for loop from 250 to 254):

255 rounds up to 260...so 254 is the limit (give it 8 bits, get 8 bit answer)

here is some output from your  #59 posting...I suppose what do you mean by overflow? 16bits?   Pelles on PC, was happy (though I made everything int).

......

....

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sun. Apr 29, 2018 - 02:17 AM

I see what you mean ...in 8 bits you get the over flwo , but on the PC kit is happy.

I made a version with your update...works good from 0 to 254...THIS is quick & slick!

```ldi ZL, 205		 ;magic constant
mul ZL, mynumber  ;number to convert
movw ZH:ZL, r1:r0
subi ZL, low(-1025)   ;5*205
sbci ZH, high(-1025)
andi ZH, 0xF8
mov ZL, ZH
lsr ZH
lsr ZH

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sun. Apr 29, 2018 - 03:12 AM

but you asked for 16 bit ?!

Avrcandies wants a 16 bit algo, but said 8 bits would suffice as proof of concept.

However, I have a feeling there must be a better algorithm, some pattern present in the multiples of 10 in binary, that is repeated and can be used for the rounding.

edit: I think I can prove that if you have a 16 bit number HL (H - high byte, L - low byte), and H is even, then, in decimal, H+L has the same units digit as HL, that is, (H+L)%10 = HL%10. An interesting result, I believe.

The fundamental problem that needs to be solved is:

```n += 5;
n -= n%10;```

So playing with modular arithmetic will eventually yield a good method, I think.

Last Edited: Sun. Apr 29, 2018 - 01:16 PM

an H is even then, in decimal, H+L has the same units digit as HL, that is, (H+L)%10 = HL%10. An interesting result, I believe.

As a constrained subset let L=0 , then we are saying 256*H has the same last digit as H, when H is even.  If that is true, then adding L, from 0 to 9, would keep it true.

Also if N is odd, the 256*N ends in either N+5 or N-5  (or perhaps there is a more cohesive way of stating this).

 256*N N ends in last digit 0 0 match 2 2 match 4 4 match 6 6 match 8 8 match 1 6 adds 5 3 8 adds 5 5 0 subtract 5 7 2 subtract 5 9 4 subtract 5 got N ends last digit in either 0 0,5 2 2,7 4 4,9 6 1,6 8 3,8

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Mon. Apr 30, 2018 - 12:53 AM

So, my plan was to reduce the 16 bit number into something that fits in 8 bits with the same decimal units digit. Then get this digit and subtract to the original number.

I used the method of adding 5 when H is odd.

This is what I have:

```#include <stdio.h>
#include <stdint.h>

uint8_t reduce (uint16_t);

int main()
{
for (uint16_t i = 10000; i <= 12000; i++) {
printf("%d %d \n", i, reduce(i));
}
}

uint8_t reduce (uint16_t bin) {
uint8_t h_sign = (bin >> 8) & 0x01;		/* save sign of high bit for later */
bin = (bin >> 8) + (bin & 0xFF);		/* reduction round 1 */
bin = (bin & 0x1F) + ((bin >> 4) & 0x1E);	/* reduction round 2 */
bin += h_sign ? 5 : 0;				/* if h was odd, add 5 to correct */
return bin;
}```

Round 1 reduces 16 bit to 9 bit.

Round 2 reduces 9 bit to roughly a 5 bit number.

"reduction round 2" will need quite a few assembly instructions to implement, this will never be as pretty as the 8 bit version

this will never be as pretty as the 8 bit version

Thanks for the try!  Wouldn't there be a magic number that would work for 16bits, just as in the 8 bit method?

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

avrcandies wrote:
Wouldn't there be a magic number that would work for 16bits, just as in the 8 bit method?

Sure. The "magic numbers" are just representations of the 1/10 fraction in binary, that is: 0.00011001100 (1100)

For 8 bits, you grab the portion marked, the largest that can fit in 8 bits, and round up: 11001101, which is 205. When you multiply by this number, you are in a way dividing by 10.

So, for 16 bits, you would use 1100110011001101 which is 52429.

The problem is you would need to do a 16x16 multiply, and that is quite a few AVR instructions. This is why I'm trying to find a different algo for larger numbers.

Meanwhile, I translated the "reduction" routine in #67 to assembly:

```start:
;input in r24:r25
bst		r25,0
sbc		r24, r24
andi	r24, 0x10
ldi		r18, 0x1F
and		r18, r25
swap	r25
andi	r25, 0x0E
add		r24, r25    ;or r24, r25 will also do, maybe it's clearer
brtc	end
subi	r24, -5
end:                      ;output in r24

forever:
rjmp	forever```

Last Edited: Mon. Apr 30, 2018 - 06:18 PM

you would need to do a 16x16 multiply, and that is quite a few AVR instructions

I wonder, since part of the result will be shifted into oblivion, if perhaps the full 16x16 result might not need calculated (or more likely, saved), especially if the +5 is kept in the original argument.

You are a master of the numerical domain!!

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

avrcandies wrote:

You are a master of the numerical domain!!

Lol, I wish. True, I always had some affinity for discrete math, but these are little more than parlour tricks compared to some number theory and crypto stuff...

And thanks for the challenge, it really gave me food for thought.

Last Edited: Mon. Apr 30, 2018 - 07:53 PM

first the 16x16 you only "save" the high byte because 54429 is 8 times bigger than wanted so the reminder is placed in 3 bytes.(but you save)

But if you MUL the 3 bytes with 20 (not 10) the result will have a shift of 4th bit and then the reminder should be placed correct (perhaps some of the top bits needs to be and'ed off).

And perhaps the value at the lowest byte is so small that it isn't needed.

but there is an other "real" problem what about 65535! a correct rounding will give 0 it 16bit !!!

And it could be faster with a more sloppy value of 1/10  (perhaps the reminder have to be found as x-(1/10x*10) )

17 cycles will do for 16-bit mod 10:

```  ; select SH:SL to be a pair of upper registers (LDI range)
: EDIT sou:rce is the pair of input registers
​  MOVW SH:SL, sou:rce
SBRC SH, 7
SUBI SH, 100 ; no carry
SBRC SH, 0
ADDI SH, 5 ; no carry
BRCC 1f
ADDI SL, 6  ; no carry
1:
LDI SH, 26
MUL SL, SH
LDI SH, 10
MUL R1, SH
SUB SL, R0
BRCC 1f
1: ; SL=sou:rce % 10  R1=0```

Moderation in all things. -- ancient proverb

Last Edited: Tue. May 1, 2018 - 03:27 PM

17 cycles will do for 16-bit mod 10:

That's amazing--haven't tried it yet, but will.

Wondering why not use ZH:ZL, ZH, XL, etc   SH, SL sound mysterious like they are needed for something else.

can you repost ...what is 1: ???  should be lf:??  also, 1: is in two places

but there is an other "real" problem what about 65535! a correct rounding will give 0 it 16bit !!!

We have to live with it... just like 255 rounds up to 260, which doesn't fit well into a byte...you could do some tricks:

a) If 260, return with the overflow set, otherwise return with the overflow cleared (to alert the caller).

b) If 260, return with the value 251, which could be trapped out by the caller....but likely to be forgotten about & cause "strange" issues to surface later!

c) Simply don't try rounding anything above 254 (or 65534)

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Tue. May 1, 2018 - 03:19 AM

what is 1: ???  should be lf:??  also, 1: is in two places

https://stackoverflow.com/questions/27353096/1b-and-1f-in-gnu-assembly

 "Experience is what enables you to recognise a mistake the second time you make it." "Good judgement comes from experience.  Experience comes from bad judgement." "Wisdom is always wont to arrive late, and to be a little approximate on first possession." "When you hear hoofbeats, think horses, not unicorns." "Fast.  Cheap.  Good.  Pick two." "We see a lot of arses on handlebars around here." - [J Ekdahl]

I don't think AVR Studio will let you do that.  Even if it did, it would do horrible things to the label listing(s).  Yeah, the alternative isn't much fun either, labels with suffixes of 'aaa' and 'bbb' get to be par for the course, but at least they're unique.  S.

Scroungre wrote:
I don't think AVR Studio will let you do that.
That depends which of the assemblers you use. As long as you are using one of the GNU ones then Nf or Nb are standard practice and always have been.

User manual: https://sourceware.org/binutils/...

Example from that:

Here is an example:

```1:        branch 1f
2:        branch 1b
1:        branch 2f
2:        branch 1b
```

Which is the equivalent of:

```label_1:  branch label_3
label_2:  branch label_1
label_3:  branch label_4
label_4:  branch label_3
```

Last Edited: Tue. May 1, 2018 - 08:48 AM

nice work a thing to remember

I checked it with this code: (for all combinations )

```	MOVW  r24, r16
SBRC  r25, 7
SUBI  r25, 100 ; no carry
SBRC  r25, 0
SUBI  r25, -5 ; no carry
BRCC  L0
SUBI  r24, -6  ; no carry
L0:
LDI   r25, 26
MUL   r24, r25
LDI   r25, 10
MUL   R1, r25
SUB   r24, R0
BRCC  L1
SUBI  r24, -10
L1: ; SL=sou:rce % 10  R1=0
```

Do you have an assembler that "eat" 1f ?

I always wanted to make one where + was 1f and ++ 2f etc.

UPS I see I'm late with this

And perhaps this is one for the compiler it make a mul with 0xCCCD from a lib and then some shifts.(a lot of shifts because it also solve the /)

Last Edited: Tue. May 1, 2018 - 09:31 AM

clawson wrote:

Scroungre wrote:
I don't think AVR Studio will let you do that.
That depends which of the assemblers you use. As long as you are using one of the GNU ones then Nf or Nb are standard practice and always have been.

Typically I use the default one.  Of course, this is from Studio 4 running on Win2k, but I recall it incessantly complaining about 'duplicate labels'.  I didn't think it would treat a number label in any exceptional manner, and again - what does that do to the label list?  Granted I've not tried this - Perhaps I should.

S.

PS - Given functions with a dozen branches or more, I think the '++' format would start to come unhinged at about '+++++++++' and '+++++++++++++'.  ;-)  S.

AS4 only really gave access to the Atmel Assembler (though it's true avr-as was there too if you added WinAVR) but in AS7 it comes as standard with the Atmel Assembler and three different GCC assemblers (avr-as, avr32-as and arm-as) so you start with a wider choice. For AV8 the "plain" assembler is probably still the best choice for stand alone Asm projects as there's so much prior knowledge about its use. But for anything that involves inter-working between C/C++ and Asm then the GNU as assembler is the more obvious choice.

really many "local" labels like those in this routine just clutter up a reference list

what I would like is just to type + I just don't want to type long labels like the routine name and then a number, but that would be ok if that's then end product.

like it would be nice to have a function where the values of the flags was showed (known and unknown(to see the life time)).

That's very clever code. I had already tried to use 26 as "magic number", since it aligns the result to 8 bits (shift not needed). But some numbers were generating rounding errors:

```uint8_t reduce (uint16_t);
uint8_t units (uint8_t);

int main()
{
for (uint16_t i = 0; i < 65535; i++) {
if ((i%10) != units(reduce(i))){
printf("%d %d \n", i, units(reduce(i)));
}
}
}

uint8_t units (uint8_t reduced){
reduced = (reduced * 26) & 0xFF;
reduced = (reduced * 10) >> 8;
return reduced;
}

uint8_t reduce (uint16_t bin) {
uint8_t h_sign = (bin >> 8) & 0x01;		/* save sign of high bit for later */
bin = (bin >> 8) + (bin & 0xFF);		/* reduction round 1 */
bin = (bin & 0x1F) + ((bin >> 4) & 0x1E);	/* reduction round 2 */
bin += h_sign ? 5 : 0;			/* if h was odd, add 5 to correct */
return bin;
}```

Output of errors is:

```57854 5
58364 5
58874 5
59384 5
59894 5
60404 5
60914 5
61424 5
61934 5
62444 5
62954 5
63464 5
63974 5
64484 5
64994 5
65504 5
65534 5```

These are all large numbers. So by subtracting 100 to numbers with the high bit set, this problem is solved (I would probably subtract 120, which is the largest multiple of 10 smaller than 128).

Very nice code indeed.

If you look at the tail end of my code in #49 that convert the last two digit (and use 26 as magic number) I make an adjustment if the number is bigger than 64 (because 26 is to big!)

Ah, I see it, in this part:

```	sbrs	        r24,6
rjmp	        L0
subi            r20,20
sbci	        r21,0```

Yeah, it's the same thing

I also used the "adjustment technique" many years ago for a x86 algo: https://board.flatassembler.net/...

It allowed me to save one "mul" compared to the base code. But mul is usually more expensive in x86 CPUs than in AVR (naturally, since it's 32/64 bit), sot it is usually worthwhile to get rid of them, in AVR not so much.

Last Edited: Tue. May 1, 2018 - 12:06 PM

In my old code I used to use 51 that is way better but then there is a shift involved :(
and because it's smaller than the correct value an offset is needed (510<512 where 260>256).

but it bugs me a bit that the code don't always take the same time ;)

avrcandies wrote:
Wondering why not use ZH:ZL, ZH, XL, etc SH, SL sound mysterious like they are needed for something else.
Not mysterious, generic:

```#define SH ...
#define SL ...
#define sou ...
#define rce ...
#include "mod1016.S"```

Moderation in all things. -- ancient proverb

This doesn't seem to be working.  if I give 2345, I get 2565 (should get 2350)...if I give 5678, I get  2568 (should get 5680)

did I mess something up?

```			MOVW ZH:ZL, myval_hi:myval_low
SBRC ZH, 7
SUBI ZH, 100 ; no carry
SBRC ZH, 0
SUBI ZH, -5 ; no carry
BRCC abc

SUBI ZL, -6  ; no carry

abc:		LDI ZH, 26
MUL ZL, ZH
LDI ZH, 10
MUL R1, ZH
SUB ZL, R0
BRCC bbb

SUBI ZL, -10
bbb:		rcall bin2ascii  ;report ZH:ZL as ascii number
ret```

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

what does the code state !

17 cycles will do for 16-bit mod 10:

This function just gets the units digit. To solve the rounding problem, it needs a few tweaks.

Change line 3 to:

`			SUBI ZH, 105 ; no carry`

Lets call r the result of the modified function. Now, add (5-r) to the original number (edit: or subtract r-5), and it should give the result you need.

edit: wait, this line is not always executed. You need to add 5 somewhere in the function before the multiplies.

Last Edited: Tue. May 1, 2018 - 05:42 PM

17 cycles will do for 16-bit mod 10:

Oh I see, I thought this was just a self-note.  I didn't realize this was for something else (though very related) & was more concerned about some other factors I asked about.

This solution, is also quite marvelous!

Another way to get rounding is to use this to get the last digit , always subtract this from the original number (always giving last digit of 0).  If last digit was >4, add 10 to this

That might not be the shortest path, but usable.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

One point that nobody touched yet... does it REALLY needs to have three digits for temperature?

I would use only two, "42" means 420 degrees, and that is it.

Much easier to convert to decimal, he could even use a simple 99 bytes table in flash with the conversion already done, a simple LPM and badabim, packed bcd done.

The OP can even use three displays and even shows the zero on the third one to show "420"...

On soldering iron, the temperature changes in a way that those 10°C resolution will not interfere so much.

Also, the OP is fantastically worried about clock cycles, when the uC will be more than 95% of the time in idle.

I understand the OP is missing a lot of knowledge about everything, maybe he is young or new in the area, he will learn in time.

We need to support all new, young and novice people, one day we will die anyway and they will inherit the planet.

Well, yeah, for sure that will happens.  For our luck, we will not be here to see the consequences.

Wagner Lipnharski
Orlando Florida USA