91 posts / 0 new

Pages

Author
Message

I have written in assembler 3-Digit 7Segment Multiplexing Code that works very well...because i have inpit format number from 000-999 To be displayed i need to write ASM function that takes this input number and put each digit in correspoding registers so that my function then can take this number that is index in array and read 7-Segment HEX code that will be send to each segment to turn on LED so that correct digit is displayed.

So Here is Example:

Input Number Is: 432 And i Need to after function is done to put in registers these values:

R23 - 04

R24 - 03

R25 - 02

So i look at ATMEL avr200.asm and test these code:

.include "m328pdef.inc"

.CSEG
digit: .DB 0x3F, 0x06, 0x5B, 0x4F, 0x66, 0x6D, 0x7D, 0x07, 0x7F, 0x6F
.ORG	0x0005
rjmp RESET

RESET:
; INIT - Stack Pointer
ldi		R16, HIGH(RAMEND)
out		SPH, R16
ldi		R16, LOW(RAMEND)
out		SPL, R16

;***** Subroutine Register Variables

.def	drem16uL=r14
.def	drem16uH=r15
.def	dres16uL=r16
.def	dres16uH=r17
.def	dd16uL	=r16
.def	dd16uH	=r17
.def	dv16uL	=r18
.def	dv16uH	=r19
.def	dcnt16u	=r20

; LOAD - Divident(dd8u) And Divisor(dv8u) => 432(1B0) / 100 = 4.32
; Divident 1
ldi		dd16uH, HIGH(0x01B0)
ldi		dd16uL, LOW(0x01B0)

; Divisor 1
ldi		dv16uH, HIGH(100)
ldi		dv16uL, LOW(100)
call	div16u

ldi		ZH, HIGH(digit)				; Load Start Z-Address Of Digit Array (Flash)
ldi		ZL, Low(digit)
add		ZH, dres16uH				; Add The Digit1 Index
adc		ZL, dres16uL				; Add 0 To Propagate The Carry
lpm		R23, Z						; Read Digit1 From Flash

loop:
rjmp loop

;***** Code

div16u:
clr	drem16uL			; clear remainder Low byte
sub	drem16uH, drem16uH	; clear remainder High byte and carry
ldi	dcnt16u,17			;init loop counter
d16u_1:
rol	dd16uL				;shift left dividend
rol	dd16uH
dec	dcnt16u				; decrement counter
brne d16u_2			    ; if done
ret						;    return
d16u_2:
rol	drem16uL		    ;shift dividend into remainder
rol	drem16uH
sub	drem16uL,dv16uL		;remainder = remainder - divisor
sbc	drem16uH,dv16uH		;
brcc	d16u_3			;if result negative
add	drem16uL,dv16uL		;    restore remainder
clc						;    clear carry to be shifted into result
rjmp	d16u_1			;else
d16u_3:	sec					;    set carry to be shifted into result
rjmp	d16u_1

I added code that read MSB and LSB byte of first result digit and this is 04 and get from index array value  0x66 that when send to 7-Display Segment will be number 4.

So i see in debugger that drem16uL and drem16uH have value in my example 32 so i need to get 3 and 2 and put it in above example registers. U know that 432/100=4,32 and AVR take in above example whole number that is 4 and 32 put in reminder registers so how to do that efficiently to get 3 and 2?

Thanks.

I used in above code to divide 432/100 = 4,32 and put 4 to register R23, and 32 is in drea,16uL and dreamuH registers.

So i now get to idea to add to this above program this theory:

432/100 = 4,32 => R23 = 04
32/10 => 3,2 => R24 = 03
R25 = 02

So i divide 432 by 100 get 4,32..take 4 and put into R23 to have value 04, then i take divider 32 and divide with 10 get 3,2...take 3 and put into R24 so that value is 03...and take reminder 2 and put into R25 so that value is 02.

This is just idea but if someone have another idea please write, i know that above program is cycle counting not very efficient..it takes 13uS to execute...so if i add divider by 10 it will take for sure just ore 4-5uS so it will be executed in cca 20uS and that is very long..because i have temperature meausurement that reads every 500ms value and then this function will extract each digit to coressponding registers...

Ideas?

Forget dividing, just use some binary to bcd conversion, here is one for 16 bits, can easily be reduced to 8 bits (0255), sytart with hundreds code:

from here, got take a look http://www.avr-asm-tutorial.net/avr_en/calc/CONVERT.html#bin2bcd

; Bin2ToBcd5
; ==========
; converts a 16-bit-binary to a 5-digit-BCD
; In: 16-bit-binary in rBin1H:L, Z points to first digit
;   where the result goes to
; Out: 5-digit-BCD, Z points to first BCD-digit
; Used registers: rBin1H:L (unchanged), rBin2H:L (changed),
;   rmp
; Called subroutines: Bin2ToDigit
;
Bin2ToBcd5:
push rBin1H ; Save number
push rBin1L
mov rBin2H,rmp
ldi rmp,LOW(10000)
mov rBin2L,rmp
rcall Bin2ToDigit ; Calculate digit
ldi rmp,HIGH(1000) ; Next with thousands
mov rBin2H,rmp
ldi rmp,LOW(1000)
mov rBin2L,rmp
rcall Bin2ToDigit ; Calculate digit
ldi rmp,HIGH(100) ; Next with hundreds
mov rBin2H,rmp
ldi rmp,LOW(100)
mov rBin2L,rmp
rcall Bin2ToDigit ; Calculate digit
ldi rmp,HIGH(10) ; Next with tens
mov rBin2H,rmp
ldi rmp,LOW(10)
mov rBin2L,rmp
rcall Bin2ToDigit ; Calculate digit
st z,rBin1L ; Remainder are ones
sbiw ZL,4 ; Put pointer to first BCD
pop rBin1L ; Restore original binary
pop rBin1H
ret ; and return
;
; Bin2ToDigit
; ===========
; converts one decimal digit by continued subraction of a
;   binary coded decimal
; Used by: Bin2ToBcd5, Bin2ToAsc5, Bin2ToAsc
; In: 16-bit-binary in rBin1H:L, binary coded decimal in
;   rBin2H:L, Z points to current BCD digit
; Out: Result in Z, Z incremented
; Used registers: rBin1H:L (holds remainder of the binary),
;   rBin2H:L (unchanged), rmp
; Called subroutines: -
;
Bin2ToDigit:
clr rmp ; digit count is zero
Bin2ToDigita:
cp rBin1H,rBin2H ; Number bigger than decimal?
brcs Bin2ToDigitc ; MSB smaller than decimal
brne Bin2ToDigitb ; MSB bigger than decimal
cp rBin1L,rBin2L ; LSB bigger or equal decimal
brcs Bin2ToDigitc ; LSB smaller than decimal
Bin2ToDigitb:
sub rBin1L,rBin2L ; Subtract LSB decimal
sbc rBin1H,rBin2H ; Subtract MSB decimal
inc rmp ; Increment digit count
rjmp Bin2ToDigita ; Next loop
Bin2ToDigitc:
st z+,rmp ; Save digit and increment
ret ; done

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

That looks very interesting...before i study code and see how is done can you please tell me if this will work because i have readed temperature in hexadecimal format so my value is for example 0x1B0 (432) that need to convert to binary before i can call bin2bcd? If yes do you have example code how to convert hex to bin in asm?

Thanks.

Well the above routine IS effectively doing division (by repeated subtraction of 10000, 1000, 100 and then 10) so, yeah it should work.
.
Personally I can't think of a way of extracting decimal digits that does not involve division/repeated subtraction.
.
EDIT obviously if it's just 0.. 999 then you only need the 100 and 10 bits.

Last Edited: Sun. Apr 15, 2018 - 05:10 PM

I googled and found informations that for BCD i need between AVR and 7 segment BCD driver that translates BCD to 7-Segment Decimal format...so one IC more...hmm..i would not like to use one IC more...because i love simplicity..so using multiplexing and 3 NPN transistors with 10k resistors to its base will be fine...So there is no other suggestions for extracting digits that i wrote above? Divide by 100, take result, then take reminder divide by 10 and take result and divider and you got all three digits...BCD conversion looks very nice in simulator only 6uS but i can't send BCD to 7 segment display directly from AVR..so if i understand correctly...

You said in first post you have the display working fine

I have written in assembler 3-Digit 7Segment Multiplexing Code that works very well...

Is the display workling or not?  That is pretty trivial, just one transistor per digit (search for display multiplexing)..Each digit shares the common segment lines.  Of course, sharing means dimmer (example, 5 digits, each on 20% of time, 80% time off!!)

for 3 to 5 digits, assign 3 to 5 registers to hold bcd values (0-9)  & keep them on the display.

AFTER you have the display showing the register values properly (digits!), THEN worry about doing a conversion (put the conversion BCD results in those registers)

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Ok thanks...but just to ask..if for this BCD conversion do i need special chip that converts BCD from AVR registers to 7 Segments or not?

If i don't need special decoder chip eg. 74LS47 or similiar then i will proceed to make BCD Conversion code to study..this is unfamiliar to me do i need decoder chip or not? And i im begginer so this is why i ask this.

robydream wrote:

Ok thanks...but just to ask..if for this BCD conversion do i need special chip that converts BCD from AVR registers to 7 Segments or not?

If i don't need special decoder chip eg. 74LS47 or similiar then i will proceed to make BCD Conversion code to study..this is unfamiliar to me do i need decoder chip or not? And i im begginer so this is why i ask this.

you can do the BCD conversion with a function or a macro.

Some older CPUs like the 8080 or 6502 had this as an instruction.  Since the AVR is RISC we can emulate the 8080 DAA instruction as:

DAA:
; the 8 bit number i n the accumulator is adjusted to form two
; four bit Binary Coded Decimal digits by the following process:
;
; 1. If the value of the lease significant 4 bits
; of the accumulator is greater than 9 or if the
; AC flag is set, 6 is added to the accumulator
; 2. If the value of the most significant 4 buts
; of the accumulator is greater than 9 or if the CY
; flag is set 6 is added to the most significant
; four bits of the accumulator
; Note: All flags are affected

mov r22,r24
mov r23,r24
in r25,SREG
andi r25, (1<<SREG_C)

clc
brhs DAA_adjlo		; Half carry set
andi r22,0x0F
cpi r22,10
brlo DAA_hi
ldi r22, 6

DAA_hi:
tst r25
mov r22,r23
cpi r22,0x9A
ldi r22,0x60
sec
rjmp DAA_end
clc

DAA_end:

ret

This will convert the r24 value (which I usually call ACC) to or from BCD using the cY and hCY flags.  A few temp registers are used. Be sure to protect these in the caller if they should not be trashed.

Complement arithmetic can also be used if the values are subtracted from BCD 99.

I used this function to port the 4 function math package from an 8080 based basic interpreter.   Many calculators use BCD so the interpreter simulated a calculator.  Advantage is this can extend the digit precision.  One keeps the values in SRAM and indexes into this 2 BCD digits at a time.  by chaining the cY and the hCY flags long sequences of digits can be added, subtracted multiplied or divided.  The full code is on my git.  https://github.com/sheepdoll/PTExtendedBasicArduino.git

I also wrote a Vacuum Florescent Display (VFD) driver that uses the  HD44780 protocol to emulate a 7 segment display, with a clock as the test application this is https://github.com/sheepdoll/AVRVFDCLOCK.git.  I wrote this before I found out about the DAA instruction.  Object of that project was to create a DOS style file time stamp for fatFS.    Hindsight says it would have been better to emulate a DS1307.

Edit: found a register name in the code example (ARGL) that was not converted in the an actual register (r22) for simplification.

Last Edited: Mon. Apr 16, 2018 - 06:24 PM

do i need special chip that converts BCD from AVR registers to 7 Segments or not?

Why would you think that?  Aren't you just lighting up segments on a simple (non-graphics) display? Isn't this what you are writing your software to do?  You need to form a clear picture and thoughts of what you want the code to do!

For example, you could find some 7 segment drivers/controller chips to do the multiplexing for you (then you just send the values).   But to keep it simple, why not do the multiplexing in software?

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

If your AVR have a HW MUL, the fastest routine would be one that use that div is the same as mul with 1/x.

Long time ago I have posted a routine that can take 16 bit int and get 5 bytes out, in less than 70 clk, but it has since beaten with a routine that take less than 50 clk.

Both codes are here look for something like int to BCD  or bin to BCD

so if it just want 000 to 999 and you really need speed my guess is that it can be done in about 25-30 clk.

I would do (have done!) the BCD to 7-segment display conversion with a lookup table inside the AVR.

e.g.:

7SEG_LOOKUP:

.db 0b01110111   ; 0

.db 0b00010001   ; 1

.db 0b00111110   ; 2

(and so on)

Of course, the various bit patterns depend upon your displays and how they're wired up.  To access, just use:

ldi ZL, low(2*7SEG_LOOKUP)

ldi ZH, high(2*7SEG_LOOKUP)

add ZL, bcd_digit                           ; Where bcd_digit is a register containing the value (in BCD) you would like to display

adc ZH, zero                                  ; and 'zero' is a register containing the value \$00

ldi temp, Z                                     ; and 'temp' is a high-side (16-31) register that you don't mind frying

out PORTB, temp                           ; and PORTB is where your LED display(s) are wired up to.

e.g.:

7SEG_LOOKUP:

.db 0b01110111   ; 0

.db 0b00010001   ; 1

.db 0b00111110   ; 2

(and so on)

Of course, the various bit patterns depend upon your displays and how they're wired up.  To access, just use:

ldi ZL, low(2*7SEG_LOOKUP)

ldi ZH, high(2*7SEG_LOOKUP)

add ZL, bcd_digit                           ; Where bcd_digit is a register containing the value (in BCD) you would like to display

adc ZH, zero                                  ; and 'zero' is a register containing the value \$00

lpm temp, Z                                     ; and 'temp' is a high-side (16-31) register that you don't mind frying

out PORTB, temp                           ; and PORTB is where your LED display(s) are wired up to.

&c.

S.

PS - Yes, I know there's a code window.  Yes, I know how to click on the relevant icon.  I also know that it does not work here, so kwitcher bitchin' until the site admins fix it.  S.

Edited to remove spurious \$, make it an lpm temp, and thank JS.  S,

I also know that it does not work here

JS

Last Edited: Mon. Apr 16, 2018 - 01:52 AM

Thanks..that is nice idea...convert HEXToBCD to get each digit in Binary Coded Decimal, and then lookup in FLASH table for value that needs to be send to 7Segment Display...Very smart way...i will study the code and post here when i got it working...

I have written in assembler 3-Digit 7Segment Multiplexing Code that works very well..

Wasn't your display already working?   Now, you just need to perform the conversions

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Yes my display is working if i manually specify to load number 3 from DSEG array then is shown correctly...now i need to get from number 432 each digit so that i can then call from DSEG digit positon to get hex value for 7SEGment data to be send.

I will do division math...with this:

432 / 100 = 4,32

Take 4 and PLACE in R23 => 04

Take Reminder 32 and divide it with 10 = 3,2

Take 3 and PLACE in R24 => 03

Take Reminder 2 and place it in R25 => 02

So Register After Division Needs To Look like this:

R23 => 04

R24 => 03

R25 => 02

Can SomeOne Write This in Assembler? I im learning and need lower cycles program to do that...i im reading AVR

https://github.com/sheepdoll/AVR...

"div16u" - 16/16 Bit Unsigned Division

I try it and get 4,32 so result and reminder...code needs to be improved by adding after reminder is set to divide by 10 and take new result and reminder into coressponding result registers...could please write someone this? I have finished project but this i need to add so that i can send each digit to DSEG array and get HEX value to be send to turn on digit on segment.

Thanks

robydream wrote:
.could please write someone this

The freaks are volunteers who will help you get YOUR code correct, but as professionals we get paid to write code, are you offering to pay someone?

Jim

Keys to wealth:

Invest for cash flow, not capital gains!

Wealth is attracted, not chased!

Income is proportional to how many you serve!

If you want something you've never had...

...you must be willing to do something you've never done!

Lets go Brandon!

ldi     r18, -1 + '0'
_bcd3:  inc     r18
subi    r16, low(100)           ;-100
sbci    r17, high(100)
brcc    _bcd3

ldi     r17, 10 + '0'
_bcd4:  dec     r17
subi    r16, -10                ;+10
brcs    _bcd4

subi    r16, -'0'

this is cut and paste from here

Have in mind it don't use the same reg. than you, the source number change, and the output is the ASCII numbers so 4 is 0x34 not 0x04 as you want , but it show how simple it can be done.

It's faster then div. but slower than the use of HW multiplier

Which AVR do you use?

Ok..fair is offer...i will then study and have questions how to do what i need to do.

This code works excellent...i use 8 bit value to divide (254/100) to get 2 5 and 4....using 16-bit value i will add later because it is only MSB and LSB...so i get using this code 2 and 54...

.include	"m328pdef.inc"

.ORG	0x0000

_Reset:
ldi		yl,byte1(RAMEND)
out		SPL,yl
ldi		yh,byte2(RAMEND)
out		SPL+1,yh

.DEF  ANS = R0            ;To hold answer
.DEF  REM = R2            ;To hold remainder
.DEF    A = R16           ;To hold dividend
.DEF    B = R18           ;To hold divisor
.DEF    C = R20           ;Bit Counter

LDI A,254         ;Load dividend into A
LDI B,100         ;Load divisor into B
DIV88:
LDI C,9           ;Load Bit Counter
Sub REM,REM       ;Clear Remainder And Carry
MOV ANS,A         ;Copy Dividend To Answer
Loop:   ROL ANS           ;Shift the answer To the Left
DEC C             ;Decrement Counter
BREQ DONE        ;Exit If eight bits done
ROL REM           ;Shift the remainder To the Left
Sub REM,B         ;Try To Subtract divisor from remainder
BRCC SKIP        ;If the result was negative Then
ADD REM,B         ;reverse the subtraction To try again
CLC               ;Clear Carry Flag so zero shifted into A
RJMP Loop        ;Loop Back
SKIP:   SEC               ;Set Carry Flag To be shifted into A
RJMP Loop
DONE:

But idea is when i get called DONE: macro i would like to put ANS value into R23, load reminder into A and set divider to 10 and loop again and second time i call DONE i need to leave R23 (not overwriting value) and put AND to R24 and reminder into R25

But problem is that i don't know how to test in DONE: macro if R23 register is empty or not (value 00)..if is empty copy value from AND to R23, if is not empty skip adding value and add AND to R24 and REM to R25.

So this is in short:

DONE:
// CHECK - if R23 is 00 if yes
MOV R23, AND
LDI A,REM         ;Load dividend into A
LDI B, 10         ;Load divisor into B
RCALL DIV88           ;Call again but divide reminder with 10

// if R23 is NOT 00 then

MOV R24, AND

MOV R25, REM

But i just don't know how to test if register R23 is 00 so that i can use above code..if someone can help with asm mnemonic how is called?

Ok i got it..it was so simple:

.include	"C:\FastAVR\inc\m8def.inc"

.ORG	0x0000

_Reset:
ldi		yl,byte1(RAMEND)
out		SPL,yl
ldi		yh,byte2(RAMEND)
out		SPL+1,yh

.DEF  ANS = R0            ;To hold answer
.DEF  REM = R2            ;To hold remainder
.DEF    A = R16           ;To hold dividend
.DEF    B = R18           ;To hold divisor
.DEF    C = R20           ;Bit Counter

LDI A,254         ;Load dividend into A
LDI B,100         ;Load divisor into B
DIV88:
LDI C,9           ;Load Bit Counter
Sub REM,REM       ;Clear Remainder And Carry
MOV ANS,A         ;Copy Dividend To Answer
Loop:   ROL ANS           ;Shift the answer To the Left
DEC C             ;Decrement Counter
BREQ DONE        ;Exit If eight bits done
ROL REM           ;Shift the remainder To the Left
Sub REM,B         ;Try To Subtract divisor from remainder
BRCC SKIP        ;If the result was negative Then
ADD REM,B         ;reverse the subtraction To try again
CLC               ;Clear Carry Flag so zero shifted into A
RJMP Loop        ;Loop Back
SKIP:   SEC               ;Set Carry Flag To be shifted into A
RJMP Loop
DONE:
TST R23			  ;Check If R23(first digit) is Set
BREQ Exit		  ;Branch If R23=00
MOV R24, ANS	  ;Second Digit
MOV R25, REM      ;Third Digit
RET
Exit:
MOV R23, ANS	  ;First Digit
MOV A, REM		  ;Load reminder divident into A
LDI B, 10		  ;Load divisor into B
RCALL DIV88		  ;Call again but divide divident by 10

Now one thing that worry me is 199 Clock Cycles that is for this operation too much...how can i lower cycles?

Code works after executing this i see this:

R23 => 02

R24 => 05

R25 => 04

I tested it with value 84 and it failed...because 2 digit number will be failed because R23 will be in DONE: macro always 00 and comapre will first time failed and in second time i will have:

R23 => 08

R24 => 00

R25 => 04

So it is not ok for 1 or 2 digit only for 3.

How can i fix this to work for 1 2 and 3 digit together, for example if number is 0 1 84 125 255 to get correctly values?

Ok..i updated code and now all digits from 000-255 works...

.include	"C:\FastAVR\inc\m8def.inc"

.ORG	0x0000

_Reset:
ldi		yl,byte1(RAMEND)
out		SPL,yl
ldi		yh,byte2(RAMEND)
out		SPL+1,yh

.DEF  ANS = R0            ;To hold answer
.DEF  REM = R2            ;To hold remainder
.DEF    A = R16           ;To hold dividend
.DEF    B = R18           ;To hold divisor
.DEF    C = R20           ;Bit Counter

LDI A,253         ;Load dividend into A
LDI B,100         ;Load divisor into B
DIV88:
LDI C,9           ;Load Bit Counter
Sub REM,REM       ;Clear Remainder And Carry
MOV ANS,A         ;Copy Dividend To Answer
Loop:   ROL ANS           ;Shift the answer To the Left
DEC C             ;Decrement Counter
BREQ DONE        ;Exit If eight bits done
ROL REM           ;Shift the remainder To the Left
Sub REM,B         ;Try To Subtract divisor from remainder
BRCC SKIP        ;If the result was negative Then
ADD REM,B         ;reverse the subtraction To try again
CLC               ;Clear Carry Flag so zero shifted into A
RJMP Loop        ;Loop Back
SKIP:   SEC               ;Set Carry Flag To be shifted into A
RJMP Loop
DONE:
TST R1			  ;Check If R1(first digit) is Set
BREQ Exit		  ;Branch If R1=00
MOV R24, ANS	  ;Second Digit
MOV R25, REM      ;Third Digit
RET
Exit:
MOV R23, ANS	  ;First Digit
LDI A, 0x01
MOV R1, A         ;First Digit Set
MOV A, REM		  ;Load reminder divident into A
LDI B, 10		  ;Load divisor into B
RCALL DIV88		  ;Call again but divide divident by 10

Now challenge is to add HIGH and LOW bits to load 16-bit number and get three digits...

robydream wrote:

Can SomeOne Write This in Assembler? I im learning and need lower cycles program to do that...i im reading AVR

https://github.com/sheepdoll/AVR...

"div16u" - 16/16 Bit Unsigned Division

I try it and get 4,32 so result and reminder...code needs to be improved by adding after reminder is set to divide by 10 and take new result and reminder into coressponding result registers...could please write someone this? I have finished project but this i need to add so that i can send each digit to DSEG array and get HEX value to be send to turn on digit on segment.

Thanks

This is a bit confusing.   That link is to an included tech note that was in the folder.    I see I forgot to change one of the register labels into the DAA code, which could lead to some other confusion.  I edited my post for future reference.

I have not take the time to look at the OP's code in detail.   This should be some fairly straightforward shifts and multiplies in the nybbles.   Three digits of BCD will take 2 bytes,  2 bytes of BCD take 1 byte.   If there are more than 2 digits then the 3rd digit carries into the next nybble.   So there will always be an nybble that is zero unless there is an overflow.  I would think that be the end of sequence flag.

On the other hand if leading or trailing zero suppression is required, then one is basically writing a printf function,  Which then starts detailing the itoa function.   This can be a simple table lookup that divides the digits by 1,10,100 as was done in the code of the late 1970s.  Advantage is this can work in any base.    Here is one that I converted from 68K assembly to AVR.  (the 68k opcodes are commented out)

;********************
;*	 DCVT	    *
;********************
;Decimal and octal conversion subroutines
;D1 contains the input value
;A2 indexes the output result area
;D6 contains the control word as follows:
;
;      Bits 0-7 contains the fixed length number of characters
;      Bit  8 - set to suppress leading zeros if fixed length
;      Bit  9 - set for terminal output (else uses A2)
;      Bit 10 -
;      Bit 11 - set for byte size in D1
;      Bit 12 - set for word size in D1
;      Bit 13 -
;      Bit 14 - set for octal output
;      Bit 15 - set for hex output
;
;Registers modified are none unless A2 is used

DCVT:
;	SAVE	D0,D1,D3,D4,A1		; save registers
;	MOV	D6,D0			; control word to D0
mov ARGL,D6_TMPL
mov ARGH,D6_TMPH

;Set table index A1 for decimal, octal or hex output
;	LEA	A1,HCTBL		; set hex table index
ldi ZL,low(HCTBL*2)
ldi ZH,high(HCTBL*2)

;	TSTW	D0			; is it hex?
tst ARGH
;	BMI	10\$			;   yes
brmi DCVT_10

;	LEA	A1,OCTBL		; set octal table index
ldi ZL,low(OCTBL*2)
ldi ZH,high(OCTBL*2)

;	BTST	#14.,D0			; is it octal?
;	BNE	10\$			;   yes
sbrc ARGH,6
rjmp DCVT_10

;	LEA	A1,DCTBL		; set decimal table index
ldi ZL,low(DCTBL*2)
ldi ZH,high(DCTBL*2)

;Put the table size into D4
DCVT_10:
;	MOV	(A1)+,D4		; pick up table size
mov A1_TMPL,ZL
mov A1_TMPH,ZH
lpm
mov idx,r0

;Set the input data into D3 and strip to word or byte as required
;	MOV	D1,D3			; get binary input
; avr D1 = ACC,BCC,XL,XH is parameter
;	BTST	#11.,D0			; byte size data?
;	BEQ	15\$			;   no
sbrs ARGH,3
rjmp DCVT_15
;	AND	#377,D3			; strip to byte data
clr BCC
clr XL
clr XH

DCVT_15:
;	BTST	#12.,D0			; word size data?
;	BEQ	DCND			;   no
sbrs ARGH,4
rjmp DCND
;	AND	#177777,D3		; strip to word data
clr XL
clr XH

;Calculate value of next digit
DCND:
;CLR	D1			; preclear result
clr D7_TMPL			; avr shadow D1 in r12-15

lpm
mov A6_TMPL,r0		; avr shadow internal to A6
lpm
mov A6_TMPH,r0
lpm
mov A6_Page,r0
lpm
mov A4_Page,r0
DCND_10:
;	CMP	D3,@A1			; compare to table value
cp ACC,A6_TMPL
cpc BCC,A6_TMPH
cpc XL,A6_Page
cpc XH,A4_Page

;	BLO	DCCZ
brlo DCCZ

;	INCW	D1
inc D7_TMPL

;	SUB	@A1,D3
sub ACC,A6_TMPL
sbc BCC,A6_TMPH
sbc XL,A6_Page
sbc XH,A4_Page

;	BR	10\$
rjmp DCND_10

;Check for digit of zero
DCCZ:
;	TSTW	D1			; zero?
;	BNE	DCNZ			; nope
cp D7_TMPL,zero
brne DCNZ

;Digit is zero - check for zero suppress unless units digit
;Bypass if variable length in progress
;	CMP	D4,#1			; units digit?
;	BEQ	DCNZ			; yes - no suppress
cpi idx,1
breq DCNZ

;	TSTB	D0			; check for variable length
;	BEQ	DCDD
tst ARGL
breq DCDD

;	BTST	#8.,D0			; test control bit 0
;	BEQ	DCNZ			; no suppress if off
sbrs ARGH,0
rjmp DCNZ

;	MOV	#40,D1			; set space
ldi c_tmp,32
mov D7_TMPL,c_tmp
;	BR	DCRO
rjmp DCRO

;Digit is not zero or it is zero in units position
;Make it ASCII and reset zero suppress bit
DCNZ:
;	CMPW	D1,#9.			; decimal digit?
ldi c_tmp,9
cp D7_TMPL,c_tmp
;	BLOS	10\$			;   yes
breq DCNZ_10
brlt DCNZ_10

;	ADDW	#7,D1			; adjust for hex A-F
ldi c_tmp,7

DCNZ_10:
;ADDW	#60,D1			; make ASCII for output
ldi c_tmp,48

;	ANDW	#177377,D0		; reset zero suppress bit
andi ARGH,0xFE

;Check for variable length output
;A digit has been detected so reset length if variable
;	TSTB	D0			; variable output length?
;	BNE	DCRO			; nope
tst ARGL
brne DCRO
;	OR	D4,D0			; reset for fixed length
mov idx,ARGL

;Ready for output - bypass if fixed length is less
DCRO:
;	CMPB	D0,D4			; check fixed length
;	BLO	DCDD			; bypass output if less
cp ARGL,idx
brlo DCDD

;	BTST	#9.,D0			; terminal output?
;	BEQ	10\$			;   no - use A2
SBRC ARGH,1
rjmp DCRO_10
mov D6_TMPL,D7_TMPL
rcall TOTCHR	;D1			; output to terminal
;BR	DCDD
rjmp DCDD

DCRO_10:
;	MOVB	D1,(A2)+		; output the digit
st Y+,D7_TMPL

;Digit done - check for all digits processed
DCDD:
;	ADD	#4,A1			; bump table index
;	ldi c_tmp,4

;	DEC	D4			; decrement digit count
dec idx
;	BNE	DCND			; loop if not last digit
brne DCND

;End of processing - restore registers and exit
;	REST	D0,D1,D3,D4,A1	 	; restore registers
;	RTE
ret

;Table for decimal conversion
DCTBL:	.dw	10
.dd	1000000000
.dd	100000000
.dd	10000000
.dd	1000000
.dd	100000
.dd	10000
.dd	1000
.dd	100
.dd	10
.dd	1

;Table for octal conversion
OCTBL:	.dw	11
.dd	1073741824
.dd	134217728
.dd	16777216
.dd	2097152
.dd	262144
.dd	32768
.dd	4096
.dd	512
.dd	64
.dd	8
.dd	1

;Table for hex conversion
HCTBL:	.dw	8
.dd		0x10000000
.dd		0x1000000
.dd		0x100000
.dd		0x10000
.dd		0x1000
.dd		0x100
.dd		0x10
.dd		0x1

And a 16-bit numbers support to extract three digit so numbers from 000-999 is successfully extracted...time to software do this is on 18,432MHz quarz 26,69uS or 492 Clock Cycles Counter...so using above math /100 take reminder and divide by 10 take so much cycles...maybe if i put some 20MHz quarz that is maximum for ATmega328p i will get slighty lower executing time...i will study above code and i think if i write this all manually without rjmp and other mnemonics that takes 2cycles to execute i think i can decrease time from 492 Cycles to about 350Cycles (18,98uS)....bit need to study...i im wondering if some have a idea or clue how to extract numbers from 000-999 using multiplication? I see in atmel datasheet that AVR have Hardware MUL instruction that takes 2cycles...and i try to multiplicate for example:

432 x 0,01 = 4,32 => Take 4 Put into R23 => 04

Take Reminder 32 And:

32 x 0,1 = 3,2 => Take 3 Put into R24 => 03

And Take Reminder and put into R25 => 02

So using HW MUL instruction if is allowed to multiple with decimal value? Because it runs 2cycles per multiplication so this will be around 30-40cycles if it can be done this way...and software division needs 492 cycles so this will be huge discovery in AVR world and huge speed.

Here is code that is working without errors with division and extracting numbers from 000-999:

.include	"C:\FastAVR\inc\m8def.inc"

.DSEG
Digit1: .Byte 1
Digit2: .Byte 1
Digit3: .Byte 1
.CSEG
.ORG  0x0000
RJMP Reset

Reset:
.DEF ANSL = R0            ;To hold Low-Byte of answer
.DEF ANSH = R1            ;To hold high-Byte of answer
.DEF REML = R2            ;To hold Low-Byte of remainder
.DEF REMH = R3            ;To hold high-Byte of remainder
.DEF   AL = R16           ;To hold Low-Byte of dividend
.DEF   AH = R17           ;To hold high-Byte of dividend
.DEF   BL = R18           ;To hold Low-Byte of divisor
.DEF   BH = R19           ;To hold high-Byte of divisor
.DEF    C = R20           ;Bit Counter

LDI AL,Low(0x363) ;Load Low-Byte of dividend into AL
LDI AH,HIGH(0x363);Load HIGH-Byte of dividend into AH
LDI BL,Low(100)   ;Load Low-Byte of divisor into BL
LDI BH,HIGH(100)  ;Load high-Byte of divisor into BH
DIV1616:
MOVW ANSH:ANSL,AH:AL ;Copy dividend into answer
LDI C,17          ;Load Bit Counter
Sub REML,REML     ;Clear Remainder And Carry
CLR REMH          ;
Loop:   ROL ANSL          ;Shift the answer To the Left
ROL ANSH          ;
DEC C             ;Decrement Counter
BREQ DONE        ;Exit If sixteen bits done
ROL REML          ;Shift remainder To the Left
ROL REMH          ;
Sub REML,BL       ;Try To subtract divisor from remainder
SBC REMH,BH
BRCC SKIP        ;If the result was negative Then
ADD REML,BL       ;reverse the subtraction To try again
CLC               ;Clear Carry Flag so zero shifted into A
RJMP Loop        ;Loop Back
SKIP:   SEC               ;Set Carry Flag To be shifted into A
RJMP Loop
DONE:
TST R4			  ;Check If R4(First Digit) Is Set
BREQ Exit           ;Branch If R4=00
MOV R24, ANSL 	  ;Second Digit Low Byte
MOV R25, REML       ;Third Digit Low Byte
RET
Exit:
MOV R23, ANSL	      ;First Digit
LDI AL, 0x01
MOV R4, AL		  ;First Digit Set
MOV AL, REML        ;Load Reminder Low-Byte of dividend into AL
MOV AH, REMH        ;Load Reminder HIGH-Byte of dividend into AH
LDI BL,Low(10)      ;Load Divisor Low-Byte of divisor into BL
LDI BH,HIGH(10)     ;Load Divisor high-Byte of divisor into BH
RCALL DIV1616       ;Call again but divide divident by 10

God And Bad Comments Are Welcome how can i improve this eather way with multiplication or division....

Why are you worried about execution time? The function will be called less than 100 times per second, so any potential savings won’t amount to much. Rule #1 get the code working first. Then think about optimising.

Ok you are right...i have some doubts and that's are:

I im building solder station controller using 3-Digit 7Segment Display Common Cathode...that is refreshed using Timer1 Configured As CTC mode and time to get interrupt ist set to 5ms...so every 5ms timer1 is fired it reads registers R23 R24 R25, read digit numbers from it...loads from FLASH array index coressponding hex value for digit 0...9 and using multiplexing send that hex value on PORTD that turns that digit decimal value.

I use Timer0 that is fired every 500ms that reads optocoupler temperature (MAX6675). left shift bytes to get temperature reading...divide result with 4 to get Celsius, and then it calls above function that extract celsius temperature that sets valid decimal value in registers R23 R24 R25 that is read using Timer1 that refresh display.

My question is is this 492cycles ok for this operation so that i can get fast reading temperature or it will be some cpu usage high and slow reading because of many cycles that divide numbers and extract each digit? I know that this is small time but i think it can be improved...ideas welcome....and yes i know that 1s = 1e+6s so this is under 30us executing time..so it must be without delay and low cpu usage...

The code I gave in #17 will do the job in less than 100 clk.

The code at the end of my link will do a full 16 -> 5 digit in max 68 clk and if you remove the code for finding the first digit it's about 40 clk.

And as a say in #11 if it only 3 digit in can be done in about 25-30 clk for an optimal routine.

All this is without formatting!

How do you want to show 8?

008  no formating

__8 (_ indicate off)

_8_ centered

8__ looks best if at the end of a text

I will study your code in simulator..and needs to get clear in my head of what each line is doing...this is way to learn asm and to know how things works..so i don't want just to copy paste code and say hey i finished project it is working...and later in next project i will surely not know how divider works in asm..so i will spend much more time to get point of that...

as you are asking for formating this i have done with formating table in flash and is working perfect...3cycles is needed to fetch formatiing from flash (2cycles is needed for SRAM but this is not critical because i im not building space shuttle where 1 cycle is very important).

.CSEG
digit: .DB 0x3f, 0x06, 0x5b, 0x4f, 0x66, 0x6d, 0x7d, 0x07, 0x7f, 0x6f

THIS IS CODE FROM TIMER1 THAT READ REGISTERS
ldi		ZH, HIGH(digit)				; Load Start Z-Address Of Digit Array (Flash)
ldi		ZL, Low(digit)

add		ZH, R23						; Add The Digit1 Index
Adc		ZL, R17						; Add 0 To Propagate The Carry
lpm		R23, Z						; Read Digit1 From Flash

add		ZH, R24						; Add The Digit2 Index
Adc		ZL, R17						; Add 0 To Propagate The Carry
lpm		R24, Z						; Read Digit2 From Flash

add		ZH, R25						; Add The Digit3 Index
Adc		ZL, R17						; Add 0 To Propagate The Carry
lpm		R25, Z						; Read Digit3 From Flash

So this is done and it works perfectly with very low cycles...so now i must study your code and get division in cycles below 100 that will be huge improvement...so when i get things cleaner in simulator i will post here your code that works for what i need..and thanks for code...

So that mean that you will show 8  as 008 (most people prefer not to show the two zeros )

Yes you are right...look at this picture...when i power on solder station it will start showint temperature rising to set temperature from 000 038  067  098  127  148  189 200 if i set 200C as set temperature and wil stay there..so if i use 3 digit 7 segment display to me it is beautiful to see 008 regards that 8...it is what people likes for its own style.

I see that you use MUL in you code to get first digit and other is very similiar idea to me but with much smart code so cycles is smaller..i im now studing and writing your code...i im now waiting to finish this code translations to my needs and see how many cycles is needs...

robydream wrote:

...bit need to study...i im wondering if some have a idea or clue how to extract numbers from 000-999 using multiplication? I see in atmel datasheet that AVR have Hardware MUL instruction that takes 2cycles...and i try to multiplicate for example:

... God And Bad Comments Are Welcome how can i improve this eather way with multiplication or division....

I think Nietzsche is dead so God probably does not care what the comments are.

In post #22 I gave code that does base conversion which only uses subtractions from a table.   Does not get much simpler than that.  Sure the code looks a bit long, becouse it is for two different ASM mnuemonics and has comments that explain it.   This is a general purpose function.  You can distill out only the parts needed.   The table and the subtraction.  I could have pulled the function from the basic interpreter which is basically the same thing.

If you really are concerned about cycles and not code space you can do an unrolled loop.  This is no calls or returns or indexes.   You count each instruction and branch.  Then you need to figure out if skip or branch instructions is better for the compares.   In an unrolled loop if you are only interested in 3 digits, you have 10 subtracts by 100, then 10 subtracts by 10 and 10 subtracts by 1.  Count the subtracts, Exit is on underflow past zero (negative) to the next digit.  The worse case is the number 9 9 9 the best case is 0 0 0.  The resulting numbers will be in the index (counter) registers for each digit.  There is a lot of old process control code for 8080, Z80, 8051, that uses this method.

Software divide is better for library systems and CISC processes that have hardware divisions when implementing printf(). This uses a lot of abstraction layers, to keep the code universal across platforms and languages.  There might be ways of working with fixed point divisions, which are academic, 0x1/0xA is a repeating fraction, otherwise this would be a more popular method.  This leaves subtraction loops (like the ASM200 tech note) or subdivision such as the DAA function which subdivides into compare to 10 and subtraction (inverse) addition of 6. since 10x6 = 16.

robydream wrote:
time to get interrupt ist set to 5ms...so every 5ms timer1 is fired
What speed is this 328 being run at? Let's assume that it's only just a conservative 1MHz. if it executes 1,000,000 cycles in 1 second then in 5ms it executes 5,000 cycles. So whether your display update takes 100 cycles, 500 cycles, 1,000 cycles, 2,000 cycles or whatever what does it really matter? Or is the CPU doing so many other CPU intensive operations that it is actually already using 4,900 cycles out of the 5,000 available in each 5ms that you HAVE to get the display updating done in just 100 or something?

(oh and this was 1MHz - if you run at 16MHz (say) then you have 80,000(!) cycles each time to get stuff done)

I have using ATMega328P, and 16MHz Quarz...i must use this high refresh rate because i im doing multiplexing on 7Segment displays...and to see digit nice without flickering. ghosting and so on..i must use this high rate...if i lower the rate for example 500ms o 1sec then i see each digit is shown while others are off..so if there is better way of using 7SEG Display please let me know.

And using code from sparrow2 i get very nice results from simulator...only 56cycles is needed to get each digit of 3 digit number in each registers...and would like to ask if sparrow2 can give me clue or formula how do make thinga optimized to get for example 25-30cycles for 3 digit? I ust removed code that get first digit and got very nice 56cycles...so i think it can be more improved.

I would like to get idea or code so that i can use it and save for later next project that i will do in assembly...for example when i finish this project and learn asm...my next idea is cnc controller where timings, cycles and precision using UART is very important..so i need to learn how to minimize cycles to get 3 digit...and using HW MUL mnemonic i was very impressed that it can be done with multiplication to do division...wonderful thing.

Just for record using sparrow2 HW MUL code i got 56cycles and when i do with bit shifting i got 492cycles...for same result! So hey sparrow2 imporeve cycles for about around 9x...which is very nice improvement...

So question for math users...how can i get formula in theory to do multiplication to get each digit from number? So that i can try to write code and maybe maybe i can get lower cycles....but my expetations are for 3 digit extraction in reality it is need 35-40cycles...below this in software routing using HW MUL i think is impossible...and question..why designers when they have designed AVR why they did not inlcude HW DIV instruction...they include HW MUL  but not DIV?

Even calculator that have division function have multiple function so this is what i can't figure out.

last thing first, a 8 bit HW MUL can easy be expanded to 16 32 bit. You can't do that with a 8 bit DIV.

But first question what else do you need to do since you will need 16MHz , it's not because of the multiplexing something like a 5 ms update rate would be fine, and what we have seen here can be done with a 128KHz clk!

I will look into the 25-30 clk later I busy now

and there should be plenty of mul with 1/X for div here , but short div with 256, 65536 ..... is free it's just move offset with a byte or word or .....

so mul with 26 and move a byte is close to div with 10 (26/256=0.10156).

then the best for 8 bit mul is to mul with the biggest number that is 2**n bigger but still fit a byte that is 8*1/10*256=204,8 so use 205

that give a result of 205/(256*8)=0.100097 so very close, but now you need to shift the result.

error in a number

Last Edited: Tue. Apr 17, 2018 - 11:10 AM

robydream wrote:
I have using ATMega328P, and 16MHz Quarz...
robydream wrote:
if sparrow2 can give me clue or formula how do make thinga optimized to get for example 25-30cycles for 3 digit?
I guess I must be missing something. As I just told you, in 5ms at 16MHz you have EIGHTY THOUSAND cycles. So why are you so focussed on getting this already optimized code further optimized to reduce it still further. Even if it takes 100 cycles that is 0.125% of the execution time you have available. What are you planning to do with the other 99.875% of execution time that is left ??

Clawson great question...here is what i im planning to do with AVR MCU..but i will wait for sparrow2 optimized code so that i can every juice from AVR speed...this is where at beggining learning AVR i would like to learn...and later i will have optimized code that can i use and undestand how it works in my next upcoming project...

So i im planning on building solder station controller that have the following futures:

- when solder station is power on on 7seg display it is shown S-E (Senzor-Error) => No Thermocouple Iron Attached

- when iron is attached, on display show 000 and every 500ms refresh 7seg display and show temperature (000, 013, 025, 0,48...)

- show iron temperature until iron temperature reach set temperature with rotary encoder (for example set temp is on display 200C)

- when i rotate left or right rotary encoder lower set temperature or high set temperature digits on 7seg display

- after 5sec is done and no rotary encoder is settinge temp start showing iron temperature

- when iron temp is lower that set temp turn ON HEATER RED LED, when it is equal or higher turn OFF HEATER LED

- add senzor that reads if iron is on stand and not picked up from stand then after 10min BUZZER will give sound 10 bips and iron station will automatically OFF( to prevent hause burning if is on stand and i im away)

- presing long about 1sec, 7seg digits start flashing showing P01 wich means Preset 01 to store set temperature onto preset 1 after i press another long 1sec on rotary encodert show SUC whick is SUCCESS fully stored into eeprom 1 position

- if i press in P01 menu one long press 5sec show P-- that shows that that current postion is erased from eeprom

- saving positions will br 9 (P01 to P09)

And so on...but here i need first to get extract digit using excellent sparrow2 code and get stepbystep..so if u leave unoptimized code at every step on planing soldering stations i will then get into trouble and bugs and all will be slow...so later i will add heather soldering gun...so ideas are in my head and all this will be very functional soldering station with heather gun for smd reflow...

But i must go every step by step..i wrote code that reads MAX6675 temperature....and got in hex &1B0 (432), have function that reads rom FLASH 7seg digits value but need to decode 432 into 4 3 2 usign sparrow2 ultra fast low cycle code...so this is where i need to study and learn...i hope you got my idea.....

Sorry but where in that list are the tasks you think require the other 79,900 (or even 79,950) cycles left out of every 80,000 in 5ms ?

I don't see anything listed there that is "processor intensive".

As so often stated there: the first rule of optimization is "don't".

It seems to me that you just aren't grasping exactly how powerful a 16MHz AVR actually is - you could be running an alarm clock a GPS receiver and still have time to let the user play Space Invaders on an attached graphic LCD as well as controlling the solder station in the cycles you have available. Or, to look at it another way, you could wind the clock back from 16MHz to 100kHz or even less and you would STILL have sufficient cycles for everything you say though the point you made earlier about multiplexing LEDs is the one thing that may introduce a requirement to run a little faster than that.

The key thing about writing code (usually) is to keep it simple and easy to read so it is easy to maintain - only worry about optimization when you hit some task that really is CPU bound. Often "optimization" will go hand in hand with "obfuscation" as there's a tendency to employ "clever tricks" and things like that which may not be immediately clear to the maintainer when they come back to fix some critical bug in a couple of years time (BTW that maintainer could be you when you have long forgotten how you implemented the original code).

I know and you are totally right...but in my nature i would like to optimize it maximum as it can be ...so i think that is good way...i know that AVR is very powerful CPU..and eather it is 8-bit you can do 32-bit operations with it...using more registers and more cycles...but for this atmel have prodeuces 32 bit PU and ARM have too nice CPU...to me very interesting is AVR so as i im building soldering station i don't hurry with build and would like to learn how can code be optimized and be more faster..as CPU runs at 16MHz it has so many available cycles that iwill not see if macro is executed in 3uS or 27uS because it is so fast..but my learning curve is get code to work..then optimize it maximum as you can and then use it....save optimized code macro to use it in next project..so that i don't need to think again how it workd.

I previusly started programming in ATMEL Studio in C language and after compiling same code that works like in ASM i get very dissipointing result...in ASM i got 105words and in C after compiling c to asm and to hex i got 225words..so eather one way to go in asm because asm is pretty close to AVR HW...so this way i perefer optimizations and working in AVR....

Thanks to clearing things to me...i know that you are right..but in my nature i would like to get optimized above code to run in 25-30cycles because if AVR will have DIV mnemonic and execution will be 2cycles..then to divide 432 number and get each digit i will need 10cycles...so above code if can be done in 25cycles will be 2,5x slower that HW division(and AVR don't have it)...but is 19x faster that software division...so if you think that 19x faster improvement is not right for same result from macro (division) then i think you are wrong...492 cycles in not same as 25cycles...and why bother CPU bus and ALU with 492 cycles if you can do with better code in 25cycles? This is my opinion...

but 432 is bigger than a byte! so no easy DIV way on a 8 bitter ! (first line of #33)

robydream wrote:
I previusly started programming in ATMEL Studio in C language and after compiling same code that works like in ASM i get very dissipointing result...in ASM i got 105words and in C after compiling c to asm and to hex i got 225words..so eather one way to go in asm because asm is pretty close to AVR HW...so this way i perefer optimizations and working in AVR....
yes but C is more easily readable and maintainable. If you go on to do software engineering professionally you will rapidly find that maintainability is one of the key design goals. You can write the fanciest solution you like but if you can't later fix the code (or re-use it as part of new / better designs) you will have a real headache on your hands. In fact even C is generally surplanted by C++ now because the design goal of that language was more easy maintenance and reuse possibilities. Apart from the most cost constrained of projects (typically extremely high volume where every \$0.01 counts) you won't find many commercial micro designs using Asm these days as the maintenance / readability / reuse requirements are over-riding.

you are right byte is going from 0-255 so as i need for my 7seg display 000-999 this needs to be .BYTE 2 if im right...and put LOW and HIGH...here is your code that works wonderful with 56 cycles...i just slight modify it to make my needs and see i put in this example number 432 and i got in registers 02 03 04 and is what i need...so when you have time to optimize it or write me here forumula so that i can try (remember i im beginner and this is the way that i can try so that i can learn)

So if you please have time to write me example with MUL menmonic how can i do this with number in decimal 432 to get 04 03 02 with math theory...and i will try to write asm...

.include	"C:\FastAVR\inc\m8def.inc"

.ORG  0x0000								; 1 Cycle
RJMP Reset								; 2 Cycle

Reset:
; TEST - 0x1B0 (432)
ldi	R16, Low(0x1B0)						; 1 Cycle
ldi R17, HIGH(0x1B0)					; 1 Cycle
movw R10, R16							; 1 Cycle

Top:
; MUL by 41
ldi		r18,41							; 1 Cycle
ldi		r22,0							; 1 Cycle
mul		r16,r18							; 2 Cycle
movw		r20,r0						; 1 Cycle
mul		r17,r18							; 2 Cycle
add		r21,r0							; 1 Cycle
Adc		r22,r1							; 1 Cycle

;>>10 is the same As highbyte >>2
lsr		r17								; 1 Cycle
lsr		r17								; 1 Cycle
;Do mul And Sub result
mul		r17,r18							; 2 Cycle
Sub		r20,r0							; 1 Cycle
sbc		r21,r1							; 1 Cycle
sbci		r22,0						; 1 Cycle

;>>12 is the same As <<4 we know that result
; only is 1 Byte
Swap		r22
Swap		r21
eor		r22,r21
andi		r22,240
eor		r22,r21							; 1st digit R22
;find reminder by number-result*100
ldi		r20,100
mul		r20,r22
movw		r20,r16
Sub		r20,r0

;split the value in R22 into 2 digits.
;formular y=(number*51+20)>>9
ldi		r16,51
mul		r22,r16
movw		r18,r0
subi		r18,Low(-20)
sbci		r19,high(-20)
lsr		r19
;calc the reminder
ldi		r17,10
mul		r19,r17
Sub		r22,r0
mov		r21,r19

;this is a repeat On the other 2 digits
mul		r20,r16
movw		r18,r0
subi		r18,Low(-20)
sbci		r19,high(-20)
lsr		r19
mul		r19,r17
Sub		r20,r0
;this is just move so everything stays in order
mov		r23,r21
mov		r21,r19

...i must use this high refresh rate because i im doing multiplexing on 7Segment displays...and to see digit nice without flickering. ghosting and so on..i must use this high rate...if i lower the rate for example 500ms o 1sec then i see each digit is shown while others are off..so if there is better way of using 7SEG Display please let me know

Your worries border on the ridiculous. The calculations have nothing to do with the situation at all.  Once the digit values have been calculated (say once a second), they can be multiplexed hundreds or thousands of times a second onto the display with no problem, or flicker.   Look up the segment mappings & multiplex those at high speed.    Only when new readings are needed should the reading values be recalculated, which is much slower than the display multiplexing. Try using once a second.  Faster than that and you will get a blurry display (368, 369, 367, 368, 369, 368, 370, 369,368....).

You can also average the values going to the calculation (before digit extraction) to "smooth" them out   Xdisp=(Xdisp+Xsample)/2, say 5 times a second

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Tue. Apr 17, 2018 - 03:32 PM

robydream wrote:
I know and you are totally right...but in my nature i would like to optimize it maximum as it can be ...so i think that is good way...i know that AVR is very powerful CPU..
You might want to reconsider your criteria for optimal.

Also, AVRs are not all that powerful, but your display does not take much.

Moderation in all things. -- ancient proverb

clawson wrote:
Even if it takes 100 cycles that is 0.125% of the execution time you have available. What are you planning to do with the other 99.875% of execution time that is left ??

...especially for a display update, which only needs to be refreshed a few times per second.  If that, for a solder station.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

ok it was rain so I had some time to look at it.

First some code based on the code in #17

;max 92 clk
movw    r24,r16 ; only needed if to save org.
ldi     r20, -1
L0:     inc     r20
subi    r24, low(100)           ;-100
sbci    r25, high(100)
brcc    L0
ldi     r19, 10
L1:     dec     r19
subi    r24, -10                ;+10
brcs    L1
mov     r18,r24 ; only needed if number in order

an optimized version

;max 53 clk
movw    r24,r16 ; only needed if to save org.
ldi     r20, 0
L0:     subi    r20, -4
subi    r24, low(400)           ;-400
sbci    r25, high(400)
brcc    L0
L0a:	dec	r20
subi    r24, low(-100)           ;+100
sbci    r25, high(-100)
brcs	L0a
ldi     r19, 0
L1:     subi    r19, -4
subi    r24, 40                ;-40
brcc    L1
L1a:	dec     r19
subi	r24,-10			;+10
brcs	L1a

mov     r18,r24 ; only needed if number in order

And then a faster using HW mul

;max 31 clk
;input 0-999 in r17:r16 out r22:r21:r20
movw		r24,r16		;copy number
sbrc		r25,2		;if number >=512 guess one higher
inc		r25
mov		r22,r25		;first 100 digit guess
inc		r25		;add one so guess can be to high
ldi		r18,100
mul		r18,r25		;r1:r0 == 0 100 200 300 500 600 700 or 800
movw	        r24,r16		;get a new copy
sub		r24,r0		;sub 0 100 200 300 500 600 700 or 800 from number
sbc		r25,r1
brcs	        L0  		;if negativ done with 100
sub		r24,r18		;else sub 100 and inc guess
inc		r22
brcs	        L0	    	;if negativ done with 100
sub		r24,r18		;else sub 100 and inc guess
inc		r22
;now correct 100 digit
L0:	add		r24,r18		;make reminder back to 0-99
ldi		r18,26		;divide with 10 by mul with 26/256
mul		r24,r18
movw	        r20,r0
sbrs	        r24,6		;if >=64 sub 20 (mul 26 is a tad to big)
rjmp	        L1
subi	        r20,20
sbci	        r21,0
;now correct 10 digit
L1:	ldi		r18,10		;mul reminder with 10
mul		r20,r18
mov		r20,r1		;now last digit correct

I'm sure it can be done faster but I think it's hard to cut of 5 clk (with code, not a problem with a LUT)

And yes this is like a crossword for me. I know it's not needed for OP

Thanks...this is awesome...only 31 clk using HW multiplier...it is more that enough...and i don't need more optimized version that that....thanks i will add it to 7seg code and when arrive 7seg displays i will publish video..thanks...

Here is an algorithm I use, I'll illustrate in C so it's easier to understand. In this case it's packed BCD because I'm having trouble wrestling the compiler to make it work with 32 bit ints.

But I just want to show the algorithm.

#include <stdio.h>
#include <stdint.h>
uint16_t bin2bcd (uint16_t);

int main (void) {
for (uint16_t i = 0, bcd; i <= 999; i++) {
bcd = bin2bcd(i);
printf("0x%03x \n", bcd);
}
}

#define BASE 10
#define MAGIC 41				/* 41 = BASE*4096/1000 rounded up; 4096 = 3 nibbles; 1000 = numbers in the 0-999 range */
uint16_t bin2bcd (uint16_t bin) {
uint16_t bcd;
bin *= MAGIC;
bcd = (bin & 0xF000) >> 4;		/* store 100s digit */
bin &= 0x0FFF;				/* mask only remainder nibbles */
bin *= BASE;				/* get 10s digit in most significant nibble */
bcd |= (bin & 0xF000) >> 8;		/* store 10s digit */
bin &= 0x0FFF;				/* mask only remainder nibbles */
bin *= BASE;		        	/* get 1s digit in most significant nibble */
bcd |= (bin & 0xF000) >> 12;		/* store 1s digit */
return bcd;
}

You can paste this in an online compiler like https://www.onlinegdb.com/online... to see the output.

Last Edited: Thu. Apr 26, 2018 - 12:35 PM

I tried it your way aswell but could never get it as fast that was why I didn't post it. It's the 9 clk to deal with the high bytes that kills it but if you want to look at it here is a direct way to solve it:

ldi		r18,41
mul		r18,r16
movw	        r24,r0
mul		r18,r17
mov		r20,r25
ldi		r18,10
andi	        r25,0x0f
mov		r26,r25
mul		r18,r24
movw	        r24,r0
mul		r26,r18
mov		r21,r25
andi	        r25,0x0f
mov		r26,r25
mul		r18,r24
movw	        r24,r0
mul		r26,r18
mov		r22,r25
swap	        r20
swap	        r21
swap	        r22
andi	        r20,0x0f
andi	        r21,0x0f
andi	        r22,0x0f

Last Edited: Fri. Apr 27, 2018 - 08:53 AM

Yeah, it's somewhat slower than your method. The advantage is that it executes in constant time, that's useful sometimes.

ok I looked at it again and if I use your way for the first digit and mine for the last two it will take max 26 clk ;)

;in r17:r16  out r22:r21:r20
ldi		r18,41
mul		r18,r16
mov		r22,r1
mul		r18,r17
swap	        r22
andi	        r22,0x0f
ldi		r18,100
movw	        r24,r16
mul		r18,r22
sub		r24,r0
ldi		r18,26
mul		r24,r18
movw	        r20,r0
sbrs	        r24,6
rjmp	        L0
subi            r20,20
sbci	        r21,0
L0:	ldi		r18,10
mul		r20,r18
mov		r20,r1

if source is aloud to change one clk are saved.

The code is also correct up to 1023 (fail at 1100), with high digit having the value of 10, so it's easy to extent to 10 bit for the ADC

Last Edited: Fri. Apr 27, 2018 - 06:52 PM

Nice