Ok if speed realy matter then I can tell that it looks like I can keep my word about a ASM version that use 80 clk.
I Just solved the biggest problem , div 0..9999 with 100, and calc the reminder, that can be done in 28 clk (24 instructions).(no jumps so it's allways 28 clk)
Split the 0..99 up to two digits take 13 clk (11 instructions)
But I have to add some move instructions before I put it together. But now it's bed time here.
So I think that even with the overhead in C's mul function this way can be fastest in C aswell.
Jens
Are there any C lib functions for 8bit*8bit=16bit and
8*16=24 in any of the used compilers.
I remember that in the past I asked for a 8*8=16 for the codevision and 2 hours later he showed how to write the function.
There is also "good enough". If you are going to agonize about 10 cycles in displaying the value, then you'd better dig through EVERY compiler primitive (e.g., mul & div routines; EEPROM read; EEPROM write; flash read, shift) and agonize about a possible cycle here and two there. And you had better know your compiler's code generation model intimately so that your C code translates int as-near-to-optimal sequences as you can get. [That actually is a good idea. But the way I might write the "best" function in Codevision may not be the same as the "best" way in ImageCraft or GCC as the code generation models are different.]
Meanwhile, your project never gets "done". Once you get your average current draw down below the typical battery-leakage value--the shelf life-- it doesn't matter anyway as you are going to get variations from battery to battery and brand to brand.
Now, if you are tuning the ISR for a high-speed operation such as encoder or industrial counter with "trip" comparison or pixel output or similar, then close examination and tweaking and cycle-shaving is certainly justified.
Many of my production apps have no C library includes at all. (Well, I almost have CV's delay.h that I use for my startup delay but that is trivial.) My "bigger" apps are Mega32/Mega64 class and might have a multi-line display and a menu system and different mode displays. those might have a few memcpy/strcpy uses, and some might have sprintf(). PErhaps a couple from math.h here-and-there. But I suspect a lot less than many apps that are of different types.
Oh, yeah, back to your agonizing: When you app is "done", then you have to carefully evaluate, FOR THAT PARTICULAR APP, whether you are better off going fast as heck to go back to sleep earlier or to plod along very sedately towards your bed conserving energy in the process. That answer isn't always obvious, and much will have to do with peripheral subsystems enabled when awake.
Lee
You can put lipstick on a pig, but it is still a pig.
I've never met a pig I didn't like, as long as you have some salt and pepper.
We designer must also think about the evironment. One less clock cycle and one byte less flash means we will use slightly less wafer and battery. Just image how much of a Nuclear Power Plant and size of garbage one byte and one clock cycle in every computer in the world would mean?
Don't take this to seriously, remember that this started as a Christmas hobby project.
My favorites:
1. My oscilloscope, Yokogawa DLM2024.
2. My soldering iron, Weller WD2M, WMRP+WMRT.
3. JTAGICE3 debugger.
Posted by Kleinstein: Sat. Jan 10, 2009 - 05:22 PM
1
2
3
4
5
Total votes: 0
Code length is only important if you are short on flash and every byte count. I would guess that on evarage 1000s of bytes of flash are unused with the controllers.
A special case may be bootloaders, but there speed is of no importance.
OK now I have a result of the uint16 to 5 BCD bytes.
This is ASM code, but can be put into a C function.
This code is optimized for worst case speed.
The code is not optimal but at least it show that the DIV by MUL by 1/X can be done fast.
The code can run on any AVR with a HW multiplier.
Just by moving the possible error correction on first digit down to the div by 100 code there can be saved 2 clk, but I don’t want to make it to messy.
Worst case the conversion take 69 CLK. Best case 67
The code is solving the problem this way.
Find the first digit.
Then div with 100, and calc the reminder
Take those 2 numbers and split them up to 2 digits.
This is the code. It can be pasted into AVR studio, I kept my test loop
It looks like there is a problem with the use of tabs.
.include "m88def.inc"
;
;********************************
; unsigned 16 bit to 5 byte BCD
; Author Jens Norgaard-Larsen
;********************************
; code size without loop
; 58 *2 = 116 byte
; speed without loop update
; best case : 67 clk
; worst case : 69 clk
// just a part of the test code
ldi r16,low(10000)
ldi r17,high(10000)
movw r10,r16
// in: 16 bit UINT in R17:R16
// out: R24:R23:R22:R21:R20
// change R16,R17,R18,R19,R0,R1
// change R10,R11 only in test loop
top:
// find first digit by mul high byte with 6 this give
// the result in the high byte (can be 1 to small))
ldi r18,6
mul r17,r18
mov r24,r1 ;r24 now calcluated
;find reminder by sub 10000*result from R17:R16
ldi r18,low(10000)
mul r24,r18
movw r20,r0
ldi r18,high(10000)
mul r24,r18
add r21,r0
sub r16,r20
sbc r17,r21
; if reminder >=10000 then inc result and
; sub 10000 from R17:R16
cpi r16,low(10000)
cpc r17,r18
brmi L000
inc r24
subi r16,low(10000)
sbci r17,high(10000)
L000:
; DIV with 100 result in R22
; reminder in R20
; function used is result = (R17:R16*41-R17:R16>>10*41)>>12
; MUL by 41
ldi r18,41
ldi r22,0
mul r16,r18
movw r20,r0
mul r17,r18
add r21,r0
adc r22,r1
;>>10 is the same as highbyte >>2
lsr r17
lsr r17
;do mul and sub result
mul r17,r18
sub r20,r0
sbc r21,r1
sbci r22,0
;>>12 is the same as <<4 we know that result
; only is 1 byte
swap r22
swap r21
eor r22,r21
andi r22,240
eor r22,r21
;find reminder by number-result*100
ldi r20,100
mul r20,r22
movw r20,r16
sub r20,r0
;split the value in R22 into 2 digits.
;formular y=(number*51+20)>>9
ldi r16,51
mul r22,r16
movw r18,r0
subi r18,low(-20)
sbci r19,high(-20)
lsr r19
;calc the reminder
ldi r17,10
mul r19,r17
sub r22,r0
mov r21,r19
;this is a repeat on the other 2 digits
mul r20,r16
movw r18,r0
subi r18,low(-20)
sbci r19,high(-20)
lsr r19
mul r19,r17
sub r20,r0
;this is just move so everything stays in order
mov r23,r21
mov r21,r19
; this is just a part of my test code
nop
movw r16,r10
subi r16,1
sbci r17,0
movw r10,r16
rjmp top
Ok if speed realy matter then I can tell that it looks like I can keep my word about a ASM version that use 80 clk.
I Just solved the biggest problem , div 0..9999 with 100, and calc the reminder, that can be done in 28 clk (24 instructions).(no jumps so it's allways 28 clk)
Split the 0..99 up to two digits take 13 clk (11 instructions)
But I have to add some move instructions before I put it together. But now it's bed time here.
So I think that even with the overhead in C's mul function this way can be fastest in C aswell.
Jens
Are there any C lib functions for 8bit*8bit=16bit and
8*16=24 in any of the used compilers.
I remember that in the past I asked for a 8*8=16 for the codevision and 2 hours later he showed how to write the function.
- Log in or register to post comments
TopWhat about this code?
- Log in or register to post comments
TopThere is also "good enough". If you are going to agonize about 10 cycles in displaying the value, then you'd better dig through EVERY compiler primitive (e.g., mul & div routines; EEPROM read; EEPROM write; flash read, shift) and agonize about a possible cycle here and two there. And you had better know your compiler's code generation model intimately so that your C code translates int as-near-to-optimal sequences as you can get. [That actually is a good idea. But the way I might write the "best" function in Codevision may not be the same as the "best" way in ImageCraft or GCC as the code generation models are different.]
Meanwhile, your project never gets "done". Once you get your average current draw down below the typical battery-leakage value--the shelf life-- it doesn't matter anyway as you are going to get variations from battery to battery and brand to brand.
Now, if you are tuning the ISR for a high-speed operation such as encoder or industrial counter with "trip" comparison or pixel output or similar, then close examination and tweaking and cycle-shaving is certainly justified.
Many of my production apps have no C library includes at all. (Well, I almost have CV's delay.h that I use for my startup delay but that is trivial.) My "bigger" apps are Mega32/Mega64 class and might have a multi-line display and a menu system and different mode displays. those might have a few memcpy/strcpy uses, and some might have sprintf(). PErhaps a couple from math.h here-and-there. But I suspect a lot less than many apps that are of different types.
Oh, yeah, back to your agonizing: When you app is "done", then you have to carefully evaluate, FOR THAT PARTICULAR APP, whether you are better off going fast as heck to go back to sleep earlier or to plod along very sedately towards your bed conserving energy in the process. That answer isn't always obvious, and much will have to do with peripheral subsystems enabled when awake.
Lee
You can put lipstick on a pig, but it is still a pig.
I've never met a pig I didn't like, as long as you have some salt and pepper.
- Log in or register to post comments
TopWe designer must also think about the evironment. One less clock cycle and one byte less flash means we will use slightly less wafer and battery. Just image how much of a Nuclear Power Plant and size of garbage one byte and one clock cycle in every computer in the world would mean?
Don't take this to seriously, remember that this started as a Christmas hobby project.
My favorites:
1. My oscilloscope, Yokogawa DLM2024.
2. My soldering iron, Weller WD2M, WMRP+WMRT.
3. JTAGICE3 debugger.
- Log in or register to post comments
TopCode length is only important if you are short on flash and every byte count. I would guess that on evarage 1000s of bytes of flash are unused with the controllers.
A special case may be bootloaders, but there speed is of no importance.
- Log in or register to post comments
TopOK now I have a result of the uint16 to 5 BCD bytes.
This is ASM code, but can be put into a C function.
This code is optimized for worst case speed.
The code is not optimal but at least it show that the DIV by MUL by 1/X can be done fast.
The code can run on any AVR with a HW multiplier.
Just by moving the possible error correction on first digit down to the div by 100 code there can be saved 2 clk, but I don’t want to make it to messy.
Worst case the conversion take 69 CLK. Best case 67
The code is solving the problem this way.
Find the first digit.
Then div with 100, and calc the reminder
Take those 2 numbers and split them up to 2 digits.
This is the code. It can be pasted into AVR studio, I kept my test loop
It looks like there is a problem with the use of tabs.
- Log in or register to post comments
TopPages