57 posts / 0 new

## Pages

Ok if speed realy matter then I can tell that it looks like I can keep my word about a ASM version that use 80 clk.
I Just solved the biggest problem , div 0..9999 with 100, and calc the reminder, that can be done in 28 clk (24 instructions).(no jumps so it's allways 28 clk)

Split the 0..99 up to two digits take 13 clk (11 instructions)

But I have to add some move instructions before I put it together. But now it's bed time here.

So I think that even with the overhead in C's mul function this way can be fastest in C aswell.

Jens
Are there any C lib functions for 8bit*8bit=16bit and
8*16=24 in any of the used compilers.
I remember that in the past I asked for a 8*8=16 for the codevision and 2 hours later he showed how to write the function.

```	RSEG CODE:CODE:NOROOT(1)
PUBLIC _i2a
//   13 __z void i2a(char *s, UINT16 v)
//R16:R17 - v
//R30:R31 - *s
//
_i2a:
//   14 {
//   15   UINT8 m0; //R16
//   16   UINT8 m1; //R17
//R18-R20 - 24bit fmul result
//R21 - c,b,a ->06 8D B9
//R22 - zero reg
CLR	R22
LDI	R21,0x06
//  v=__multiply_unsigned(m0,0x06)+3;
MUL	R16,R21
MOVW	R19:R18,R1:R0
SUBI	R18,0xFD
SBCI	R19,0xFF
//  v+=__multiply_unsigned(m1,0x06)<<8;
MUL	R17,R21
MOV	R20,R1
//  v+=__multiply_unsigned(m1,0x8D);
LDI     R21, 0x8D
MUL     R17, R21
//  v+=__multiply_unsigned(m0,0x8D)>>8;
MUL     R16, R21
//  v+=__multiply_unsigned(m1,0xB9)>>8;
LDI	R16,0x10		; Counter & flags
LDI	R21,0xB9
MUL     R17, R21
LDI     R21, 10			; Next multiplicand
BREQ	??i2a_0
SUBI	R20,208
ST	Z+,R20
INC	R16
??i2a_0:
//   39     UINT16 hv;
//   40     UINT8 bv;
//   41     bv=v>>8;
MOV     R17, R19
//   42     v=__multiply_unsigned(v,10);
MUL     R18, R21
MOVW    R19:R18, R1:R0
//   43     hv=__multiply_unsigned(bv,10);
MUL     R17, R21
//   44     v+=(hv&0xFF)<<8;
//   45     if (SREG_Bit0) hv+=0x100;
//   46     bv=hv>>8;
MOV     R17, R1
//   47     if ((i|bv)&0x8F)
MOV     R20, R1
OR      R20, R16
ANDI    R20, 0x8F
BREQ    ??i2a_1
//   48     {
//   49       *s++=bv+'0';
SUBI	R17,208
ST	Z+,R17
//   50       i|=1;
//        ORI     R18, 0x01
??i2a_1:
//   51     }
//   52     i<<=1;
ROL	R16
//   54   while(!SREG_Bit0);
BRBC    0, ??i2a_0
//   55   *s=0;
ST      Z, R22
//   56 }
RET
```

Quote:

Lee, speed is absolutely important.

There is also "good enough". If you are going to agonize about 10 cycles in displaying the value, then you'd better dig through EVERY compiler primitive (e.g., mul & div routines; EEPROM read; EEPROM write; flash read, shift) and agonize about a possible cycle here and two there. And you had better know your compiler's code generation model intimately so that your C code translates int as-near-to-optimal sequences as you can get. [That actually is a good idea. But the way I might write the "best" function in Codevision may not be the same as the "best" way in ImageCraft or GCC as the code generation models are different.]

Meanwhile, your project never gets "done". Once you get your average current draw down below the typical battery-leakage value--the shelf life-- it doesn't matter anyway as you are going to get variations from battery to battery and brand to brand.

Now, if you are tuning the ISR for a high-speed operation such as encoder or industrial counter with "trip" comparison or pixel output or similar, then close examination and tweaking and cycle-shaving is certainly justified.

Many of my production apps have no C library includes at all. (Well, I almost have CV's delay.h that I use for my startup delay but that is trivial.) My "bigger" apps are Mega32/Mega64 class and might have a multi-line display and a menu system and different mode displays. those might have a few memcpy/strcpy uses, and some might have sprintf(). PErhaps a couple from math.h here-and-there. But I suspect a lot less than many apps that are of different types.

Oh, yeah, back to your agonizing: When you app is "done", then you have to carefully evaluate, FOR THAT PARTICULAR APP, whether you are better off going fast as heck to go back to sleep earlier or to plod along very sedately towards your bed conserving energy in the process. That answer isn't always obvious, and much will have to do with peripheral subsystems enabled when awake.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

We designer must also think about the evironment. One less clock cycle and one byte less flash means we will use slightly less wafer and battery. Just image how much of a Nuclear Power Plant and size of garbage one byte and one clock cycle in every computer in the world would mean?

Don't take this to seriously, remember that this started as a Christmas hobby project.

My favorites:
1. My oscilloscope, Yokogawa DLM2024.
2. My soldering iron, Weller WD2M, WMRP+WMRT.
3. JTAGICE3 debugger.

Code length is only important if you are short on flash and every byte count. I would guess that on evarage 1000s of bytes of flash are unused with the controllers.

A special case may be bootloaders, but there speed is of no importance.

OK now I have a result of the uint16 to 5 BCD bytes.
This is ASM code, but can be put into a C function.
This code is optimized for worst case speed.
The code is not optimal but at least it show that the DIV by MUL by 1/X can be done fast.
The code can run on any AVR with a HW multiplier.
Just by moving the possible error correction on first digit down to the div by 100 code there can be saved 2 clk, but I don’t want to make it to messy.

Worst case the conversion take 69 CLK. Best case 67

The code is solving the problem this way.

Find the first digit.

Then div with 100, and calc the reminder

Take those 2 numbers and split them up to 2 digits.

This is the code. It can be pasted into AVR studio, I kept my test loop
It looks like there is a problem with the use of tabs.

```
.include "m88def.inc"
;
;********************************
; unsigned 16 bit to 5 byte BCD
; Author   Jens Norgaard-Larsen
;********************************
; code size without loop
; 58 *2 = 116 byte
; speed without loop update
; best case : 67 clk
; worst case : 69 clk
// just a part of the test code
ldi		r16,low(10000)
ldi		r17,high(10000)
movw	r10,r16
// in:	16 bit UINT in R17:R16
// out: R24:R23:R22:R21:R20
// change R16,R17,R18,R19,R0,R1
// change R10,R11 only in test loop
top:
// find first digit by mul high byte with 6 this give
// the result in the high byte (can be 1 to small))
ldi		r18,6
mul		r17,r18
mov		r24,r1 ;r24 now calcluated
;find reminder by sub 10000*result from R17:R16
ldi		r18,low(10000)
mul		r24,r18
movw		r20,r0
ldi		r18,high(10000)
mul		r24,r18
sub		r16,r20
sbc		r17,r21
; if reminder >=10000 then inc result and
; sub 10000 from R17:R16
cpi		r16,low(10000)
cpc		r17,r18
brmi		L000
inc		r24
subi		r16,low(10000)
sbci		r17,high(10000)
L000:
; DIV with 100 result in R22
; reminder in R20
; function used is result = (R17:R16*41-R17:R16>>10*41)>>12
; MUL by 41
ldi		r18,41
ldi		r22,0
mul		r16,r18
movw		r20,r0
mul		r17,r18
;>>10 is the same as highbyte >>2
lsr		r17
lsr		r17
;do mul and sub result
mul		r17,r18
sub		r20,r0
sbc		r21,r1
sbci		r22,0
;>>12 is the same as <<4 we know that result
; only is 1 byte
swap		r22
swap		r21
eor		r22,r21
andi		r22,240
eor		r22,r21
;find reminder by number-result*100
ldi		r20,100
mul		r20,r22
movw		r20,r16
sub		r20,r0

;split the value in R22 into 2 digits.
;formular y=(number*51+20)>>9
ldi		r16,51
mul		r22,r16
movw		r18,r0
subi		r18,low(-20)
sbci		r19,high(-20)
lsr		r19
;calc the reminder
ldi		r17,10
mul		r19,r17
sub		r22,r0
mov		r21,r19

;this is a repeat on the other 2 digits
mul		r20,r16
movw		r18,r0
subi		r18,low(-20)
sbci		r19,high(-20)
lsr		r19
mul		r19,r17
sub		r20,r0
;this is just move so everything stays in order
mov		r23,r21
mov		r21,r19

; this is just a part of my test code
nop
movw		r16,r10
subi		r16,1
sbci		r17,0
movw		r10,r16
rjmp		top```