Smallest and fastest binary to bcd conversion

Go To Last Post
57 posts / 0 new


  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok if speed realy matter then I can tell that it looks like I can keep my word about a ASM version that use 80 clk.
I Just solved the biggest problem , div 0..9999 with 100, and calc the reminder, that can be done in 28 clk (24 instructions).(no jumps so it's allways 28 clk)

Split the 0..99 up to two digits take 13 clk (11 instructions)

But I have to add some move instructions before I put it together. But now it's bed time here.

So I think that even with the overhead in C's mul function this way can be fastest in C aswell.

Are there any C lib functions for 8bit*8bit=16bit and
8*16=24 in any of the used compilers.
I remember that in the past I asked for a 8*8=16 for the codevision and 2 hours later he showed how to write the function.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What about this code?

	PUBLIC _i2a
//   13 __z void i2a(char *s, UINT16 v)
//R16:R17 - v
//R30:R31 - *s
//   14 {
//   15   UINT8 m0; //R16
//   16   UINT8 m1; //R17
//R18-R20 - 24bit fmul result
//R21 - c,b,a ->06 8D B9
//R22 - zero reg
	CLR	R22
	LDI	R21,0x06
//  v=__multiply_unsigned(m0,0x06)+3;
	MUL	R16,R21
	MOVW	R19:R18,R1:R0
	SUBI	R18,0xFD
	SBCI	R19,0xFF
//  v+=__multiply_unsigned(m1,0x06)<<8;
	MUL	R17,R21
	MOV	R20,R1
	ADD	R19,R0
	ADC	R20,R22
//  v+=__multiply_unsigned(m1,0x8D);
        LDI     R21, 0x8D
        MUL     R17, R21
        ADD     R18, R0
        ADC     R19, R1
	ADC	R20, R22
//  v+=__multiply_unsigned(m0,0x8D)>>8;
        MUL     R16, R21
        ADD     R18, R1
        ADC     R19, R22
	ADC	R20, R22
//  v+=__multiply_unsigned(m1,0xB9)>>8;
	LDI	R16,0x10		; Counter & flags
	LDI	R21,0xB9
        MUL     R17, R21
        LDI     R21, 10			; Next multiplicand
        ADD     R18, R1
        ADC     R19, R22
	ADC	R20, R22
	BREQ	??i2a_0
	SUBI	R20,208
	ST	Z+,R20
	INC	R16
//   39     UINT16 hv;
//   40     UINT8 bv;
//   41     bv=v>>8;
        MOV     R17, R19
//   42     v=__multiply_unsigned(v,10);
        MUL     R18, R21
        MOVW    R19:R18, R1:R0
//   43     hv=__multiply_unsigned(bv,10);
        MUL     R17, R21
//   44     v+=(hv&0xFF)<<8;
        ADD     R19, R0
//   45     if (SREG_Bit0) hv+=0x100;
	ADC	R1, R22
//   46     bv=hv>>8;
        MOV     R17, R1
//   47     if ((i|bv)&0x8F)
        MOV     R20, R1
        OR      R20, R16
        ANDI    R20, 0x8F
        BREQ    ??i2a_1
//   48     {
//   49       *s++=bv+'0';
	SUBI	R17,208
	ST	Z+,R17
//   50       i|=1;
//        ORI     R18, 0x01
//   51     }
//   52     i<<=1;
	ROL	R16
//   54   while(!SREG_Bit0);
        BRBC    0, ??i2a_0
//   55   *s=0;
        ST      Z, R22
//   56 }
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


Lee, speed is absolutely important.

There is also "good enough". If you are going to agonize about 10 cycles in displaying the value, then you'd better dig through EVERY compiler primitive (e.g., mul & div routines; EEPROM read; EEPROM write; flash read, shift) and agonize about a possible cycle here and two there. And you had better know your compiler's code generation model intimately so that your C code translates int as-near-to-optimal sequences as you can get. [That actually is a good idea. But the way I might write the "best" function in Codevision may not be the same as the "best" way in ImageCraft or GCC as the code generation models are different.]

Meanwhile, your project never gets "done". Once you get your average current draw down below the typical battery-leakage value--the shelf life-- it doesn't matter anyway as you are going to get variations from battery to battery and brand to brand.

Now, if you are tuning the ISR for a high-speed operation such as encoder or industrial counter with "trip" comparison or pixel output or similar, then close examination and tweaking and cycle-shaving is certainly justified.

Many of my production apps have no C library includes at all. (Well, I almost have CV's delay.h that I use for my startup delay but that is trivial.) My "bigger" apps are Mega32/Mega64 class and might have a multi-line display and a menu system and different mode displays. those might have a few memcpy/strcpy uses, and some might have sprintf(). PErhaps a couple from math.h here-and-there. But I suspect a lot less than many apps that are of different types.

Oh, yeah, back to your agonizing: When you app is "done", then you have to carefully evaluate, FOR THAT PARTICULAR APP, whether you are better off going fast as heck to go back to sleep earlier or to plod along very sedately towards your bed conserving energy in the process. That answer isn't always obvious, and much will have to do with peripheral subsystems enabled when awake.


You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

We designer must also think about the evironment. One less clock cycle and one byte less flash means we will use slightly less wafer and battery. Just image how much of a Nuclear Power Plant and size of garbage one byte and one clock cycle in every computer in the world would mean?

Don't take this to seriously, remember that this started as a Christmas hobby project.

My favorites:
1. My oscilloscope, Yokogawa DLM2024.
2. My soldering iron, Weller WD2M, WMRP+WMRT.
3. JTAGICE3 debugger.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Code length is only important if you are short on flash and every byte count. I would guess that on evarage 1000s of bytes of flash are unused with the controllers.

A special case may be bootloaders, but there speed is of no importance.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK now I have a result of the uint16 to 5 BCD bytes.
This is ASM code, but can be put into a C function.
This code is optimized for worst case speed.
The code is not optimal but at least it show that the DIV by MUL by 1/X can be done fast.
The code can run on any AVR with a HW multiplier.
Just by moving the possible error correction on first digit down to the div by 100 code there can be saved 2 clk, but I don’t want to make it to messy.

Worst case the conversion take 69 CLK. Best case 67

The code is solving the problem this way.

Find the first digit.

Then div with 100, and calc the reminder

Take those 2 numbers and split them up to 2 digits.

This is the code. It can be pasted into AVR studio, I kept my test loop
It looks like there is a problem with the use of tabs.

.include ""
; unsigned 16 bit to 5 byte BCD 
; Author   Jens Norgaard-Larsen
; code size without loop
; 58 *2 = 116 byte 
; speed without loop update
; best case : 67 clk
; worst case : 69 clk
// just a part of the test code
	ldi		r16,low(10000)
	ldi		r17,high(10000)
	movw	r10,r16
// in:	16 bit UINT in R17:R16
// out: R24:R23:R22:R21:R20
// change R16,R17,R18,R19,R0,R1
// change R10,R11 only in test loop
// find first digit by mul high byte with 6 this give
// the result in the high byte (can be 1 to small))
	ldi		r18,6
	mul		r17,r18
	mov		r24,r1 ;r24 now calcluated
;find reminder by sub 10000*result from R17:R16
	ldi		r18,low(10000)
	mul		r24,r18 
	movw		r20,r0
	ldi		r18,high(10000)
	mul		r24,r18
	add		r21,r0
	sub		r16,r20
	sbc		r17,r21
; if reminder >=10000 then inc result and 
; sub 10000 from R17:R16
	cpi		r16,low(10000)
	cpc		r17,r18
	brmi		L000
	inc		r24
	subi		r16,low(10000)
	sbci		r17,high(10000)
; DIV with 100 result in R22
; reminder in R20
; function used is result = (R17:R16*41-R17:R16>>10*41)>>12
; MUL by 41
	ldi		r18,41
	ldi		r22,0
	mul		r16,r18
	movw		r20,r0
	mul		r17,r18
	add		r21,r0
	adc		r22,r1
;>>10 is the same as highbyte >>2
	lsr		r17
	lsr		r17
;do mul and sub result
	mul		r17,r18
	sub		r20,r0
	sbc		r21,r1
	sbci		r22,0
;>>12 is the same as <<4 we know that result
; only is 1 byte
	swap		r22
	swap		r21
	eor		r22,r21
	andi		r22,240
	eor		r22,r21
;find reminder by number-result*100 
	ldi		r20,100
	mul		r20,r22
	movw		r20,r16
	sub		r20,r0

;split the value in R22 into 2 digits.
;formular y=(number*51+20)>>9
	ldi		r16,51
	mul		r22,r16
	movw		r18,r0
	subi		r18,low(-20)
	sbci		r19,high(-20)
	lsr		r19
;calc the reminder
	ldi		r17,10
	mul		r19,r17
	sub		r22,r0
	mov		r21,r19

;this is a repeat on the other 2 digits
	mul		r20,r16
	movw		r18,r0
	subi		r18,low(-20)
	sbci		r19,high(-20)
	lsr		r19
	mul		r19,r17
	sub		r20,r0
;this is just move so everything stays in order
	mov		r23,r21
	mov		r21,r19

; this is just a part of my test code
	movw		r16,r10
	subi		r16,1
	sbci		r17,0
	movw		r10,r16
	rjmp		top