## Binary-to-Decimal Conversion--a reference for all

49 posts / 0 new
Author
Message

There have been many, many threads on the Forum on binary-to-decimal conversion, usually for purposes of creating ASCII character representations of numbers for display.

Some of the common themes include the surprise at the size of programs containing printf(); complaints about size and/or speed of alternatives such as itoa(); and just lack of basic knowledge on the subject.

With different types of displays, and differing display requirements (sometimes I want to inject an inplied decimal port; sometimes I need utoa() for full 16 bits instead of 15; etc.) I'm always searching for the "ultimate method".

Well, I came across a reference that is detailed enough for the purists and also straightforward enough for the beginnners:
http://www.cs.uiowa.edu/~jones/b...

Quote:

Binary to Decimal Conversion in Limited Precision
Part of the Arithmetic Tutorial Collection
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Copyright © 1999, Douglas. W. Jones, with major revisions mad e in 2002. This work may be transmitted or stored in electronic form on any computer attached to the Internet or World Wide Web so long as this notice is included in the copy. Individuals may make single copies for their own use. All other rights are reserved.

I >>knew<< there had to be bettter methods than the straightforward subtraction of powers of 10, and faster methods than the elegant recursive solution (which HAS to be the smallest flash consumer). This article will give all of us something to think about. I'm looking forward to exploring more items in the "collection" now that I've found it.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Quote:
...exploring more items in the "collection"...

Just on the off-chance to save someone *some* work while further exploring this collection - my "DIV16_XX" library (#131 in the academy) implements division by constants based on Mr. Jones' nice examples (though I've to admit that it is implemented in assembly - I just couldn't resist "36 cycles for a div. by 23" ...).

:wink:

Andreas

Interesting article! I'm working on the asm version :twisted:

```ldi r16, (1<<MB_BEER)
out SREG, r16
```

see you in an hour.

Christoph

I tend to post off-topic replies when I've noticed some interesting detail.
Feel free to stop me.

Great! 8)

Here is my code to convert 1-4 byte binary do packed bcd (conv_bin2bcd) and packed bcd to binary (conv_bcd2bin).
First argument - input data,second - number of output bytes (for values 0-99 size=1,0-9999 size=2,0-999999 size=3,0-99999999 size=4).

```unsigned long conv_bin2bcd(unsigned long data,unsigned char size)
{register unsigned long result asm("r16");
asm ("mov __tmp_reg__,%A2 \n"
"conv_bin2bcd00: \n"
"mov r2,%A1 \n"
"mov %A1,%B1 \n"
"mov %B1,%C1 \n"
"mov %C1,%D1 \n"
"mov %D1,r2 \n"
"dec __tmp_reg__ \n"
"brne conv_bin2bcd00 \n"

"eor %A0,%A0 \n"  /*clear result*/
"eor %B0,%B0 \n"
"eor %C0,%C0 \n"
"eor %D0,%D0 \n"
"mov __tmp_reg__,%A2 \n"
"lsl __tmp_reg__\nlsl __tmp_reg__\nlsl __tmp_reg__\n"  /*__tmp_reg__=size*8*/

"conv_bin2bcd01: \n"           /*shift loop*/
"sbrs %A0, 3 \n"            /*if carry to bit 3,*/
"subi %A0, 3 \n"            /*subtract 3*/
"sbrs %A0, 7 \n"            /*if carry to bit 7,*/
"subi %A0, 0x30\n"          /*subtract 0x30*/
"sbrs %B0, 3 \n"            /*if carry to bit 3,*/
"subi %B0, 3 \n"            /*subtract 3*/
"sbrs %B0, 7 \n"            /*if carry to bit 7,*/
"subi %B0, 0x30\n"          /*subtract 0x30*/
"sbrs %C0, 3 \n"            /*if carry to bit 3,*/
"subi %C0, 3 \n"            /*subtract 3*/
"sbrs %C0, 7 \n"            /*if carry to bit 7,*/
"subi %C0, 0x30\n"          /*subtract 0x30*/
"sbrs %D0, 3 \n"            /*if carry to bit 3,*/
"subi %D0, 3 \n"            /*subtract 3*/
"sbrs %D0, 7 \n"            /*if carry to bit 7,*/
"subi %D0, 0x30\n"          /*subtract 0x30*/
"lsl %A0\nrol %B0\nrol %C0\nrol %D0\n" /*shift out buffer*/

"sbrc %D1, 7 \n"            /*skip if msbit of input =0*/
"sbr %A0,1 \n"
"lsl %A1\nrol %B1\nrol %C1\nrol %D1\n" /*shift in buffer*/

"dec __tmp_reg__ \n"        /*repeat for all bits*/
"brne conv_bin2bcd01 \n"

: "=r" (result) :"r" (data), "r" (size) : "r2"
);
return(result);
}
```
```unsigned long conv_bcd2bin(unsigned long data,unsigned char size)
{register unsigned long result asm("r16");
asm ("eor %A0,%A0 \n"  /*clear result*/
"eor %B0,%B0 \n"
"eor %C0,%C0 \n"
"eor %D0,%D0 \n"
"mov __tmp_reg__,%A2 \n"
"lsl __tmp_reg__\nlsl __tmp_reg__\nlsl __tmp_reg__\n"  /*__tmp_reg__=size*8*/

"conv_bcd2bin00: \n"          /*shift loop*/

"lsr %D0\nror %C0\nror %B0\nror %A0\n" /*shift out buffer*/

"sbrc %A1,0 \n"
"sbr %D0,0x80 \n"

"lsr %D1\nror %C1\nror %B1\nror %A1\n"

"sbrc %D1, 7 \n"            /*if carry to bit 7,*/
"subi %D1, 0x30 \n"         /*subtract 0x30*/
"sbrc %D1, 3 \n"            /*if carry to bit 3,*/
"subi %D1, 3\n"             /*subtract 3*/
"sbrc %C1, 7 \n"            /*if carry to bit 7,*/
"subi %C1, 0x30 \n"         /*subtract 0x30*/
"sbrc %C1, 3 \n"            /*if carry to bit 3,*/
"subi %C1, 3\n"             /*subtract 0x30*/
"sbrc %B1, 7 \n"            /*if carry to bit 7,*/
"subi %B1, 0x30 \n"         /*subtract 0x30*/
"sbrc %B1, 3 \n"            /*if carry to bit 3,*/
"subi %B1, 3\n"             /*subtract 3*/
"sbrc %A1, 7 \n"            /*if carry to bit 7,*/
"subi %A1, 0x30 \n"         /*subtract 0x30*/
"sbrc %A1, 3 \n"            /*if carry to bit 3,*/
"subi %A1, 3\n"             /*subtract 3*/

"dec __tmp_reg__ \n"        /*repeat for all bits*/
"brne conv_bcd2bin00 \n"

"conv_bcd2bin01: \n"
"mov __tmp_reg__,%D0 \n"
"mov %D0,%C0 \n"
"mov %C0,%B0 \n"
"mov %B0,%A0 \n"
"mov %A0,__tmp_reg__ \n"
"dec %A2 \n"
"brne conv_bcd2bin01 \n"

: "=r" (result) :"r" (data), "r" (size) : "r2"
);
return(result);
}
```

I've started coding it in asm and ran into some weird problem:

```putdec:
mov	d1, number_L	;d1 = (n>>4) & 0xF
swap	d1
andi	d1, 0x0F

mov	d2, number_H	;d2 = (n>>8) & 0xF
andi	d2, 0x0F

mov	d3, number_H	;d3 = (n>>12) & 0xF
swap	d3
andi	d3, 0x0F

mov	d0, d1			;d0 = 6*(d3 + d2 + d1) + (n & 0xF)
ldi	r16, 6
mul	d0, r16
mov	d0, r0
mov	r16, number_L
andi	r16, 0x0F

ldi	r16, 0x9A	;q = (d0 * 0x19A) >> 12
mul	d0, r16
swap	r1
mov	q, r1
andi	q, 0x0F

ldi	r16, 10		;d0 = d0 - 10 * q
mul	q, r16
sub	d0, r0

ret```

I tested it with number_H:L = 0xFFFF and d0 is now 0x09. But as 0xFFFF = 65525 I expected d0 to be 0x05 now :shock:
All registers are in the high block (d0..d4, q, Number_L:H)
I've checked the multiply d0 * 0x19A but it gives the same results in the sim as in the calculator.

Christoph

I tend to post off-topic replies when I've noticed some interesting detail.
Feel free to stop me.

I admit this one is a little tricky.

`I don't have the intelligence or experience to figure out how this code works `

yes of course you do :D
Assuming A is a 16 bit number then A = AL + 256 * AH.
A * B = (AH*256 + AL)*(BH*256 + BL) = AH*BH*256*256 + AH*256*BL + AL*BH*256 + AL*BL.
In this case: A = d0 and B = 0x19A, so AH = 0 and BH = 1 which will make our above term look like this:
A*B = AL*0x01*256 + AL * 0x9A
The second part of the sum is done first: d0 * 0x9A.
Then d0 is added to the high register of the result. That's the first part of the sum and should be the same as the original code. The higher order result regs are not taken care of as

`(d0 * 0x19A) >> 12 `

only uses bits 15...12.

Christoph

I tend to post off-topic replies when I've noticed some interesting detail.
Feel free to stop me.

I think I found it. with n = 0xFFFF (and similar high numbers) q is greater than 0xF and then my code doesn't take care of the carry which is - as it seems - important in this case.

I tend to post off-topic replies when I've noticed some interesting detail.
Feel free to stop me.

SteveN wrote:
Of course the C code presented is pure ______ (any foreign language to an English speaking American inserted here...don't want to offend anyone 8) ) to me.

I think the word "gibberish" is sufficiently non-locale-specific to be inoffensive and can be used in this context. Unless of course you live in Gibber. :)

Interesting approach, but I fear, the needed divisions and multiplications decrease the speed dramatically.

On my view the subtraction method should be the fastest way on the AVRs.
Following my optimized version. It swing around zero, so subtraction and comparison are done simultaneous:

```;*************************************************************************
;*                                                                       *
;*                      Convert unsigned 32bit to ASCII                  *
;*                                                                       *
;*              Author: Peter Dannegger                                  *
;*                      danni@specs.de                                   *
;*                                                                       *
;*************************************************************************
;
;input: R31, R30, R29, R28 = 32 bit value 0 ... 4294967295
;output: R25, R24, R23, R22, R21, R20, R19, R18, R17, R16 = 10 digits (ASCII)
;
bin32_ascii:
ldi     r25, -1 + '0'
_bcd1:  inc     r25
subi    r29, byte2(1000000000)  ;-1000,000,000 until overflow
sbci    r30, byte3(1000000000)
sbci    r31, byte4(1000000000)
brcc    _bcd1

ldi     r24, 10 + '0'
_bcd2:  dec     r24
subi    r29, byte2(-100000000)  ;+100,000,000 until no overflow
sbci    r30, byte3(-100000000)
sbci    r31, byte4(-100000000)
brcs    _bcd2

ldi     r23, -1 + '0'
_bcd3:  inc     r23
subi    r28, byte1(10000000)    ;-10,000,000
sbci    r29, byte2(10000000)
sbci    r30, byte3(10000000)
sbci    r31, 0
brcc    _bcd3

ldi     r22, 10 + '0'
_bcd4:  dec     r22
subi    r28, byte1(-1000000)    ;+1,000,000
sbci    r29, byte2(-1000000)
sbci    r30, byte3(-1000000)
brcs    _bcd4

ldi     r21, -1 + '0'
_bcd5:  inc     r21
subi    r28, byte1(100000)      ;-100,000
sbci    r29, byte2(100000)
sbci    r30, byte3(100000)
brcc    _bcd5

ldi     r20, 10 + '0'
_bcd6:  dec     r20
subi    r28, byte1(-10000)        ;+10,000
sbci    r29, byte2(-10000)
sbci    r30, byte3(-10000)
brcs    _bcd6

ldi     r19, -1 + '0'
_bcd7:  inc     r19
subi    r30, byte1(1000)          ;-1000
sbci    r31, byte2(1000)
brcc    _bcd7

ldi     r18, 10 + '0'
_bcd8:  dec     r18
subi    r30, byte1(-100)          ;+100
sbci    r31, byte2(-100)
brcs    _bcd8

ldi     r17, -1 + '0'
_bcd9:  inc     r17
subi    r30, 10                 ;-10
brcc    _bcd9

subi    r30, -10 - '0'
mov     r16, r30
ret
;-------------------------------------------------------------------------
```

And now the 16 bit version and in C:

```char digit;

void bin2bcd( unsigned int val )
{
char i;

i = '0' - 1;
do
i++;
while( !((val -= 10000) & 0x8000) );
digit = i;

i = '0' + 10;
do
i--;
while( (val += 1000) & 0x8000 );
digit = i;

i = '0' - 1;
do
i++;
while( !((val -= 100) & 0x8000) );
digit = i;

i = '0' + 10;
do
i--;
while( (val += 10) & 0x8000 );
digit = i;

digit = val | '0';
}
```

Peter

Another 16-bit version in asm:

```;**************************************************************************
;*
;* "Convert16" - 16-bit unsigned Binary to ASCII conversion
;*
;* Ths subroutine converts an unsigned 16-bit number (XH:XL)
;* to a 5-digit BCD number represented by 5 bytes (r7:r6:r5:r4:r3).
;*
;* MSD of the 5-digit number is placed in r7,
;*
;* The ASCII-coded digits are stored in "ASCII_digits" array
;*
;* Note: array structure (offsets from start address):
;* .DSEG
;* .org [last_equ]+1
;* ASCII_digs:   .BYTE 11     ; reserve 11d bytes for ASCII digits:
;*				; offset 0 for sign character ("+" or "-")
;*				; offset 1 for msc (ASCII code, digit 9)
;*				; offset 2 for character (ASCII code, digit 8)
;*				; 	"
;*				; 	"
;*				; 	"
;*				; 	"
;*				; 	"
;*				; 	"
;*				; 	"
;*				; offset 10 for lsc (ASCII code, digit 0)
;*
;* Register usage:
;*	r3             BCD value digit 0 (ones)
;*	r4             BCD value digit 1 (tens)
;*	r5             BCD value digit 2 (hundreds)
;*	r6             BCD value digit 3 (thousands)
;*	r7             BCD value digit 4 (tenthousands)
;*	XL             binary value LSB (16bit: Low byte)
;*	XH             binary value (16bit: High byte)
;*	r24            temporary value & output char.
;*
;* Number of words      :83
;* Number of cycles     :242-272 (incl. push/pop etc.)
;* Low registers used   :7 (r3,r4,r5,r6,r7)
;* High registers used  :5 (X,Z,r24)
;* Pointers used        :Z,X
;*
;* All registers saved
;*
;* Optimized conversion code by John Payson, port to AVR by A.L
;*
;* Note: the basic algorithm computes the BCD digits from the "binary digits"
;* (input) and represents them as negative numbers to allow a very
;* efficient "conversion by subtraction" method (~180 cycles total for
;* the 16-bit-binary to 5-digit-bcd conversion).
;*
;**************************************************************************

Convert16:
push	r24
push	ZH
push	ZL
push	XH
push	XL
push	r3
push	r4
push	r5
push	r6
push	r7

; implement equations, make BCD values negative
mov	r24,XH
swap	r24
andi	r24,\$0F
subi	r24,-\$F0
mov	r6,r24
subi	r24,-\$E2
mov	r5,r24
subi	r24,-\$32
mov	r3,r24

mov	r24,XH
andi	r24,\$0F
subi	r24,-\$E9
mov	r4,r24

mov	r24,XL
swap	r24
andi	r24,\$0F

rol	r4
rol	r3
com	r3
clc			; compensate unwanted "+1" (A.L.)
rol	r3

mov	r24,XL
andi	r24,\$0F
rol	r6

ldi	r24,\$07
mov	r7,r24

; BCD digits are in 2's complement form now and made
; negative numbers (except for the "10K" digit in
; r7, which is regarded as a positive number)

ldi	r24,\$0A		; load "10" for "normalizing"

Lb1:	; "normalize" BCD digits - "/10" & "mod 10" simultaneously
dec	r4
brcs	Lb2
rjmp	Lb1

Lb2:
dec	r5
brcs	Lb3
rjmp	Lb2

Lb3:
dec	r6
brcs	Lb4
rjmp	Lb3

Lb4:
dec	r7
brcs	Lb5
rjmp	Lb4

Lb5:				; convert and store BCD digits to array
ldi XH,high(ASCII_digs)
adiw	XL,6		; set it to ASCII array offset 6 (digit 4)

clr     ZH		; (5 unpacked BCD digits = 5 regs)
ldi     ZL,8 		; 16 bit: address+1 of last BCD data register (r7)
Lb6:
ld      r24,-Z		; pre-decrement
subi    r24,-'0'        ; convert to ASCII
st	X+,r24		; store ASCII digits to SRAM array
cpi     ZL,4		; address +1 of first BCD data register (r3)
brsh    Lb6		; loop until all 5 digits are stored

pop	r7		; EXIT module
pop	r6
pop	r5
pop	r4
pop	r3
pop	XL
pop	XH
pop	ZL
pop	ZH
pop	r24

ret

;**** End of Convert16 Function ---------------------------------------****

```

Andreas

Hmmm...why didn't all these algorithms pop up in the uncounted discussions about itoa before? :o

Christoph

I tend to post off-topic replies when I've noticed some interesting detail.
Feel free to stop me.

Hi.

If you are looking for speed and have enough program memory to spend, here's a routine from me.
Loops make your code slower and the cycles aren't always the same.
In this routine, there's not a single loop and the cycles are always the same.

If you are interested in other fast routines in assembly, feel free to contact me

dpagrafio@the.forthnet.gr

## Attachment(s): ASM code

oops! I just noticed that the cycles aren't exactly the same but anyway, it's still faster than using loops :D

bye

Conversion of binary numbers of any size. Uses shifts and bcd correction. Numbers stored in memory from lsb to msb.

```;Convert binary number in memory  to packed BCD
;input : XH:XL=address of lowest byte of input number;
;           YH:YL=address of lowest byte of output number;
;           R24-input number size ,bytes;R25-output number size,bytes.
;uses  : R16,R17,R18,R19,R30,R31

conv_bin2bcd:
movw r30,r28      ;Z->output buffer
mov r16,r25         ;r16=output buffer size
eor r17,r17
conv_bin2bcd05:
st Z+,r17               ;Clear output buffer
dec r16
brne conv_bin2bcd05

mov r16,r24
lsl r16
lsl r16
lsl r16                     ;R16=input bits counter

conv_bin2bcd10: ;input bits shift loop
mov r17,r24          ;R17=input buffer size
movw r30,r26      ;Z->input buffer

conv_bin2bcd20:  ;Shift input buffer loop
ld r18,Z
rol r18
st Z+,r18
dec r17
brne conv_bin2bcd20

mov r17,r25           ;r17=output buffer size
movw r30,r28        ;Z->output buffer

conv_bin2bcd30:  ;Shift output buffer loop
in r19,SREG         ;remember carry flag
subi r18,-3           ;BCD correction
sbrs r18,3
subi r18,3
subi r18,-0x30
sbrs r18,7
subi r18,0x30
out SREG,r19      ;restore carry flag
rol r18                 ;Shift bit from input buffer to output buffer
st Z+,r18             ;store byte back
dec r17
brne conv_bin2bcd30;Repeat for all output buffer

dec r16
brne conv_bin2bcd10  ;Repeat for every input bit

ret

```
Last Edited: Wed. Jul 21, 2004 - 06:17 AM

Wow, my month-old post has gotten a lot of play in the past few days.

My powers-of-10-with-subtract routine has served me fairly well, and it is small enough and fast enough for 16-bit work. I've modified it from a "standard" itoa()-type routine for display purposes: typically right-justify; leading-0 supression or not; handle full 16-bit unsigned; insert an implied decimal point where needed; etc.

As luck would have it, a new app needs similar features for 32-bit--kind of an ltoa() mod. I hope to be able to take many of these postings & do some testing on size & speed.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Hi all,

here's one for the real freaks to chew on :twisted:
I've tried to implement that algorithm and tried to write very readable code, so you should be able to understand it. I've tried it with 0xFFFF as the number to be displayed. The display routine just writes the resulting digits to sram.

First problem: I used Studio 4, as I currently don't have version 3.5 installed. This might be a source for errors - maybe my code works, but the sim doesn't know that :?

The code for the very last digit seems to work, as but the rest gives rubbish. Maybe some fresh eyes can see some error, I didn't. All c lines are in there as comments, so what the code *should* do can be seen in the first line of every code block. At the side I've added comments as well, mainly for the multiply/add operations.

It seems that the algorithm as it is written down in the final version (in the document by Mr. Jones) doesn't only need 8-bit variables, but 16 bits from time to time. d0 and q for example need 16 bits each to give correct results.

The code is VERY register hungry, but I didn't optimize it at all (a working version could be optimized, but as it doesn't work, well there's no need for optimization).

Christoph

## Attachment(s): avrfreaks_putdec.asm

I tend to post off-topic replies when I've noticed some interesting detail.
Feel free to stop me.

Hello buffi,

I just tried your code and noticed that some of C code comments do not match your assembly code. Even with my corrections, the answer comes out to be 63125. This is my first attempt at trying to implement this neat algorithm so I do not know where the other problems are.

The first one is at ;d1 = q + 9*d3 + 5*d2 + d1. Your first line of assembly has add d1,ql instead of add d1,d1

The next one has to do with the ; d2 = q + 2 * d2. I believe the add d2,r0 should be mov d2,r0. This also happens in the section for ; d3 = q + 4 * d3.

Ok, I am a idiot. The line of add d1,ql that I thought :oops: was a mistake and should of been replaced by add d1,d1 needs to be removed all together. The register d1 already contains the value so adding to it is a mistake. I just made the corrections (this one and the two other stated earlier) and the answer is 65535.

OK, my code now works as well. The errors I made were a bit dumb, I must admit that :shock:
I am though quite happy about the multiply operations, they caused the greatest headache, but thinking a bit more about them was worth it as it seems.

I've attached the new (now working version of the code) for others to download. It's 124 words including init code.
Now we can start optimizing it :D

Christoph

EDIT:
Conversion timing: 0xABCD needs 187 cycles, 0xFFFF needs 197 cycles. I don't know which kind of values needs the longest time, but the variations are due to the d2 = d2 % 10 operation which is done in a loop.

EDITEDIT:
Replaced the file by new one with prettier formatting! NO code changes.

## Attachment(s): avrfreaks_putdec.asm

I tend to post off-topic replies when I've noticed some interesting detail.
Feel free to stop me.

buffi wrote:

It's 124 words including init code.
Conversion timing: 0xABCD needs 187 cycles, 0xFFFF needs 197 cycles.

Only for comparison:

subtraction method: 20 words, 20...170 cycle (without call, return)

Peter

That's about 8 cycles per digit increment - that's ok for a subtraction algorithm. My question is: Why does this "new" (I don't know how old it is) algorithm perform so bad on AVRs with my version of the code? Has anyone seen big performance brakes in there?

Christoph

I tend to post off-topic replies when I've noticed some interesting detail.
Feel free to stop me.

I tried to optimize the algorithm and got down to 52 bytes of code, 8 registers and 70 clocks (including call and return) but there is a problem with the value of 159. What I did was to simplify all five of the "divide by 10" with a multiply by a factor of 26 / 256. I guess there is a rounding error so the result I am getting is 1, 5, 255. I do not know if I can simply say if the remainder is 255 then use 9 instead since I have only checked 0 - 160, 12345, 32768 and 65535.

:shock: WOW!
Was that optimization done on my code or did you optimize the original algorithm? Can I have a look at it? I'd like to see if we can find the error together as the usage statistics you gave are quite promising.

Christoph

I tend to post off-topic replies when I've noticed some interesting detail.
Feel free to stop me.

This method for Bin2Bcd is not new, I saw it in an electronics mag some time ago written for the PIC micro.

This multiply and shift method of approximating a divide by 10 is really only much use on those advanced processors which have a barrel shifter (I think its called that). They can apply a shift of a specified number of bits to a source register in one instruction. Therefore your "q = (d0 * 0x19A) >> 12" can be done in one instruction.

Here is my GCC version of this method:
It is not scalable to 32-bits however.

```//Bin2Bcd2 16-Bit binary to unpacked BCD output
//This uses a method I saw in an electronics mag.
//It is highly sneaky; converting hex digits to decimals by assuming that 4096 is really (5000-4)
//and 256 is (240+16) etc.
//Despite the wierdness it is the fastest method by a huge margin. (Ave=167cy) (96 Bytes)
//---------------------------------------------------------------
void Bin2Bcd2 (unsigned int c, char *dest)
{
char tmp, tenThou, thou, hund, tens, units;

//Using hex digits: c=0xABCD
//
//      tenThou thou hund tens units
//Initial:      4096  256  16    1
//               A     B   C     D

units=(char)c; hund=BYTEB(c);
tens=units; thou=hund;
units&=0x0F;  hund&=0x0F;
__asm__ ( " swap %0 \n" : "=r" (tens) : "0" (tens) ); tens&=0x0F;
__asm__ ( " swap %0 \n" : "=r" (thou) : "0" (thou) ); thou&=0x0F;

//Init done. Now perform the maths.
tmp=(thou+hund+tens)<<2; tmp+=20;
units-=tmp;							//units= D-4(A+B+C)-20
hund*=2; tens*=2;
tens+=hund; tens+=hund; tens+=hund;
tens-=138;							//tens = 6B+2C-138
hund+=thou-46;						//hund = A+2B-46
thou = 4*thou-64;					//thou = 4A-64
tenThou=7;							//tenThou = 7 (constant init)

//Maths done, all -VE except tenThou. Now Normalise each digit by adding 10 until +ve

__asm__ (
"1:	dec %1		\n"
"	subi %0,-10	\n"
"	brcs 1b		\n"
: "=r" (units),  "=r" (tens) : "0"  (units),  "1"  (tens)
);

__asm__ (
"1:	dec %1		\n"
"	subi %0,-10	\n"
"	brcs 1b		\n"
: "=r" (tens),  "=r" (hund) : "0"  (tens),  "1"  (hund)
);

__asm__ (
"1:	dec %1		\n"
"	subi %0,-10	\n"
"	brcs 1b		\n"
: "=r" (hund),  "=r" (thou) : "0"  (hund),  "1"  (thou)
);

__asm__ (
"1:	dec %1		\n"
"	subi %0,-10	\n"
"	brcs 1b		\n"
: "=r" (thou),  "=r" (tenThou) : "0"  (thou),  "1"  (tenThou)
);

*dest++=tenThou;
*dest++=thou;
*dest++=hund;
*dest++=tens;
*dest=units;
}```

I would like to see danni's 20 word, 20 cycle subtraction code

Nigel

This method for Bin2Bcd is not new, I saw it in an electronics mag some time ago written for the PIC micro.

This multiply and shift method of approximating a divide by 10 is really only much use on those advanced processors which have a barrel shifter (I think its called that). They can apply a shift of a specified number of bits to a source register in one instruction. Therefore your "q = (d0 * 0x19A) >> 12" can be done in one instruction.

Here is my GCC version of this method:
It is not scalable to 32-bits however.

```//Bin2Bcd2 16-Bit binary to unpacked BCD output
//This uses a method I saw in an electronics mag.
//It is highly sneaky; converting hex digits to decimals by assuming that 4096 is really (4100-4)
//and 256 is (260-4) and 16 is (20-4). This common -4 appears in the maths below.
//Despite the weirdness it is the fastest method by a huge margin. (Ave=167cy) (96 Bytes)
//---------------------------------------------------------------
void Bin2Bcd2 (unsigned int c, char *dest)
{
char tmp, tenThou, thou, hund, tens, units;

//Using hex digits: c=0xABCD
//
// tenThou thou hund tens units
//Initial: 4096 256 16 1
// A B C D

units=(char)c; hund=c/256);
tens=units; thou=hund;
units&=0x0F; hund&=0x0F;
__asm__ ( " swap %0 \n" : "=r" (tens) : "0" (tens) ); tens&=0x0F;
__asm__ ( " swap %0 \n" : "=r" (thou) : "0" (thou) ); thou&=0x0F;

//Init done. Now perform the maths.
tmp=(thou+hund+tens)<<2; tmp+=20;
units-=tmp; //units= D-4(A+B+C)-20
hund*=2; tens*=2;
tens+=hund; tens+=hund; tens+=hund;
tens-=138; //tens = 6B+2C-138
hund+=thou-46; //hund = A+2B-46
thou = 4*thou-64; //thou = 4A-64
tenThou=7; //tenThou = 7 (constant init)

//Maths done, all -VE except tenThou. Now Normalise each digit by adding 10 until +ve

__asm__ (
"1: dec %1 \n"
" subi %0,-10 \n"
" brcs 1b \n"
: "=r" (units), "=r" (tens) : "0" (units), "1" (tens)
);

__asm__ (
"1: dec %1 \n"
" subi %0,-10 \n"
" brcs 1b \n"
: "=r" (tens), "=r" (hund) : "0" (tens), "1" (hund)
);

__asm__ (
"1: dec %1 \n"
" subi %0,-10 \n"
" brcs 1b \n"
: "=r" (hund), "=r" (thou) : "0" (hund), "1" (thou)
);

__asm__ (
"1: dec %1 \n"
" subi %0,-10 \n"
" brcs 1b \n"
: "=r" (thou), "=r" (tenThou) : "0" (thou), "1" (tenThou)
);

*dest++=tenThou;
*dest++=thou;
*dest++=hund;
*dest++=tens;
*dest=units;
}
```

I would like to see danni's 20 word, 20 cycle subtraction code

Nigel

Last Edited: Sun. Apr 3, 2016 - 03:03 PM

I worked on it last night and I think it is working. I had to replace the first two divide by 10s with the 410/4096 factor due to horrible rounding errors. I wrote some Visual C++ code to test the algorithm and there are around 20,000 errors when using 26/256 for both divisions and 10,000 when using 410/4096 for d0 and 26/256 for d1.

I think it added approximately 15 words of code, another register and the cycle count was somewhere around 90. After going to bed, I think I can remove another 10 cycles from it.

This code uses the MUL instruction and will only work for 16 bit unsigned numbers. It is the caller's responsibility to do the two's complement for negative numbers.

I started with Buffi's file and optimized the math. I removed the loop for the mod 10 and with the d1 (or d2 can't remember) = d1 - q*10 code. For the multiplication, I multiplied the LOW(d0) with LOW(410) and kept the total in R1:R0. Then instead of multipling HIGH(410) * LOW(d0), I added the LOW(d0) to R1 since the HIGH(410) is always 1. I set the T flag if a carry happened. The maximum value for d0 = 6*(d1 + d2 + d3) + d0 is 285 so the HIGH(d0) will either be 0 or 1. So if the HIGH(d0) is non-zero, I added the LOW(410) to R1. Since the HIGH(410) is always 1 a, 256 also needs to be added to R1 if the HIGH(d0) is non-zero. That means always setting the T flag if the HIGH(d0) is non-zero. The multiplication result is T:R1:R0 which needs to be divided by 4096 or right shifted by 12. That means keeping only the upper nibble of R1, swapping it to the lower nibble and loading bit 4 with T.

I did the same thing for d1 but I believe the maximum value for d1 = q + 9*d3 +5*d2 +d1 is 253 so the HIGH(d1) will always be 0. This is where the ten cycles can be removed.

I will post the code later today. I am also going to try to skip the d1, d2, d3, d4 code if their digits are zero. This will make the cycle count vary. I believe it will take more cycles for testing then actually doing it but that remains to be determined.

This version of the binary to BCD takes 88 cycles, 71 words of code, and 10 registers. I think some more cycles and maybe a register could be removed but I am tired of looking at it. This passes my POGE test. There is test code included that calls the routine with every possible value (0 .. 65535) and outputs the result in ASCII out UART0 at 38400 bps assuming a 12.288 MHz crystal is being used.

This routine uses the MUL instruction so it will not on AVRs that do not have that instruction. I do not think I have any other ATmega specific code though. It only works for a 16 bit unsigned value. If you want a signed value then do a two's complement before calling it.

I am planning on doing a 32 bit version but I do not know when I will be able to work on it.

## Attachment(s): PutDec.asm

1/6 incorrect values sounds like a lot, but in many apps the low digit could be dropped anyway--like calculating percentages to the hundreth but only displaying to the tenth. Ate all the errors only +/-1? Or even better, all +1 or -1?

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

No, the errors were not all by -1, the one's digit would be off by -10. Those could be corrected by adding ten to the one's digit if bit 7 was set. But the error would cascade into the ten's digit and hundred's digit so that is why I had to use the 410/4096 for the one's digit divide by ten operation.

The code I posted does not suffer from these problems.

For example when using 26 / 256 for all divide by ten operations, the number to convert is in the left column followed by each of the BCD digits.
7587, 0, 7, 5, 8, 7
7588, 0, 7, 6, 255, 254
7589, 0, 7, 6, 255, 255
7590, 0, 7, 5, 9, 0

10588, 1, 0, 5, 8, 8
10589, 1, 0, 6, 255, 255
10590, 1, 0, 6, 255, 0
10591, 1, 0, 5, 9, 1

Using 410/4096 for the one's digit divide by ten and using 26/256 for the ten's digit resulted in the ten's digit being 255 sometimes. It looked like when that happened, the hundred's digit was off by +1. That probably could of been corrected but I decided to just go ahead and use 410/4096 also.

2289, 0, 2, 2, 8, 9
2290, 0, 2, 3, 255, 0
2291, 0, 2, 3, 255, 1

N.Winterbottom wrote:
I would like to see danni's 20 word, 20 cycle subtraction code

Its not 20 cycle, its 20...170 cycle.

Its my C example above, written in assembler.

Peter

I like the C version (easy for me to incorporate), but for me it had problems with unsigned's larger than 32768 (negatives), I modified the routine to check for the Carry flag in SREG and this seems to work.

```char digit;

void bin2bcd( unsigned int val )
{
char i;

i = '0' - 1;
do
{
i++;
val -= 10000;
}
while(!(SREG & (_BV(0))));
digit = i;

i = '0' + 10;
do
{
i--;
val += 1000;
}
while((SREG & (_BV(0))) );
digit = i;

i = '0' - 1;
do
{
i++;
val -= 100;
}
while(!(SREG & (_BV(0))));
digit = i;

i = '0' + 10;
do
{
i--;
val += 10;
}
while(!(SREG & (_BV(0))));
digit = i;

digit = val | '0';
}

```

My version of another algorithm

```//HEXTOBCD-HEXTOBCD-HEXTOBCD-HEXTOBCD-HEXTOBCD-HEXTOBCD-HEX

void hex_to_bcd(unsigned long information)
{
res0=0,res1=0,res2=0,res3=0;
res4=0,res5=0;
//	test value of tbfr_h
//	tbfr_h = 0x2710;	//10000
//	tbfr_h = 0x03e8;	//1000
//	tbfr_h = 0xea60;	//60000

//information=tbfr_h;

unsigned int s=0;
//	s - use for devide by displacement

while(s != 32)   // 32 for long type 16 -int
{
//Öèêëè÷åñêèé ñäâèã èíôîðìàöèè //Cyclic Shift
res5 = (res5<<1)+((res4>>3)&1);
res4 = (res4<<1)+((res3>>3)&1);
res3 = (res3<<1)+((res2>>3)&1);
res2 = (res2<<1)+((res1>>3)&1);
res1 = (res1<<1)+((res0>>3)&1);
res0 = (res0<<1)+(information>>31);  //15 for int
//// Multiply by 2
information = (information<<1);
//nible
res0 = res0&0x0f;
res1 = res1&0x0f;
res2 = res2&0x0f;
res3 = res3&0x0f;
res4 = res4&0x0f;
res5 = res5&0x0f;
if(s != 31)
{//Decimal correction
if(res5 > 4)
res5 = res5+3;
if(res4 > 4)
res4 = res4+3;
if(res3 > 4)
res3 = res3 + 3;
if(res2 > 4)
res2 = res2+3;
if(res1 > 4)
res1 = res1+3;
if(res0 > 4)
res0 = res0+3;
}
s++;
}
buf1 = digits[res0];//last significant
buf2 = digits[res1];
buf3 = digits[res2];
buf4 = digits[res3];//most significant
}```

I assume packed BCD is outdated today.

It was only useful on former days, when computers are extremely short on RAM..

Today a whole byte per digit (ASCII or 7-segment code) is many more convenient.

Especially since packed BCD need more words and cycles on the AVR.

Peter

danni wrote:
I assume packed BCD is outdated today.

Maybe "today", but not necessarily "yesterday".

"Yesterday" we had several production designs based on AT90S4433. There weren't any Mega8's. When Mega8's did appear, they were US\$1+ more costly than '4433.

With 128 bytes of SRAM yet a sizable amount of flash, large '4433 apps can get real tight on SRAM. Also, packed BCD allowed me to use registers for building a 6x 7-segment output; I wouldn't have had enough working registers for unpacked; the unpacking only was needed in the display routine itself.

"Today" a Mega8/88 with 8x SRAM of '4433 (actually since about 2 years ago when the Mega8 price dropped to ~US\$2/100 qty) I might agree with you. :) But, as always, it all depends on the particular app.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Heres the most efficient binary to bcd routine ive seen, except that its in PIC

```;********************************************************************
;                Binary To BCD Conversion Routine (16 Bit)
;                       (LOOPED Version)
;
;      This routine converts a 16 Bit binary Number to a 5 Digit
; BCD Number.
;
;       The 16 bit binary number is input in locations Hbyte and
; Lbyte with the high byte in Hbyte.
;       The 5 digit BCD number is returned in R0, R1 and R2 with R0
; containing the MSD in its right most nibble.
;
;   Performance :
;               Program Memory  :  32
;               Clock Cycles    :  750
;
;*******************************************************************;
;
B2_BCD_Looped
bsf      ALUSTA,FS0
bsf      ALUSTA,FS1            ; set FSR0 for no auto increment
;
bcf      ALUSTA,C
clrf     count, F
bsf      count,4         ; set count = 16
clrf     R0, F
clrf     R1, F
clrf     R2, F
loop16a
rlcf     Lbyte, F
rlcf     Hbyte, F
rlcf     R2, F
rlcf     R1, F
rlcf     R0, F
;
dcfsnz   count, F
return
movwf     FSR0
;
incf     FSR0, F
;
incf     FSR0, F
;
goto    loop16a
;
movfp    INDF0,WREG
btfsc      WREG,3          ; test if result > 7
movwf     INDF0
movfp    INDF0,WREG
btfsc      WREG,7          ; test if result > 7
movwf     INDF0           ; save as MSD
return```

Quote:

Heres the most efficient ...

Efficient for what? Processor cycles? Code space?

I saw the "efficient", and then I saw not only a loop but calls & returns!?!

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Its efficient in that it uses algebra instead of division by powers and also that it converts directly to packed bcd. wow I forgot how annoying the fsr is.

outer_space wrote:
Its efficient in that it uses algebra instead of division by powers and also that it converts directly to packed bcd.

But packed bcd was mostly not the goal, so you need further conversion steps to get ASCII or 7-segment.

And thus methods with direct ASCII output are many more efficient, e.g. look on my example code above (32 bit to ASCII). It works with the optimized subtraction powers of 10 method.

Peter

packed bcd is great if youre working with a segment display and want to store all the digits in low registers, also if youre using a pic you only get 64? bytes of data memory.

Hi Lee,
I'll let you and the 'gang' decide how efficient this code is, but it it stll found in the tools section. Performs 8,18,24,32 bit, signed and unsigned ASCII conversion. Let us know how it rates (> 2000 downloads).
"ASCII printing routines"
https://www.avrfreaks.net/index.p...

Kind Regards,
Jack Tidwell

## Attachment(s): ASCII_printing_routines.zip

outer_space wrote:
packed bcd is great if youre working with a segment display

???

Sorry, but I'm confused totally now :?:

You need a 7-segment code to drive 7-segment displays.

On my applications I use e.g. 5 bytes of SRAM for 5 digits and put the 7-segment pattern into it (with leading zeros blank) and then the multiplex timer interrupt drive digit after digit.

Thus the binary to 7-segment conversion must not be done inside the interrupt handler and so the smallest code size was most efficient.

Peter

following an example code for 7 segment output, extremely efficient (only 36 words).

```;*************************************************************************
;*                                                                       *
;*                      Convert unsigned 16bit to 7-Segment              *
;*                                                                       *
;*              Author: Peter Dannegger                                  *
;*                      danni@specs.de                                   *
;*                                                                       *
;*************************************************************************
;
.nolist
.include "2313def.inc"
.list
;
.equ    _0A     = 0x02                          ;segment order
.equ    _0B     = 0x04
.equ    _0C     = 0x40
.equ    _0D     = 0x10
.equ    _0E     = 0x08
.equ    _0F     = 0x01
.equ    _0G     = 0x20
.equ    _0DP    = 0x80                          ;decimal point

.equ    _00     = ~( _0A+_0B+_0C+_0D+_0E+_0F     )      ;number pattern, low active
.equ    _01     = ~(     _0B+_0C                 )
.equ    _02     = ~( _0A+_0B+    _0D+_0E+    _0G )
.equ    _03     = ~( _0A+_0B+_0C+_0D+        _0G )
.equ    _04     = ~(     _0B+_0C+        _0F+_0G )
.equ    _05     = ~( _0A+    _0C+_0D+    _0F+_0G )
.equ    _06     = ~( _0A+    _0C+_0D+_0E+_0F+_0G )
.equ    _07     = ~( _0A+_0B+_0C                 )
.equ    _08     = ~( _0A+_0B+_0C+_0D+_0E+_0F+_0G )
.equ    _09     = ~( _0A+_0B+_0C+_0D    +_0F+_0G )
;
.dseg
.org	0x60
digits:
.byte	5			;digit data for multiplex interrupt
.cseg
;-------------------------------------------------------------------------
;input: R17, R16= 16 bit value 0 ... 65535
;output: digits = 5 digits (7-segment code)
;
;words: 36 (40)
;
bin16_ascii:
ldi	yl, digits
ldi	zh, high(2 * segment_tab)
ldi	zl, low( -1 + 2 * segment_tab)
_bcd1:
inc	zl
subi	r16, low(10000)
sbci	r17, high(10000)
brcc	_bcd1
rcall	_bcd5
ldi	zl, low(10 + 2 * segment_tab)
_bcd2:
dec	zl
subi	r16, low(-1000)
sbci	r17, high(-1000)
brcs	_bcd2
rcall	_bcd5
st	y+, r0
ldi	zl, low(-1 + 2 * segment_tab)
_bcd3:
inc	zl
subi	r16, low(100)
sbci	r17, 0
brcc	_bcd3
rcall	_bcd5
ldi	zl, low(10 + 2 * segment_tab)
_bcd4:
dec	zl
subi	r16, -10
brcs	_bcd4
rcall	_bcd5
ldi	zl, low(2 * segment_tab)
_bcd5:
lpm				;number to 7-segment
st	y+, r0			;store in multiplex SRAM
ret
;-------------------------------------------------------------------------
.if	((pc + 4) ^ pc) & 0x80	;table inside the same 256 byte ?
.org	(pc & 0xFF80) + 0x80	;otherwise next 256 byte
.endif
segment_tab:
.db	_00, _01, _02, _03, _04, _05, _06, _07, _08, _09
;-------------------------------------------------------------------------

```

Peter

Hi Guys!

I DO REALIZE this thread is over 12 years old.....but I really needed a 32-bit hex to BCD ASM routine and to be frank I just wasn't up to the task of writing it, so I decided to do some searching....

In POST#12 danni (Peter Dannegger) posted what looked like a perfect fit for my needs.....but it didn't work.... While the very cool tricks he used are way beyond my little wheelhouse, I really needed it to work, so I set to it in the debugger.....the problem starts @ _bcd7: and follows through to the end .... here it is:

```</p>
<pre>
ldi     r19, -1 + '0'
_bcd7:  inc     r19
subi    r30, byte1(1000)          ;-1000
sbci    r31, byte2(1000)
brcc    _bcd7

ldi     r18, 10 + '0'
_bcd8:  dec     r18
subi    r30, byte1(-100)          ;+100
sbci    r31, byte2(-100)
brcs    _bcd8

ldi     r17, -1 + '0'
_bcd9:  inc     r17
subi    r30, 10                 ;-10
brcc    _bcd9

subi    r30, -10 - '0'
mov     r16, r30
ret
</pre><p>```

It is a very simple slip-up.....in the section above, to fix it simply replace r31 with r29 and replace r30 with r28..... after the replacements are made the code works perfectly.  What amazes is that no one has pointed out this typo in the intervening 12 years....

@danni,  THANKS for the great code!

Fish

Fish4Fun wrote:
@danni, THANKS for the great code!

Indeed, sentiments echoed...

And thanks for finding the bug.

Even though it's an old thread, it's still giving.

I'd just found that routine, and am now using it.

I would probably have tried one of the others if it didn't work, so lucky I saw your patch.

Works great now.

I wanted to limit the use of the precious high set registers, so I made some simple changes:-

* Changed the output registers to R06-R15

* Added a prologue that stores the two constants in memory (-1+'0' & 10+'0')

* Changed the LDI's for the output registers to LDS's from the two constants

Cheers,

Rob

intabits wrote:

I wanted to limit the use of the precious high set registers, so I made some simple changes:-

* Changed the output registers to R06-R15

* Added a prologue that stores the two constants in memory (-1+'0' & 10+'0')

* Changed the LDI's for the output registers to LDS's from the two constants

Just realized I should have pushed two high registers and kept the constants in those, making the LDS's into MOV's.

I think the main reason for not been found is that not many use 32bit in ASM (then most people use C the the libs with the compiler ).

2. it can be rather slow with the correct (wrong) numbers, but if speed don't matter it's a easy way. (and then it can be done with a LUT and one big loop).

To make worst case a bit faster you can do a sub with numbers like 4000, and when negative then add (by sub the minus number) 1000.

That way you avoid the long loop for 9 (and 0 the other way) . (as I remember 3000 and 1000 work as well).

If you use a AVR with MUL there is a faster way by div with 10000 and deal with it as two 16 bit numbers (and a loop for first digit).

Then it can be done in less than 200clk for all numbers.

Some may find these changes useful...

Eliminated the need for output registers by making it call a procedure to display/store/whatever for each digit.

With a little extra code, this output procedure becomes a helper routine to control the output width, and provide

;*******************************************************************************
;    Convert unsigned 32bit to ASCII
;    Author: Peter Dannegger danni@specs.de
;    Viciously hacked about by Rob Storey
;-------------------------------------------------------------------------------
;    Input: 32 bit value 0 ... 4294967295
;       Passed in 4 symbolic registers B4:B3:B2:B1 (from the high register set)
;    Output: 10 ASCII digits passed in TL via successive calls (MSD to LSD)
;       to procedure "PutDecChar", which can do as it pleases with them...
;       EG: Write to output, Store in memory, Zero suppress before sending, etc.
;*******************************************************************************
M1P0        EQU    -1+'0'
TNP0        EQU    10+'0'
MTP0        EQU    -10-'0'

Bin32ToASC  PUSHM    TL,TH,B1,B2,B3,B4     ;Or whatever....
SUBI    TH,10                  ;Convert width in TH to output char counter
NEG    TH
;
LDI    TL,M1P0
Bin2Asc1:   INC   TL
SUBI     B2,Byte2(1000000000)  ;-1000,000,000 until overflow
SBCI  B3,Byte3(1000000000)
SBCI  B4,Byte4(1000000000)
BRCC  Bin2Asc1
RCALL    PutDecChar
;
LDI    TL,TNP0
Bin2Asc2:   DEC   TL
SUBI  B2,Byte2(-100000000)  ;+100,000,000 until no overflow
SBCI  B3,Byte3(-100000000)
SBCI  B4,Byte4(-100000000)
BRCS  Bin2Asc2
RCALL    PutDecChar
;
LDI    TL,M1P0
Bin2Asc3:   INC   TL
SUBI  B1,Byte1(10000000)    ;-10,000,000
SBCI  B2,Byte2(10000000)
SBCI  B3,Byte3(10000000)
SBCI  B4,0
BRCC  Bin2Asc3
RCALL    PutDecChar
;
LDI    TL,TNP0
Bin2Asc4:   DEC   TL
SUBI  B1,Byte1(-1000000)    ;+1,000,000
SBCI  B2,Byte2(-1000000)
SBCI  B3,Byte3(-1000000)
BRCS  Bin2Asc4
RCALL    PutDecChar
;
LDI    TL,M1P0
Bin2Asc5:   INC   TL
SUBI  B1,Byte1(100000)      ;-100,000
SBCI  B2,Byte2(100000)
SBCI  B3,Byte3(100000)
BRCC  Bin2Asc5
RCALL    PutDecChar
;
LDI    TL,TNP0
Bin2Asc6:   DEC   TL
SUBI  B1,Byte1(-10000)        ;+10,000
SBCI  B2,Byte2(-10000)
SBCI  B3,Byte3(-10000)
BRCS  Bin2Asc6
RCALL    PutDecChar
;
LDI    TL,M1P0
Bin2Asc7:   INC   TL
SUBI  B1,Byte1(1000)          ;-1000
SBCI  B2,Byte2(1000)
BRCC  Bin2Asc7
RCALL    PutDecChar
;
LDI    TL,TNP0
Bin2Asc8:   DEC   TL
SUBI  B1,Byte1(-100)          ;+100
SBCI  B2,Byte2(-100)
BRCS  Bin2Asc8
RCALL    PutDecChar
;
LDI    TL,M1P0
Bin2Asc9:   INC   TL
SUBI  B1,10                 ;-10
BRCC  Bin2Asc9
RCALL    PutDecChar
;
SUBI  B1,MTP0
MOV   TL,B1
RCALL    PutDecChar
;
POPM    TL,TH,B1,B2,B3,B4    ;Or whatever....
RET

;*******************************************************************************
;    Output Helper Routine for Bin32ToASC function, TL=Char TH=Width counter
;    Implements Output Width and Override, and Leading Zero Suppression
;    Caller of Bin32ToASC sets Width in TH, uses SET/CLT to turn LZS On/Off
;    If width is set too low, masked non-zero digits are shown anyway
;-------------------------------------------------------------------------------
PutDecChar  DEC    TH             ;Count Digits
BRPL   PutDecCharC    ;Time to start showing them?
BRTC   PutDecCharS    ;Yes, Is Zero Suppression Enabled?
; Two reasons for being here: Not showing due to:      Width  | LZS
PutDecCharC CPI    TL,'0'         ;Non-zero masked by width?  | Still in zeros?
BREQ   PutDecCharX    ;Yes: show anyway           | No: start showing
CLT                   ;Force suppression off      | End of LZS
PutDecCharS RCALL  PutByte        ;Show output char
PutDecCharX RET

A few more lines could make the characters suppressed due to width appear as spaces, thus giving right-justification