Optimizing libc integer conversion routines

Go To Last Post
69 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Preamble. A few of us started discussing about optimizing libc integer conversion routines (utoa, itoa, ltoa…) in another thread. All in all, we hijacked that thread and it became difficult to follow for everyone when it eventually moved back to the original issue (cudos to the original poster for not bitching about it!). So we've decided to keep going in this new dedicated thread…

 

It all started with a cycle count for ltoa(1 000 000, buff, 10): ~2350, which seemed a bit much to several of us.

Here are other typical benchmarks straight from the libc manual:

Function Units avr2 avr25 avr4
itoa (12345, s, 10) Flash bytes
Stack bytes
MCU clocks
110 (110)
2
879
102 (102)
2
875
102 (102)
2
875
ltoa (12345L, s, 10) Flash bytes
Stack bytes
MCU clocks
134 (134)
2
1597
126 (126)
2
1593
126 (126)
2
1593

Note that ltoa gets better results, but is here tested with a value (12345) that fits in 16 bits only.

 

Those results are not completely satisfactory, and lead many experienced freaks to brew their own version of these routines, which defeats their original purpose. Yet the stdlib code is most likely not to blame; rather a generic legacy interface supporting, and treating equally, all radices from 2 to 36. Whereas, in most real-world use cases, some radices (10 in particular) are immensely more common than others.

 

Hopefully, there is room in the current implementation to provide dedicated implementations to privileged radices:

extern __inline__ __ATTR_GNU_INLINE__
char *utoa (unsigned int __val, char *__s, int __radix)
{
    if (!__builtin_constant_p (__radix)) {
	extern char *__utoa (unsigned int, char *, int);
	return __utoa (__val, __s, __radix);
    } else if (__radix < 2 || __radix > 36) {
	*__s = 0;
	return __s;
    } else {
	extern char *__utoa_ncheck (unsigned int, char *, unsigned char);
	return __utoa_ncheck (__val, __s, __radix);
    }
}

As I understand it, the purpose of this thread is to develop some well tested optimized alternatives for such radices (starting with 10) and propose them for upstream integration. In case this eventually fails, it would be to provide de facto alternatives to whoever needs some tight and/or fast code for such features.

ɴᴇᴛɪᴢᴇᴎ

Last Edited: Sun. Sep 18, 2016 - 10:11 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, so let's also link to this important previous thread. There is a specialized utoa code there by Peter Dannegger that takes 20-170 cycles, and the one by Jens Norgaard-Larsen that takes about constant 60 something cycles, but omits the final BCD to ASCII conversion, so it's not a complete utoa. Both are for base 10.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 

First lets find out which format it should support.

If this is for C use the output should be using a ST Y+ instruction as output. (both in #2 link use reg.).

And should be written with a min. use of reg.

My biggest problem is that I'm busy now

 

And somewhere I should find a link about a helping rutine for extent my fast code to 32bit. (it's a div 32bit with 10000).

Last Edited: Mon. Sep 19, 2016 - 06:41 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And I guess that I should add, that :

Because a NEG on 16 bit  only take 3 clk on an AVR signed should use same rutine.(negaiv either output direct '-' or remember and negate number).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, I wrote an optimized version of uint_16 to BCD, it takes 49 cycles:

 

;             +--------------------------+
;             | uint16_t to 5 byte BCD   |
;             | by El Tangas @AVRFreaks  |
;             |          2016            |
;             +--------------------------+
;
; input: little endian uint16_t in R17:R16
; output: little endian unpacked BCD in R24:R23:R22:R21:R20
; changes: R17,R18,R19,R0,R1

;needed constants - 16777.216 = 1/1000 * 256 * 65536
	ldi	r18, low(16777)
	ldi	r19, high(16777)

	clr	r24		;zero register for arthmetic operations

;16x16 mul input * 16777 = (input/1000) << 24
	mul	r16, r18	;Low * Low
	movw	r20, r0
	mul	r17, r19	;High * High
	movw	r22, r0

	mul	r16, r19	;Low * High
	add	r21, r0
	adc	r22, r1
	adc	r23, r24

	mul	r17, r18	;High * Low
	add	r21, r0
	adc	r22, r1
	adc	r23, r24

;divide original value by 4 (only more significant byte is needed) and add
;in other words, add input * 0.25 to obtain a better aproximation (input * 16777.25)
	lsr	r17
	lsr	r17
	subi	r17, -2		;add 2 to correct rounding errors (need to always round up)
	add	r21, r17
	adc	r22, r24	;r22:r21 contains rounded remainder, r20 discarded
	adc	r23, r24	;r23 contains 10000s and 1000s (0-65)

;16x8 mul r22:r21 by 10
	ldi	r19, 10		;load constant 10
	mul	r21, r19
	movw	r20, r0
	mul	r22, r19
	add	r21, r0
	adc	r1, r24
	mov	r22, r1		;r22 contains 100s
	inc	r21		;r21 contains rounded remainder, r20 discarded

;8x8 mul r21 by 10
	mul	r21, r19	;multiply remainder by 10
	movw	r20, r0		;r21 contains 10s, r20 remainder

;8x8 mul r20 by 10
	mul	r20, r19	;multiply remainder by 10
	mov	r20, r1		;r20 contains units

;extract 2 high digits from r23 (does nothing if initial value is in 0-9999 range)
;this algo is only valid for 0-68 range due to sloppy correction but it's enough
;divide by 10 for first digit
	ldi	r18, 26		;load constant 26 = 1/10 * 256 rounded up
	mul	r23, r18	;equivalent to (div by 10) << 8
	mov	r24, r1
	sub	r0, r1		;correct the remaider (equivalent to multiply by 25.9)
;multiply remainder by 10 for second digit
	mul	r0, r19
	mov	r23, r1

The last block doesn't do anything useful if the input is < 10000, so for those particular cases it would take 41 cycles.

 

Algorithm is as follows:

1) Divide by 1000 to obtain a number between 0-65 (thousands) and a remainder

2) Remainder is multiplied by 10, 3 consecutive times to obtain hundreds, tens and units.

3) The thousands value (0-65) is divided by 10 to obtain 10,000s

4) The remainder is multiplied by 10 to obtain the thousands.

 

I tested and there seems to be no rounding errors, but if someone wants to confirm I'd be grateful.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nice job

 

Have not checked all numbers but seems to be correct.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wrote an optimized version of uint_16 to BCD

I don't see a label to call the routine or a ret.

John Samperi

Ampertronics Pty. Ltd.

https://www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yeah, I like to keep my options open. Maybe it's a macro, maybe it's inline, maybe add the entry point and ret and it's a function.

Truth is, I don't even know what are avr-gcc calling conventions, this is just an embodiment of the algorithm, the rest are details.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I was planning to make a string version of my code but after yours I don't want to use more time on that.

 

For the general code it should output with '0'-'9' with ST Y+,rnn, so it should output in correct order. (and then try to reuse reg. so the footprint get's small)

 

Then I guess Bob should "look at" / "give input to" then needed formating.

Since a NEG16 only is 3 clk it's not a problem to make it signed, but where to put the '-' etc.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've rewritten the code to be callable from C. It's a function (utobcd) that takes an uint16_t argument and returns the base 10 representation as an unpacked BCD in the lower 5 bytes of an uint64_t (the 3 high bytes are undefined). I think it will be much more useful like this, maybe eventually the thread objective will actually be reached cheeky

 

;             +--------------------------+
;             | uint16_t to 5 byte BCD   |
;             | by El Tangas @AVRFreaks  |
;             |          2016            |
;             +--------------------------+
;
; input: little endian uint16_t in R25:R24
; output: little endian unpacked BCD in R22:R21:R20:R19:R18
;         accessible from C as 5 lower bytes of an uint64_t
; changes: R18-R25 (the uint64_t output) and R0

.global utobcd
utobcd:
;needed constants - 16777.216 = 1/1000 * 256 * 65536
	ldi	r22, (16777)&0xFF	;Low
	ldi	r23, (16777)>>8		;High

;16x16 mul input * 16777 = (input/1000) << 24
	mul	r24, r22	;Low * Low
	movw	r18, r0
	mul	r25, r23	;High * High
	movw	r20, r0

	mul	r25, r22	;High * Low
	clr	r22		;zero register for arithmetic operations
	add	r19, r0
	adc	r20, r1
	adc	r21, r22

	mul	r24, r23	;Low * High
	add	r19, r0
	adc	r20, r1
	adc	r21, r22

;divide original value by 4 (only more significant byte is needed) and add
;in other words, add input * 0.25 to obtain a better aproximation (input * 16777.25)
	lsr	r25
	lsr	r25
	subi	r25, -2		;add 2 to correct rounding errors (need to always round up)
	add	r19, r25
	adc	r20, r22	;r20:r19 contains rounded remainder, r18 discarded
	adc	r21, r22	;r21 contains 10000s and 1000s (0-65)

;16x8 mul r20:r19 by 10
	ldi	r24, 10		;load constant 10
	mul	r19, r24
	movw	r18, r0
	mul	r20, r24
	add	r19, r0
	adc	r1, r22
	mov	r20, r1		;r20 contains 100s
	inc	r19		;r19 contains rounded remainder, r18 discarded

;8x8 mul r19 by 10
	mul	r19, r24	;multiply remainder by 10
	movw	r18, r0		;r19 contains 10s, r18 remainder

;8x8 mul r18 by 10
	mul	r18, r24	;multiply remainder by 10
	mov	r18, r1		;r18 contains units

;extract 2 high digits from r21 (does nothing if initial value is in 0-9999 range)
;this algo is only valid for 0-68 range due to sloppy correction but it's enough
;divide by 10 for first digit
	ldi	r25, 26		;load constant 26 = 1/10 * 256 rounded up
	mul	r21, r25	;equivalent to (div by 10) << 8
	mov	r22, r1
	sub	r0, r1		;correct the remaider (equivalent to multiply by 25.9)
;multiply remainder by 10 for second digit
	mul	r0, r24
	mov	r21, r1

	clr	r1		;restore zero_reg for avr-gcc compatibility
	ret

 

 

As an usage example, this is the test code I wrote to validate this function (Arduino code):

 

extern "C" uint64_t utobcd(uint16_t);

int main() {
	uint16_t counter = 0;
	uint64_t result;
 	do {
		result = utobcd(counter);
		if (((result & 0xFF00000000) >> 32) != (counter / 10000)) break;
		if (((result & 0xFF000000) >> 24) != ((counter % 10000) / 1000)) break;
		if (((result & 0xFF0000) >> 16) != ((counter % 1000) / 100)) break;
		if (((result & 0xFF00) >> 8) != ((counter % 100) / 10)) break;
		if ((result & 0xFF) != (counter % 10)) break;
		counter++;
	} while(counter != 0); //repeat until counter overflows

	//if test successful, light up LED
	if (counter == 0) {
		DDRB |= (1<<5);
		PORTB |= (1<<5);
	}
}

It compares the output of the utobcd function with equivalent calculations made in C, if successful lights the Arduino built in LED (or if first iteration fails - not gonna happen). Result: LED lights after ~4s.

 

edit: changed "5 lower bits" to "5 lower bytes" in code description (thanks for spotting this, Simonetta)

edit 2: initialized 'counter' to zero (thanks, theusch)

Last Edited: Thu. Sep 29, 2016 - 03:32 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I thought this was all about replacing avr-libc itoa(int value,  char * buff,  int radix) with a faster version? A replacement would need to slot in and behave identically so it didn't change (apart from speed) any existing use of itoa(). 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And that was why I wrote :

For the general code it should output with '0'-'9' with ST Y+,rnn, so it should output in correct order. (and then try to reuse reg. so the footprint get's small)

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sure, that is the final goal of the thread. But we don't need to get there in one single leap (I think? This is not "work", right?). I for one have just a very basic understanding of interfacing C and assembly, I'm using this thread to learn that.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

which output do we expect like 123 should output:

"123"

"00123"

"  123"

"123  "

 

And for signed where to add the '-'

 

or is that a pre and post job? 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:

which output do we expect like 123 should output:

"123"

"00123"

"  123"

"123  "

 

And for signed where to add the '-'

 

or is that a pre and post job? 

 

https://www.avrfreaks.net/comment...

But according to netizen that is only "a few cycles"...

netizen wrote:

... a few cycles of post-processing. ...

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wonder if an optimized routine for selected bases could be written in C, and thus benefit from instructions only available on newer versions of the AVR8 architecture? I'm not sure there is anything really helpful in there, but a lot of the instruction timings are different on XMEGA for example, so... What is the baseline for cycle counting, MEGA AVRs?

 

There is also the question of unrolling loops and the like. Size vs. speed trade off. Seems like base 16 is going to be easy, maybe start there.

 

Why do humans write numbers backwards? If only we had had the foresight to write them little endian we could save a few cycles!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
I thought this was all about replacing avr-libc itoa(int value, char * buff, int radix) with a faster version? A replacement would need to slot in and behave identically so it didn't change (apart from speed) any existing use of itoa().

Indeed, eventually that's my goal, but this doesn't leave aside intermediary implementations, proof-of-concept code, or different individual goals. Contributions like El Tangas' might actually end up being the best this thread eventually has to offer…

 

sparrow2 wrote:
which output do we expect like 123 should output:

"123"

"00123"

" 123"

"123 "

And for signed where to add the '-' or is that a pre and post job?

libc generic implementations (of utoa/itoa/ultoa/ltoa) produce a reversed output string, that is piped through strrev().  Usage of the same strrev should probably be considered innocuous in terms of code size, as every other conversion function relies on it anyway. Thus optimized implementation order isn't paramount ―although end result should be human-aligned.

Eventually, the minus sign "-" should be human-readable too.

 

EDIT: There is no concept of right-alignment in these functions, thus no concept of filling char either. That might be considered annoying, but anyway that's how they are in libc. :-)

ɴᴇᴛɪᴢᴇᴎ

Last Edited: Tue. Sep 27, 2016 - 04:53 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

netizen wrote:
Usage of the same strrev should probably be considered innocuous in terms of code size, as every other conversion function relies on it anyway.

On the other hand, an implementation of ultoa() (for example) that can piggy-back on utoa() would win bonus points. :-)

ɴᴇᴛɪᴢᴇᴎ

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm looking into a version for AVR's without MUL (I was about writing tiny's ;) )

 

And that will be very different in structure, and in the ballpark of 1/2 speed.

 

The clk count is for any AVR with MUL, there aren't used any instructions that differ in speed on mega/xmega/tiny

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

FWIW, current libc code like utoa/itoa (resp. ultoa/ltoa) is relying on a common __utoa_common (resp. __ultoa_common), which despite the name processes a signed input

ɴᴇᴛɪᴢᴇᴎ

Last Edited: Tue. Sep 27, 2016 - 05:03 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

netizen wrote:
Usage of the same strrev should probably be considered innocuous in terms of code size, as every other conversion function relies on it anyway.

Putting code size aside for the moment, and defend your "few cycles" premise.  After all, it appears that your goal is a few dozen cycles.  Are you going to add a few dozen (i.e. 2x) for your post processing?  After all, in the other thread you denigrated approaches that melded right-justify and leading-zero suppress and similar, because the root engine took too many cycles and the post-processing only takes a "few".

 

[the seminal thread here, linkied to in the other thread, also gave a "best" approach that left the results reversed.]

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
[the seminal thread here, linkied to in the other thread, also gave a "best" approach that left the results reversed.]

Unfortunately, challenging/buggy moderation tools have not allowed us to cleanly split this topic from the original thread it rose from.

If you feel some valuable code has been left aside in the process, please contribute it below…

ɴᴇᴛɪᴢᴇᴎ

Last Edited: Tue. Sep 27, 2016 - 05:19 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

???  I gave you the link above to your "few cycles" comment, in which reply it stated that the link was in #53 there.  Repeating:

https://www.avrfreaks.net/comment... leads to

https://www.avrfreaks.net/forum/s...

 

And you still have not addressed my repeated queries about the "few cycles" claim, which I re-iterated after sparrow2 mentioned what I thought was related.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There is lots of code in that thread, and you link to two different codes in this comment alone.

Could you be more specific? As in: post some actual code?

 

theusch wrote:
you still have not addressed my repeated queries about the "few cycles" claim,

I'm not sure what those queries are…?

If it's about reversing an intermediary output, that's fine, I covered that in #17. If it's about right-alignment or fill-chars, I've addressed it in the same post.

What part haven't I covered?

 

ɴᴇᴛɪᴢᴇᴎ

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

netizen wrote:
What part haven't I covered?

See #21.  I think it is quite clear.  You mentioned that strrev doesn't count for code size.  That can be debated, but put that aside.  What about the cycles?  As I quoted you, you claimed a "few" cycles.  And again repeating, as you appear to be getting a few dozen cycles for your work here, doesn't adding that much again for post-processing to make a useful output come out to more than a "few"?

 

How many times and how many ways must I say it?  In your comments in both threads and in your work here, cycles seem quite important.

 

With call and pointer setup and looping and termination check, I'd guess a couple dozen cycles to reverse your five bytes.  Maybe a little less if unrolled and a jump into the series of copies.  Leading-zero suppression?  A few more cycles each loop?

 

"The job isn't over until the paperwork is done." [not too wide below, I hope?  I still haven't figured that out; if you keep your window a certain width then why can't all the width be used?]

Image result for the job's not over until the paperwork is done

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Digging on my disc, I found the following code that constantly subtracts powers of 10.

 

Takes less than 350 Ticks and less than 60 bytes of flash.

 

It' untested, maybe I stopped during playing around and turnd to a more interesting topic back then ;-)

 

extern void u16_to_string (char*, unsigned);
extern void s16_to_string (char*, signed);

static const __attribute__((__progmem__))
unsigned pows10[] = { 1e4, 1e3, 1e2, 1e1 };

static __inline__ __attribute__((__always_inline__))
void u2str (char *buf, unsigned u)
{
    register char *r26 __asm ("26") = buf;
    register unsigned r24 __asm ("24") = u;
    register const unsigned *r30 __asm ("30") = pows10;
    __asm volatile ("%~call %x[f]"
                    : "+r" (r26), "+r" (r24), "+r" (r30)
                    : [f] "i" (u16_to_string)
                    : "memory", "22", "23");
}

static __inline__ __attribute__((__always_inline__))
void i2str (char *buf, signed i)
{
    register char *r26 __asm ("26") = buf;
    register signed r24 __asm ("24") = i;
    register const unsigned *r30 __asm ("30") = pows10;
    __asm volatile ("%~call %x[f]"
                    : "+r" (r26), "+r" (r24), "+r" (r30)
                    : [f] "i" (s16_to_string)
                    : "memory", "22", "23");
}
__tmp_reg__  = 0
__zero_reg__ = 1

.macro wadi reg, cst
#if defined (__AVR_TINY__)
    subi    \reg+0, lo8 (-(\cst))
    sbci    \reg+1, hi8 (-(\cst))
#else
    adiw    \reg, \cst
#endif
.endm

#if defined (__AVR_HAVE_JMP_CALL__)
#define XJMP    jmp
#define XCALL   call
#else
#define XJMP    rjmp
#define CALL    rcall
#endif

/*
#undef __AVR_HAVE_LPMX__
#define __AVR_TINY__
*/

#if defined (__AVR_TINY__)
#define __lpm_reg__ R16
#else
#define __lpm_reg__ R0
#endif

;; "+r": Holds power-of-10 as read from pows10[]
#define R_P10_A 22
#define R_P10_B __lpm_reg__

;; "+w": Number to be converted.  This *must* be R24/25
#define R_NUM_A 24
#define R_NUM_B 25

;; "+d": Digit is expanded in this reg
#define R_DIG 23

.text

;; Wandelt int N zu Dezimalstring nach BUF.
;; Länge von BUF: Mindestens 7.
;; Return: Adresse der abschießenden '\0'.
;; Speicher: RAM:    2 Bytes dynamisch (Stack)
;;           Flash: 16 Bytes (ohne u16_to_string)
;; Laufzeit: 92...305 Ticks (incl. CALL + RET)
;;
;; char* s16_to_string (char *buf, int n)
;;
;; Clobbers: R20--R27, R30--R31

s16_to_string:
	sbrs R_NUM_B, 7
#if defined (__AVR_HAVE_JMP_CALL__)
	rjmp 9f
#else
	rjmp u16_to_string
#endif
	neg  R_NUM_B
	neg  R_NUM_A
	sbci R_NUM_B, 0
	ldi  R_DIG, '-'
	st   X+, R_DIG
9:	XJMP u16_to_string

.global	s16_to_string
.type	s16_to_string, @function
.size	s16_to_string, .-s16_to_string

;; Wandelt unsigned N zu Dezimalstring nach BUF.
;; Länge von BUF: Mindestens 6.
;; Return: Adresse der abschießenden '\0'.
;; Speicher: RAM:    2 Bytes dynamisch (Stack)
;;           Flash: 56 Bytes (incl. 8 Bytes für pows10[])
;; Laufzeit: 89...317 Ticks (incl. CALL + RET)
;;
;; char* u16_to_string (char *buf, unsigned n)
;;
;; Clobbers: R20--R27, R30--R31

u16_to_string:
	clt
0:
#if defined (__AVR_HAVE_LPMX__)
	lpm  R_P10_A, Z+
	lpm  R_P10_B, Z+
#elif defined (__AVR_TINY__)
	ld   R_P10_A, Z+
	ld   R_P10_B, Z+
#else
	lpm
	adiw 30, 1
	mov  R_P10_A, __lpm_reg__
	lpm
	adiw 30, 1
#endif /* case have LPM */

	ldi  R_DIG, '0'
1:
	sub  R_NUM_A, R_P10_A
	sbc  R_NUM_B, R_P10_B
	brlo 2f
	inc  R_DIG
	set
	rjmp 1b
2:
	add  R_NUM_A, R_P10_A
	adc  R_NUM_B, R_P10_B
	brtc 3f
	st   X+, R_DIG
3:
	sbrs R_P10_A, 1
	rjmp 0b

	subi R_NUM_A, -'0'
	st   X+, R_NUM_A
	st   X, __zero_reg__
	ret

.global	u16_to_string
.type	u16_to_string, @function
.size	u16_to_string, .-u16_to_string
/*
.section .progmem.data, "a", @progbits
pows10:
	.word	10000
	.word	1000
	.word	100
	.word	10

.type	pows10, @object
.size	pows10, .-pows10
*/

Caveats:

 

- progmem will need avr-gcc v7+ to work for Tiny cores as expected.

 

- Data in progmem is read without ELPM so that the code will not work if sections are moved freely around (which is not uncommon).  Fixing this will add some more #ifdefs.

 

- Some identifiers clobber namespace, they should be renamed.

 

- Integration into libc has still to be done, all the __builtin_constant_p gaga etc.

 

Have fun!

 

avrfreaks does not support Opera. Profile inactive.

Last Edited: Tue. Sep 27, 2016 - 08:42 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

please don't forget to change the "lower five bits" in the initial description to "lower five bytes".

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:
- Integration into libc has still to be done, all the __builtin_constant_p gaga etc.

Have fun!

Thanks for contributing that code. :-)

I thought we'd just branch statically (constant __radix), as in:

extern __inline__ __ATTR_GNU_INLINE__
char *utoa (unsigned int __val, char *__s, int __radix)
{
    if (!__builtin_constant_p (__radix)) {
	extern char *__utoa (unsigned int, char *, int);
	return __utoa (__val, __s, __radix);
+   } else if (__radix == 10) {
+       extern char *__utoa10_ncheck (unsigned int, char *);
+       return __utoa10_ncheck (__val, __s);
    } else if (__radix < 2 || __radix > 36) {
	*__s = 0;
	return __s;
    } else {
	extern char *__utoa_ncheck (unsigned int, char *, unsigned char);
	return __utoa_ncheck (__val, __s, __radix);
    }
}

Or do you mean we should also branch dynamically ―in __utoa()?

ɴᴇᴛɪᴢᴇᴎ

Last Edited: Wed. Sep 28, 2016 - 09:17 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

IMO, no.  This would add quote some penalty to these functions and introduce a size overhead of about 3-fold compared to the current implementation.

 

avr-libc and avr-gcc's libgcc always followed the optimize-for-size policy, and this proved to be a good and sound approach.

 

Such conversion routines are usually not time critical and only used in slow display routines, hence hunting for execution ticks is not indicated.  It might result in satisfying improvements, but these improvenets are likely to be irrelevant for almost all applications, and I am not aware of any user filed bug that complains about too slow integer-to-ascii conversions.

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The problem is that many want log data in a readable format, and in those situations speed sometimes matter. 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Silly question but if you are logging and speed is of the essence why not store the binary and worry about human formatting later ? (and you get he advantage of packing the log tighter so you can log more). In fact things like human formatting can be left until much later - like when the data has been delivered to a PC or other destination. If you are human formatting on an AVR it's presumably for immediate display (LCD, UART, etc).

 

(which kind of brings me back to the first point I made in the first thread on this ;-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I guess that you never have been out and made systems for the industry, they often insist on a csv format. (speed don't matter for the "main" but for the little AVR it does)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nope, never been involved in loggers but if I were designing one I would pack the binary to get as much data logged into the smallest space possible then just have a post processor to humanize it when someone wants to read the log.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

this is not for "loggers" but logging data, on a system that do something else. (often when you trim the system).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If speed really matters that much, then there is no way around using radix which is a power of 2, IMO, together with a specialized algorithm that goes for speed.

 

In such a situation I wouldn't actually rely on standard library implementation to provide bleeding edge performance and use some custom lines that so the trick :-)

 

That's no objection against speeding up standard implementations provided this can be done without bloating the code...

avrfreaks does not support Opera. Profile inactive.

Last Edited: Wed. Sep 28, 2016 - 02:27 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can't expect that a none programmer out in the industry have to look at the temperature as F8 F9 FA ..... he want a human numbers, and with a terminal(program), can log the data, and show them in excel etc. (And that is the way he has done it the last 20+ years )    

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can do that, of course, and it will work as smooth as in the last 20 years.  I am not getting the point that you are trying to make...

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
You can't expect that a none programmer out in the industry have to look at the temperature as F8 F9 FA ..... he want a human numbers, and with a terminal(program), can log the data, and show them in excel etc. (And that is the way he has done it the last 20+ years )
So the program he uses to extract the data from the micro also does a binary->human conversion at that time. What am I missing?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I give up.

Over and out (about why speed matter).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've added leading zero elimination and ASCII conversion to my routine, now it should be compatible with utoa, with hardcoded base 10.

 

;             +--------------------------+
;             |    uint16_t to ASCIIZ    |
;             | by El Tangas @AVRFreaks  |
;             |          2016            |
;             +--------------------------+
;
; input: little endian uint16_t in R25:R24
;        char* in R23:R22
; output: returns the input char*, that now points to an ASCIIZ string
;         containing a decimal representation of the input uint16_t
; changes: R18-R25 and R0

.global __utoa10
__utoa10:
	movw	r30, r22
;needed constants - 16777.216 = 1/1000 * 256 * 65536
	ldi	r22, (16777)&0xFF	;Low
	ldi	r23, (16777)>>8		;High

;16x16 mul input * 16777 = (input/1000) << 24
	mul	r24, r22	;Low * Low
	movw	r18, r0
	mul	r25, r23	;High * High
	movw	r20, r0

	mul	r25, r22	;High * Low
	clr	r22			;zero register for arthmetic operations
	add	r19, r0
	adc	r20, r1
	adc	r21, r22

	mul	r24, r23	;Low * High
	add	r19, r0
	adc	r20, r1
	adc	r21, r22

;divide original value by 4 (only more significant byte is needed) and add
;in other words, add input * 0.25 to obtain a better aproximation (input * 16777.25)
	lsr	r25
	lsr	r25
	subi	r25, -2		;add 2 to correct rounding errors (need to always round up)
	add	r19, r25
	adc	r20, r22	;r20:r19 contains rounded remainder, r18 discarded
	adc	r21, r22	;r21 contains 10000s and 1000s (0-65)

;16x8 mul r20:r19 by 10
	ldi	r24, 10		;load constant 10
	mul	r19, r24
	movw	r18, r0
	mul	r20, r24
	add	r19, r0
	adc	r1, r22
	mov	r20, r1		;r20 contains 100s
	inc	r19			;r19 contains rounded remainder, r18 discarded

;8x8 mul r19 by 10
	mul	r19, r24	;multiply remainder by 10
	movw	r18, r0		;r19 contains 10s, r18 remainder

;8x8 mul r18 by 10
	mul	r18, r24	;multiply remainder by 10
	mov	r18, r1		;r18 contains units

;extract 2 high digits from r21 (does nothing if initial value is in 0-9999 range)
;this algo is only valid for 0-68 range due to sloppy correction but it's enough
;divide by 10 for first digit
	ldi	r25, 26		;load constant 26 = 1/10 * 256 rounded up
	mul	r21, r25	;equivalent to (div by 10) << 8
	mov	r22, r1
	sub	r0, r1		;correct the remaider (equivalent to multiply by 25.9)
;multiply remainder by 10 for second digit
	mul	r0, r24
	mov	r21, r1

	clr	r1				;restore zero_reg for avr-gcc compatibility
	movw	r24, r30	;setup return char*
	ldi	r23, '0'	;constant for ASCII conversion

;leading zero elimination
	cpse	r22, r1		;skip 10000s digit if zero
	rjmp	digit_10000
	cpse	r21, r1		;skip thousands digit if zero
	rjmp	digit_1000
	cpse	r20, r1		;skip hundreds digit if zero
	rjmp	digit_100
	cpse	r19, r1		;skip tens digit if zero
	rjmp	digit_10
	rjmp	digit_1		;units cannot be skipped
;convert to ASCII and store string
digit_10000:
	add	r22, r23
	st	z+, r22
digit_1000:
	add	r21, r23
	st	z+, r21
digit_100:
	add	r20, r23
	st	z+, r20
digit_10:
	add	r19, r23
	st	z+, r19
digit_1:
	add	r18, r23
	st	z+, r18
;add \0 string terminator
	st	z, r1
	ret

 

The code size, 126 bytes, is now larger than the original utoa. Cycle count is 76/78/79/80/81 for 1-5 digits (including call and ret instructions).

 

Tested in Arduino with this code:

 

extern "C" char* __utoa10(uint16_t, char*);

int main() {
	uint16_t counter = 0;
	char result[6];
 	do {
		if (((uint16_t) atol(__utoa10(counter, result))) != counter) break;
		counter++;
	} while(counter != 0); //repeat until counter overflows

	//if test sucessful, light up LED
	if (counter == 0) {
		DDRB |= (1<<5);
		PORTB |= (1<<5);
	}
}

 

edit: initialized counter to zero.

Last Edited: Thu. Sep 29, 2016 - 03:26 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:
Tested in Arduino with this code:

No warning about "value of 'counter' used before initialized" or similar?

 

~80 clocks.  Nice.

 

As you have a separate output "pass", it would be straightforward (nearly trivial) but not quite to have right-justified output to n bytes.  And insert implied decimal point--e.g. millivolts input to volts display.

 

Left as an exercise for the reader is taking millivolts and outputting 4.56 in volts.  (in practice it combines field width and zero-suppression flag and knowing when to stop [don't put out the millivolts 1's digit] and making sure to get 0.12 and not " .12".)  Maybe another 20 cycles or so, but ~100 cycles for "full featured" is nice.

 

Still remains that these efforts are only suited to Megas.

 

 

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Thu. Sep 29, 2016 - 02:52 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Still remains that these efforts are only suited to Megas.
And does it meet SprinterSB's other constraint (really that of the AVR-LibC developers in general) that the code should not grow?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yeah, totally forgot to initialize that variable cheeky, will correct it.

 

@clawson:

The ASCII conversion/leading zero suppression code made it grow to be larger than the original code. It's a tradeoff  between speed and size, I suppose...

I read the previous thread and concluded that size optimization should use the successive subtraction technique, based on this code:

https://www.avrfreaks.net/comment/440976#comment-440976

Last Edited: Thu. Sep 29, 2016 - 03:47 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:
I read the previous thread and concluded that size optimization should use the successive subtraction technique, ...

 

As mentioned in this thread and the prior, often discussed.

 

For base 10, AFAIK, recursive is the smallest at the expense of cycles and stack space.  In the "seminal" tread, Bob G. posted a darned small implementation.  I forget the approach.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is this "safe" code ?

 

I mean don't you need to push and pop the used reg. ? (other than those the compiler use for input and output).

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm following these guidelines, so I expect the code to be "safe". According to these rules, I can change r18-r27, r30-r31 and r0. I can also use r1 but it must be restored to zero on exit.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok

Then I start to understant why the compiler code is so slow. (so for real speed it has to be inline)

 

add:

Or is that the same rules for inline, (that local vars can't be in those high reg. ?

 

I would have expected that a function would tell caller which reg that change, or store those it use

Last Edited: Fri. Sep 30, 2016 - 09:17 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sparrow2 wrote:
is that the same rules for inline, (that local vars can't be in those high reg. ?

With inline, you shouldn't clobber registers. Thus you let the compiler allocate registers that fit your specs.

 

sparrow2 wrote:
I would have expected that a function would tell caller which reg that change, or store those it use

If you read El Tangas last kink again, you'll get that information.

Some registers are call-used (a sub-routine might clobber them), thus you need to save them before calling a subroutine if you need them afterwards.

Some registers are call-saved: it is the subroutine responsibility to save and restore them if it clobbers them.

ɴᴇᴛɪᴢᴇᴎ

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you read El Tangas last kink again, you'll get that information.

Some registers are call-used (a sub-routine might clobber them), thus you need to save them before calling a subroutine if you need them afterwards.

Some registers are call-saved: it is the subroutine responsibility to save and restore them if it clobbers them.

I got that part, I think, and if I read it correct the compiler can't use r19:r18 for a local int, if there is a function call (even if it don't use the regs for anything),

but at least you tell me that if it's inline it will work.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avr-gcc inline assembly can use R0..R31 in any way one wants.

The compiler has to be informed.

gcc inline assembly has the syntax and semantics to connect registers with C variables.

That is what makes gcc inline assembly so much more powerful than others.
 

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

avr-gcc inline assembly can use R0..R31 in any way one wants.

The compiler has to be informed.

gcc inline assembly has the syntax and semantics to connect registers with C variables.

That is what makes gcc inline assembly so much more powerful than others.
 

 

So, do you think it would be better if avr-libc was rewritten, converted to inline assembly?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quite a lot of it already is ;-) 

 

Do bear in mind though the requirement to make best use of 17 different architectures (MUL when you can etc). Handling those variants may be easier in plain Asm with conditional sections. Trying to handle that too in the already fraught inline syntax could be "fun". ;-) 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Wouldn't it be easier to just learn to count in hex and then re-define ASCII so that the codes for ABCDEF came right after 0123456789? Ah, wait, no, someone would complain that they want lowercase, forget that.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

People who use lower case in hex give me the willies! OK, I suppose 0xDEADBEEF and 0xdeadbeef both work but when you have something like 0xACE81ADE ("Ace Blade") it just doesn't look right as 0xace81ade !!

 

(oh and in my world the 'x' is always 'x' and never 'X'. Euugh! This does of course mean "0x%08X" though! ;-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Agreed, lowercase hex is nearly as bad as using spaces instead of tabs, or tab sizes other than 4.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Ok I had some time and here is a ASM version that can run on all AVR's (with RAM).

It is 78 byte, and take max 201 clk (76  200 if you have a defined zero reg.)

But the best is actually the small amount of registers used, only 2 other than input data and pointer

It don't print starting 0's so 123 is printed as 123\0

It's based on the count down up down ....

;input r17:r16
;input Z point to string start
;change r16,r17,r24,r25,Z 


        mov     r25,ZL      ;remember org pointer start
        ldi     r24, '0'-1
ML1:	inc     r24
        subi    r16, low(10000)       
        sbci    r17, high(10000)
        brcc    ML1
        cpi     r24,'0'
        breq    PL1         ;never print a 0 on first digit
        st      Z+,r24

PL1:
        ldi     r24, '0'+10
ML2:	dec     r24
        subi    r16, low(-1000)       
        sbci    r17, high(-1000)
        brcs    ML2
        cpi     r24,'0'
        brne    PL2         ;!= '0' print
        cpse	ZL,r25      ;if nothing printed don't print '0'
PL2:	st      Z+,r24


        ldi     r24, '0'-1
ML3:	inc     r24
        subi    r16, low(100)           
        sbci    r17, high(100)
        brcc    ML3
        cpi     r24,'0'
        brne    PL3         ;!= '0' print
        cpse    ZL,r25      ;if nothing printed don't print '0'
PL3:	st      Z+,r24


        ldi     r24, '0'+10
ML4:	dec     r24
        subi    r16, -10 
        brcs    ML4
        cpi     r24,'0'
        brne    PL4         ;!= '0' print
        cpse    ZL,r25      ;if nothing printed don't print '0'
PL4:	st      Z+,r24


        subi    r16, -'0'
        st      Z+,r16      ;always print last digit
        ldi     r16, 0
        st      Z+,r16      ;0 terminator

I will look at a faster version but the code will be bigger, but something like 150 byte and 120clk should be possible

An other way is something like a 40 byte 300 clk version

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This was better that expected :

54 byte and max 249 clk , and only r18 and r19 added as changed

_digit:
		ldi     r24,'0'-1
_digit1:
		inc     r24
		sub     r16,r18
		sbc     r17,r19
		brcc    _digit1
		cpi	r24,'0'
		brne	_digit2			;!= '0' print
		cpse	ZL,r25		;if nothing printed don't print '0'
_digit2:        st      Z+,r24
		add     r16,r18
		adc     r17,r19
		ret
convert:
		mov     r25,ZL		;pointer have moved print digit
		ldi     r18,low(10000)
		ldi     r19,high(10000)
		rcall   _digit
		ldi     r18,low(1000)
		ldi     r19,high(1000)
		rcall   _digit
		ldi     r18,low(100)
		ldi     r19,high(100)
		rcall   _digit
		ldi     r18,low(10)
;		ldi     r19,high(10)
		rcall   _digit
                subi    r16, -'0'
		st	Z+,r16	;always print last digit
		st	Z+,r19	;terminator
 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Last version before bed time it make both count down and up on each digit (but down with a factor 2 bigger).

It 68 byte and max 188 clk

_digit:
		ldi r24,'0'
_digit1:
		subi r24,-2
		sub r16,r18
		sbc r17,r19
		brcc _digit1
		lsr r19
		ror r18
		add r16,r18
		adc r17,r19
		brcs _digit3
		dec r24
		add r16,r18
		adc r17,r19
//		rjmp _digit4
_digit3:dec r24
_digit4:
		cpi		r24,'0'
		brne	_digit2			;!= '0' print
		cpse	ZL,r25		;if nothing printed don't print '0'
_digit2:st		Z+,r24
		ret

convert:
		mov		r25,ZL		;pointer have movet print digit
		ldi     r18,low(20000)
		ldi     r19,high(20000)
		rcall    _digit
		ldi     r18,low(2000)
		ldi     r19,high(2000)
		rcall    _digit
		ldi     r18,low(200)
		ldi     r19,high(200)
		rcall    _digit
		ldi     r18,low(20)
;		ldi     r19,high(20)
		rcall    _digit
        subi    r16, -'0'
		st		Z+,r16	;always print last digit
		st		Z+,r19	;terminator

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
avr-gcc inline assembly can use R0..R31 in any way one wants. The compiler has to be informed. gcc inline assembly has the syntax and semantics to connect registers with C variables. That is what makes gcc inline assembly so much more powerful than others.

 

FYI, you can't use just any register in inline asm, for example the following code won't compile:

void f (int*);

void g (int a)
{
    f (&a);
    __asm (" " ::: "28");
}

Similar code is frequently used  — not taking the address of a parameter, but passing down the address of a local, non-static buffer array to some receiver or transmitter routine.

 

avrfreaks does not support Opera. Profile inactive.

Last Edited: Fri. Oct 21, 2016 - 06:08 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm not an expert on inline asm, but isn't that just failing because you're telling avr-gcc you're clobbering r28, and avr-gcc requires r28 to be saved and restored? I.e. you can use it however you want if you think you know better, but telling the compiler what you're doing will make it angry.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:
FYI, you can't use just any register in inline asm, for example the following code won't compile:

void f (int*);

void g (int a)
{
    f (&a);
    __asm (" " ::: "28");
}

Similar code is frequently used  — not taking the address of a parameter, but passing down the address of a local, non-static buffer array to some receiver or transmitter routine.

What is the error message?

Does it help to put in the r?

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Adding "r" won't help, that's just some sugar.

 

foo.c: In function 'g':
foo.c:7:1: error: r28 cannot be used in asm here

That's because r28 is part of the frame pointer.
 

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:
Adding "r" won't help, that's just some sugar.

 

foo.c: In function 'g':
foo.c:7:1: error: r28 cannot be used in asm here

That's because r28 is part of the frame pointer.

According to #5 here, it can.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I had a look at how small it can be, and I guess that the code like this is one of the smallest I can think of at the moment.

It is zero terminated, but the zeros aren't removed so 123 will print 00123\0 

It use a 16/8 div routine and because the digits come in the wrong order the zeros is a pain to remove!

it take r17:r16 as input and Z is the pointer

and it take about 760 clk

But it's only 38 byte in size. 

Any smaller code?

convert:
        ldi     r24,10           ;div with 10
	adiw	Z,5
	ldi	r23,5
con0:   clr     r18             ;reminder
        ldi     r25,0x10        ;loop 16 bit
con1:   lsl     r16
        rol     r17
        rol     r18
        cp      r18,r24
        brcs    con3
con2:   sub     r18,r24
        inc     r16
con3:   dec     r25
        brne    con1
	subi	r18,-'0'		;go to ACSII
	st	-Z,r18
        dec     r23
	brne    con0
	std	Z+5,r16			;0 terminating string

Edit forgot an init

Last Edited: Mon. Oct 24, 2016 - 10:07 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nothing to do with libc, but in dumping data, say to a 9600 baud serial port, then yes, transmission speed can be the most expensive factor in data acquisition time.  It's trivial to write a routine that dumps a byte as two hex digits, but then when you paste that data dump into a spreadsheet program (eg. MS Excel), horrible things happen.

 

(0x omitted)

 

1234 is interpreted as 1234 decimal, when it should be 4660 decimal

12A4 is interpreted as hex, or 4772 decimal (as it should).

12E4 is interpreted as 120,000 decimal when it should be 4836.

 

The horrible solution I came up with is to prefix all values with 'F', which entirely defeats the transmission speed benefit, but does force all the data to be interpreted as hexadecimal.

 

F1234 mod F0000 gives 4660 decimal, always.  &c.

 

Dunno if that helps any.  S.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But that is a good reason to use decimal numbers in you log. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I had an AVR system in production awhile ago that used base-62 for communication.  [0-9],[A-Z],[a-z] were the allowed ASCII characters.  I wanted base-64, but the barcode system we were using for input and output wouldn't do punctuation.  Ah well...  S.

 

Edited for typo.  S.

Last Edited: Tue. Oct 25, 2016 - 01:12 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Scroungre wrote:
I wanted base-64
Ah ha - a reinvention of UUcode ? Those of us old enough to remember email before MIME existed will remember passing binary files back and forth UUencoded!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Pretty much!!  Although ours was a homegrown incompatible version.  I am indeed old - uudecode was our pal!  And to be honest, I still have systems out there that are transmission-speed dependent, and I'm always tempted to try.  Few of the Powers-With-Money are inclined to concur.  S.