## integer to string

117 posts / 0 new

## Pages

Do you talk about utoa(16bit) or ltoa (32bit)?

There is a big difference, a utoa can be done in about 70 clk (worst case)

sparrow2 wrote:

Do you talk about utoa(16bit) or ltoa (32bit)?

There is a big difference, a utoa can be done in about 70 clk (worst case)

"Here we go again!"  ;)

Indeed, what are the rules for the "contest"?  Obviously there will be a difference in time/space between 16-bit and 32-bit.  Also a difference between unsigned and signed.  And are you going to right-justify?  Leading-zero-supress?  And one always needs to tell what number is being converted, as that will change the cycles for most approaches.

Starting with

https://www.avrfreaks.net/forum/s...

...one toolchain took a bit over 100 cycles for itoa(9999).  https://www.avrfreaks.net/comment...

sparrow2, what is your worst-case number to convert?

In that thread, danni gave a routine that handles up to 42000 in a couple hundred cycles and is small.

sparrow2 referred to a different approach, and posted the result in

https://www.avrfreaks.net/comment...

...

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

remember this thread is 100% 32 bit!

that was why I made a comment.

I would say that anything about 16 bit in this thread would be noise!

sparrow2 wrote:

remember this thread is 100% 32 bit!

that was why I made a comment.

I would say that anything about 16 bit in this thread would be noise!

???

`void lcd_write_int(unsigned int data)`

It looks like unsigned 16-bit to me...  "utoa"

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Chill out, the OP said:

marsheng wrote:
Once over 32,768 the LCD displays - negative numbers.

This is the behaviour of a signed 16 bit, so I guess I jumped to the conclusion this thread was about 16bit numbers. Now I will read these other threads, I like to learn algos.

sorry but the last 20 post's have been about ltoa.

sparrow2, what is your worst-case number to convert?

As you have in the last link about 70 clk for a full 16 bit

and if max is 9999 it's about 50 clk

And about 300 if it's extented to a ltoa (use same rutine and make two div with 10000)

sparrow2 wrote:
remember this thread is 100% 32 bit!
sparrow2 wrote:
sorry but the last 20 post's have been about ltoa.

Now I'll suspect your clock counts, if 20/50 is 100%. ;)

I'll have to examine your code.  Will this give exact results for all input numbers?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

For the record this is what you are calling when you call ltoa() in AVR-LibC:

```#define val_lo	r22
#define str_lo	r20
#define rdx_lo	r18
#define rdx_hi	r19

ENTRY	ltoa
ENTRY	__ltoa

cpi	rdx_lo, 37
cpc	rdx_hi, __zero_reg__
brsh	1f
cpi	rdx_lo, 2
brlo	1f
XJMP	_U(__ltoa_ncheck)

1:	X_movw	ZL, str_lo
st	Z, __zero_reg__
X_movw	r24, str_lo
ret
```
```#define val_lo	r22
#define val_hi	r23
#define val_hlo	r24
#define val_hhi	r25
#define str_lo	r20
#define rdx_lo	r18

#define sign	r27	/* Argument for __ultoa_common()	*/

ENTRY	__ltoa_ncheck

clr	sign
cpi	rdx_lo, 10
brne	1f
tst	val_hhi
brpl	1f

; radix == 10 && val < 0: sign = '-' and val = -val
ldi	sign, '-'
com	val_hhi
com	val_hlo
com	val_hi
neg	val_lo
sbci	val_hi, -1
sbci	val_hlo, -1
sbci	val_hhi, -1
1:	XJMP	_U(__ultoa_common)
```
```#define val_lo	r22
#define val_hi	r23
#define val_hlo	r24
#define val_hhi	r25
#define str_lo	r20

#define counter	r19
#define digit	r26
#define sign	r27

ENTRY	__ultoa_ncheck
clr	sign

ENTRY	__ultoa_common
X_movw	ZL, str_lo

1:  ; Saves one iteration of the digit-loop:
; If val < radix we can use the low byte of val as digit
mov	digit, val_lo
cpc	val_hi, __zero_reg__
cpc	val_hlo, __zero_reg__
cpc	val_hhi, __zero_reg__
; now C is set, if val < radix
sbc	counter, counter
bst	counter, 0
brts	4f
; counter == 0 here

; If val >= radix, then pop one digit from val
clr	digit

2:  ; Vanilla 32:8 quotient and remainder to pop the digit
; digit <- val % radix
; val   <- val / radix
lsl	val_lo
rol	val_hi
rol	val_hlo
rol	val_hhi
rol	digit
brlo	3f
; val |= 1
inc	val_lo
3:  ; Loop the 32 bits
subi	counter, 8		; 256 / 8 == 32 loops
brne	2b

4:  ; Convert the digit to ASCII...
subi	digit, -'0'
cpi	digit, '9'+1
brlo	5f
subi	digit, '0'-'a'+10
5:  ; ... and store it to the reversed string
st	Z+, digit

; Popped all digits?
brtc    1b

; Yes:  Store the sign (if any)
cpse	sign, __zero_reg__
st	Z+, sign

; Terminate the string with '\0'
7:	st	Z, __zero_reg__

; Reverse the string and return the original string pointer
X_movw	r24, str_lo
XJMP	_U(strrev)```
```#include "macros.inc"

#define str_hi r25
#define str_lo r24
#define ltemp  r23
#define rtemp  r22

ASSEMBLY_CLIB_SECTION
.global _U(strrev)
.type   _U(strrev), @function

_U(strrev):
X_movw	XL, str_lo	; X is start of string
X_movw	ZL, str_lo	; Z becomes end of string
; find end of string
1:	mov	rtemp, ltemp	; to obtain right nonzero character
ld	ltemp, Z+
tst	ltemp
brne	1b
sbiw	ZL, 2		; to last nonzero byte
rjmp	3f
; swap bytes
2:	ld	ltemp, X
st	X+, rtemp
st	Z, ltemp
ld	rtemp, -Z
3:	cp	XL, ZL
cpc	XH, ZH
brlo	2b

ret```

If anyone can spot ways to speed/improve this feel free to say - you can become famous as one of the AVR-LibC authors and you get your name on this page...

http://www.nongnu.org/avr-libc/u...

(note that you have to willing to make the code available under a BSD licence).

clawson wrote:
If anyone can spot ways to speed/improve this feel free to say

+1

Let's make this interesting…  ^^

ɴᴇᴛɪᴢᴇᴎ

Yes

After staring at this code for quite a while, I concluded they are doing a long division using some clever shifting of the dividend instead of the divisor. This cleverness means the code can probably not be much improved unless a different division algorithm is used (not long division).

I'm not sure if an alternative using the multiply by inverse algo would be faster, it would probably require a lookup table with "magic numbers" for all the valid inverses (1/2, 1/3....1/36).

I think I'll pass on this one.

But in reality who want other than 2 10 and 16 as base numbers?

and 2 and 16 are easy

so it's only 10 we need to look at.

I've had 1G37T30 times I've needed to use base 30!

But as the AVR-LibC [i|l]toa() functions has offered radix 36 since time immemorial I guess any replacement needs to continue to do so for the one person who's used a non 2/10/16 radix. Of course the code could look out for 2/10/16 and use a more optimal solution for those specific bases.

clawson wrote:
But as the AVR-LibC [i|l]toa() functions has offered radix 36 since time immemorial I guess any replacement needs to continue to do so

That's the issue. The code is probably fine, although we might be able to save a cycle here or there. The generic algo is likely good too.

The problem lies in the library interface itself. It makes sense to offer free radix in a powerful library, but we MCU users don't need it as much as we need it fast and small.

You said it yourself, Cliff: most ltoa usage is for user feedback, which means it's gonna be base 10 99% of the time. The need for base 2 and 16 is much smaller already (debugging?) and should not impact our "main" library code. The need for other radices pretty much does not exist (if you need it, just use ltoa and its sisters). If we kept to these radices, as sparrow2 mentioned, we could get much better/faster library code.

That would be the sort of library we really need, despite all the merits of a standardized libc.

ɴᴇᴛɪᴢᴇᴎ

Last Edited: Mon. Sep 12, 2016 - 08:21 PM

Just for fun: how many cycles does it take ltoa to convert 1000000 to binary?

ɴᴇᴛɪᴢᴇᴎ

clawson wrote:
For the record this is what you are calling when you call ltoa() in AVR-LibC:

You discarded an important part (stdlib.h):

```extern __inline__ __ATTR_GNU_INLINE__
char *ltoa (long __val, char *__s, int __radix)
{
extern char *__ltoa (long, char *, int);
*__s = 0;
return __s;
} else {
extern char *__ltoa_ncheck (long, char *, unsigned char);
}
}
```

There is provision here to do further radix checks and branch to radix-optimized code.

ɴᴇᴛɪᴢᴇᴎ

Last Edited: Mon. Sep 12, 2016 - 08:47 PM

netizen wrote:

I wouldn't give up 2300+ cycles to put 1000000 in a string.

It's interesting that you consider 2300+ cycles (147us at 16MHz) to be a great burden that must be avoided.

Every program that I have written for microcontrollers has periods where it wastes cycles waiting for something to happen.

It is during these periods that I schedule operations such as ltoa(), so the burden is nil.

What is nice about having ltoa() in the library is that it's an alternative to sprinf() that uses less flash.  Also, being in the library it is likely to be bug free.

If there was a smaller and faster equivalent function in the library that only did radix 10, I would gladly use that.

Chuck99 wrote:
It's interesting that you consider 2300+ cycles (147us at 16MHz) to be a great burden that must be avoided.

Well, I get your point but I don't want to get drowned into that. Surely you'll agree that you can't implement a library like libc with this mindset though…

Chuck99 wrote:
If there was a smaller and faster equivalent function in the library that only did radix 10, I would gladly use that.

I'm glad to hear that. :-)

In fact, another non-libc dedicated function would be the best choice: I don't think it's realistic to imagine freaks will get to weight on the libc interface though. And if we keep to the current interface, even with branching to radix-optimized code, we'll still be clobering a parameter register and wasting a few cycles because we have to specify a 10 radix. It's not much, but it will happen 99% of the time…

Nevertheless, what we could do is propose alternative implementation for existing libc functions that do provide the benefits we need. That would be an improvement.

ɴᴇᴛɪᴢᴇᴎ

Last Edited: Mon. Sep 12, 2016 - 09:08 PM

netizen wrote:

In fact, another non-libc dedicated function would be the best choice: I don't think it's realistic to imagine freaks will get to weight on the libc interface though. And if we keep to the current interface, even with branching to radix-optimized code, we'll still be clobering a parameter register and wasting a few cycles because we have to specify a 10 radix. It's not much, but it will happen 99% of the time…

Nevertheless, what we could do is propose alternative implementation for existing libc functions that do provide the benefits we need. That would be an improvement.

Maybe not, since the base is a known constant at compile time, the compiler might be smart enough to optimize all the overhead away, and directly call the "right" alternative function. Who knows, only with testing...

El Tangas wrote:
Maybe not, since the base is a known constant at compile time, the compiler might be smart enough to optimize all the overhead away, and directly call the "right" alternative function. Who knows, only with testing...

Good point. I don't think GCC will mess with a function definition (remove an unused/constant parameter) though…

ɴᴇᴛɪᴢᴇᴎ

For fast and small take a look at danni's code there https://www.avrfreaks.net/comment...

It is small  has a max of 170 clk.

If worst case needs to be improved some extra code can be added

either first (forget first digit it's not full) do sub 4000 until negative and then add 1000 until positiv  then with 400 100 40 10 4 1.

or do some bit checks if bit 12 is set (and first digit found) init with 4 and sub 5000 (we know what we can do 4000) and check etc.

I have not made it but I guess about 100- 120 clk for worst-case can be done.

The good thing is that this code works on tiny's.

For mega's the HW mul can be used to make div by mul  with 1/x.

And that can the be done in worst-case of 65 clk (my old version can be improved)

Both can be made for 32 bit

(with the 1/x version I would div 32 bit with 10000 and the use the 16 bit rutine without first digit, then div with 10000 again  .. .. and the the last 2 digits)

Last Edited: Mon. Sep 12, 2016 - 09:38 PM

I don't think that the radix needs to be known at compile time, radix is not a const.

Thanks, sparrow2.

I personally won't bother spending time "fixing" this unless the powers that be agree on the principle beforehand. In which case, they'll probably implement it themselves —but that's another story.

Do we have an agreement here that optimized code for radices 10, 2 and 16 would be beneficial, and provide more opportunities for these stdlib conversion functions to be used more widely?

ɴᴇᴛɪᴢᴇᴎ

netizen wrote:
Do we have an agreement here that optimized code for radices 10, 2 and 16 would be beneficial, and provide more opportunities for these stdlib conversion functions to be used more widely?

FWIW I don't use any of these XtoY functions for this reason alone, but perhaps that's just me…

ɴᴇᴛɪᴢᴇᴎ

netizen wrote:
Do we have an agreement here that optimized code for radices 10, 2 and 16 would be beneficial, and provide more opportunities for these stdlib conversion functions to be used more widely?
The way collaborative software tends to work is that when someone finds some missing/deficient feature in what exists that they simply cannot live without they put in the time/effort to implement it and then if they are a generous soul they donate their work back for others to share later. While it sounds like you may consider Xtoa() to be deficient I get the sneaking feeling that your need to have it "corrected" is possibly not great enough to drive you to be the person that must have it fixed at all costs ;-) If someone comes along saying "I need radix 10/16 as fast as possible at any cost possible!!" then I guess we may have found our candidate to make changes?

clawson wrote:
The way collaborative software tends to work is that when someone finds some missing/deficient feature in what exists that they simply cannot live without they put in the time/effort to implement it and then if they are a generous soul they donate their work back for others to share later.

My experience in dealing with established open source projects is slightly different: it is likely that your personal improvements will be refused because in the general case there are reasons not to implement them that way. It could be constraints you're not aware of, because you only see part of the picture. It could be some less significant habits that someone higher up in the food chain happens to have. Sometimes it might just be an ego trip from someone powerful enough to impose it's tantrum on the rest of us. I could link to examples of each from my own experience, but that wouldn't help. :-)

The thing is, coding something for one project and coding it for something like stdlib is not the same: of course the later requires much higher standards, and they obviously come at a cost. Thus my question: I'm not gonna bear that cost to play bingo and eventually see it all go to waste…

ɴᴇᴛɪᴢᴇᴎ

But I notice you conveniently dodged the question: do you support the effort or not?

ɴᴇᴛɪᴢᴇᴎ

By the way, Cliff: you've got mail. :-)

ɴᴇᴛɪᴢᴇᴎ

netizen wrote:
do you support the effort or not?

How many times has anyone here used conversion to other than base 10 and an occasional base 16 in any application?

And as a "real" part of the application and not a curiosity or "don't care about the cycles"?

There are some that like their base 2 displays.  I cannot remember ever doing that myself.

I can't remember doing octal since PDP and VAX days.

I remember a base 36 in a past life.  Let's call that a curiosity.

So IMO your effort would be a solution in search of a problem.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

theusch wrote:
So IMO your effort would be a solution in search of a problem.

Thanks. We happen to disagree, but at least you made your opinion clear. :-)

ɴᴇᴛɪᴢᴇᴎ

ɴᴇᴛɪᴢᴇᴎ

I've used base 2 sometimes but it sure would help if there'd also been a leading 0 padding option! You end up itoa() and then doing str*() to put in the leading zeroes anyway.

Cliff: did you get my private message?

ɴᴇᴛɪᴢᴇᴎ

clawson wrote:
I've used base 2 sometimes but it sure would help if there'd also been a leading 0 padding option! You end up itoa() and then doing str*() to put in the leading zeroes anyway.

I feel you, but this ain't going to happen: we're not gonna change the stdlib API. What about optimized code for common radices?

ɴᴇᴛɪᴢᴇᴎ

netizen wrote:

Cliff: did you get my private message?

I did eventually - there's something very odd about this whole Freaks/Communities site thing. It seemed to believe I *was* logged into Freaks but was not logged into Communities.
netizen wrote:
I'm not really the person to ask (just another user). The people who develop AVR-LibC gather here:

https://lists.nongnu.org/mailman...

There is this too:

https://lists.nongnu.org/mailman...

(several say they prefer to deal with this stuff via email rather than internet fora).

clawson wrote:
I'm not really the person to ask (just another user). The people who develop AVR-LibC gather here

I know where they gather. I just want to know where freaks stand before I bother them: it's not the same if it's just me, or experienced freaks backing up the effort.

ɴᴇᴛɪᴢᴇᴎ

Well, for binary keeping leading zeroes would be trivially simple, you just need to shift the bits to the carry flag, then adc with '0' char, repeat 32 times, add the zero string terminator, and that's it. For other bases that are powers of 2, it would also be very easy.

netizen wrote:

Isn't that all explored in the link I gave in #53?  (#21; #23; ...)

https://www.avrfreaks.net/comment...

theusch wrote:

Nearly always I want my display numbers to be right-justified [and fixed width]. The biggest exception might be free--format "dump" strings, but then size/speed don't matter that much. So my home-built routines are almost always adaptations of "itoa()"-type.

The same goes for leading-zero suppression. Usually I want it for display, but not for manipulating setpoints and a few other situations. So I've got an option for that.

When catered for, both of those are virtually free on top of the actual conversion itself.  IMO/IME a lot cheaper when integrated into the conversion than with a post-processing pass.  Suit yourself.  If the app has a need for (s)printf otherwise, I'll tend to use that and hang the extra cycles a few times a second.  If the app is updating a many-digit 7-seg high-speed counter https://www.marshbellofram.com/a... then indeed I'll try to keep the binary=>BCD time down.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Wed. Sep 14, 2016 - 03:41 PM

netizen wrote:

clawson wrote:
I'm not really the person to ask (just another user). The people who develop AVR-LibC gather here

I know where they gather. I just want to know where freaks stand before I bother them: it's not the same if it's just me, or experienced freaks backing up the effort.

Can I refer Sir back to:

My personal opinion has not changed from there I'm afraid.

theusch wrote:
netizen wrote:

Isn't that all explored in the link I gave in #53? (#21; #23; ...)

Indeed it is. But I don't want to take chances and let misinterpretation blur my point: I want your explicit opinion on this particular matter.

theusch wrote:
When catered for, both of those are virtually free on top of the actual conversion itself. IMO/IME a lot cheaper when integrated into the conversion than with a post-processing pass.

So, you think having optimized code for common radices is a "solution in search of a problem" (#81), yet you roll your own code with integrated right-justified and fixed width to save a few cycles of post-processing. Am I getting this right?

ɴᴇᴛɪᴢᴇᴎ

netizen wrote:
Am I getting this right?

Define "few".  The best itoa/utoa implementations, depending on the rules, are a couple hundred cycles.  I'll wager that taking that output, placing right-justified into a fixed-width field, and leading-zero-supress will double that.

And indeed, IMO/IME radices other than 10 are rare.  Base 16, once in a while.  [BUT THAT IS SUCH A TRIVIAL CASE THAT IT WILL BLOW THE SHORTS OFF ANY GENERIC IMPLEMENTATION--the value is already in packed-BCD format.]

I don't know what you are pounding at.  If you care to propose changes to this particular toolchain for consideration, that is up to you.  No blessing from me is needed.  It ain't my usual toolchain anyway.

netizen wrote:

Remember that itoa and variants aren't part of any standard C library.  You are free to extend the language however you see fit.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Wed. Sep 14, 2016 - 04:15 PM

theusch wrote:
Define "few". The best itoa/utoa implementations, depending on the rules, are a couple hundred cycles. I'll wager that taking that output, placing right-justified into a fixed-width field, and leading-zero-supress will double that.

First, remember the initial case for this thread is ltoa, not itoa/utoa. And the cycles count is 2300, thus ±200 would indeed be "few".

But let's make it easy on you and consider the fastest: utoa, so you might have a point on a relative basis.

The post-processing function would be something like:

`_utoa(char *src, char *dst, char fill);`

Its job is to move a pointer through *src until it encounters /0 (6 steps max), then move backward copying *src to *dst+i and filling what's left with fill (5 steps max). That, according to you, is 200 cycles long? Challenge accepted: show me the code!

theusch wrote:
And indeed, IMO/IME radices other than 10 are rare. Base 16, once in a while.

Alright, so you actually are agreeing with me/us. :-)

theusch wrote:
I don't know what you are pounding at. If you care to propose changes to this particular toolchain for consideration, that is up to you. No blessing from me is needed.

I need feedback from experienced people to support my case. You happen to be one of them, although you claim not to. :-)

ɴᴇᴛɪᴢᴇᴎ

Last Edited: Wed. Sep 14, 2016 - 05:24 PM

netizen wrote:
First, remember the initial case for this thread is ltoa, not itoa/utoa.

Sigh.

`(unsigned int data)`

and then the complaint "Once over 32,768 the LCD displays - negative numbers."

netizen wrote:
And the cycles count is 2300,

The cycles count for what?  Many cycle counts given in the other thread, and in this one.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Alright, nevermind. You have chosen to object, no matter what. It's your choice and I respect that. Let's leave it at that. :-)

ɴᴇᴛɪᴢᴇᴎ

Last Edited: Wed. Sep 14, 2016 - 05:44 PM

netizen wrote:

Alright, nevermind. You have chosen to object, no matter what. It's your choice and I respect that. Let's leave it at that. :-)

No.  The 2300+ cycles that you are using was your number, from Chuck's ltoa() experiment with 1000000.  How do you relate that to itoa or a utoa implementation?

In the other thread, my toolchain's itoa() was 341 cycles for 9999.

danni's approach was 165 cycles for 9999.

sparrow2's approach is about 100, right?

Mixing apples and oranges will give fruit salad.  And my post-processing comments stand.  As well as how widely arcane bases might be used.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

theusch wrote:

Your post-processing comments will be validated when you accept the challenge I gave you. Until then, they're just FUD.

theusch wrote:
No. The 2300+ cycles that you are using was your number, from Chuck's ltoa() experiment with 1000000. How do you relate that to itoa or a utoa implementation?

You wouldn't ask if you had followed this thread: they're all under-optimized for common radices because they use a generic algorithm that supports uncommon ones.

theusch wrote:
In the other thread, my toolchain's itoa() was 341 cycles for 9999. danni's approach was 165 cycles for 9999. sparrow2's approach is about 100, right?

Right. And precisely my point: I'd like to have optimized code such as these when I use a XtoY function without thinking. That's what stdlib should be about in AVR platforms anyway.

Look, I'm getting tired of this. Please, let's leave it at that.

ɴᴇᴛɪᴢᴇᴎ

If it only needs to work up to 9999 my routine take 45 clk (it's relatively hard to find the first digit)

and for all 16 bit it can be made in 65 clk max (min is almost the same 2 less ).

I was looking on some  6502 code for div a byte with 10

It's correct for all numbers (I have checked it)

input in r24

output in r24

r25 change

``` lsr r24        ;1/2
mov r25,r24
lsr r24        ;1/4
lsr r24        ;3/8
lsr r24        ;3/16
lsr r24        ;3/32
lsr r24        ;19/64
lsr r24        ;51/128
lsr r24        ;51/256
lsr r24        ;51/512
```

It take 13 clk.

for the itoa it only needs to be correct up to 99 are there there anyone that know if it can be done faster ? (with a bit more sloppy fraction code)

fractions to the code

Last Edited: Wed. Sep 14, 2016 - 09:37 PM

This one uses the hardware multiplier, so it's not for ATtinys. And it's not working yet, most results differ from the right one by 1 unit less, needs a less sloppy division, so it will take more cycles. Working on it.

8 bit input in r16. Result in r18:r17:r16 (3 digits)

```ldi	r17, 164
fmul	r16, r17
andi	r18, 3
ldi	r16, 10
mul	r16, r1
mov	r17, r1
mul	r16, r0
mov	r16, r1```

This incorrect version takes 14 cycles for 3 digits, less significant digit is in error by 1 unit. Sorry for no comments, but I'm a bit sleepy now, maybe tomorrow.

Thanks for the code guys.

There is no byte-to-string facility in stdlib, though: candidates are int-to-string and long-to-string, both signed and unsigned.

Even if there's probably less to be gained than with ltoa —but it's probably more commonly used, here is utoa for reference (utoa_ncheck.S):

```#if	!defined (__AVR_TINY__)

#include "asmdef.h"

/* char *__utoa_ncheck (unsigned val, char *s, unsigned char radix)
This function is the utoa() without checking the radix for 2..36 margins.
*/

#define val_lo	r24
#define val_hi	r25
#define str_lo	r22

#define counter	r21
#define digit	r26
#define sign	r27
// clock cycles
ENTRY	__utoa_ncheck
clr	sign                                            // 1

ENTRY	__utoa_common
X_movw	ZL, str_lo                                      // 1
clr	counter                                         // 1

1:  ; Vanilla 16:8 quotient and remainder to pop the digit
; digit <- val % radix
; val   <- val / radix
clr	digit                                           // 1

2:	lsl	val_lo                                          // 1
rol	val_hi                                          // 1
rol	digit                                           // 1
brlo	3f                                              // 1/2
inc	val_lo                                          // 1
3:	subi	counter, 16		; 256 / 16 == 16 loops  // 1
brne	2b                                              // 1/2

; Convert the digit to ASCII...
subi	digit, -'0'                                     // 1
cpi	digit, '9'+1                                    // 1
brlo	4f                                              // 1/2
subi	digit, '0'-'a'+10                               // 1
4:  ; ... and store it to the reversed string
st	Z+, digit                                       // 1(tiny/xmega)/2

; Iterate until all digits are sucked out of VAL.
sbiw	val_lo, 0                                       // 2
brne	1b                                              // 1/2

; Store the sign (if any)
cpse	sign, __zero_reg__                              // 1/2
st	Z+, sign                                        // 1(tiny/xmega)/2

; Terminate the string with '\0'
st	Z+, __zero_reg__                                // 1(tiny/xmega)/2

; Reverse the string and return the original string pointer
X_movw	r24, str_lo                                     // 1
XJMP	_U(strrev)                                      // …

ENDFUNC

#endif	/* !__AVR_TINY__ */```

And its front-end (stdlib.h):

```/**
\ingroup avr_stdlib

\brief Convert an unsigned integer to a string.

The function utoa() converts the unsigned integer value from \c val into an
ASCII representation that will be stored under \c s.  The caller
is responsible for providing sufficient storage in \c s.

\note The minimal size of the buffer \c s depends on the choice of
radix. For example, if the radix is 2 (binary), you need to supply a buffer
with a minimal length of 8 * sizeof (unsigned int) + 1 characters, i.e. one
character for each bit plus one for the string terminator. Using a larger
radix will require a smaller minimal buffer size.

\warning If the buffer is too small, you risk a buffer overflow.

Conversion is done using the \c radix as base, which may be a
number between 2 (binary conversion) and up to 36.  If \c radix
is greater than 10, the next digit after \c '9' will be the letter
\c 'a'.

The utoa() function returns the pointer passed as \c s.
*/
#ifdef  __DOXYGEN__
extern char *utoa(unsigned int val, char *s, int radix);
#else
extern __inline__ __ATTR_GNU_INLINE__
char *utoa (unsigned int __val, char *__s, int __radix)
{
extern char *__utoa (unsigned int, char *, int);
*__s = 0;
return __s;
} else {
extern char *__utoa_ncheck (unsigned int, char *, unsigned char);