Preamble. A few of us started discussing about optimizing libc integer conversion routines (utoa, itoa, ltoa…) in another thread. All in all, we hijacked that thread and it became difficult to follow for everyone when it eventually moved back to the original issue (cudos to the original poster for not bitching about it!). So we've decided to keep going in this new dedicated thread…
It all started with a cycle count for ltoa(1 000 000, buff, 10): ~2350, which seemed a bit much to several of us.
Here are other typical benchmarks straight from the libc manual:
Function | Units | avr2 | avr25 | avr4 |
itoa (12345, s, 10) | Flash bytes Stack bytes MCU clocks |
110 (110) 2 879 |
102 (102) 2 875 |
102 (102) 2 875 |
ltoa (12345L, s, 10) | Flash bytes Stack bytes MCU clocks |
134 (134) 2 1597 |
126 (126) 2 1593 |
126 (126) 2 1593 |
Note that ltoa gets better results, but is here tested with a value (12345) that fits in 16 bits only.
Those results are not completely satisfactory, and lead many experienced freaks to brew their own version of these routines, which defeats their original purpose. Yet the stdlib code is most likely not to blame; rather a generic legacy interface supporting, and treating equally, all radices from 2 to 36. Whereas, in most real-world use cases, some radices (10 in particular) are immensely more common than others.
Hopefully, there is room in the current implementation to provide dedicated implementations to privileged radices:
extern __inline__ __ATTR_GNU_INLINE__ char *utoa (unsigned int __val, char *__s, int __radix) { if (!__builtin_constant_p (__radix)) { extern char *__utoa (unsigned int, char *, int); return __utoa (__val, __s, __radix); } else if (__radix < 2 || __radix > 36) { *__s = 0; return __s; } else { extern char *__utoa_ncheck (unsigned int, char *, unsigned char); return __utoa_ncheck (__val, __s, __radix); } }
As I understand it, the purpose of this thread is to develop some well tested optimized alternatives for such radices (starting with 10) and propose them for upstream integration. In case this eventually fails, it would be to provide de facto alternatives to whoever needs some tight and/or fast code for such features.