[request for comment] strlen_P() enhancement

Last post
26 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Proposed enhancement to strlen_P():

#define my_strlen_P(s)                         \
  __builtin_choose_expr(                       \
    __builtin_constant_p(__builtin_strlen(s)), \
    __builtin_strlen(s),                       \
    strlen_P(s)                                \
  )

This should avoid calling the strlen_P function if the string size is known to the compiler beforehand (i.e. if the parameter to strlen_P is not a changing pointer). This is in fact how strlen() works anyway (through a mechanism in __builtin_strlen() hidden to the user).

Of course, the real change would involve also to rename the current library function strlen_P() to say __strlen_P() and then this macro would be part of pgmspace.h, calling __strlen_P() in cases where appropriate.

Comment, please.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do you want to know the strlen() of the anonymous string or the sizeof() the anonymous string?

Surely the strlen() will just be sizeof() - 1. And you can use a macro for it.

Just exactly what are you trying to do?
If it is anonymous, you would need a new instance for each reference. e.g.

printf_P(PSTR("anonymous format"));
len = sizeof(PSTR("anonymous format")) - 1;
len = strlen_P(PSTR("anonymous format"));

if the string has an identifier:

char identifier[] PROGMEM = "identified string";
len = sizeof(identifier) - 1;
len = strlen_P(identifier);

Untested. But the non-anonymous version looks preferable. And should be portable across compilers. (just by using appropriate storage types)

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

David,

PSTR("string") is seen as a pointer, so (in avr-gcc) sizeof(PSTR("string")) == 2. That's why sizeof() is not an universal operator to be recommended for this case. C, as a true bastard "middle-level" language, has implemented strings through a kludge; that's why it's better to use the dedicated functions from string.h (and the equivalents from pgmspace.h).

strlen_P(PSTR("anything")) results if function call, which actually scans through the string in FLASH; but as the string in FLASH can't change, this is a known constant compile-time. That is what I seek to remedy. strlen() works in this way already; I think it makes sense to fix strlen_P() to work in the same way.

Portability is no issue in case of strlen_P(), as it is non-standard avr-libc -specific function.

I am not discussing alternative ways of finding a string's length from the user's point of view; I am suggesting an enhancement for an existing library function.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Jan,

The following source

char identifier[] PROGMEM = "identified string";
size = sizeof(identifier) - 1;

produces this:

.global	identifier
	.section	.progmem.data,"a",@progbits
	.type	identifier, @object
	.size	identifier, 18
identifier:
	.string	"identified string"
.global	size
	.data
	.type	size, @object
	.size	size, 2
size:
	.word	17

Admittedly 'size' is a variable in the DATA segment. It is initialised to the correct value.

I see no problem with using a macro for the operation. The non-anonymous string is declared as an array not as a pointer.

I would agree that trying to get the length of an anonymous pointer to an anonymous string requires strlen_P(). But presumably you only want a single anonymous string, or else you are just wasting FLASH. (and asking for typos)

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

David,

Sorry, but I don't understand where are you aiming.

1. I am asking for comments on a proposed enhancement to strlen_P(). If a compiler would not recognize miltiplication by a constant which is power of 2, and I would propose a fix which would convert such multiplications to shifts, would you recommend just to teach users to use explicit shifts?

2. using sizeof() on strings as a methodology is asking for trouble - one day you'll forget (as you indeed did above) that they don't work on anonymous strings. The correct methodology for strings is to use the dedicated functions prototyped in string.h (and, here, for FLASH strings, pgmspace.h).

Do I misunderstand your comments?

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just to correct myself, my_strlen_P(PSTR("anything")) still results in function call, but the result is correct. A mere macro just could not fully replace a fully blown support for memory classes built into the compiler.

The following:

char s[] PROGMEM = "anything";
size = my_strlen_P(s);

results in size being loaded by constant; which is not the case if strlen_P(s) is used.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Looks fine. I use a similar method in LUFA for determining the bit pattern required for the endpoint configuration registers from the user-supplied bank size. If the input value is constant (via the same macro you use) the result is computed at compile time, otherwise it is replaced by a function call.

- Dean :twisted:

Make Atmel Studio better with my free extensions. Open source and feedback welcome!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Jan,

What do you actually want to achieve?

As I see it, I could equally well:

#define my_strlen(char_array) (sizeof(char_array)-1)

That macro would give you the string length of any flexible char array (initialised with a string). Both in Flash or SRAM.

It would always be a constant known at compile time. However it would not always be true if you modified a SRAM array. And it only works on arrays, not on pointers.

But I would assume that you would be using a regular strlen() on SRAM anyway. If you wanted the strlen_P() of a pointer that is known at compile time, you would use the sizeof(*ptr)-1. And it would be a lot clearer to reference the non-anonymous array in the first place.

If you wanted to know the length of a string that was pointed to by a SRAM pointer, you would have to use a function. It is not known at compile time.

I am guessing that you have a load of strings in Flash, and you want to have a table of the individual lengths.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
Jan,

What do you actually want to achieve?


Better strlen_P().
david.prentice wrote:

As I see it, I could equally well:

#define my_strlen(char_array) (sizeof(char_array)-1)

That macro would give you the string length of any flexible char array (initialised with a string). Both in Flash or SRAM.


But, as shown above, that would fail with a string literal.

strlen()/strlen_P() both work both on string variables and literals.

As long as there are no generic pointers and true support for memory classes ("named sections" in gcc lingo) as we know it from the '51 and other small-mcu compilers, there's no real need to have functions working *both* in FLASH and RAM. The user HAS to be aware of the difference. The very moment when there will be such support, pgmspace.h will be past anyway.

David wrote:
But I would assume that you would be using a regular strlen() on SRAM anyway.

Exactly.

The user of avr_libc is guided to replace constant strings in RAM with constant strings in FLASH. At the same time, he is told to use the "_P" versions of functions traditionally prototyped in string.h; so, having been accustomised to use strlen() for strings in RAM, he should be seeking to use strlen_P() for strings in FLASH.

David wrote:
If you wanted the strlen_P() of a pointer that is known at compile time, you would use the sizeof(*ptr)-1.

Oh, no. sizeof(*PSTR("anything")) == sizeof(char) == 1.

David wrote:
And it would be a lot clearer to reference the non-anonymous array in the first place.
Sure; just you cannot exert your teacher's will on the users.

If you wanted to know the length of a string that was pointed to by a SRAM pointer, you would have to use a function. It is not known at compile time.

David wrote:
I am guessing that you have a load of strings in Flash, and you want to have a table of the individual lengths.

No. I have no use for it at the moment. I just came through this and though I will throw in a minor enhancement. I was naively under the impression that enhancements to avr-libc may be welcome.

Jan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

abcminiuser wrote:
Looks fine.
Thanks.

However, I still wait that somebody says "it will fail when XYZ"... ;-)

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm okay with the change, although I would like some kind of regression test. I believe that is what you are asking for, Jan.

I was originally going to say that I never use strlen_P, but then I did a search on my code. By golly, I use it! I'd forgotten about them completely!

There are two situations that I use it. First, is in my general output routines where I can pass a pointer to a PSTR back to the routine for printing:

   case cmdPT_PSTR:
      if ( ! g_MainAbortPending && ( *pBufLen >= strlen_P( (char*) pCmdPkt->param[parm] ) + 4 ) )
      {
         strcat_P( *ppBuf, (char*) pCmdPkt->param[parm] );
      }
      break;

The other place is where I have matched PSTR in an incoming string and want to skip over it:

   /* See if we got an error */
   pC = strstr_P( pSpbRxBuffer, SpbError );

   if ( pC != NULL )
   {
      /* Skip over the error flag and copy the string to the error string buffer */
      pC += strlen_P( SpbError );
      strcpy( pSpbErBuffer, pC );

      return SPB_TE_STRING_ERR;
   }

In the second case, your improvement would help.

Just a couple of RealLife(tm) examples to fuel the fires.

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

stu_san wrote:
I'm okay with the change, although I would like some kind of regression test. I believe that is what you are asking for, Jan.

[...]

Just a couple of RealLife(tm) examples to fuel the fires.


Yes, that's it. As I said above in reply to David, I don't have a real-world use for it at the moment. Thanks a lot, Stu.

Actually, your first example made me think again: we should perhaps be more rigorous in properly casting/using pointers to items in FLASH. You should cast that pointer to PGM_P (const char PROGMEM *) rather than just char *, am I right? And then the compiler should perhaps be more rigorous in what it accepts as parameters to "_P" functions, maybe checking the PROGMEM attribute in the same way as it checks type qualifiers match...?

I think the same was requested by a user here the other day...

Jan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:

#define my_strlen(char_array) (sizeof(char_array)-1)

That macro would give you the string length of any flexible char array (initialised with a string).

Repetition still doesn't make it true.

Your "any" claim fails with

char s[] = "123\0abc";

Stealing Proteus doesn't make you an engineer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ArnoldB wrote:

Your "any" claim fails with

char s[] = "123\0abc";


Oh, please, Arnold. This is pervert. C is supposed to be used by "informed users" (read it "hackers" if you wish); it is FAR from being "bombenfest".

Jan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:
Oh, please, Arnold. This is pervert. C is supposed to be used by "informed users" (read it "hackers" if you wish); it is FAR from being "bombenfest".
I often mop-up failed projects. I can assure your people write code like this. The most common incarnation being

char s[] = "1234\0";

Stealing Proteus doesn't make you an engineer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Arnold's example does, in fact, blow your "string size is known to the compiler beforehand" out of the water. The only reasonable thing that could be known at compile time is the size of the memory reserved for the variable.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
Arnold's example does, in fact, blow your "string size is known to the compiler beforehand" out of the water. The only reasonable thing that could be known at compile time is the size of the memory reserved for the variable.

Contrary.
Note that I am using __builtin_strlen().
This:
volatile uint8_t k;
char sx[] PROGMEM = "Arnold\0B";
[...]
    k = my_strlen_P(sx);

yields this:

  40 0030 86E0      		ldi r24,lo8(6)
  41 0032 8093 0000 		sts k,r24
[...]
  79               	.global	sx
  82               	sx:
  83 0008 4172 6E6F 		.string	"Arnold"
  83      6C64 00
  84 000f 4200      		.string	"B"

Is it fair enough?

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Note that I am using __builtin_strlen().
But David's example did not. His macro will fail on that example.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:
Proposed enhancement to strlen_P():

#define my_strlen_P(s)                         \
  __builtin_choose_expr(                       \
    __builtin_constant_p(__builtin_strlen(s)), \
    __builtin_strlen(s),                       \
    strlen_P(s)                                \
  )

Doesn't work in C++ as __builtin_choose_expr is not implemented, and make it an inline function:

inline size_t my_strlen_P(const char *s) {
  return __builtin_constant_p(__builtin_strlen(s))
     ? __builtin_strlen(s) : strlen_P(s);
}
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Humm, hummm...
I've tweaked it a bit more:

__attribute__((__always_inline__)) static inline size_t my_strlen_P(PGM_P s);

static inline size_t my_strlen_P(PGM_P s) {
  return __builtin_constant_p(__builtin_strlen(s))
     ? __builtin_strlen(s) : strlen_P(s);
} 


and, surprise, all of the sudden it started optimizing even the my_strlen_P(PSTR("qwerty"))!

Stu, and others, may I ask you to give it a try in the wild?

I would then attach it to https://savannah.nongnu.org/bugs/?28058

Thanks.

Jan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:
Stu, and others, may I ask you to give it a try in the wild?
I hope to give it a shot sometime today. Kinda busy on a different (embedded Linux) project so I may not get to it today, but will definitely tomorrow.

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:
Stu, and others, may I ask you to give it a try in the wild?
Jan

Excellent. Appears to behave correctly on a set of code I tried. Code is 25k LOCs on a MEGA2560 and lots of strings amounting to some 20kbytes. No problems observed.

This is one of those things that shouldn't need to be an explicit optimization, it should be a given where possible. There should be no reason to have to compute the length of a string in PROGMEM.

I did have to do some minor reorganization of the code.

Its stating the obvious, but if the strings are not visible in the current object module, the length can't be computed at compile-time. Solving this would be really neat!

With larger projects, there will always be multiple object modules and strings may be reused across them. As a result, I had all strings in one C module. Makes things simpler when it comes to supporting multiple languages. It takes a bit of careful thought to rearrange things, into a header-file, for example.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dalew1 wrote:
Appears to behave correctly on a set of code I tried.

Thanks for trying.

dalew1 wrote:
This is one of those things that shouldn't need to be an explicit optimization, it should be a given where possible. There should be no reason to have to compute the length of a string in PROGMEM.

While the string itself in FLASH won't change, the *pointer* used in this function may be a variable, pointing to various FLASH strings. In this case, the length "calculation" is inevitable. Also, part of the FLASH may be used as a permanent configuration memory, storing for example the language-dependent strings, being updated through some self-programming mechanism (bootloader), after compilation. Here, functions like strlen/strcmp etc. working on FLASH "variables" may make sense.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I forgot to report on this - my ATmega2560 app (> 90K LOC) is running perfectly with the mod. I have both cases handled by the macro. Total of 5 uses of strlen_P in the code, but they are exercised pretty often in normal usage.

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Arigato gozaimasu, Stu-san.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Domo, domo. :D

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!