How to force gcc to use only one copy of string?

Go To Last Post
34 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,
I wonder if it is possible or I'm doing something wrong"¦ Let's assume I have in different program locations the same string "Cancel". The string is entirely in FLASH "“ in PROGMEM section. So if I have the string couple of times in my program I have multiple "Cancel" strings in FLASH. How can I solve it? I can define one file with all strings I want to use, but it is not very handful.
If it is important "“ I use g++ (compiling C++ program), a I redefined PROGMEM:
#define PROGMEM __attribute__(( section(".progmem.data") ))
Any ideas will be greatly appreciated.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

you have already identified the basic solution.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I agree with wg0z.

GCC have no idea about strings , it's all just "data".

I'd use a file with the strings (easy).

But you could also write a "preprocessor" , that searches for strings. Ie. "Cancel". (hint ampersand)
Scans a linked list (struct), for a match.

If
{ 
  there is a macth replace the string with "labelvalue".
}
else 
{
assigns it to "labelvalue" Ie.  Str000001 for the first string met , adds the "Cancel" string and the index 000001 (labelindex) to the "current" linked list member. Then increment labelindex , and points to the "next" member.

Replace the "newly found" string  with "labelvalue".

Then add the real string to a MyStrings.c , and maybe to a MyStrings.h ...
Ie. 
const prog_char	Str000001[]		= "Cancel";

}

Done ....

If you are not careful you could be bitten by ampersands in "C comment blocks"

I did write such a thing for an IBM-S370 C compiler , where the stupid linker only took the first 8 characters as significant.

I just used it to replace function & variable names (labels) with Lxxxxxxx.

That way we could get PowRay compiled on a 370 , and run it there ... (He...He) it was pretty fast on a S370 back in 91 :-) , nowadays i'd expect a fast pc would match the S370 , for floatingpoint.

/Bingo

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
GCC have no idea about strings , it's all just "data".

Not true, the optimizer is capable of finding repeated constant strings and consolidating them.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
#ifndef string_hey_guys_sup
#define string_hey_guys_sup
string s = "hey guys 'sup?";
#endif

Worst solution ever.. but I think it would work. :P

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
Quote:
GCC have no idea about strings , it's all just "data".

Not true, the optimizer is capable of finding repeated constant strings and consolidating them.

I suppose you can back that up with an example :-)

/Bingo

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for your responses. I though gcc can manage this situation, it is quite simple. So I disagree with Bingo600. We are talking about constants in FLASH memory, which optimization is very simple, and at least linker have all necessary data to do this. I don’t want to create a metacompiler which will handle this particular situation, as it is not portable solution. I rather thing it is something gcc should deal with, and I’m really surprised that there is no build-in mechanisms.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This from the GCC user manual (Manuals eh? I mean why would you? ;-)):

Quote:
-fmerge-constants
Attempt to merge identical constants (string constants and floating point constants) across compilation units.
This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.

Enabled at levels -O, -O2, -O3, -Os.


(note however the "if the assembler/linker support it")

BTW the following entry on that page of the manual is:

Quote:
-fmerge-all-constants
Attempt to merge identical constants and identical variables.
This option implies -fmerge-constants. In addition to -fmerge-constants this considers e.g. even constant initialized arrays or initialized constant variables with integral or floating point types. Languages like C or C++ require each variable, including multiple instances of the same variable in recursive calls, to have distinct locations, so using this option will result in non-conforming behavior.

That does not say that it's enabled in -O1/2/3/s cases.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I’ve tried to compile my C++ program with –fmerge-all-constants, but my constants are still kept separately. So this option is broken in AVR port, or at least avr-g++ port.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Use code generation.
In each file that uses a string,
refer to the string as MyStrings::fred.
Include a structured comment with a line of the form
fred "string"
Use tools to collect all those comments and generate MyStrings.h and MyStrings.cc .
All the members should be static.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

TFrancuz wrote:
I’ve tried to compile my C++ program with –fmerge-all-constants, but my constants are still kept separately. So this option is broken in AVR port, or at least avr-g++ port.

But you have not declared it as a constant. You have only told it that it resides in in the progmem address space, this does not make it a constant in the eyes of the compiler.

The compiler doesn't know the difference between the address spaces, only that they are distinct regions. These regions may, or may-not, be writable at some point. [actually C as a whole doesn't even have a concept of memory regions, this is an extension to GCC in order to support the different memory spaces of some micros like the AVR]

Try declaring them as "const" as well, to actually make them constants. [may also need to make them static]

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I’ve declared them as constants because I used PSTR macro from avr/pgmspace.h, which is defined as follows:
# define PSTR(s) ((const PROGMEM char *)(s))

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

So this option is broken

A matter of interpretation, as I see it. In the quotes from the documentation that Cliff gave I read one sentence beginning "attempt" and one formulation beginning "if..".

Quote:
I can define one file with all strings I want to use, but it is not very handful.

Why? If the string actually "is the same string" but set up in several places than you have redundancy, and even if the compiler might remove the bad memory footprint effects of that you still have the problem of maintenance of redundant things. If you want to change one string that is actually used in several places then your desired solution will force you to change in many places. And if you miss one placew you will not get an error, but a misbehaviour at run time. Keeping strings that are the same in only one place removes this problem. (Since you're using C++ you have the option to locate all those strings in a class that all other classes using the strings can inherit. Unlike eg C#, C++ supports multiple inheritance.)

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

So this option is broken in AVR port,

"broken" suggests that there's been an attempt to implement it that either never worked or has stopped working. My reading of that manual entry was that it was optional functionality left to the decision of the assembler/linker author. Something that's not been implemented cannot yet be broken.

By the way, remind me again is:

(const PROGMEM char *)

a constant pointer or a pointer to const? (serious question I can't remember without checking). The compiler needs to know that its the data that is constant, not what's pointing at it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

By the way, remind me again is:
Code:
(const PROGMEM char *)

a constant pointer or a pointer to const? (serious question I can't remember without checking).


The right to left reading trick gives "a pointer to a char that is const".

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Maybe it’s a matter of interpretation… assume… ok., so in what situations this option works? Hard to imagine. So at least it is misoptimization? Or something which should be corrected?
Ok, I know that I have gcc sources and I can do that… but it is hard to get any support from gcc maintainers. Once I tried to move vtables in avr-gcc from SRAM to FLASH memory, I couldn’t get any support. Disappointing…

Why I don’t want to use one file with all declared strings… I can do that. But I have a bunch of files with GUI classes, some of them use typical strings “Cancel”, “Ok” and so on. So I have to move all strings declarations to one file, not so bad… but I have to define in one file all strings I will ever use in my whole library. But if user don’t use all widgets he will end up with a bunch of unused strings…I suppose that if the compiler cannot merge constants, which is simple task, he cannot automatically remove unused ones too…unless not without some magical switches and options.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

in what situations this option works? Hard to imagine. So at least it is misoptimization? Or something which should be corrected?

I imagine it's something that should IMPLEMENTED if it hasn't been in the AVR port of binutils.

If I were you I'd bite the bullet and take a one time hit to collect your strings in one place - it'll make the translators job much easier later on!

BTW just to note that this IS implemented for x86:

root@DevSystem:~# cat stest.c
#include 

const char string1[] = "Hello";
const char string2[] = "test";
const char string3[] = "Hello";

int main(void) {
  while(1) {
  }
}
root@DevSystem:~# gcc stest.c -g -o stest

when built the objdump -s output includes:

Contents of section .rodata:
 8048478 03000000 01000200 48656c6c 6f007465  ........Hello.te
 8048488 73740048 656c6c6f 00                 st.Hello.       

But when built:

root@DevSystem:~# gcc stest.c -fmerge-all-constants -g -o stest

it yields:

Contents of section .rodata:
 8048478 03000000 01000200 48656c6c 6f007465  ........Hello.te
 8048488 737400                               st.             

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What happens if you put the strings into RAM? (can't try now myself)

What happens if you define strings using char s[]="string" rather than char * s = "string"?

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This came up about a year ago:
https://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=74835

But on a bit of a side note I did notice
that printf() does some really good optimization on literal strings but only if your string ends in a newline.

i.e.

printf("hello\n");

is optimized to simple puts() call
but

printf("hello");

does a more complicated varg args
printf() call.

Seems odd to me, both are simple strings.

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Seems odd to me, both are simple strings.

puts() automatically appends a newline to the end of the given string, while printf() does not. If you use "Hello" in printf() the compiler cannot translate that into a puts() call, as that would append the \n to the end of the string when you have not specifically requested it.

The "puts()-adds-newline" behavior tripped me up a few times.

- Dean :twisted:

Make Atmel Studio better with my free extensions. Open source and feedback welcome!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

abcminiuser wrote:
Quote:

Seems odd to me, both are simple strings.

puts() automatically appends a newline to the end of the given string, while printf() does not. If you use "Hello" in printf() the compiler cannot translate that into a puts() call, as that would append the \n to the end of the string when you have not specifically requested it.

The "puts()-adds-newline" behavior tripped me up a few times.

- Dean :twisted:

Yep, totally forgot about that.

But it could have called fputs(str, stdout);
instead.....

--- bill

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

JohanEkdahl wrote:
Why? If the string actually "is the same string" but set up in several places than you have redundancy, and even if the compiler might remove the bad memory footprint effects of that you still have the problem of maintenance of redundant things. If you want to change one string that is actually used in several places then your desired solution will force you to change in many places. And if you miss one placew you will not get an error, but a misbehaviour at run time. Keeping strings that are the same in only one place removes this problem. (Since you're using C++ you have the option to locate all those strings in a class that all other classes using the strings can inherit. Unlike eg C#, C++ supports multiple inheritance.)
Good point, for which reason, I withdraw my previous suggestion.
Strings which by design should be the same should be the same member.
For strings that are the same by coincidence one can do something like this:
class MyStrings {
public:
    // comefrom walt
    static const PROGMEM char fred[];

    ...

    #define walt fred
} ;

It's possible a namespace would work better.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, I can use a class with predefined strings. But it doesn’t solve the problem with unnecessary strings. As I said it is a library, the end-user program sometimes doesn’t use all strings, so the class must somehow know which strings are used… not a big problem to solve it. But I think it will be more elegant if the compiler will take care about it, it even has appropriate options…

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Ok, I can use a class with predefined strings. But it doesn’t solve the problem with unnecessary strings. As I said it is a library, the end-user program sometimes doesn’t use all strings, so the class must somehow know which strings are used…

Oh. That's a different matter. The thread started with the problem of duplicate identical strings.

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes Johan, I started with duplicated strings, and have ended with possibly unnecessary strings as a result of proposed workaround. Nice.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

[Falling back to C now...] What if you place every string in a static in a function that returns a pointer to it's string. And then slap -ffunction-sections and -gc_unused-sections (or whatever the names are) onto that. Will the strings be garbage collected along with un-referenced functions?

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Will the strings be garbage collected along with un-referenced functions?

My money is on 95% "no". Initialisers of static's still go into .data and are not part of .text of .named_function_section (in the function-sections case). Unless you were talking about "PROGMEM" copies in flash only but there's a clue in the name there too - they won't be tied to the function section they'd be in ".progmem".

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes of-course, Cliff. That makes very much sense.

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

TFrancuz wrote:
Ok, I can use a class with predefined strings. But it doesn’t solve the problem with unnecessary strings. As I said it is a library, the end-user program sometimes doesn’t use all strings, so the class must somehow know which strings are used… not a big problem to solve it. But I think it will be more elegant if the compiler will take care about it, it even has appropriate options…
Just put every string in a separate library member.

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sorry, can you give me an example how to do it? I don’t understand what do you mean.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

file1.c

char * ret_str1(void) {
 static str1[] = { "Hello" };
 return str1;
}

file2.c

char * ret_str2(void) {
 static str2[] = { "World" };
 return str1;
}

etc.

Compile these to file1.o, file2.o. Then, as per the manual:

http://www.cs.mun.ca/~paul/cs472...

use avr-ar to combine all the .o's into a .a

avr-ar rcs strings.a file1.o file2.o

Now link against the .a

If your program calls ret_str2() but not ret_str1() then the binary of file2.c will be built in but fie1.c/.o will never be used.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
file1.c

char * ret_str1(void) {
 static str1[] = { "Hello" };
 return str1;
}

He wants his strings in flash.
He is using c++.
I hope he doesn't want those function calls.
string1.cc:
#include "MyStrings.h"
static const char PROGMEM MyStrings::string1[]="string1";

string_2.cc:
#include "MyStrings.h"
static const char PROGMEM MyStrings::string_2[]="string 2";

Moderation in all things. -- ancient proverb

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you for your explanation. It helps me a lot. But I suspect that Skeeve solution has a small error – static keyword implies internal linkage, so the label is not visible to any external module.
Ok, so I have a solution, not perfect, but acceptable, at least I can get what I want to.
In the meantime I’ve checked if AVR-gcc 4.4 can automatically merge strings. But unfortunately the problem is exactly the same, so what do you think, is it worth to send a bug report? If yes, what will be the correct place?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

TFrancuz wrote:
Thank you for your explanation. It helps me a lot. But I suspect that Skeeve solution has a small error – static keyword implies internal linkage, so the label is not visible to any external module.
In this case, it means a static member of a class:
no corresponding MyStrings object is required.
The member is visible.
Quote:
Ok, so I have a solution, not perfect, but acceptable, at least I can get what I want to.
In the meantime I’ve checked if AVR-gcc 4.4 can automatically merge strings. But unfortunately the problem is exactly the same, so what do you think, is it worth to send a bug report? If yes, what will be the correct place?
I'm not sure where to send it,
but I think the term to use is "request for enhancement".

Moderation in all things. -- ancient proverb