## Removing Dead Code From C++ Project

28 posts / 0 new
Author
Message

I'm trying to create a library with some code I reuse in most of my projects. I have some modules grouped in subdirectories of my working directory. In each of these modules I have one class and one global object of that class. I want to remove the unused objects and functions from the final .hex, so I'm compiling with -fdata-sections and -ffunction-sections, and linking with -Wl,--gc-sections. But looks like some dead code is still linked. The disassembly generated by the Atmel Studio Simulator show me some functions and constructors that shouldn't be there.

I found this discussion about my problem, but I didn't find a solution there: http://www.avrfreaks.net/forum/dead-code-removal-not-working

Moving my modules to an include path could help in some way? Any help?

Post an actual example (complete project) where this occurs. The linker does not make mistakes. It will garbage collect sections that have a 0 reference count. If sections are included then, however obscure, there IS a reference to them.

In that thread you linked to my post #12 shows it all working as intended in a small test program. So provide a test program like that where it is not working.

stdio.h wrote:
Moving my modules to an include path could help in some way? Any help?
This has nothing to do with the compilation - it's the linking that does the garbage collect. The only mistake you might make at compile time is not passing the -fdata-sections and -ffunction-sections for every source file involved.

clawson wrote:
This has nothing to do with the compilation - it's the linking that does the garbage collect. The only mistake you might make at compile time is not passing the -fdata-sections and -ffunction-sections for every source file involved.

But what happens when we use a module from the avr libc in our project? I guess Atmel Studio will edit the makefile to include only that module instead of the whole libc, am I right? This is not a rhetorical question, it's just a guessing, I really don't know how this works. But if I'm right, moving my library to an include directory would help, wouldn't?

Are you talking about a real library (avr-ar etc) or just a collection of .h and .cpp files? (which is not a true library in the GCC sense of the word).

The majority of the code in AVR-LibC is true library code and is located in libgcc.a, libc.a and libm.a though some is just inlined macros from .h files. When  avr-gcc builds and links code there is an implied "-lgcc -lc -lm" added to the linker invocation to link against those libs.

For a true libXXX.a library created by avr-ar then the usual ld rules for lib member extraction apply. The libs themselves are not built wit -ffunction-sections but that doesn't matter because each archive member is in a separate .o file and the linker will only bind to the .o's that are referenced.

So first say what you actually mean by "library".

Last Edited: Sun. Feb 12, 2017 - 04:24 PM

clawson wrote:
Are you talking about a real library (avr-ar etc) or just a collection of .h and .cpp files?

I'm talking about a real library, just like the majority of the code in AVR-LibC. What I have now is a bunch of .h and .cpp files in the project's directory and subdirectories.

I'm planning to make a compilation of the code I think can be reusable and generate a library (with avr-ar and all of that). As you said, the linker will only bind the .o's that are referenced.

Creating and using a library instead of keep using my raw collection of files has any drawback? What are the pros and cons?

Last Edited: Sun. Feb 12, 2017 - 07:34 PM

Well for a true bin lib do it the way libc.a, libgcc.a and libm.a do it - one function per .o file and then archive the .o files with avr-as ("rcs" options) then when you link only the archive members required will be pulled in to the link. Of course you face the same issue as libgcc.a/libm.a/libc.a in that to provide binaries for all 300+ AVR models you will need the 17 variants just as is done in AVR-LibC (which is the usual reason no one uses static libs for AVR apart from those in AVR-LibC).

Thanks, clawson. You helped me a lot.

Provide binaries for various models is indeed a big pain in the ass. I think I'll take the collection-of-files way of doing this.

I don't want to bother you, but this chat raises another question in my head: If I put these files (not compiled) into the include directory then the linker will only put the required modules into the link, doesn't it? I can have the benefits of linking just the modules I want, without the drawbacks of generating binaries for all avr models, am I right?

stdio.h wrote:
If I put these files (not compiled) into the include directory then the linker will only put the required modules into the link, doesn't it?
I'd suggest you read a thread in the Tutorial Forum called "Managing Large Projects". It seems to me you have some fundamental misunderstanding about how a C compiler works.

Here's a (one function) example:

// adc.h
uint16_t ADC_read();
// adc.c

}
// main.c

int main(void) {
while(1) {
PORTB = 0x55;
}
}
}

Let's say that adc.h and adc.c are actually located in C:\avr\libcode while main.c is located in c:\projects\avr\adcread along with project/makefiles etc.

There are now two issues:

1) How does the compiler get to "see" adc.h when it tries to compile main.c and comes to the #include line.

2) How is the code from adc.c built to create adc.o which will then link with main.o after it has been built.

For (1) the solution is the compiler's -I option. This can be used to say "as well as looking in the same directory as main.c for any #include's also take a look in this other directory over at ...". So the compiler options would have:

-I \avr\libcode

added. This ensures that the adc.h file will be found even though it's in a different place to the main.c where it's being used.

For (2) you need to arrange for the files that are to create the code to all be compiled so you arrange for the project to build:

.\main.c

How you actually do that will depend on the kind of project/make solution you use. If it is AS7 then it has a very useful "Add as Link" which can add a "virtual link" to \avr\libcode\adc.c so it looks like it's alongside main.c but is actually held in a different place on disk.

Can I ask, why are you trying to create a binary module? If the code is your own for use in your own projects, why not just include it as code and have to built with the rest of your project?

There seems to be zero benefit to creating a binary library, beyond saving a few seconds now and again when you do a full rebuild, and many down sides. You can't change the compiler options like optimization or debug level, for example.

I'm just trying to understand what your needs are so I can suggest the best course of action.

Well in the old days (apart from shortening build times - though that's not really an issue for small AVR code) the advantage of libXXX.a and avr-ar was that the archive members were only pulled into the link as needed so libc.a may contain memcpy(), strcmp(), strtoul() and whatever else but if you only called memcpy() then only the contents of memcpy.o would be pulled from the .a file. For a "peripheral library" you might rely on the same - it contains ADC and UART functions but if you only use the UART ones then the ADC stuff is ignored and so on.

That was then. This is now. Now there are -ffunction-sections and -gc-sections so you can build a link all of uart* and ADC* in a project but at the end, when the linker notices that no ADC* function is actually called it just garbage collects and dumps them anyway.

The other reason for giving mycode.h and libmycode.a was when you didn't want the person using the functions you documented in mycode.h to actually be able to see the source. Perhaps it uses proprietary methods or has some kind of security or similar. That argument might still apply I guess.

clawson,

I understand how the compiling and linking process works, but I was not aware of that "-I option". Thanks for that. That means if I put adc.cpp and adc.h in C:\avr\libcode and use the option -I C:\avr\libcode I could use the directive #include <adc.h> and my files would be compiled and linked with the main code, right? If so then how can I add this new include path in Atmel Studio for all future projects? It seems to me that I can set this directory only in the current project's configuration. Thanks again.

mojo-chan wrote:
Can I ask, why are you trying to create a binary module?

I don't want to do that anymore. I though this was the usual way of creating and using a custom library, but clawson already opened my eyes for the cons of that approach.

Last Edited: Wed. Feb 15, 2017 - 05:56 PM

stdio.h wrote:
I could use the directive #include <adc.h>
No. You would use #include "adc.h". I lose track of the number of people I have seen who don't understand the significance of <> versus "" with #include!
stdio.h wrote:
and my files would be compiled and linked with the main code, right?
Wrong. All the -I achieves is to tell the C preprocessor another location to look in when it is looking for .h files. This has nothing to do with getting the code that the associated .c or .ccp provides actually built and linked. You achieve that by adding \path\to\file\foo.cpp to your list of build inputs. The compiler will then build that file just as it builds local main.c and other_functions.c or whatever. When it has created a .o file for each they are all fed together into the linker that then links them together.
stdio.h wrote:
If so then how can I add this new include path in Atmel Studio for all future projects? It seems to me that I can set this directory only in the current project's configuration. Thanks again.
Studio (because it is really just Microsoft Visual Studio) allows the creation of "project templates". So you set up an "empty" (or near empty) project with all your favourite selection of options and then export that as a template. Later you create new projects from the template and they inherit all the settings and files from the template.

Damn it! I'm confused now. I'm not a native in English, so I'm sorry for so much misunderstandings in a subject that should be so easy.

Let's simplify and make the things easier for me: Where do I have to put my .cpp and .h files and what flags do I need to set to be able to do #include <adc.h>  in all my projects from now on?

You must be tired of answering my questions, so if you could just link some documentation that helps me with my last question I would be glad.

Thanks.

clawson wrote:
I'd suggest you read a thread in the Tutorial Forum called "Managing Large Projects".
Did you do that?

There's nothing particularly "special" about any of this - it's a standard was to use a C compiler. The only "tricks" in locating files away from the main project files are:

1) so that .h files are found in #include's use -I to tell the C preprocessor additional places to look for .h

Of course maybe you don't use Studio? If you just build with Makefiles then you likely have a "SRC = main.c" line? In which case you just modify that to be "SRC = main.c \location\you\gave\adc.c" to add the core adc.c file to the list of files in the project.

clawson wrote:
clawson wrote: I'd suggest you read a thread in the Tutorial Forum called "Managing Large Projects". Did you do that?

Yes, I did. Thanks!

clawson wrote:

I think I've not made it clear what I want. I'm sorry for that. I don't want to add my files in all my future projects with some Atmel Studio "Add" button. I want to put my files in some include directory, set some gcc system variables or flags (once) and be able to just #include <somefile.h> and let the magic happens, the same way we do with the standard library. The only difference is that standard library code is compiled and mine is not.

Last Edited: Fri. Feb 17, 2017 - 04:16 PM

stdio.h wrote:
The only difference is that standard library code is compiled and mine is not.
But that is the point. Precompiled lib code only works because printf(), memcpy(), strlen() and so on are all precompiled as archive member of libc.a as printf.o, memcpy.o, strlen.o etc.

Also the default linker command is effectively:

avr-gcc yourinput1.o yourinput2.o ... -lgcc -lc -lm -o youroutput.elf

You don't type those -lgcc -lc -lm but they are effectively there anyway. This causes the linker to go off searching both in preconfigured library search paths and also any extra -L paths you added to the linker command and for any -l<whatever> it then looks for lib<whatever>.a. So it looks for libc.a, libgcc.a and libm.a. What it finds are:

C:\SysGCC\avr\avr\lib>dir libc.a libm.a /s /b
C:\SysGCC\avr\avr\lib\libc.a
C:\SysGCC\avr\avr\lib\libm.a
C:\SysGCC\avr\avr\lib\avr25\libc.a
C:\SysGCC\avr\avr\lib\avr25\libm.a
C:\SysGCC\avr\avr\lib\avr25\tiny-stack\libc.a
C:\SysGCC\avr\avr\lib\avr25\tiny-stack\libm.a
C:\SysGCC\avr\avr\lib\avr3\libc.a
C:\SysGCC\avr\avr\lib\avr3\libm.a
C:\SysGCC\avr\avr\lib\avr31\libc.a
C:\SysGCC\avr\avr\lib\avr31\libm.a
C:\SysGCC\avr\avr\lib\avr35\libc.a
C:\SysGCC\avr\avr\lib\avr35\libm.a
C:\SysGCC\avr\avr\lib\avr4\libc.a
C:\SysGCC\avr\avr\lib\avr4\libm.a
C:\SysGCC\avr\avr\lib\avr5\libc.a
C:\SysGCC\avr\avr\lib\avr5\libm.a
C:\SysGCC\avr\avr\lib\avr51\libc.a
C:\SysGCC\avr\avr\lib\avr51\libm.a
C:\SysGCC\avr\avr\lib\avr6\libc.a
C:\SysGCC\avr\avr\lib\avr6\libm.a
C:\SysGCC\avr\avr\lib\avrtiny\libc.a
C:\SysGCC\avr\avr\lib\avrtiny\libm.a
C:\SysGCC\avr\avr\lib\avrxmega2\libc.a
C:\SysGCC\avr\avr\lib\avrxmega2\libm.a
C:\SysGCC\avr\avr\lib\avrxmega4\libc.a
C:\SysGCC\avr\avr\lib\avrxmega4\libm.a
C:\SysGCC\avr\avr\lib\avrxmega5\libc.a
C:\SysGCC\avr\avr\lib\avrxmega5\libm.a
C:\SysGCC\avr\avr\lib\avrxmega6\libc.a
C:\SysGCC\avr\avr\lib\avrxmega6\libm.a
C:\SysGCC\avr\avr\lib\avrxmega7\libc.a
C:\SysGCC\avr\avr\lib\avrxmega7\libm.a
C:\SysGCC\avr\avr\lib\tiny-stack\libc.a
C:\SysGCC\avr\avr\lib\tiny-stack\libm.a

So, say you are building for a mega16 which is an "avr5" architecture AVR then it will find and use:

C:\SysGCC\avr\avr\lib\avr5\libc.a
C:\SysGCC\avr\avr\lib\avr5\libm.a


they will be fed to the link but code will only be taken from them if one (or more) of their members are called. For example:

C:\SysGCC\avr\avr\lib\avr5>avr-nm libc.a | grep memcpy
U memcpy
memcpy_P.o:
00000000 T memcpy_P
memcpy_PF.o:
00000000 T memcpy_PF
memcpy.o:
00000000 T memcpy
U memcpy

So if I have made a call to memcpy() then the whole contents of memcpy.o will be added to the load image. Same for something like strlen:

C:\SysGCC\avr\avr\lib\avr5>avr-nm libc.a | grep strlen
strlen_P.o:
00000000 T __strlen_P
strlen_PF.o:
00000000 T strlen_PF
strlen.o:
00000000 T strlen

So the contents of strlen.o will be added to the binary image and so on.

However the issue here is:

C:\SysGCC\avr\avr\lib>dir libc.a /s /b
C:\SysGCC\avr\avr\lib\libc.a
C:\SysGCC\avr\avr\lib\avr25\libc.a
C:\SysGCC\avr\avr\lib\avr25\tiny-stack\libc.a
C:\SysGCC\avr\avr\lib\avr3\libc.a
C:\SysGCC\avr\avr\lib\avr31\libc.a
C:\SysGCC\avr\avr\lib\avr35\libc.a
C:\SysGCC\avr\avr\lib\avr4\libc.a
C:\SysGCC\avr\avr\lib\avr5\libc.a
C:\SysGCC\avr\avr\lib\avr51\libc.a
C:\SysGCC\avr\avr\lib\avr6\libc.a
C:\SysGCC\avr\avr\lib\avrtiny\libc.a
C:\SysGCC\avr\avr\lib\avrxmega2\libc.a
C:\SysGCC\avr\avr\lib\avrxmega4\libc.a
C:\SysGCC\avr\avr\lib\avrxmega5\libc.a
C:\SysGCC\avr\avr\lib\avrxmega6\libc.a
C:\SysGCC\avr\avr\lib\avrxmega7\libc.a
C:\SysGCC\avr\avr\lib\tiny-stack\libc.a


There are SEVENTEEN different copies of libc.a (and hence also strlen.o and memcpy.o etc).

This is also reliant on the fact that these are part of the standard C library - so the compiler expects to be able to find libc.a and that is why there is an implied -lc presented to the link.

If you created adc.o and uart.o and timer.o and then use "avr-ar rcs" to create a libperipherals.a then you would have to manually add a "-lperipherals" to the linker command line. As the lib would (and should) not be located in standard C compiler search paths (the C:\sysGCC\avr\avr\lib\* I showed above) you would need to put this lib somewhere on your disk and provide a "-L \path\to\libs" to the linker so it would know where to go and look for libperipherals.a

And you still face that over-riding issue that to support all models of AVR you need 17 of them! Tiny AVRs for example don't have MUL, CALL or JMP (just RCALL and RJMP) so you cannot build code for them that use any of those opcodes. If they want to do multiplications they need a software implemented shift/add routine. Meanwhile at the top end of the AVRs you have the big ones that have ELPM rather than LPM so if the code uses LPM it needs a different version for "big flash" compared to normal flash and so on.

You also face the fact that if you had a uart.o as a member of libperipherals.a how are you going to set baud rates? To do that you need to know what F_CPU is. But that then implies a run time calculation based on F_CPU and desired baud rate.

So anyone writing "lib" code for AVRs will almost all do it as source files and let the consumer add those to their project and build them. That lets them make the build time choice of 300 different models of AVR (and hence 17 different architectures) and it lets them specify at build time F_CPU and baud so the UBRR (and other calculations like timers, SPI baud rates and so on) be done during the compile so the binary needs no runtime over-head to calculate.

However, as I've been saying, to achieve this you (a) need to tell the project where to read the common headers for the code and (b) add the names of the source files to the list to be built into the project.

However as I've been trying to say, the -ffunction-sections and -gc-sections that the compiler has means you can build:

projdir\main.c

libdir\uart.c

libdir\spi.c

libdir\timer.c

etc

It does not actually matter that ADC_init() and ADC_read() and UART_init() and UART_getchar() are built in to every project because you build all this with -function-sections so you actually get each function in .text.ADC_init, .text.ADC_read and on. Then when the linker links all the code it finds that none of those function sections has a reference so the -gc-sections (gc = garbage collect) just discards all the pointless bits. So if you have just:

#include "myperipherallib.h"

int main(void) {
}

then only that function will be in the binary. All the SPI_init() and Timer_init() and so on will not be there.

So it does not hurt to present ALL the source files in your lib as candidates to be built in the project every time (hence the suggestion of a template) because only want is used will eventually remain in any particular project binary.

A classic example of this is the LUFA USB library for AVR - it is extensive - maybe 100+ source files or more. When you first build they all get build (and after that most don't need to be rebuilt) but the binary only pulls from them what is actually used.

So what magic Arduino does? I can "install" a "library" adding the source files to the libraries directory and do something like #include <TimerOne.h> in every future project.

Back to the dead code, I realized that the linker includes code from every not used module that has a global object with a volatile variable or reference member. I don't even include the header file of that modules, but they are still present in the binary.

Arduino has some clever logic that operates during the build process that looks at the classes/methods you use then includes what's required to satisfy those requests. A lot of people underestimate Arudino - it's quite clever but it can make the unfamiliar think that C compilers have more intelligence than they really have. Arduino is all about making things easy to program - not so true of plain compilers and linkers.

stdio.h wrote:
Back to the dead code, I realized that the linker includes code from every not used module that has a global object with a volatile variable or reference member. I don't even include the header file of that modules, but they are still present in the binary.
Still waiting to see an example of what you are talking about - see my first reply in this thread.

For simplicity I made an example with just one module.

#ifndef ADC_H_

#include <stdint.h>
#include <avr/io.h>

{
public:

volatile uint16_t& data_register = ADC;

void init();

};

#endif /* ADC_H_ */

#include "adc.h"
#include <avr/io.h>

{
// some init code
}

{
// select channel
ADMUX = (ADMUX & ~((1 << MUX2) | (1 << MUX1) | (1 << MUX0))) | channel;

// start conversion

// wait
;

}


main.cpp

int main(void)
{
while (1)
{
}
}

Compiled with avr-g++ 4.9.2 with the following options:

-funsigned-char -funsigned-bitfields -DDEBUG  -I"C:\Program Files (x86)\Atmel\Studio\7.0\Packs\atmel\ATmega_DFP\1.1.130\include"
-O1 -ffunction-sections -fdata-sections -fshort-enums -g2 -Wall -mmcu=atmega328p
-B "C:\Program Files (x86)\Atmel\Studio\7.0\Packs\atmel\ATmega_DFP\1.1.130\gcc\dev\atmega328p"
-c -std=c++11 -MD -MP -MF "$(@:%.o=%.d)" -MT"$(@:%.o=%.d)" -MT"$(@:%.o=%.o)" Linked with AVR/GNU linker 4.9.2 with the following options: -Wl,-Map="$(OutputFileName).map" -Wl,--start-group -Wl,-lm  -Wl,--end-group -Wl,--gc-sections -mmcu=atmega328p
-B "C:\Program Files (x86)\Atmel\Studio\7.0\Packs\atmel\ATmega_DFP\1.1.130\gcc\dev\atmega328p"

With ADC files removed from the project
Program Memory Usage     :    134 bytes   0,4 % Full
Data Memory Usage           :    0 bytes   0,0 % Full

With ADC files included in the project
Program Memory Usage     :    200 bytes   0,6 % Full
Data Memory Usage         :    2 bytes   0,1 % Full

Note: In this example the reference data_register is not used, but I keep it in the example code because maybe that variable is the origin of the problem.

Last Edited: Fri. Feb 17, 2017 - 07:14 PM

Why is adc.cpp instantiating the class? Surely it's up to main to instantiate one if it wants one? The instantiation will invoke the c'tor code.

Since there is only one adc peripheral, seems to be better to have a global object instantiated by the "library" code instead of forcing the user to create an object to control that peripheral. If the user includes adc.h he will use the adc. The std c++ lib does the same with cin, cout and cerr. I like the way we use cin and cout, so I would like to use my own objects the same way. Some Arduino libraries like TimerOne e EEPROM do the same.

Last Edited: Sat. Feb 18, 2017 - 04:25 PM

Then you have misunderstood the purpose of C++. It allows the module author to write a set of building blocks. The user then chooses "I'll have two of those, 1 of those and 3 of those". He instantiates what he needs. Even if you know there's only one ADC it's up to the user to decide he wants to create one and use it - not you. The exception to this may be when ISR()s are involved. They are like a "fixed base point" that you cannot really change the mechanism of so if you want them to call back to a static class member for service then you may need an instance of the class for that (though if static the code should be callable without an instance)

clawson wrote:
Then you have misunderstood the purpose of C++. It allows the module author to write a set of building blocks. The user then chooses "I'll have two of those, 1 of those and 3 of those". He instantiates what he needs. Even if you know there's only one ADC it's up to the user to decide he wants to create one and use it - not you.

But why C++'s cin and cout and arduino's Serial, TimerOne, EEPROM and others do exactly what I was doing?

Because Arduino don't care about "dead" code. It's not built for efficiency. You will find space taken in the flash for a Serial instance even if you don't call it. Yes they may be using -ffunction-sections and -fdata-sections but if they do things like setting that data_register reference to "ADC" (such as you have) then it will have to generate CTOR code to achieve that.

To be honest I don't actually understand the purpose of that variable. As you say, with something like ADC there only is one and its data register is "ADC" so why does that have to be held in a volatile class member? If you remove that line there is no c'tor code.

clawson wrote:
To be honest I don't actually understand the purpose of that variable. As you say, with something like ADC there only is one and its data register is "ADC" so why does that have to be held in a volatile class member?

It's because I have every operation of the adc behind the interface provided by the adc class. So, I'd like to have the ADC register behind that interface too.

stdio.h wrote:

clawson wrote:
To be honest I don't actually understand the purpose of that variable. As you say, with something like ADC there only is one and its data register is "ADC" so why does that have to be held in a volatile class member?

It's because I have every operation of the adc behind the interface provided by the adc class. So, I'd like to have the ADC register behind that interface too.

Why are you even bothering with C++, when it obviously doesn't suit your purpose?

For a low-level API, it's quite common to write a set of C functions to access peripherals, even if you have an application that is C++.

Bob.

stdio.h wrote:
It's because I have every operation of the adc behind the interface provided by the adc class. So, I'd like to have the ADC register behind that interface too.
But ADC is a preprocessor macro that is globally available - you can't get away from that! In C++ you put things in members to hide them from the outside world. You can't "hide" the ADC register - everyone knows where it is.

clawson wrote:
stdio.h wrote:
It's because I have every operation of the adc behind the interface provided by the adc class. So, I'd like to have the ADC register behind that interface too.
But ADC is a preprocessor macro that is globally available - you can't get away from that! In C++ you put things in members to hide them from the outside world. You can't "hide" the ADC register - everyone knows where it is.
If one really wants, one can come really close.

#ifndef ADC_H

#include <io.h>
...
#endif