Add 2 Bytes to Field, Free Down by 323 Bytes, .text up by 194; Why?

Go To Last Post
23 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well into C++ development for an ATmega328 / ATmega32U4 target, I'm using a DTO to capture compact results parsed by ArduinoJson so that I can dispose of the buffers before continuing. All execution is single threaded.

 

When I added a new field to the DTO of name[16] the free space reported by MemoryFree dropped from a moderately comfortable 1 kB to below 700 bytes. Surprised and alarmed by the dramatic change (and with more uses of struct just ahead), I looked into various sizes of the name field. The free memory behaved as expected for shorter arrays, add two bytes to the array, lose two bytes of free memory.

 

However, on increasing from a length of 8 to 10 the free memory dropped by 323 bytes and the .text section increased by 194 bytes.

 

Changing from class to struct didn't change the behavior, nor did removing the initializer. I understand that alignment might cause a byte or three of difference, but this is unlike anything I've seen before. Edit: Checking sizeof(ParseDto) behaves as expected, 31 bytes for name[8], 33 bytes for name[10], 39 bytes for name[16].

 

What is behind this so that I can both resolve the specific issue, as well as try to make sure I haven't repeated the mistake elsewhere?

 


 

Toolchain is platformio 3.6.1 on macOS, avr-g++ (GCC) 5.4.0

 

    class ParsedDto {
    public:
        Commands            command;  // enum class Commands
        unified_error_t     error;    // uint16_t
        i2c_addr_t          addr;     // uint8_t
        bool                addr_all;
        float               offset_t;
        float               offset_h;
        char                name[16];
        char                type[SensorRegistry::kType_string_buf_len];  // const size_t kType_string_buf_len{9};
    };

 

A single instance of ParsedDto is created on the stack as local. The Free heap: value shown is from MemoryFree, reported at the time that the instance of ParsedDto is on the stack.

 

With name[8]

===> setup() done
{}
Free heap: 1026

DATA:    [====      ]  36.2% (used 742 bytes from 2048 bytes)
PROGRAM: [=======   ]  69.1% (used 22280 bytes from 32256 bytes)
.pioenvs/uno/firmware.elf  :
section                     size      addr
.data                        510   8388864
.text                      21770         0
.bss                         232   8389374
.comment                      17         0
.note.gnu.avr.deviceinfo      64         0
.debug_aranges               416         0
.debug_info                 3786         0
.debug_abbrev               1702         0
.debug_line                 2374         0
.debug_str                   520         0
Total                      31391

With name[10]

===> setup() done
{}
Free heap: 703

DATA:    [====      ]  36.2% (used 742 bytes from 2048 bytes)
PROGRAM: [=======   ]  69.7% (used 22474 bytes from 32256 bytes)
.pioenvs/uno/firmware.elf  :
section                     size      addr
.data                        510   8388864
.text                      21964         0
.bss                         232   8389374
.comment                      17         0
.note.gnu.avr.deviceinfo      64         0
.debug_aranges               416         0
.debug_info                 3786         0
.debug_abbrev               1702         0
.debug_line                 2374         0
.debug_str                   520         0
Total                      31585

 

 

This topic has a solution.
Last Edited: Mon. Nov 5, 2018 - 10:03 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If it has that much effect clearly more than one instance of the class is created.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
more than one instance of the class is created

 

Only one instance is created within one function, known to be non-reentrant, during single-threaded execution. Confirmed by replacing the default constructor with a "debug print" and only one instance is seen.

 

Edit: Behavior for up through [8] is linear -- add two bytes, lose two bytes of run-time memory -- it is step-wise discontinuous on increasing to [10], so "multiple instances" doesn't seem to explain this behavior either.

 

Also, .text, at least as I understand it, also shouldn't increase non-linearly in that way (based on run-time usage) as it is created at compile time.

 

Edit: Behavior of size of .text and free memory with varying sizes of name[]

 

name  .text   free
----  -----   ----
[2]   21710   1034
[4]   21710   1032
[6]   21710   1030
[8]   21710   1028
[9]   21834    711
[10]  21834    710
[12]  21834    708

 

It is almost as if when the array length exceeds 8 that it is handled very differently. Only one example of this, but 8 curiously is a power of two. However, changing from type[9] to type[8] doesn't "magically" shrink .text and only increases the free memory indication by 1 byte.

Last Edited: Sat. Nov 3, 2018 - 04:34 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Unless you've explicitly defined "heap start" & "heap end" there must be some algorithm based on estimated stack usage. Perhaps you've crossed some magic number boundary.

 

Another theory:

There is a limit to the indexing offset in "Base register+Offset" addressing mode (maybe 64) when you cross that boundary extra code is required. I see you did sizeof(ParseDto)=39 but what else is on the stack ?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'll look into what explicitly defined "heap start" & "heap end" means and how to try it.

 

The loss of free memory appears at the start of execution, after the Arduino-framework main() calls setup() -- so not anything other than "Arduino essentials" and the frames for main() and setup() should be on the stack.

void
setup() {

    Serial.begin(BAUD_RATE);
    while ( ! Serial ) continue;

    showFreeMemory();

for differing sizes of name[] the indicated free memory:

[7]    1227
[8]    1226
[9]     909
[10]    908
[16]    902

 

This suggests to me that the impact is due to the contents of the .initN sections, which are in the .text section. The "swelling" of the .text section is coincident with the reduction of free memory.

 

If anyone has some suggestions as to how to examine those sections, I'd appreciate it. 

 


 

As a "sanity check", I downloaded the official AVR 8-bit Toolchain 3.6.2 - Mac OS X 64-bit as listed on https://www.microchip.com/mplab/avr-support/avr-and-arm-toolchains-c-compilers and it behaves the same way.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, regular visitors here like to solve puzzles.  But you need to give us more to go on, such as a small complete test program that exhibits the symptoms.  Then we can all play along with the home game. [edit] but I don't know how many of us might have your same setup.   Surely it couldn't be an idiosyncrasy in platformio!  Barring full source, full maps  and/or full .LSS.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Sat. Nov 3, 2018 - 07:23 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Well, regular visitors here like to solve puzzles.  But you need to give us more to go on, such as a small complete test program that exhibits the symptoms.

 

I know that feeling well, from other platforms and projects! Right now with close to 4000 lines of code and comments, I need to find a "minimal-failing" example.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This looks like a bug in Json and nothing to do with your code (excepting that it exposes the bug).  If that's the case then the best you can hope for in this forum is suggestions for a work-around.

 

Though, as the IBM is famously said to claim: "Than's not a bug, its a feature!"

Nicholas O. Lindan

Cleveland Engineering Design, LLC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That very well may be and a path that I'll try to pursue (Edit: if a compiler-based problem isn't the root of this).

 

Edit: To be clear, this DTO isn't passed to ArduinoJson code, it is used to store the results of calls to methods in that library. Typically the "raw" result is examined and an intermediate, derived value is set in the DTO. The closest ArduinoJson gets to "touching" the DTO is

                    parsed_dto.offset_t = parsed[AS_FLASH_STRING_HELPER(KeyPstrs::kOffset_t)];
                    parsed_dto.offset_h = parsed[AS_FLASH_STRING_HELPER(KeyPstrs::kOffset_h)];

 

In the mean time, enjoy:

 

http://www.computerhistory.org/t...

Last Edited: Sat. Nov 3, 2018 - 11:46 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

WTF is a JSON DTO?

TYVM!!

 

--Mike

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Something I wouldn't use on a machine with more memory!

 

Basically, when parsing JSON with the ArduinoJson library, the structure of the JSON is stored in a JsonBuffer object with the content often as references to other contemporaneous objects, but as copies of char*, String, and __FlashStringHelper* (recast program-memory references). As the buffer is big by AVR-memory standards (200-400 bytes in my case), it can be imperative to extract the data and dispose of the buffer before taking action on the parsed data. The "trick" is to parse, then copy the "interesting" data to a DTO that is much more compact than the original JSON.

 

In my case I can pass in a reference to a 39-byte DTO to a function, create a transient 200-byte StaticJsonBuffer on the stack, parse, return from the function and reclaim 200 bytes for a net savings of ~160 bytes. As I need to create a transient 400-byte StaticJsonBuffer to generate the output, those 160 bytes come in quite handy.

 

See further https://arduinojson.org/v5/faq/w...

 

Last Edited: Sun. Nov 4, 2018 - 04:52 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK I Googled it - "Data Transfer Object"

 

Some people seem to really dislike it!

--Mike

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

I need to find a "minimal-failing" example.

Lacking that, at least before&after output from "avr-nm -SC -size-sort foo.elf" (which should at least tell us where the extra code came from.)

 

Last Edited: Sun. Nov 4, 2018 - 09:28 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jeffsf wrote:
Right now with close to 4000 lines of code and comments, I need to find a "minimal-failing" example.

That exercise (disabling chunks of stuff) will often help to pinpoint the problem area and lead to discovery of the cause.

 

Still, comparing the full maps of the test cases should help.  Then examine the LSS for the suspect areas.

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Sun. Nov 4, 2018 - 01:26 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
output from "avr-nm -SC -size-sort foo.elf"

 

Thanks -- that was the kind of hint I was hoping for!

 

So surprising were the results on the binaries I had saved away before, I captured name[8] and name[9] versions as git commits and rechecked things

 

$ git diff 53ee59e 192a81b
diff --git a/arduino/egg_sensor/src/serial_api_v1.cpp b/arduino/egg_sensor/src/serial_api_v1.cpp
index 4c63c36..01f6dea 100644
--- a/arduino/egg_sensor/src/serial_api_v1.cpp
+++ b/arduino/egg_sensor/src/serial_api_v1.cpp
@@ -116,7 +116,7 @@ namespace SerialApiV1
         float               offset_t;
         float               offset_h;
         // TODO: Get magic number from SensorSet::PersistentStorage::MAX_DEVICE_NAME_LENGTH (private, ATM)
-        char                name[8];
+        char                name[9];
         char                type[SensorRegistry::kType_string_buf_len];

 

    With name[8]

    .pioenvs/uno/firmware.elf  :
    section                     size      addr
    .data                        510   8388864
    .text                      21710         0
    .bss                         232   8389374
    .comment                      17         0
    .note.gnu.avr.deviceinfo      64         0
    .debug_aranges               416         0
    .debug_info                 3786         0
    .debug_abbrev               1702         0
    .debug_line                 2374         0
    .debug_str                   520         0
    Total                      31331

    ===> setup() early
    Free heap: 1226

 

    With name[9]

    .pioenvs/uno/firmware.elf  :
    section                     size      addr
    .data                        510   8388864
    .text                      21834         0
    .bss                         232   8389374
    .comment                      17         0
    .note.gnu.avr.deviceinfo      64         0
    .debug_aranges               416         0
    .debug_info                 3786         0
    .debug_abbrev               1702         0
    .debug_line                 2374         0
    .debug_str                   520         0
    Total                      31455

    ===> setup() early
    Free heap: 909

 

So the change from name[8] to name[9]:

  • Increases the size of the .text area by 124 bytes
  • Decreases the indicated free at start by 317 bytes

 

(The code is slightly different than when the thread was started through removal of the explicit ParsedDto() initializing constructor and diagnostic ParsedDto::Dump() that would print the contents)

 

Thanks to @westfw, for each of the two generated firmware.elf:

 

avr-nm -SC --size-sort firmware.elf > nm-SC.out

cut -d ' ' -f 2- nm-SC.out > nm-SC.cut

 

and compare the two (name[8] on the left, name[9] on the right)

 

$ diff --side-by-side --suppress-common 53ee59e/nm-SC.cut 192a81b/nm-SC.cut
00000002 D UnifiedErrors::NO_CMD_RECOGNIZED		      <
							      >	00000002 D SensorRegistry::TypeStrings::kMAX30205
00000004 B timer0_overflow_count			      |	00000004 b timer0_overflow_count
00000006 t MAX30205::GetTypeString() const		      |	00000006 T MAX30205::GetTypeString() const
00000014 t Serial0_available()				      |	00000014 T Serial0_available()
00000018 T millis					      |	00000018 t millis
00000044 t HardwareSerial::_tx_udr_empty_irq()		      |	00000044 T HardwareSerial::_tx_udr_empty_irq()
00000046 t MAX30205::ReadToJson(ArduinoJson::JsonObject&)     |	00000046 T MAX30205::ReadToJson(ArduinoJson::JsonObject&)
00000076 t MAX30205::SetupDefault()			      |	00000076 T MAX30205::SetupDefault()
							      >	00000078 T main
0000008e t MAX30205::Measure()				      |	0000008e T MAX30205::Measure()
0000009a T HardwareSerial::write(unsigned char)		      |	0000009a t HardwareSerial::write(unsigned char)
000000aa t MAX30205::ReadToSerial()			      |	000000aa T MAX30205::ReadToSerial()
000000aa t MAX30205::DescribeToSerial()			      |	000000aa T MAX30205::DescribeToSerial()
000002da T SerialApiV1::parseJsonToDto(LineBuffer*, SerialApi <
000005f6 T main						      |	000008d4 T setup

 

The first two lines are unremarkable to me as they are a uint16_t and a const char* const ("MAX30205" for 9 or 10 bytes of storage) in namespaces that may have had their limited usages inlined or otherwise optimized. 

 

Now where things are very surprising are with main, setup, and SerialApiV1::parseJsonToDto()

 

Calling sequence here is:

 

main()

setup()

while (true) {

// get a line

SerialApiV1::processLine()

ParsedDto parsed_dto;

parseJsonToDto()

...

return;

}

 

With name[8], there is no setup symbol at all. I am guessing that its single usage was subsumed into main.

 

With name[9], there is no SerialApiV1::parseJsonToDto() symbol. Its single usage is in SerialApiV1::processLine() which has a single usage in setup(). processLine() does not appear in either symbol table, so appears to have been optimized out (in?).

 

(I grudgingly deal with the Arduino framework's hidden main() by using setup() as my "main()" and ignoring loop() and all the rest. Getting onto another framework is a topic for another day.)

 

name[8]: 0x2da + 0x5f6 = 2256 bytes

name[9]: 0x078 + 0x8d4 = 2380 bytes

 

with a difference of 124 bytes, which is the difference seen in the size of the .text sections.

 

While I may never know why the compiler made very different decisions based on a single-byte change in a single field, it clearly did.

 

 

 

Now why this leads to a difference of 317 bytes very early in program execution is still a puzzle to me.

 

 

const char* const SensorRegistry::TypeStrings::kMAX30205 -- 8 or 9 bytes of string constant -- maybe there are a dozen usages of this somewhere...

 

Used as a return value for an overriden, virtual class method defined in the .h file, and once in a strcmp() call, so it seems that there should only be one reference to this in a virtual-function table somewhere and one for the strcmp() call.

 

Even still, changes in compiled instructions aren't even in the same address space as SRAM on an AVR device.

 

 

 

I keep coming back to a member of the .text section

.init4:

For devices with > 64 KB of ROM, .init4 defines the code which takes care of copying the contents of .data from the flash to SRAM.

 and think that it has to be doing something very different as well between the two builds.

 

 

Last Edited: Sun. Nov 4, 2018 - 05:17 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

I keep coming back to a member of the .text section

>> .init4:

>> For devices with > 64 KB of ROM

 Why do you suspect that?  Your 328p doesn't have >64k of ROM.

While interpreting a disassembly output (avr-objdump -SC) might be a pain on a program with 4kLOC, it should be pretty easy to rule out the .initN sections, all of which should show up at the startup vector location, before main() is called.

 

You don't by any chance have a case where your structure is passed by value rather than by reference/pointer, do you?   I don't know exactly what avr-gcc will do once the parameters no longer fit in the registers, but I'm pretty sure it's not very pretty...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Times like this have me wishing I had a debugger for the ATmega328P. If the behavior is the same on an ATmega32U4 (which I believe supports JTAG debugging), I'll try to confirm it there.

 

The coincidence of a 32-byte instance and 32 registers (at least as I read the datasheet) pointed out by @westfw is certainly an interesting one. I'll get back to that in a bit.

 

I did check and confirmed that the instance was being passed by reference. For sanity, I also tried passing by pointer, as well as declaring it a global and accessing it directly. No noticeable changes in behavior.

 

For those "playing along at home", a few things tickled my thinking:

  • Decreases the indicated free at start by 317 bytes

In setup():

                // Line done -- call dispatch
                Serial.print('\n');
                showFreeMemory();
                SerialApiV1::processLine(lineBuffer, sensorSet);

Which looked like:

    unified_error_t
    processLine(LineBuffer *lineBuffer, SensorSet *sensor_set)
    {
        unified_error_t error{UnifiedErrors::NO_ERROR};

        ParsedDto parsed_dto;

        parseJsonToDto(lineBuffer, parsed_dto);

        parsed_dto.Dump();

        error = takeAction(parsed_dto, sensor_set);

        lineBuffer->Clear();

        return error;
    }

which, from

With name[9], there is no SerialApiV1::parseJsonToDto() symbol. 

probably doesn't "look" like a function call in the compiled code.

 

Taking that another step, the compiled code that resulted from the declaration of that function may not be step-by-step the same as

    unified_error_t
    parseJsonToDto(LineBuffer* lineBuffer, ParsedDto& parsed_dto)
    {
        using namespace JsonStrings;

        StaticJsonBuffer<300> jsonBuffer; // 300 may not be overkill after all!

Hmmm, 300 bytes from jsonBuffer, pretty close to

a difference of 317 bytes very early in program execution is still a puzzle to me

 

Checking at the "top level" of the calling tree seemed reasonable to determine "idle" memory availability.

 

In retrospect, even though the code looked like it was "outside" of all the functions where allocation was being done, it seems like "optimization" had other ideas of where that print statement was relative to allocation for functions that weren't actually "called".

 

Looking "inside" of the "function calls" (however GCC decided to implement them) reveals that the "peak" memory usage is comparable and suggests that the 300-byte buffer is "pre-allocated" somewhere in the code outside of the "function call" when the code is "rearranged".

 

with name[8]:

===> setup() done
Free heap: 1032
{hello:there}
Free heap: 1032
processLine() entry
Free heap: 1032
processLine() before parseJsonToDto()
Free heap: 1032
parseJsonToDto() entry
Free heap: 703
parseJsonToDto() after StaticJsonBuffer<300> jsonBuffer;
Free heap: 703

Using 26 of 300; Free heap: 697
parseJsonToDto() after jsonBuffer.parseObject();
Free heap: 697
processLine() after parseJsonToDto()
Free heap: 1032
processLine() before takeAction()
Free heap: 1032
takeAction() entry
Free heap: 1032
{"result":null,"error":"0xb000","error_string":"No valid cmd:<value> recognized","api_version":0.900000}
takeAction() just prior to return
Free heap: 1032
processLine() after takeAction()
Free heap: 1032

with name[9]:

===> setup() done
Free heap: 711
{hello:there}
Free heap: 711
processLine() entry
Free heap: 711
processLine() before parseJsonToDto()
Free heap: 711
parseJsonToDto() entry
Free heap: 711
parseJsonToDto() after StaticJsonBuffer<300> jsonBuffer;
Free heap: 711

Using 26 of 300; Free heap: 705
parseJsonToDto() after jsonBuffer.parseObject();
Free heap: 705
processLine() after parseJsonToDto()
Free heap: 711
processLine() before takeAction()
Free heap: 711
takeAction() entry
Free heap: 711
{"result":null,"error":"0xb000","error_string":"No valid cmd:<value> recognized","api_version":0.900000}
takeAction() just prior to return
Free heap: 711
processLine() after takeAction()
Free heap: 711

 

sizeof(StaticJsonBuffer<300>) returns 308, so within a couple of bytes of the "difference" seen.

 

Perhaps something of interest in another application where that "pre-allocation" would push SRAM usage over the limit.

 

Visible optimization flag is -Os

 

For reference, the compiler line was (though I suspect the build system adds more, as -fno-rtti seems to be in effect)

avr-g++ -o .pioenvs/uno/src/serial_api_v1.cpp.o -c -fno-exceptions \
    -fno-threadsafe-statics -fpermissive -std=gnu++11 -Os -Wall \
    -ffunction-sections -fdata-sections -flto -mmcu=atmega328p \
-DPLATFORMIO=30601 -DARDUINO_AVR_UNO -DF_CPU=16000000L \
-DARDUINO_ARCH_AVR -DARDUINO=10805 \
-Isrc -I.piolibdeps/ArduinoJson_ID64/src -Ilib/I2C -Ilib/MemoryFree \
-I/Users/jeff/.platformio/packages/framework-arduinoavr/cores/arduino \
-I/Users/jeff/.platformio/packages/framework-arduinoavr/variants/standard \
src/serial_api_v1.cpp

 

Last Edited: Mon. Nov 5, 2018 - 08:09 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Why can't you just diff the LSS? Why do you need to run the code with a debugger to see what's going on?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Why can't you just diff the LSS?

 

Thanks for the suggestion! I'll look into how to do that.

 

Edit: Thanks for the pointer. 

avr-objdump -h -S .pioenvs/uno/firmware.elf

seems to provide great insight.

Last Edited: Mon. Nov 5, 2018 - 08:23 PM
This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

If you have a function with 300 bytes of local variables, and the compiler decides to inline that function into main(), then the 300 bytes will be allocated at the beginning of main(), not "just before the function code."  That could explain the difference in heap size at the "beginning" of your program.

This should be offset by no longer allocating extra stack space when the function code IS executed.

 

If you have two functions allocating lots of local variables, and the both get inlined, you might not get the "interleaving" of memory allocation that you were hoping to get, and it's time to investigate __attribute__((noinline)) to prevent such behavior (if necessary.)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
it's time to investigate __attribute__((noinline)) to prevent such behavior

 

Always good to know the tricks of beating a compiler into submission. Your suggestions and prodding have been very helpful. 

 

Now if someone can beat the C++ committee into submission so that __flash as an address-space qualifier can be supported in C++;)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Why can't you just diff the LSS?
Does that ever work?

IME one small change near the beginning can cause everything after it to be different. 

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Tell your diff program to ignore the columns with addresses and opcodes and then the changes in the source will be more obvious