Forum Menu




 


Log in Problems?
New User? Sign Up!
AVR Freaks Forum Index

Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Author Message
mckeemj
PostPosted: Jul 19, 2011 - 06:11 PM
Resident


Joined: Jun 26, 2006
Posts: 625
Location: San Luis Valley, Colorado ( 2,318m )

I finally got around to doing some tests of my own with 4.6.1 against 4.3.3 in WinAVR20100110. The good news is that there were no code breakages from the compiler ( or hidden code issues... yet! ). For the most part I found similar results to wek in that there was little change in size. There are a few cases, however, where this does not hold true.
Code:

   target   |  GCC  | .text| .data|.text %
-------------------------------------------
frequency   | 4.3.3 | 1588 |  184 |  --
meter       | 4.6.1 | 1602 |  184 |  +.9%
-------------------------------------------
xevent      | 4.3.3 | 1194 |   30 |  --
test        | 4.6.1 | 1220 |   30 |  +2.2%
-------------------------------------------
xevent      | 4.3.3 |  564 |    8 |  --
delay       | 4.6.1 |  382 |    8 |  -32%
-------------------------------------------
xio         | 4.3.3 |   84 |    0 |  --
test        | 4.6.1 |  122 |    0 |  +45%
-------------------------------------------
xio         | 4.3.3 |  352 |   26 |  --
test 2      | 4.6.1 |  342 |   30 |  -2.9%
-------------------------------------------
main        | 4.3.3 | 7834 |  370 |  --
controller  | 4.6.1 | 7218 |  406 |  -7.9%


In all cases, these programs proved quite difficult for GCC to optimize. They have heavy use of function pointers and indirection, including multiple indirection. Of particular note are two pairs: xevent_delay and xio_test and xio_test_2 and main_controller. The first pair xevent_delay and xio_test are obviously the outliers with excessively wide size variations. In the first case 4.6.1 found an optimization ( a call through a function pointer ) that 4.3.3 missed, in the second case it missed one. The second pair, xio_test_2 and main_controller both resulted in a reduced code size at the expense of reduced SRAM footprint optimization.

It is my feeling that 4.6.1 is ready for prime time, though, of course, I would hardly consider the testing I've done to be conclusive. On average the result is the same or, perhaps, slightly smaller. It would still be interesting to see what difference LTO might make, but I have been unable to find the time to attempt a compilation under cygwin.

Martin Jay McKee

P.S. The options I used for the above tests ( 4.3.3 and 4.6.1 ) were ( compiled for an AtMega8 ),
Code:
Compile:
-Os -std=gnu99 -ffunction-sections -fno-exceptions -fno-inline-small-functions -funsigned-bitfields -fshort-enums -fno-split-wide-types -fno-tree-scrv-cprop -ffreestanding

Link:
-Wl,-gc-sections,relax

_________________
As with most things in engineering, the answer is an unabashed, "It depends."
 
 View user's profile Send private message  
Reply with quote Back to top
SprinterSB
PostPosted: Jul 19, 2011 - 06:59 PM
Posting Freak


Joined: Dec 21, 2006
Posts: 1483
Location: Saar-Lor-Lux

Is it possible to factor out a module of xio_test that contributes most to size increase an supply it as .i?

Maybe it's related to PR46278.

Did you try if -fno-tree-loop-optimize fixes some size regressions?


Last edited by SprinterSB on Jul 19, 2011 - 08:30 PM; edited 1 time in total
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
wek
PostPosted: Jul 19, 2011 - 07:07 PM
Raving lunatic


Joined: Dec 16, 2005
Posts: 3089
Location: Bratislava, Slovakia

mckeemj wrote:
They have heavy use of function pointers and indirection, including multiple indirection.

Wow.
Doesn't sound like a typical 8-bit microcontroller program... Nevertheless an interesting basis for comparison!

mckeemj wrote:
The second pair, xio_test_2 and main_controller both resulted in a reduced code size at the expense of reduced SRAM footprint optimization.

This would worry me.

In mcus, the RAM is a precious resource, more precious than FLASH (my "personal", completely unscientific, factor is 4 Smile ). Unfortunatly, we don't know what the stack usage is...

Could you please try to track down where did (at least some of) those extra RAM bytes go?

Jan
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
mckeemj
PostPosted: Jul 20, 2011 - 06:08 PM
Resident


Joined: Jun 26, 2006
Posts: 625
Location: San Luis Valley, Colorado ( 2,318m )

SBSprinter wrote:
Is it possible to factor out a module of xio_test that contributes most to size increase an supply it as .i?

wek wrote:
Could you please try to track down where did (at least some of) those extra RAM bytes go?


To both, yes. I'll see what I can do. Unfortunately I wasn't thinking when I did the last set of tests and failed to create map files for each. It's not a major hassle mind you, but I have to be at the right machine.

The results for xio_test were the ones that bothered me the most... 45% size increase is not good; but then, it is 45% of the ( otherwise ) smallest ( trivial actually ) program. So it really isn't as much difference ( code size wise ) as some other cases.

wek wrote:
Wow.
Doesn't sound like a typical 8-bit microcontroller program... Nevertheless an interesting basis for comparison!

No... not exactly standard from an implementation point of view! The project the code was written for is pretty standard application however. Since it was a personal project I decided I wanted to test how well avr-gcc handled optimizations in a "non-standard" environment. I'd say it did reasonably well considering.

And, in regards to increased RAM usage...
wek wrote:
This would worry me.

In mcus, the RAM is a precious resource, more precious than FLASH

I agree that a small decrease in Flash usage at the cost of RAM increase is a poor trade off. Here again, however, it is a case of missed optimizations that the last version ( 4.3.3 ) caught. All of the RAM used is explicitly defined in the code. Some of it, however, can be optimized to constants or register access ( if memory serves -- I'll have to recheck exactly how it removed it before ). I'll see if, while I am doing other analysis, I can find a clean example about what changed here.

Martin Jay McKee

_________________
As with most things in engineering, the answer is an unabashed, "It depends."
 
 View user's profile Send private message  
Reply with quote Back to top
dl8dtl
PostPosted: Aug 01, 2011 - 12:20 PM
Raving lunatic


Joined: Dec 20, 2002
Posts: 7277
Location: Dresden, Germany

> So, I'd suggest to Joerg to modify the makefile template in mfile accordingly

The issue with Mfile is that there's no actual release strategy/policy for
it at all. Eric used to maintain a CVS tree for it as part of WinAVR, but
that one eventually got abandoned, and I think he didn't intend to ever
ship Mfile anymore even with a new WinAVR.

So, it's eventually up to the Mfile users to simply edit their template.

Despite, as the option used to work once, if it no longer does, either
something is broken, or the option should be dropped from gas.

_________________
Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.
Please read the `General information...' article before.
 
 View user's profile Send private message Send e-mail Visit poster's website 
Reply with quote Back to top
clawson
PostPosted: Aug 01, 2011 - 12:25 PM
10k+ Postman


Joined: Jul 18, 2005
Posts: 62324
Location: (using avr-gcc in) Finchingfield, Essex, England

Quote:

Eric used to maintain a CVS tree for it as part of WinAVR, but
that one eventually got abandoned, and I think he didn't intend to ever ship Mfile anymore even with a new WinAVR.

That'd be a huge pity if it were true. How many more objcopy's not including .data problems would we see here (for example) if a lot of people weren't using an Mfile template. What about lacking -lm's or ease of selecting printf/scanf support?

_________________
 
 View user's profile Send private message  
Reply with quote Back to top
wek
PostPosted: Aug 01, 2011 - 12:36 PM
Raving lunatic


Joined: Dec 16, 2005
Posts: 3089
Location: Bratislava, Slovakia

dl8dtl wrote:
The issue with Mfile is that there's no actual release strategy/policy for
it at all.
Why can't it be simply attached to avr-libc? It already contains non-strictly-libc-related stuff, e.g. in the documentation. I faintly recall that nobody complained last time this was suggested.

dl8dtl wrote:
[...]I think he didn't intend to ever
ship Mfile anymore even with a new WinAVR.
Why would he want to do that? What would be the replacement?

--------
[in the following, we are talking about the "h" in "-Wa,-adhlns=filename.lst", which is supposed to add "high level language listing" to the .lst file]

dl8dtl wrote:
Despite, as the option used to work once,
Did it? In my 2007 vintage WinAVR it definitively does not work.

dl8dtl wrote:
if it no longer does,

The problem is inverse, namely, that it now DOES work - and results in excessively long assembly times.

Jan
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
dl8dtl
PostPosted: Aug 01, 2011 - 02:25 PM
Raving lunatic


Joined: Dec 20, 2002
Posts: 7277
Location: Dresden, Germany

> Why can't it be simply attached to avr-libc?

Well, documentation about the entire toolchain has always been on the
agenda of the avr-libc project, that's why you can find it there.

Hosting completely unrelated projects is another thing though. There's also
the "AVR Super-Project" on savannah where it could be hosted -- yet,
just a hosting place still wouldn't mean there were any kind of release
policy or strategy, and I'm afraid I'm simply out of resources for doing
so.

> The problem is inverse, namely, that it now DOES work

OK, now I can see the issue with it.

_________________
Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.
Please read the `General information...' article before.
 
 View user's profile Send private message Send e-mail Visit poster's website 
Reply with quote Back to top
wek
PostPosted: Sep 16, 2011 - 01:34 PM
Raving lunatic


Joined: Dec 16, 2005
Posts: 3089
Location: Bratislava, Slovakia

More findings.

Highjacking a different thread, I remarked taht the compiler outputs into asm in form of a comment the stack size and stack frame size for every function. I asked for adding two more similar items, namely "net" register usage of a function, and stack usage for parameters passed through stack for each function call. The former to analyze whether C does not get inadvertently into way of reserved registers (to be used in asm portions of the program); the latter to enable a crude worst-case stack usage analysis.

Related to the latter, SprinterSB mentioned, that there's a new -fstack-usage switch in the newer versions, outputting some stack usage data into a .su file. While it does *not* do what I requested, I got curious and gave it a few tries. Here are the findings, not all of them related to -fstack-usage.

1.
This is how the .su file looks like for a .c source with 5 functions:
Code:

j.c:4:6:foo_with_long_name   34   static
j.c:16:6:bar   2   dynamic
j.c:36:6:used_uninitialized1   0   static
j.c:42:6:used_uninitialized   0   static
j.c:69:5:main   4   dynamic,bounded
Apparently the file:line:column format is aimed at automatic parsers such as those which allow to jump to the given spot upon clicking in IDEs, it is the same as the error/warning format. Nice.

It would be a bonus if the numbers could be kept visually in a column, but that is not simple I admit.

2.
Code:
void foo_with_long_name(long a, long b, long c, long d, long e) {
  volatile long l, l1, l2, l3, l4;
 
  l = a;
  l1 = b;
  l2 = c;
  l3 = d;
  l4 = e;
}
Code:
  12                  foo_with_long_name:
  13 0000 4F92            push r4
  14 0002 5F92            push r5
  15 0004 6F92            push r6
  16 0006 7F92            push r7
  17 0008 AF92            push r10
  18 000a BF92            push r11
  19 000c CF92            push r12
  20 000e DF92            push r13
  21 0010 EF92            push r14
  22 0012 FF92            push r15
  23 0014 0F93            push r16
  24 0016 1F93            push r17
  25 0018 CF93            push r28
  26 001a DF93            push r29
  27 001c CDB7            in r28,__SP_L__
  28 001e DEB7            in r29,__SP_H__
  29 0020 6497            sbiw r28,20
  30 0022 0FB6            in __tmp_reg__,__SREG__
  31 0024 F894            cli
  32 0026 DEBF            out __SP_H__,r29
  33 0028 0FBE            out __SREG__,__tmp_reg__
  34 002a CDBF            out __SP_L__,r28
  35                  /* prologue: function */
  36                  /* frame size = 20 */
  37                  /* stack size = 34 */

It appears, that the .su file outputs the "stack size", i.e. stack frame (mainly local variables) plus pushed registers (there is more, see below for correction).

It is sometimes useful to know how much is gulped by local variables, but luckily the .asm/.lst contains this; it might be even possible to add a third number for non-local-variable content of the stack frame (spills, maybe others I don't know of).

3.
Code:

#include <alloca.h>

void bar(unsigned char l) {
  volatile unsigned char * p;
  p = alloca(l);
}

alloca dynamically allocates memory on stack, thus prevents to calculate stack size. That seems to be indicated by "dynamic" in the respective line of .su file, while it also indicates the 2 bytes which in bar() are occupied "statically":
Code:

  94                  bar:
  95 0092 CF93            push r28
  96 0094 DF93            push r29


4. (not related to -fstack-usage)
Code:

typedef struct {
  union {
    unsigned char a;
    unsigned char k;
  };
  unsigned char b; 
} Ts;

volatile unsigned char voln;
void used_uninitialized1(void) {
  Ts s;

  voln = s.a;
}

void used_uninitialized(void) {
  Ts s;

  switch(voln) {
    case 0:
      s.a = 1;
      break;
    case 1:
      s.a = 55;
      break;
  };
  switch(s.a) {
    case 1:
      voln = 5;
      break;
  }
}

When compiled with -Wall, there is a warning issued for the second function (for the line with "switch(s.a)"): j.c:53:3: warning: 's.<U6960>.a' may be used uninitialized in this function [-Wu
ninitialized]

The new, a bit surprising but nice feature is, that the "anonymous" union in struct here gets a name (<U6960>) Smile

The other surprising thing is, that the first function does NOT trigger a warning, although it's quite obvious that the s.a variable is certainly used without initialisation. The resulting code indicates that the compiler (rightfully) entirely eliminated the s variable, which I guess is the reason why it does not warn:
Code:
 126                  used_uninitialized1:
 127                  /* prologue: function */
 128                  /* frame size = 0 */
 129                  /* stack size = 0 */
 130                  .L__stack_usage = 0
 131 00c0 1092 0000       sts voln,__zero_reg__
 132                  /* epilogue start */
 133 00c4 0895            ret

Corollary is, that if you do something stupid, you must do it in a sophisticated way, if you want to get warned... The blunt, plain stupidity is overlooked... Wink

5.
Code:

void not_supported(void) __attribute__((__naked__));
void not_supported(void) {
}

(i.e. any naked function) results in the following warning
Code:
j.c:64:1: warning: -fstack-usage not supported for this target [enabled by default]
The warning is OK, just the wording is a little bit strange.

6.
Code:
int main(void) {
  foo_with_long_name(1, 2, 3, 4, 5);
  bar(100);
  while(1);
}

foo_with_long_name() was deliberately constructed so that the compiler needs to pass parameters (here, the last one) through stack.
Code:
 162                  main:
 163                  /* prologue: function */
 164                  /* frame size = 0 */
 165                  /* stack size = 0 */
 166                  .L__stack_usage = 0
 167 0000 00D0            rcall .
 168 0002 0F92            push __tmp_reg__
 169 0004 85E0            ldi r24,lo8(5)
 170 0006 90E0            ldi r25,hi8(5)
 171 0008 A0E0            ldi r26,hlo8(5)
 172 000a B0E0            ldi r27,hhi8(5)
 173 000c EDB7            in r30,__SP_L__
 174 000e FEB7            in r31,__SP_H__
 175 0010 8183            std Z+1,r24
 176 0012 9283            std Z+2,r25
 177 0014 A383            std Z+3,r26
 178 0016 B483            std Z+4,r27
[... filling registers with the first 4 parameters as per ABI,
then the function call, purging of the stack,
and the rest of main() ...]


So, apparently, there is 0 stack consumed by the function prologue (register pushes + stack frame), and then 4 bytes used for the instance of function call (Footnote: this was compiled for a 'm2561, that's why the dummy "rcall." is used to "reserve" 3 bytes on stack - this optimalisation is the AFAIK only reason why binary compiled for 'm64x/'m128x cannot be used on the pin- and feature-compatible 'm256x - a switch allowing for this would be nice, but I am rational enough not to expect that happen). Those 4 bytes I would like to have reported in an identifiable comment somewhere around that call.

In the .su file, the respective line says "dynamic, bounded". We see 4 bytes reported, so it apparently adds up the stack usage; the "bounded" indicates that the maximum of all such stack usages will be used (tested by the slightly "extended" test file which is attached). So, at the end of the day, the number in .su file is different from the 2 numbers we already had in the .asm/.lst file... but that's the one representing better the real stack usage of the function.

7. (not related to -fstack-usage)
In the parameter-into-register-filling portion of main() just before calling foo_with_long_name, the following sequence struck my eye:
Code:
 197 003c AA24            clr r10
 198 003e BB24            clr r11
 199 0040 6501            movw r12,r10
 200 0042 6894            set
 201 0044 A2F8            bld r10,2
I admit I had to reach for the instruction set list to find out what "set" and "bld" stand for. A cunning way to load constant 4 into a non-ldi-able register indeed! Funny that in the "extended" version which is in the attachment, in an identical situation the same register gets loaded through a "standard" procedure using a "high" register. Those are the ways of a compiler... Wink

JW
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
SprinterSB
PostPosted: Sep 16, 2011 - 09:17 PM
Posting Freak


Joined: Dec 21, 2006
Posts: 1483
Location: Saar-Lor-Lux

wek wrote:
1. It would be a bonus if the numbers could be kept visually in a column
That text file contains TABs, maybe you are using the wrong tab width? I hate TABs.

Quote:
4. The other surprising thing is, that the first function does NOT trigger a warning, although it's quite obvious that the s.a variable is certainly used without initialisation.
Probably worth a bug report.

Quote:
5. [...] any naked function) results in
Code:
j.c:64:1: warning: -fstack-usage not supported for this target [enabled by default]
The warning is OK, just the wording is a little bit strange.
Maybe there are some target hooks to implement in order to help gcc to determine stack usage of functions which are non-standard, i.e. have target specific attributes. Didn't yet look into 4.6 Internals concerning that topic.

Quote:
6. [...] there is 0 stack consumed by the function prologue [...], and then 4 bytes used for the instance of function call [...]. Those 4 bytes I would like to have reported in an identifiable comment somewhere around that call.
The .su file reports the 4 bytes.

Quote:
7.
Code:
clr r10
clr r11
movw r12,r10
set
bld r10,2
I admit I had to reach for the instruction set list to find out what "set" and "bld" stand for. A cunning way to load constant 4 into a non-ldi-able register indeed!
Thanks for drawing attention to that, the 4.7 would print
Code:
set
clr r10
bld r20,2
clr r11
clr r12
clr r13
in the same situation because of changes to output_reload_insisf. So that function will have to be even more complicated Sad
Quote:
Funny that in the "extended" version which is in the attachment, in an identical situation the same register gets loaded through a "standard" procedure using a "high" register.
That's because a high register is available, whereas is the BLD case above, no d-reg is available.
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
wek
PostPosted: Sep 16, 2011 - 09:41 PM
Raving lunatic


Joined: Dec 16, 2005
Posts: 3089
Location: Bratislava, Slovakia

SprinterSB wrote:
Quote:
6. [...] there is 0 stack consumed by the function prologue [...], and then 4 bytes used for the instance of function call [...]. Those 4 bytes I would like to have reported in an identifiable comment somewhere around that call.
The .su file reports the 4 bytes.
It reports ONLY the highest number if there are multiple function calls in a function. For worst-case tree analysis, I need this number for ALL functino calls in the function.

Jan
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
SprinterSB
PostPosted: Sep 16, 2011 - 10:09 PM
Posting Freak


Joined: Dec 21, 2006
Posts: 1483
Location: Saar-Lor-Lux

wek wrote:
It reports ONLY the highest number if there are multiple function calls in a function. For worst-case tree analysis, I need this number for ALL functino calls in the function.
You have an example showing that GCC stack analysis is lower than the worst case?
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
wek
PostPosted: Sep 16, 2011 - 10:20 PM
Raving lunatic


Joined: Dec 16, 2005
Posts: 3089
Location: Bratislava, Slovakia

SprinterSB wrote:
wek wrote:
It reports ONLY the highest number if there are multiple function calls in a function. For worst-case tree analysis, I need this number for ALL functino calls in the function.
You have an example showing that GCC stack analysis is lower than the worst case?
GCC does not analyze the call *tree* (or at least I don't know of it).

I don't want to obtain a worst than the worst-case value, if it's possible to know it more precisely.

JW
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
Magister
PostPosted: Oct 06, 2011 - 07:26 PM
Hangaround


Joined: Aug 06, 2008
Posts: 141
Location: Montréal, QC

I tested this version with my project (running on a 168p), it went from 15822 bytes (using GCC4.3.0) to 17710 bytes (GCC4.6.1), a 12% increase.

I used avr-nm to check the size of various functions, one went from 1302 bytes to 2034 bytes (56% increase!), 170 to 258, 340 to 400, 152 to 248, etc

Analyzing the .lss it seems every calls to some functions (sprintf_P() in my case) adds a lot of bytes.

EDIT: It's about the same problem since 4.3.2
 
 View user's profile Send private message  
Reply with quote Back to top
SprinterSB
PostPosted: Oct 07, 2011 - 03:59 PM
Posting Freak


Joined: Dec 21, 2006
Posts: 1483
Location: Saar-Lor-Lux

It's hart to tell from the distance what causes the increased code size. (No source code, no compiler options, etc.)

One bug that can cause such increase is PR46278. Can you tell if that PR the cause of your problem?

_________________
avr-gcc NewsABIOptions4.8-WindowsInline Asm
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
SprinterSB
PostPosted: Oct 07, 2011 - 04:07 PM
Posting Freak


Joined: Dec 21, 2006
Posts: 1483
Location: Saar-Lor-Lux

I uploaded a new version of avr-gcc: http://sourceforge.net/projects/mobilec ... p/download

The "Release Notes" are the same as for http://www.avrfreaks.net/index.php?name ... 595#841595 except that the compiler is generated from SVN 179406 of gcc-4_6-branch.

Compared to the gcc-4.6.1-mingw32, following PRs are fixed:

PR50652, PR50289, PR49764, PR49487, PR46779, PR34734, PR44643, PR39633, PR39386.

Moreover, the compiler is configured with --disable-lto to disable LTO which does not work when building with my build environment. Thus, the zip is a bit smaller: It's 21MB and inflates to 68MB on disc.


Last edited by SprinterSB on Oct 21, 2011 - 05:40 PM; edited 2 times in total
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
SprinterSB
PostPosted: Oct 07, 2011 - 06:10 PM
Posting Freak


Joined: Dec 21, 2006
Posts: 1483
Location: Saar-Lor-Lux

There is also avr-gcc 4.7 snapshot 179594 that fixes the following AVR-specific PRs:

PR50652, PR50566, PR50465, PR50449, PR50447, PR50446, PR50358, PR49903, PR49881, PR49868, PR49864, PR49687, PR49313, PR47597, PR45099, PR43746, PR42210, PR36467, PR35860, PR34888, PR34790, PR33049, PR29560, PR29524, PR18145, PR17994.

The code for PR49868 is not upstream yet; you find it alongside with that PR. It enables you to compile the following C code with semantics as obvious:
Code:
#define _PGM __const __pgm
#define PGM_STR(X) ((_PGM char[]) { X })

int _PGM a = 1;
char _PGM *pstr = PGM_STR ("123");
long _PGM l[] = { 'a', 'b', 'c'};

char get_1 (char _PGM **p)       { return **p; }
char get_2 (char _PGM * _PGM *p) { return **p; }
char get_3 (char * _PGM *p)      { return **p; }
char get_4 (char **p)            { return **p; }

int main (void)
{
    return a + pstr[2] + l[1];
}

Notes:
  1. GCC 4.7 is still work in progress (stage 1)
  2. Implied by PR49687, PR49313 and PR29524, you must use the same version of avr-gcc to compile and link your objects.
  3. Be aware of implications of PR18145 when using section attributes.
  4. -fdata-sections affects data in .progmem, same applies to -fmerge-[all-]constants due to PR43746.
  5. Because of considerable problems, PR46278 is not yet fixed.


Last edited by SprinterSB on Oct 21, 2011 - 05:41 PM; edited 2 times in total
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
abcminiuser
PostPosted: Oct 09, 2011 - 01:36 PM
Moderator


Joined: Jan 23, 2004
Posts: 9830
Location: Trondheim, Norway

Just tried a non-trivial application (my Bluetooth stack and explorerbot, code available here) and the results are:

Code:
  GCC 4.3: 41476 bytes (.text + .data)
GCC 4.6.1: 41276 bytes


Which means that the new compiler performs marginally better (size-wise) in this instance. I was actually expecting a huge difference, due to the sheer number of changes to GCC between the two versions, but it seems this isn't the case.

I'm hoping the introduction of LTO will dramatically decrease the size of some of my applications, so I'm looking forward to being able to test that.

- Dean Twisted Evil

_________________
Atmel Studio 6.1 is now released, grab it here.
Report AS6/ASF bugs here.
 
 View user's profile Send private message Send e-mail Visit poster's website 
Reply with quote Back to top
wek
PostPosted: Oct 09, 2011 - 03:43 PM
Raving lunatic


Joined: Dec 16, 2005
Posts: 3089
Location: Bratislava, Slovakia

abcminiuser wrote:
Which means that the new compiler performs marginally better (size-wise) in this instance. I was actually expecting a huge difference, due to the sheer number of changes to GCC between the two versions

Why?

If you read the compiled results, do you see some obvious sources of inefficiency?

Have you ever tried also some "historical" version, perhaps some 3.x.x? You might be surprised.

It's quite visible how the newer versions approach things in a very different manner - but I expect no miracles, just the minor improvements both I (in reports above) and you have seen.

I guess there are cases when a certain writing "style" can be optimized better, perhaps in conjunction with certain gcc command-line switches (see Martin's report above) - but in a typical complete real-world embedded program, it's quite unlikely that a large program is written in such a "style".

abcminiuser wrote:
I'm hoping the introduction of LTO will dramatically decrease the size of some of my applications, so I'm looking forward to being able to test that.

I wouldn't hold my breath.

I personally have more expectations towards bugfixes than miraculously reduced code/data sizes.

But the really big thing I am looking forward to trying is the true "named" progmem support - thanks Johan!

Jan
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
abcminiuser
PostPosted: Oct 10, 2011 - 02:24 AM
Moderator


Joined: Jan 23, 2004
Posts: 9830
Location: Trondheim, Norway

Quote:

Why?

If you read the compiled results, do you see some obvious sources of inefficiency?



I'm not expecting miracles, but I do expect some improvements (otherwise what is the point of the newer versions, other than the obvious device support and bug fixes!) over the old. I'd expect to see improvements on register allocation, better peephole optimizations, smarter pointer usage and the like.

I'm not expecting a 40KB application to suddenly decrease to 20KB or anything, but I would expect the difference to be more than a few tenths of a percent.


Very excited to see the PROGMEM improvements however; that alone will make 4.7 a must have while also making my tutorial on the subject completely obsolete.

- Dean Twisted Evil

_________________
Atmel Studio 6.1 is now released, grab it here.
Report AS6/ASF bugs here.
 
 View user's profile Send private message Send e-mail Visit poster's website 
Reply with quote Back to top
Display posts from previous:     
Jump to:  
All times are GMT + 1 Hour
Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Powered by PNphpBB2 © 2003-2006 The PNphpBB Group
Credits