Function calls and optimizations in avr-gcc

Go To Last Post
9 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm very new to embedded programming, and been doing lots of reading on the subject.  Much to my surprised, I've recently heardthat a С function call requires quite a few clocks to process, as C saves and restores all sorts of registers when compiling to assembly (I heard the number 70-100 thrown out in a video from circa 2012, so it may be dated or refer to an older compiler. My own playing with the simulator suggests it's more like 30-40).  I'm wondering how well optimization handles this.

 

I'm used to writing modular code - breaking things up as functions, for ease of use and maintainability.  But if that comes with this sort of overhead, it may not be worth it, especially on code that needs to run fast.  Doesn't much matter on a modern PC, but on an 8-bit micro (my "standard" micro right now is the ATMega328p running at 8MHz), especially one running some sort of time constrained loop, it can be critical.

 

So I wonder how well modern avr-gcc optimizes code.  For example, I often write functions like this:

 

void foo()
{
    initialize();
    .
    .
    .
    //do stuff
    .
    .
    .
    cleanUp();
}

void initialize()
{
    //do start up tasks
}

void cleanUp()
{
    //do clean up tasks
}

In an example like this, the initialize() and cleanUp() functions are only ever run from foo(), and only run once each.  There's no reason not to include that code in foo() itself, except that I like to keep things broken up this way for neatness.  More style than anything.

 

Is the compiler smart enough to optimize away the function calls and collapse it into one function?  Would marking them "inline" help?  Or should I just change my style around this?

 

Thanks.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Declare your functions to be static.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The number of cycles required to call/return from a function is dependent upon the number of  parameters passed to the function and how many registers the function uses, so it will vary.  Looking at the assembler listing (<project>.lss) will show you exactly how many instructions/cycles is required for a particular function call.

 

Is the compiler smart enough to optimize away the function calls and collapse it into one function?

No.

Greg Muth

Portland, OR, US

Xplained/Pro/Mini Boards mostly

 

Make Xmega Great Again!

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is the compiler smart enough to optimize away the function calls and collapse it into one function?

No.

Yes, actually, provided the function is either declared inline or static.  However, it's true that this isn't always enough, and certain optimisation options can affect the behaviour.

 

Without static or inline:

$ cat foo.c
#include <avr/io.h>

void bar() {
  PORTB = PORTB;
}

void bat() {
  PORTC = PORTC;
}

void foo() {
  bar();
  bat();
}

int main(void) {
  foo();
  while (1) {}
}
$ avr-gcc -Wall -O1 -g -mmcu=atmega328p foo.c -o foo.elf
$ avr-objdump -S foo.elf
.
.
.
00000080 <bar>:
#include <avr/io.h>

void bar() {
  PORTB = PORTB;
  80:	85 b1       	in	r24, 0x05	; 5
  82:	85 b9       	out	0x05, r24	; 5
  84:	08 95       	ret

00000086 <bat>:
}

void bat() {
  PORTC = PORTC;
  86:	88 b1       	in	r24, 0x08	; 8
  88:	88 b9       	out	0x08, r24	; 8
  8a:	08 95       	ret

0000008c <foo>:
}

void foo() {
  bar();
  8c:	0e 94 40 00 	call	0x80	; 0x80 <bar>
  bat();
  90:	0e 94 43 00 	call	0x86	; 0x86 <bat>
  94:	08 95       	ret

00000096 <main>:
}

int main(void) {
  foo();
  96:	0e 94 46 00 	call	0x8c	; 0x8c <foo>
  while (1) {}
  9a:	ff cf       	rjmp	.-2      	; 0x9a <main+0x4>

Declaring bar() and bat() to be inline:

0000008c <foo>:
#include <avr/io.h>

inline void bar() {
  PORTB = PORTB;
  8c:	85 b1       	in	r24, 0x05	; 5
  8e:	85 b9       	out	0x05, r24	; 5
}

inline void bat() {
  PORTC = PORTC;
  90:	88 b1       	in	r24, 0x08	; 8
  92:	88 b9       	out	0x08, r24	; 8
  94:	08 95       	ret

00000096 <main>:
  bar();
  bat();
}

int main(void) {
  foo();
  96:	0e 94 46 00 	call	0x8c	; 0x8c <foo>
  while (1) {}
  9a:	ff cf       	rjmp	.-2      	; 0x9a <main+0x4>

Declaring them to be static but not inline has a similar result:

00000080 <foo>:
#include <avr/io.h>

static void bar() {
  PORTB = PORTB;
  80:	85 b1       	in	r24, 0x05	; 5
  82:	85 b9       	out	0x05, r24	; 5
}

static void bat() {
  PORTC = PORTC;
  84:	88 b1       	in	r24, 0x08	; 8
  86:	88 b9       	out	0x08, r24	; 8
  88:	08 95       	ret

0000008a <main>:
  bar();
  bat();
}

int main(void) {
  foo();
  8a:	0e 94 40 00 	call	0x80	; 0x80 <foo>
  while (1) {}
  8e:	ff cf       	rjmp	.-2      	; 0x8e <main+0x4>

 

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Sat. Mar 10, 2018 - 06:58 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is the compiler smart enough to optimize away the function calls and collapse it into one function?

Some times.

Functions that's only used once will normally be handled inline, and there for not make a "call" just place the code.

If you want speed the you can force the compiler to inline functions.  

 

Ups to slow :)

Last Edited: Sat. Mar 10, 2018 - 07:01 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The reason the compiler has a -fno-inline-small-functions option is that usually, in its absence it will try.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sometimes, rather than studying the assembler listing to see if you have arm-wrestled the compiler to do things in a desired manner, it is easier to describe what you want directly...using assembler.

 

C saves and restores all sorts of registers when compiling to assembly (I heard the number 70-100 thrown out in a video from circa 2012, so it may be dated or refer to an older compiler. My own playing with the simulator suggests it's more like 30-40)

Of course any such excess also uses up/wastes the program flash space, since those instructions have to originate from somewhere.

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Quote:
Is the compiler smart enough to optimize away the function calls and collapse it into one function?

Some times.

 

MANY times, for small-ish functions.  Especially with the new "link-time optimization" that you can enable.  To the point where it can interfere with debugging, or lead to slightly larger programs than you'd want (I maintain a bootloader that has to fit in 512 bytes, and I had to declare the "uart_getc()" function as "never inline" to prevent it from not using a function call to access it.  Even though it was not declared static, or inline, and was called multiple times.)

 

 

C saves and restores all sorts of registers when compiling to assembly (I heard the number 70-100 thrown out in a video from circa 2012

I doubt that you will find an instance where avr-gcc adds 70 instructions (or even cycles) of overhead to a function call.  An EXTREMELY NAIVE compiler might push all the function arguments to the stack, and then save all the registers to the stack, and then put local variables on the stack, and undo all that when it returns (which could amount to that many instructions, given that the AVR has 32 registers), but it has been a very long time since compilers were that naive.   The typical "small" function will have all it's variables passed in registers, use only registers that don't need to be saved, and have very little overhead (even assuming that it isn't in-lined.)  What overhead it does have is frequently stuff that you would have needed to do anyway...

 

 

My own playing with the simulator suggests it's more like 30-40).

Do you have a specific example?   If a function is particularly complex, or has lots of arguments, the overhead can get larger (and probably needs to be larger.)  And it turns out that calling a function from an ISR on AVR is particularly "expensive" because it has to save all those "don't need to be saved" registers.

 

You can always look at the code produced and check...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In joeymirin's example, the compiler is allowed to inline all the function calls it sees.

If optimizing for speed, this is what it should do.

It still needs to make separate functions.

 

If optimizing for size, main would likely be two instructions.

The rest of user code would be the three functions.

 

Which used the most flash would depend on garbage collection.

Iluvatar is the better part of Valar.