Any rules of thumb for predicting/affecting optimiser behaviour ?

Go To Last Post
17 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Compiler design is a black art and definitively predicting optimiser behaviour is impossible, but are there any rules of thumb ?

 

I just moved some unrelated code around inside my main loop and the code size reduced by 32 bytes. No code added, none removed. The functions are simple procedures, no args, no return values, no logical interdependencies. Surprised ? No, but I'm bemused by the unpredictability of it all :)

 

Or, put it the other way around, when trying to improve/refactor/etc, is there anything that can be done to help the optimiser, without changing the functional behaviour of the program.

 

btw I'm not asking about obvious coding strategies that are under direct developer control, e.g. better algorithms, data structures, knowing your hardware, etc.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Take a look at the before/after Asm listings to get an idea of how your changes impacted things. But one thing you might want to ask yourself is.. Does it really matter? Unless space/speed is of the essence let the compiler get on with its job. If however you are 3 bytes from having to trade up to the next biggest micro at a delta of $0.15 and you are building 250,000 then, yeah, 32 bytes do matter!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No, it doesn't matter in this instance. I was wondering more about how to 'help' the optimiser, or is it utterly unpredictable ?

 

Or, put another way, can one write code that is more optimiser-friendly, other than code which is basically 'well-written' anyway.

 

(I dislike relying on black boxes that I don't understand and can't predict, but my education was too long ago to have any hope of understanding the compiler source code).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm sure each compiler, even each different version of the same compiler, has it's quirks. So it depends...

I've found all kinds of unpredictable behaviour, like x/2 generating better code than x >> 1 for example, or changing a variable size from 8 bit to 16 bit resulting in smaller code. Go figure.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Some day compilers will compare notes, attend virtual conferences and figure it all out, including best practices.   They'll be sure to let us know, when they feel the need to.  They've been silently adding code to keep tabs on you!   

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

obdevel wrote:

No, it doesn't matter in this instance. I was wondering more about how to 'help' the optimiser, or is it utterly unpredictable ?

 

Or, put another way, can one write code that is more optimiser-friendly, other than code which is basically 'well-written' anyway.

 

(I dislike relying on black boxes that I don't understand and can't predict, but my education was too long ago to have any hope of understanding the compiler source code).

You may want to explore how GCC as a whole works. The fact is that it's not just an AVR compiler. The AVR just "slots into" a generic infrastructure and quite a lot of that is actually designed to generate the "best" ARm or x86 code so if it happens to work well for AVR is something of a gamble (as the developers don't care about AVR and, in fact, it will be removed at V10 anyway). So the AVR C code you write is first converted (by "GENERIC") into a CPU agnostic representation called GIMPLE. Then I think it's something like 64 optimisation passes are made over that before it is passed to the AVr code generator which then generates AVR specific code and then that is optimised again in multi-passes. So there's quite a lot of potential for things getting moved around and re-ordered during any of this process. While it is deterministic it is so complex that to try and predict how some particular C sequence will end up in opcodes is close to impossible to predict (short of using the compiler and see what actually happens). What's more just one small command line flag change or one small increment in any stage of the compiler/linker code that generates the output can have a dramatic impact. So the combination of command line flag options and compiler versions means that the complexity is magnified a thousand fold when it comes to trying to predict how it will behave.

 

Bottom line - just let the C compiler do its job. If you notice something particularly unsatisfactory in what it comes up with maybe focus on that specific thing and reimplement/reorder to get "better" code just for that bit.

 

BTW if you are really interested to know how GCC works "inside" the black box read:  https://gcc.gnu.org/onlinedocs/gccint/index.html#Top perhaps start here:  https://gcc.gnu.org/onlinedocs/gccint/Passes.html#Passes

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In the last couple of years the only time that I have 'helped' the compiler is to force a uint8_t into a register. I had a bit of time-critical code that got looped around several times and on inspecting the .ASM I realised that in doing so I would gain a significant speed increase.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

obdevel wrote:
'help' the optimiser

In general, it is far better to help the human reader to understand the intent of the code

 

So, in general, focus on making the code clear and understandable

 

If you do get to a position where "clever tricks" are unavoidable to meet requirements, then be sure to comment in great detail what you've done, and why you've done it.

 

Premature optimisation is the root of all evil

 

https://en.wikipedia.org/wiki/Program_optimization#When_to_optimize

 

Everyone knows that debugging is twice as hard as writing a program in the first place.
So, if you're as clever as you can be when you write it, how will you ever debug it?

 

https://www.goodreads.com/quotes/273375-everyone-knows-that-debugging-is-twice-as-hard-as-writing

 

 

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:

In general, it is far better to help the human reader to understand the intent of the code

+100

 

Code size.   Only important if it does not fit in the chip.

 

Code speed / efficiency.   Only important in repetitive sequences.  e.g. an innermost loop.

 

If you were to optimise ASM manually,  you would concentrate on the 1% of the code where the execution spends 90% of the time.

You do exactly the same with C or C++ code.  i.e. concentrate on the 1%.   Use appropriate size variables with efficient algorithms.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
Code speed / efficiency.   Only important in repetitive sequences. 

Not necessarily?

 

There might be "occasional" events which have to be handled with a very tight deadline - so "optimisation" may be warranted there

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you for the interesting comments. I guess it's like a lot of algorithms: fundamentally deterministic but beyond practical day-to-day understanding. Which frustrates me, as my favourite question is "yes, but why ?".

 

A comparison could be made with PCB autorouters. A basically 'good' layout will assist the autorouter, but it can try many, many more possibilities in a given period of time than a human could do. It may never be better than the best human, but 99% of the time it'll be better than 99% of humans, in 1% of the time.

 

Veering OT, I was recently reading about the early (1980s) CISC vs RISC comparisons. One problem with CISC processors, with their beautifully orthogonal instruction sets, was that the compilers of the time just didn't have enough CPU and memory available to make good use of them. Great for assembler writers with large brains and lots of time; compilers not so much. A lot of standard C library code of the time was hand written and painstakingly tuned. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

obdevel wrote:
A lot of standard C library code of the time was hand written and painstakingly tuned. 

Probably still true - especially where targetted specifically at small micros.

 

Clearly, this is a prime example of where optimisation is justified - the code is written just once, but used vast numbers of times in vast numbers of applications by vast numbers of people.

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You may be aware that most AVR CPU core have two types of subroutine call:

  1. CALL – Long Call to a Subroutine {address} - 32-bit op-code
  2. RCALL – Relative Call to Subroutine {offset} : 16-bit op-code

 

The RCALL variant can only be used if the function lies within ±4kB of the instruction.

I'll wager by your code re-organisation, the compiler was able to use the shorter RCALL variant instead of the longer CALL.

 

That of course throws up an interesting question - what would you have done if the code size increased by 32-bytes ?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

>> That of course throws up an interesting question - what would you have done if the code size increased by 32-bytes ?

 

Put it back as it was, or maybe not. I was re-factoring for readability :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

N.Winterbottom wrote:

I'll wager by your code re-organisation, the compiler was able to use the shorter RCALL variant instead of the longer CALL.

 

 

Now that you mentioned it, avr-gcc doesn't always do this automatically, it's important to enable this replacement by passing the -mrelax option.

Last Edited: Mon. Jul 13, 2020 - 12:38 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:

 

Everyone knows that debugging is twice as hard as writing a program in the first place.
So, if you're as clever as you can be when you write it, how will you ever debug it?

 

https://www.goodreads.com/quotes/273375-everyone-knows-that-debugging-is-twice-as-hard-as-writing

Clever is not the same as hard to understand.  To me, clever means hard to think of.

Duff's device and xmacros are clever.  Neither is hard to understand.

 

In general, clever things with the preprocessor either work or fail loudly, usually during compilation.

In suspicious cases, one can examine the preprocessor output.

 

Template metaprogramming is clever and ....

Moderation in all things. -- ancient proverb

Last Edited: Tue. Jul 14, 2020 - 02:54 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Your point is good but.

I will just correct :

The RCALL variant can only be used if the function lies within ±4kB of the instruction

is ±2K instructions (or ± 4KB)