more avr-g++ stupid optimizations (integer promotion?)

Go To Last Post
16 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avr-gcc 5.4.0 (also tested 4.9.2), -Os -flto, compiled as C

uint8_t printHex(uint8_t value)
{
    dbgtx(nibbletohex(value>>4));
  68:   28 2f           mov     r18, r24
  6a:   22 95           swap    r18
  6c:   2f 70           andi    r18, 0x0F       ; 15

Same code, compiled as C++:

uint8_t printHex(uint8_t value)
{
    dbgtx(nibbletohex(value>>4));
  68:   48 2f           mov     r20, r24
  6a:   50 e0           ldi     r21, 0x00       ; 0
  6c:   94 e0           ldi     r25, 0x04       ; 4
  6e:   55 95           asr     r21
  70:   47 95           ror     r20
  72:   9a 95           dec     r25
  74:   e1 f7           brne    .-8             ; 0x6e

Using __builtin_avr_swap(value) & 0x0F will get it to compile the same way in C and C++, but I'd prefer to keep the code portable.  I also like having simple functions like this just in a header, but so far my only fully portable solution is put the definitions in a .c file, and the declarations in a .h with extern "C".  But can I trust future versions of gcc not to do the same stupid "optimization"?

What's particularly confusing is that g++ will compile the code to swap + andi when I disable lto.

 

The issue seems to be related to integer promotion rules, but I can't understand why it only happens with lto.  I can trick gcc into using 8-bit unsigned math using a vector, but that's sure to confuse most people.

uint8_t printHex(uint8_t value)
{
    uint8_t v1u8 __attribute(( vector_size(1) ));
    v1u8[0] = value;
    v1u8 >>= 4;
    dbgtx(nibbletohex(v1u8[0]));

Another "solution" is to use -mint8, but that breaks any code using int16_t & uint16_t.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avr-g++ and avr-gcc are distinct compilers.

They have some, but not all, of their avr-specific stuff in common.

My guess is that the swap optimization is an example of "but not all".

avr-g++ gets a lot less work from users and less attention from compiler authors.

This will no doubt contribute to the myth that C++ is inherently bloated.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

avr-g++ and avr-gcc are distinct compilers.

They have some, but not all, of their avr-specific stuff in common.

 

That explains some of the differences, but I'm still mystified as to why g++ uses swap without lto.

I always thought gcc and g++ used the same back end (which I though was part of libgcc) after the gimple code is generated.

Maybe I should just stick to asm where I know what code I'm going to get, and 98% of the time it's as good or better than the compiler.  The only thing about asm I find a bit of a pain is manual register assignment.  Give me an advanced assembler that does intelligent global register allocation and I'd rarely touch gcc.

 

Adding to the g++ randomness, I found another way to get it to use swap:

struct ui8
{
    ui8(uint8_t val) : val(val) {}
    ui8 operator>>(int count)
    {
        ui8 tmp(*this);
        tmp.val >>= count;
        return *this;
    }

    uint8_t val;
};

uint8_t printHex(ui8 value)
{
    dbgtx(nibbletohex( (value>>4).val ) );
    dbgtx(nibbletohex(value.val));
}

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I would expect C++ to use SWAP and ANDI.  

 

The Optimiser might do different things for -O1, -O2, ..., -Os but it seems a no-brainer for uint8_t.

 

I will have a look later.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
That explains some of the differences, but I'm still mystified as to why g++ uses swap without lto.
My understanding is that the GNU compilers have more inter-layer dependencies than originally intended

and that this is one of the things that makes life interesting for newcomers who want to work on them.

 

'Twould be interesting to know whether with -lto avr-g++ finds the swap optimization.

My suspicion is the the swap might be undone by a -lto optimization

designed for 32- or 64-bit machines, e.g. Intel processors.

 

I've read of avr-gcc recognizing AVR code designed to perform

a fast multiply and changing it into a generic, i.e. slow, multiply.

 

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks almost exclusively  to Georg-Johan Lay there has been far far more work in optimising cl1 than avr-g++

 

So if you need an "efficient bit" in a principally C++ project do that bit in .c not .cpp

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

Thanks almost exclusively  to Georg-Johan Lay there has been far far more work in optimising cl1 than avr-g++

 

So if you need an "efficient bit" in a principally C++ project do that bit in .c not .cpp

 

Having had some time to consider the following comment, specifically the part about gcc ALWAYS following the ABI, my optimism for getting efficient results from gcc (even in C mode) has diminished.

https://www.avrfreaks.net/commen...

 

Instead, I'll be writing more code in asm, using my lightweight asm function technique for code that needs to be called from C/C++.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
I'll be writing more code in asm

But then it will not be portable!??

Unless there is a pressing need for small code size/speed, it hardly seems worth the effort, if you later want to port it to another mpu, even a different AVR you will have to again review the code optimization and if needed, learn another asm!??

 

Jim

Perhaps you time would be better spent becoming a gcc(++) compiler writer!

 

 

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ki0bk wrote:

ralphd wrote:
I'll be writing more code in asm

But then it will not be portable!??

 

Of course.  Most of the embedded code I write is non-portable anyways.  A lot of it, like bootloaders and bit-banged communications can't be done in a portable way.  I still write a lot of portable code in languages like python, C/C++, golang, OpenCL.

If you don't understand the headaches an unpredictable embedded systems compiler can cause, look at some of Bill W's posts about maintaining Optiboot.  My picobootArduino bootloader written in asm is half the size and has never caused me problems when I upgraded to a new version of avr-gcc.

 

I'd also challenge anyone that thinks becoming an expert asm programmer is hard, and that programming in C++ is easier.  C++ is a FAR more complex language than asm, even when you leave out exceptions, threading, and the STL.  I started programming in C++ 30 years ago (yes, in the cfront days), and still have a long way to go to become an expert.  I do consider myself an expert in 6502 and AVR assembler, and am getting there with ARM.  Even when I do write AVR code in C/C++, I spend most of my debugging time looking at the asm anyway.

 

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
Having had some time to consider the following comment, specifically the part about gcc ALWAYS following the ABI, my optimism for getting efficient results from gcc (even in C mode) has diminished.

https://www.avrfreaks.net/commen...

 

Instead, I'll be writing more code in asm, using my lightweight asm function technique for code that needs to be called from C/C++.

When measuring efficiency,

be sure to include author time spent writing

code and author time spent debugging code.

There was a time when compilers were much worse and were still considered useful.

The first compilers had to be written in assembly or worse.

Compiler authors were off to the races when compilers

could be written in the languages they compiled.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think somone’s being a bit fast n loose with the facts. Since when does avr-gcc ‘compile’ assembler? Avr-as has that job. As for the compiler being ‘unpredictable’? Surely for a given piece of code it gives the same results - or does it spin the wheel to decide what to do? Sure - different versions give different output but it’s hardly unpredictable. Surely with a new version you expect some things to change - otherwise what is the need for a new version?

If you spend a lot of time debugging, then surely write less bugs!
The reality is if you want a very specific outcome, then a compiler will usually disappoint. When you try to force it’s hand, then you’re opening the door to compiler changes doing things different. The simple solution is let the compiler do what it does best and write the specific stuff in asm.
Have you tried the IAR compiler? Not sure if the demo mode lets you write a bootloader or gives you full optimisations. For arm Cortex, it is a lot more aggressive than arm-gcc.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Go simple when seeking alternatives-

 

dbgtx( nibbletohex(value/16) );

 

https://godbolt.org/z/s9v2oU

 

now you have more time for asm.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's the same issue we observed in a previous thread, it seems shifts are poorly optimized, while multiply and division kick start the optimizer for some reason.

This is bad because our optimizing instincts as programmers tell us the exact opposite, use shifts when possible.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kartman wrote:
I think somone’s being a bit fast n loose with the facts. Since when does avr-gcc ‘compile’ assembler? Avr-as has that job. As for the compiler being ‘unpredictable’? Surely for a given piece of code it gives the same results - or does it spin the wheel to decide what to do?

 

You can spin the facts however you want.  As I explained above, for a given piece of code there are DIFFERENT results for gnu11 than for gnu++11.  And I challenge you to provide a way of predicting which versions of gcc will give the same results.

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:
You can spin the facts however you want.  As I explained above, for a given piece of code there are DIFFERENT results for gnu11 than for gnu++11.  And I challenge you to provide a way of predicting which versions of gcc will give the same results.
For the most part, one does not need to.

In cases where one needs to read the compiler's assembly output to make sure,

one should probably write the assembly as well.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The following code

 

#include <stdlib.h>
#include <stdint.h>

uint8_t test(uint8_t num) {
  return num >> 4;
}

int main()
{
  return test(rand() % 16);
}

produces the following assembly in AVG-GCC 5.4.0 in C mode with `-Os`

 

test(unsigned char):
        swap r24
        andi r24,lo8(15)
        ret
main:
        rcall rand
        ldi r22,lo8(16)
        ldi r23,0
        rcall __divmodhi4
        clr r25
        ldi r18,4
        1:
        lsr r25
        ror r24
        dec r18
        brne 1b
        ret

As you can see, the body of `test` uses `swap`, while the inlined code for `test` call inside `main` suddenly uses that dumb cyclic version.

 

So, the behavior is not necessarily dependent on compiling in C or C++ mode. Even compiling in C mode may produce either version within the same translation unit, apparently depending on the surrounding context.

Last Edited: Mon. Mar 2, 2020 - 10:15 PM