GCC optimization fail

Go To Last Post
12 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This simple program shows a poor optimization by GCC:

 

#include <avr/io.h>

int main () {
	OCR1A = (PIND << 8) | PIND;
}

gcc 4.8.1 and 5.4.0 give:

 

int main () {
	OCR1A = (PIND << 8) | PIND;
  80:	29 b1       	in	r18, 0x09	; 9
  82:	89 b1       	in	r24, 0x09	; 9
  84:	90 e0       	ldi	r25, 0x00	; 0
  86:	92 2b       	or	r25, r18
  88:	90 93 89 00 	sts	0x0089, r25
  8c:	80 93 88 00 	sts	0x0088, r24
  90:	80 e0       	ldi	r24, 0x00	; 0
  92:	90 e0       	ldi	r25, 0x00	; 0
  94:	08 95       	ret

 

I also tested gcc 7.2.0, which I use because of C++17 support. It's even worse:

00000080 <main>:
  80:   89 b1           in      r24, 0x09       ; 9
  82:   99 b1           in      r25, 0x09       ; 9
  84:   89 27           eor     r24, r25
  86:   98 27           eor     r25, r24
  88:   89 27           eor     r24, r25
  8a:   90 93 89 00     sts     0x0089, r25     ; 0x800089 <__TEXT_REGION_LENGTH__+0x7e0089>
  8e:   80 93 88 00     sts     0x0088, r24     ; 0x800088 <__TEXT_REGION_LENGTH__+0x7e0088>
  92:   90 e0           ldi     r25, 0x00       ; 0
  94:   80 e0           ldi     r24, 0x00       ; 0
  96:   08 95           ret

Yes, very clever the use of 3 eors to exchange r24 and r25. Very clever but very useless since you could just revert the 2 in instructions and achieve the same result... or reverse r24 and r25 in the store instructions.

I'm using -Os flag, but the other optimization levels don't seem to do any difference.

 

Is there one of those obscure gcc options I usually miss that might fix this?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No.
 

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You could optimize the C! smiley

#include <avr/io.h>

int main () {
//    OCR1A = (PIND << 8) | PIND;
    uint8_t test = PIND;
    OCR1AH = test;
    OCR1AL = test;

    while(1) {};
}

which yields:

int main () {
//    OCR1A = (PIND << 8) | PIND;
    uint8_t test = PIND;
  aa:	89 b1       	in	r24, 0x09	; 9
    OCR1AH = test;
  ac:	80 93 89 00 	sts	0x0089, r24	; 0x800089 <__TEXT_REGION_LENGTH__+0x7e0089>
    OCR1AL = test;
  b0:	80 93 88 00 	sts	0x0088, r24	; 0x800088 <__TEXT_REGION_LENGTH__+0x7e0088>
  b4:	ff cf       	rjmp	.-2      	; 0xb4 <main+0xa>


 

David (aka frog_jr)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:
This simple program shows a poor optimization by GCC:

Is it?  >>You<< essentially forced the sequence of operations, as the code >>you<< chose to write does two reads of a volatile location.

 

As pointed out in #3, if you write to code without forcing said operation the compiler seems to do just fine.

 

If indeed you do the double-read of PIND, aren't you racing a change?  Or is that the whole point of the sequence?

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It also works without volatile:

unsigned char c, d;
unsigned cd;

void f (void)
{
    cd = (d << 8) | c;
}

The responsible pattern is:

(define_insn_and_split "*iorhi3.ashift8-ext.zerox"
  [(set (match_operand:HI 0 "register_operand"                        "=r")
        (ior:HI (ashift:HI (any_extend:HI
                            (match_operand:QI 1 "register_operand"     "r"))
                           (const_int 8))
                (zero_extend:HI (match_operand:QI 2 "register_operand" "r"))))]
  "optimize"
  { gcc_unreachable(); }
  "&& reload_completed"
  [(set (match_dup 1) (xor:QI (match_dup 1) (match_dup 2)))
   (set (match_dup 2) (xor:QI (match_dup 2) (match_dup 1)))
   (set (match_dup 1) (xor:QI (match_dup 1) (match_dup 2)))]
  {
    rtx hi = simplify_gen_subreg (QImode, operands[0], HImode, 1);
    rtx lo = simplify_gen_subreg (QImode, operands[0], HImode, 0);

    if (!reg_overlap_mentioned_p (hi, operands[2]))
      {
        emit_move_insn (hi, operands[1]);
        emit_move_insn (lo, operands[2]);
        DONE;
      }
    else if (!reg_overlap_mentioned_p (lo, operands[1]))
      {
        emit_move_insn (lo, operands[2]);
        emit_move_insn (hi, operands[1]);
        DONE;
      }

    gcc_assert (REGNO (operands[1]) == REGNO (operands[0]));
    gcc_assert (REGNO (operands[2]) == 1 + REGNO (operands[0]));
  })

It expresses the combination of 5 16-bit operations:  2 extensions to 16 bits, one 16-bit shift, a 16-bit IOR and a 16-bit assignment.  We are hitting the corner case of 2 overlaps (which will execute the 2 assertions at the end) that triggers the 3 XORs.

 

So see the effect of removing the pattern (actually anything synthesized be combine) use -fdisable-rtl-combine

 

You can try to change the constraint of operand 2 from "r" to "0" and elaborate what happens for possible operand combinations / overlaps.
 

avrfreaks does not support Opera. Profile inactive.

Last Edited: Sat. Sep 30, 2017 - 04:18 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I was recently a bit disappointed by:

 

void loop() {
  for (byte d = 0; d < 8; d++) {
    uint32_t s = cathall;  // all cathodes 1 (off)
    :  // more manipulations of s
    PORTB = s & 255;
 250:   85 b9           out     0x05, r24       ;;;  Good.
    PORTC = (s >> 8) & 255;
 252:   49 2f           mov     r20, r25        ;;; Would rather have out 0x8, r25
 254:   5a 2f           mov     r21, r26
 256:   6b 2f           mov     r22, r27
 258:   77 27           eor     r23, r23
 25a:   48 b9           out     0x08, r20       ; 8
    PORTD = (s >> 16);
 25c:   cd 01           movw    r24, r26      ;;;  And out 0xb, r26
 25e:   aa 27           eor     r26, r26
 260:   bb 27           eor     r27, r27
 262:   8b b9           out     0x0b, r24       ; 11

 

Weren't there at one time some optimizations in the avr-gcc code generator to specially deal with byte-sized results?

Last Edited: Sat. Sep 30, 2017 - 09:40 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have tried the option -fdisable-rtl-combine, for the older versions of GCC, the code indeed is de-optimized, but we can now see clearly how the compiler is "thinking":

 

00000080 <main>:
  80:	89 b1       	in	r24, 0x09	; 9
  82:	29 b1       	in	r18, 0x09	; 9
  84:	90 e0       	ldi	r25, 0x00	; 0 ;zero extend to 16 bit
  86:	98 2f       	mov	r25, r24   ;shift: move the data
  88:	88 27       	eor	r24, r24   ;shift: clear lower bits
  8a:	30 e0       	ldi	r19, 0x00	; 0 ;zero extend
  8c:	82 2b       	or	r24, r18  ; do the or operation
  8e:	93 2b       	or	r25, r19  ; do the or operation
  90:	90 93 89 00 	sts	0x0089, r25	; 0x800089 <__TEXT_REGION_LENGTH__+0x7e0089>
  94:	80 93 88 00 	sts	0x0088, r24	; 0x800088 <__TEXT_REGION_LENGTH__+0x7e0088>
  98:	80 e0       	ldi	r24, 0x00	; 0
  9a:	90 e0       	ldi	r25, 0x00	; 0
  9c:	08 95       	ret

 

For the newer gcc 7.2.0, -fdisable-rtl-combine seemed to have no effect.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

-fdisable-rtl-combine has still the effect of disabling insn combine, but the above pattern was added just recently as http://gcc.gnu.org/r242907 hence in older versions there is no such pattern.
 

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The first asm of which OP complains has two extra instructions: LDI and OR.

frog_jr's suggestion is semantically different and is an example of what one should not have to do.

 

The pattern (fred<<8) | greg

where fred and greg are 8-bit variables often seems to give avr-gcc trouble.

Sometimes (fred<<8) & 0xFF00 | greg works better.
 

Note that OP's code does require reading PIND twice.

International Theophysical Year seems to have been forgotten..
Anyone remember the song Jukebox Band?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Based on #5, I have a further test:

 

#include <avr/io.h>

int main () {
	return (PIND << 8) | PIND;
}

This compiles to

 

00000080 <main>:
  80:	29 b1       	in	r18, 0x09	; 9
  82:	89 b1       	in	r24, 0x09	; 9
  84:	90 e0       	ldi	r25, 0x00	; 0
  86:	92 2b       	or	r25, r18
  88:	08 95       	ret

on gcc 4.8.1 and 5.4.0, but on the newer version 7.2.0 I get:

00000080 <main>:
  80:   99 b1           in      r25, 0x09       ; 9
  82:   89 b1           in      r24, 0x09       ; 9
  84:   08 95           ret

 

So at least this case has been fixed.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
Added a "r,r,0" alternative to the mentioned pattern, you can try it from gcc trunk svn 253343 or newer.

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you, however I wouldn't dare to actually compile avr-gcc, it seems to be quite a challenge. I will wait until this is incorporated in a new release and someone compiles it, then I will test for sure.