avr-gcc optimization of xor incorrectly eliminates volatile access!

Go To Last Post
43 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Has anyone else seen this?  Any ideas?

 

avr-gcc (AVR_8_bit_GNU_Toolchain_3.6.1_495) 5.4.0

 

#include <avr/io.h>
int aRead() {
    uint8_t h,l;
    l = ADCL;
    h = ADCH;
    return (h<<8) | l;
}
int main() {
    volatile uint8_t x;
    x = aRead() ^ 42;
}

Compile with "avr-gcc -mmcu=atmega328p -Os -flto adctest.c" or "avr-gcc -mmcu=atmega328p -Os -flto adctest.c"

Produces:

0000008e <main>:
  8e:   cf 93           push    r28
  90:   df 93           push    r29
  92:   1f 92           push    r1
  94:   cd b7           in      r28, 0x3d       ; 61
  96:   de b7           in      r29, 0x3e       ; 62
  98:   80 91 78 00     lds     r24, 0x0078     ; 0x800078 <__TEXT_REGION_LENGTH__+0x7e0078>
  9c:   9a e2           ldi     r25, 0x2A       ; 42
  9e:   89 27           eor     r24, r25
  a0:   89 83           std     Y+1, r24        ; 0x01
  a2:   80 e0           ldi     r24, 0x00       ; 0
  a4:   90 e0           ldi     r25, 0x00       ; 0
  a6:   0f 90           pop     r0
  a8:   df 91           pop     r29
  aa:   cf 91           pop     r28
  ac:   08 95           ret

Note that the read of ADCH (0x0079) isn't there.  It's a volatile io registers; it SHOULD be there.  (It NEEDS to be there.  If you read ADCL, you essentially shut down the ADC until ADCH has been read as well.  Original Arduino thread: http://forum.arduino.cc/index.php?topic=581523.msg3959529#msg3959529)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think that xor of upper byte and 0x00 was omitted because the result does not change.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, but it still should have read the register, due to "volatile."

Interestingly, other equally-"doesn't affect the high byte" operations, like "|", behave correctly.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
Compile with "avr-gcc -mmcu=atmega328p -Os -flto adctest.c" or "avr-gcc -mmcu=atmega328p -Os -flto adctest.c"
I can't spot the difference.

 

Dropping -flto fixes it, so I'd blame LTO.

 

The problem is not present in 3.5.4.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Tue. Nov 27, 2018 - 05:40 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Bad code is produced by -O2, -O3 or -flto...

Hmm.  Or by -Os and -O1 if I add the always_inline attribute to the aread() function.

So, something with inlining in general?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here's the .i version.  Nothing interesting that I see :-(

 

int aRead() __attribute__((always_inline));
int aRead() {
    uint8_t h,l;
    l = 
# 7 "adctest.c" 3
       (*(volatile uint8_t *)(0x78))
# 7 "adctest.c"
           ;
    h = 
# 8 "adctest.c" 3
       (*(volatile uint8_t *)(0x79))
# 8 "adctest.c"
           ;
    return (h<<8) | l;
}

int main() {
    volatile uint8_t x;
    x = aRead()^42;
}

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The access to ADCH is already gone from the .s.  I only checked with -O2.  The .s isn't human readable with -flto.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
Note that the read of ADCH (0x0079) isn't there.

Doesn't volatile only mean that a read must be performed immediately before using the result I.e. don't use the result of any previous read.

So to omit the read because you don't use the result I think that's perfectly valid.

 

I also think the separate register reads and combiing them are suspect, that's why ADC exists as a pseudo 16-bit register. If used, the compiler will always emit the correct sequence to read the ADC.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Presumably it's the actual mechanics of LINK TIME optimization? The fact that ADCH was volatile must not have been conveyed all the way down the line to the point where the link optimizer plugin executes?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There is definitely something fishy going on here. If you use I/O registers that are in range of in/out, they don't get discarded, for example:

#include <avr/io.h>
int aRead() {
	uint8_t h, l;
	l = TCNT0;
	h = ADCH;
	return (h<<8) | l;
}
int main() {
	volatile uint8_t x;
	x = aRead() ^ 42;
}

 

yields:

 

0000007a <main>:
  7a:	cf 93       	push	r28
  7c:	df 93       	push	r29
  7e:	1f 92       	push	r1
  80:	cd b7       	in	r28, 0x3d	; 61
  82:	de b7       	in	r29, 0x3e	; 62
  84:	86 b5       	in	r24, 0x26	; 38   *** not discarded ***
  86:	9a e2       	ldi	r25, 0x2A	; 42
  88:	89 27       	eor	r24, r25
  8a:	89 83       	std	Y+1, r24	; 0x01
  8c:	80 e0       	ldi	r24, 0x00	; 0
  8e:	90 e0       	ldi	r25, 0x00	; 0
  90:	0f 90       	pop	r0
  92:	df 91       	pop	r29
  94:	cf 91       	pop	r28
  96:	08 95       	ret

 

Only registers using load/store instructions are discarded by the optimizer. So, bug IMO. Also exists in later versions of avr-gcc (tested with 8.0.1).

 

 

edit: oops, yes they do, I'm just not seeing rightblush

Last Edited: Tue. Nov 27, 2018 - 11:01 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

LTO is probably one of the least visited areas of avr-gcc so I'm not entirely surprised if it hasn't got the test coverage of the other tools in GCC.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes,   it does not read ADCH.   But it does not need to.   I get this with -O3

    l = ADCL;
00000045  LDS R24,0x0078		Load direct from data space
    x = aRead() ^ 0x2A;   //42
00000047  LDI R25,0x2A		Load immediate
00000048  EOR R24,R25		Exclusive OR
00000049  STD Y+1,R24		Store indirect with displacement

and if I use the more conventional:

int wRead()
{
    return ADCW;
}

int main()
{
    volatile uint8_t x;
    x = wRead() ^ 0x2A;    //42
}

I get this:

    return ADCW;
00000045  LDS R18,0x0078		Load direct from data space
00000047  LDS R19,0x0079		Load direct from data space
    x = wRead() ^ 0x2A;     //42
00000049  LDI R24,0x2A		Load immediate
0000004A  EOR R24,R18		Exclusive OR
0000004B  STD Y+1,R24		Store indirect with displacement

Your version takes 6 machine cycles.   The conventional takes 8 cycles.

You can read a 16-bit register as many times as you like.   Each read of ADCL is a fresh value of ADCW.   It simply saves the hi-byte in a secret register.  ADCH only reads the historic value from the secret register.    It does not read the current hi-byte.

 

If you read ADCH without reading ADCL first,  the hardware reads the fresh value of ADCW.

 

Hey-ho.    I doubt if you would ever notice the difference in real life for the ADC.    Yes,   it would be useful when reading a Timer to save a couple of cycles.     But it is nicer when the 16-bit register is in IN/OUT space so you don't need the LDS instructions.

 

Yes,  I originally altered the XOR value.   It was easier on my brain when running through the Simulator.

 

It seems that GCC is cleverer than I ever thought.

 

David.

 

Edit.    Regarding incorrect volatile access.     You have requested to read an 8-bit register.   GCC "knows" about the 16-bit behaviour and that the optimisation is safe.    If you had requested 16-bit register e.g. ADCW,  GCC would have read ADCL, ADCH in the conventional way.

If you want to force foreign hardware to work in a particular way,   you specify volatile access registers e.g. memory-mapped peripherals on an external address bus.

Last Edited: Tue. Nov 27, 2018 - 11:30 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
You can read a 16-bit register as many times as you like.   Each read of ADCL is a fresh value of ADCW.   It simply saves the hi-byte in a secret register.  ADCH only reads the secret register.    It does not read the current hi-byte.

 

This is normally true, but not for the specific case of the ADC. Remember what the OP said:

westfw wrote:
If you read ADCL, you essentially shut down the ADC until ADCH has been read as well.

 

This is because the ADC uses a different mechanism for 16 bit access. Probably seemed like a good idea at the time, I guess...

From the datasheet:

When ADCL is read, the ADC Data Register is not updated until ADCH is read. Consequently, if the result is left adjusted and no more than 8-bit precision is required, it is sufficient to read ADCH. Otherwise, ADCL must be read first, then ADCH.

So in this case, it's reading the high byte that updates the register, that's the opposite of the 16 bit timer, for example. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Each read of ADCL is a fresh value of ADCW. It simply saves the hi-byte in a secret register. ADCH only reads the historic value from the secret register. It does not read the current hi-byte.

That is not what the data sheet says...

And it doesn’t matter if it “needs to”; I think C definitions say that it MUST!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oops.    I take your point.    I still think it is easier to just read ADCW in the first place.     It is generally wise to let the Compiler do the optimising.

 

(and I should have read the link in the original post)

 

David.

 

Edit.   I have now read the link.   And it looks as if their conclusion was the same as mine i.e. use ADCW

Last Edited: Tue. Nov 27, 2018 - 11:44 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
And it doesn’t matter if it “needs to”; I think C definitions say that it MUST!
Not necessarily.  The standard has this to say:

5.1.2.3 Program execution

 

1 The semantic descriptions in this International Standard describe the behavior of an
abstract machine in which issues of optimization are irrelevant.

 

2 Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects, 11) which are changes in the state of
the execution environment. Evaluation of an expression may produce side effects. At
certain specified points in the execution sequence called sequence points, all side effects
of previous evaluations shall be complete and no side effects of subsequent evaluations
shall have taken place. (A summary of the sequence points is given in annex C.)

 

3 In the abstract machine, all expressions are evaluated as specified by the semantics. An
actual implementation need not evaluate part of an expression if it can deduce that its
value is not used and that no needed side effects are produced
(including any caused by
calling a function or accessing a volatile object).

It is clear to me how in your example the compiler can conclude that the value of ADCH is not needed.  It is not clear to me how it can conclude that there are no needed side effects when accessing it.

 

In any event, it seems as though it may be correct to omit the access.  In fact, failure to omit the access with 'other equally-"doesn't affect the high byte" operations, like "|" ' may in fact be a missed optimisation.

 

The standard also has this to say in 6.7.3:

6 An object that has volatile-qualified type may be modified in ways unknown to the
implementation or have other unknown side effects. Therefore any expression referring
to such an object shall be evaluated strictly according to the rules of the abstract machine,
as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the
object shall agree with that prescribed by the abstract machine, except as modified by the
unknown factors mentioned previously. 116) What constitutes an access to an object that
has volatile-qualified type is implementation-defined.

 

116) A volatile declaration may be used to describe an object corresponding to a memory-mapped
input/output port or an object accessed by an asynchronously interrupting function. Actions on
objects so declared shall not be ‘‘optimized out’’ by an implementation or reordered except as
permitted by the rules for evaluating expressions.

At any rate, the 'solution' (or at least, 'workaround') is to use ADCW as AVR Libc intended.

 

I am not especially confident in this analysis.

 

Edit: formatting

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Tue. Nov 27, 2018 - 09:30 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wonder what you get if you compile:

int main(void) {
    if (0) {
        PORTB = 0x55;
    }
}

On the one hand you'd say PORTB was volatile so there must be an access but on the other we can all see that line will never be reached.

 

This just seems to be a simpler case of the same thing.

 

(I don't actually know the answer so off to try it....)

 

EDIT: not entirely surprised to get...

00000034 <main>:

int main(void) {
    if (0) {
        PORTB = 0x55;
    }
}
  34:	80 e0       	ldi	r24, 0x00	; 0
  36:	90 e0       	ldi	r25, 0x00	; 0
  38:	08 95       	ret

so it didn't bother to access the thing that was pointless. To be honest, right or wrong, I think I prefer it to behave like this - why waste time/space for an access that ultimately serves no purpose?

 

(yeah, yeah, I know about the "read it to clear it" kind of 'volatile' thing!)

Last Edited: Tue. Nov 27, 2018 - 03:33 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I would be fairly certain that the oldest compiler in the world would eliminate an if (0) block.

 

I was surprised by this thread.    It appears that GCC is being too clever.

If ADCW behaves differently to TCNT1W then it should be identified as such.

 

From the 328P header file:

#define ADCW    _SFR_MEM16(0x78)
#define TCNT1 _SFR_MEM16(0x84)

I was intrigued by the behaviour of USIBR and USIDR in a recent thread https://www.avrfreaks.net/forum/...

Admittedly that example was pointless but surprising.

 

westfw's example came up in real life.

 

David.

Last Edited: Tue. Nov 27, 2018 - 04:22 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
why waste time/space for an access that ultimately serves no purpose?

Well, yeah -- you example is in if (0), code that can never be reached.

 

It is not as common as when I was your age, but "memory-mapped peripherals" might have seemingly pointless accesses and the external hardware "oh; an access to 0x1234 which means to reset myself".  It would look pointless in the code.  Is that covered by that clause in the standards.

 

Y'all need to try similar, e.g. PINB; or PORTB; statements.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
  It would look pointless in the code.
Hence my:

clawson wrote:
yeah, yeah, I know about the "read it to clear it" kind of 'volatile' thing!
;-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not a GCC user myself but how does it know to treat on-chip registers as volatile?

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brian Fairchild wrote:
Not a GCC user myself but how does it know to treat on-chip registers as volatile?

 

<avr/iom328p.h>:

#define PINB _SFR_IO8(0x03)

 

<avr/sfr_defs.h>:

#define _MMIO_BYTE(mem_addr) (*(volatile uint8_t *)(mem_addr))
.
.
.
#ifndef __SFR_OFFSET
#  if __AVR_ARCH__ >= 100
#    define __SFR_OFFSET 0x00
#  else
#    define __SFR_OFFSET 0x20
#  endif
#endif
.
.
.
#define _SFR_IO8(io_addr) _MMIO_BYTE((io_addr) + __SFR_OFFSET)

 

Ergo, PINB expands to:

(*(volatile uint8_t *)((0x03) + 0x20)

 

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks.

#1 Hardware Problem? https://www.avrfreaks.net/forum/...

#2 Hardware Problem? Read AVR042.

#3 All grounds are not created equal

#4 Have you proved your chip is running at xxMHz?

#5 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm thinking that this is somehow because the options are saying

an int is only 8 bits.  Then when you shift it left by 8, it always

becomes zero, so why bother.

 

--Mike

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

  :

What constitutes an access to an object that has volatile-qualified type is implementation-defined.

I *was* pretty sure that the avr-gcc "implementation definition" included an assumption that "all read accesses to volatile variable have side effects."  That is how the code usually behaves.  For example, if I read ADCL twice:

int aRead() {
    uint8_t h,l;
    l = ADCL;
    l = ADCL;
    h = ADCH;
    return (h<<8) | l;
}

Then the non-inline version of aRead() does do two reads (even when inlined.)

(and without the XOR: just "x = aRead();" it access both registers, even though the ADCH value is equally unused.)

 

    if (0) {
        PORTB = 0x55;
    }

Surely there is a difference between "this code can never be reached" and "this result is not used"?

 

Last Edited: Wed. Nov 28, 2018 - 01:07 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
I *was* pretty sure that the avr-gcc "implementation definition" included an assumption that "all read accesses to volatile variable have side effects." That is how the code usually behaves.
Agree that this is the behaviour most often observed, but is there a GCC document which states this?

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What happens if you alter the code like this:

uint16_t aRead() {
    uint16_t h;
    uint8_t l;
    l = ADCL;
    h = ADCH;
    return (h<<8) | l;
}

 

--Mike

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
uint8_t aRead() {
    uint8_t h,l;
    l = ADCL;
    h = ADCH;
    return l;
}

does both reads...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not relevant to this thread, but interesting reading nonetheless:

https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html

... particularly the bit about memory barriers and ordering.

 

There's no suggestion either way w.r.t. side effects.

 

This bit:

As bit-fields are not individually addressable, volatile bit-fields may be implicitly read when written to, or when adjacent bit-fields are accessed. Bit-field operations may be optimized such that adjacent bit-fields are only partially accessed, if they straddle a storage unit boundary. For these reasons it is unwise to use volatile bit-fields to access hardware.

Brings to mind clawson's alternative to <avr/io.h>...

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What happens if you alter the code like this:

uint16_t aRead() {
    uint16_t h;

Same (incorrect) behavior.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Try:

    volatile uint16_t x;
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The problem is not present in 3.5.4.

Really?  I see it with "avr-gcc (AVR_8_bit_GNU_Toolchain_3.5.3_460) 4.9.2"

 

Not with gcc 4.8.1, though...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For these reasons it is unwise to use volatile bit-fields to access hardware.

Depending on the nature of the "volatile", of course.   This has shown up and caused unexpected problems on the SAM chips, which make extensive use of structures and bitfields mapped onto hardware.  https://community.atmel.com/comment/2247561#comment-2247561   (Probably mega-0 and xtiny, too...)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

(still broken in the avr-gcc 8.1 that someone made available a while ago.)

 

So, I figured - "what the H* - I don't need to actually know anything about how gcc works to try to get more info, right?"

 

avr-gcc -mmcu=atmega328p -O3 -g adctest.c -c -fdump-rtl-all

Produces a bunch of mysterious files, having some relationship to what the code is supposed to do.

Up through adctest.c.221r.ud_dce, there are references to reading both volatile registers in main:

Dataflow summary:
;;  invalidated by call      0 [r0] 1 [r1] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22] 23 [r23] 24 [r24] 25 [r25] 26 [r26] 27 [r27] 30 [r30] 31 [r31] 33 [__SP_H__] 35 [argH]
;;  hardware regs used      28 [r28] 32 [__SP_L__] 34 [argL]
;;  regular block artificial uses      28 [r28] 32 [__SP_L__] 34 [argL]
;;  eh block artificial uses      28 [r28] 32 [__SP_L__] 34 [argL]
;;  entry block defs      2 [r2] 8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14 [r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22] 23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__] 34 [argL]
;;  exit block uses      24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
;;  regs ever live      24[r24] 25[r25] 29[r29]
;;  ref usage     r2={1d} r8={1d} r9={1d} r10={1d} r11={1d} r12={1d} r13={1d} r14={1d} r15={1d} r16={1d} r17={1d} r18={1d} r19={1d} r20={1d} r21={1d} r22={1d} r23={1d} r24={2d,2u} r25={2d,2u} r28={1d,3u} r29={1u} r32={1d,2u} r34={1d,1u} r43={1d,1u} r44={1d,2u} r45={1d,1u} r51={1d,1u} r52={1d,1u} r53={1d,1u} r54={1d,1u} r55={1d,1u} r56={1d,1u,1e} r57={1d,1u}
;;    total ref usage 57{34d,22u,1e} in 16{16 regular + 0 call} insns.
(note 3 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 3 5 2 NOTE_INSN_FUNCTION_BEG)
(insn 5 2 6 2 (set (reg/f:HI 51)
        (const_int 120 [0x78])) adctest.c:5 83 {*movhi}
     (nil))
(insn 6 5 7 2 (set (reg/v:QI 44 [ l ])
        (mem/v:QI (reg/f:HI 51) [0 MEM[(volatile uint8_t *)120B]+0 S1 A8])) adctest.c:5 71 {movqi_insn}
     (expr_list:REG_DEAD (reg/f:HI 51)
        (nil)))
(debug_insn 7 6 8 2 (var_location:QI l (reg/v:QI 44 [ l ])) adctest.c:5 -1
     (nil))
(insn 8 7 9 2 (set (reg/f:HI 52)
        (const_int 121 [0x79])) adctest.c:6 83 {*movhi}
     (nil))
(insn 9 8 11 2 (set (reg/v:QI 45 [ h ])
        (mem/v:QI (reg/f:HI 52) [0 MEM[(volatile uint8_t *)121B]+0 S1 A8])) adctest.c:6 71 {movqi_insn}
     (expr_list:REG_DEAD (reg/f:HI 52)
        (nil)))
(debug_insn 11 9 12 2 (var_location:QI l (clobber (const_int 0 [0]))) adctest.c:12 -1
     (nil))

 

In the next (timewise) file, adctest.c.222r.combine, the 2nd volatile reference is gone.   That means that sometime during this "combine" optimization step is when the compiler is deciding to delete the reference.   Does that help anyone understand what's going on?  It doesn't help me much...

 

main

Dataflow summary:
;;  invalidated by call      0 [r0] 1 [r1] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22] 23 [r23] 24 [r24] 25 [r25] 26 [r26] 27 [r27] 30 [r30] 31 [r31] 33 [__SP_H__] 35 [argH]
;;  hardware regs used      28 [r28] 32 [__SP_L__] 34 [argL]
;;  regular block artificial uses      28 [r28] 32 [__SP_L__] 34 [argL]
;;  eh block artificial uses      28 [r28] 32 [__SP_L__] 34 [argL]
;;  entry block defs      2 [r2] 8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14 [r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22] 23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__] 34 [argL]
;;  exit block uses      24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__]
;;  regs ever live      24[r24] 25[r25] 29[r29]
;;  ref usage     r2={1d} r8={1d} r9={1d} r10={1d} r11={1d} r12={1d} r13={1d} r14={1d} r15={1d} r16={1d} r17={1d} r18={1d} r19={1d} r20={1d} r21={1d} r22={1d} r23={1d} r24={2d,2u} r25={2d,2u} r28={1d,3u} r29={1u} r32={1d,2u} r34={1d,1u} r43={1d,1u} r44={1d,2u} r51={1d,1u} r57={1d,1u}
;;    total ref usage 44{28d,16u,0e} in 10{10 regular + 0 call} insns.
(note 3 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(note 2 3 5 2 NOTE_INSN_FUNCTION_BEG)
(insn 5 2 6 2 (set (reg/f:HI 51)
        (const_int 120 [0x78])) adctest.c:5 83 {*movhi}
     (nil))
(insn 6 5 7 2 (set (reg/v:QI 44 [ l ])
        (mem/v:QI (reg/f:HI 51) [0 MEM[(volatile uint8_t *)120B]+0 S1 A8])) adctest.c:5 71 {movqi_insn}
     (expr_list:REG_DEAD (reg/f:HI 51)
        (nil)))
(debug_insn 7 6 8 2 (var_location:QI l (reg/v:QI 44 [ l ])) adctest.c:5 -1
     (nil))
(note 8 7 9 2 NOTE_INSN_DELETED)
(note 9 8 11 2 NOTE_INSN_DELETED)
(debug_insn 11 9 12 2 (var_location:QI l (clobber (const_int 0 [0]))) adctest.c:12 -1
     (nil))
(debug_insn 12 11 13 2 (var_location:QI h (clobber (const_int 0 [0]))) adctest.c:12 -1
     (nil))
(note 13 12 14 2 NOTE_INSN_DELETED)
(note 14 13 15 2 NOTE_INSN_DELETED)
(note 15 14 16 2 NOTE_INSN_DELETED)
(note 16 15 17 2 NOTE_INSN_DELETED)
(insn 17 16 18 2 (set (reg:QI 57)
        (const_int 42 [0x2a])) adctest.c:12 71 {movqi_insn}
     (nil))
(insn 18 17 19 2 (set (reg:QI 43 [ D.1635 ])
        (xor:QI (reg:QI 57)
            (reg/v:QI 44 [ l ]))) adctest.c:12 270 {xorqi3}
     (expr_list:REG_DEAD (reg/v:QI 44 [ l ])
        (expr_list:REG_DEAD (reg:QI 57)
            (nil))))
(insn 19 18 24 2 (set (mem/v/c:QI (plus:HI (reg/f:HI 28 r28)
                (const_int 1 [0x1])) [0 x+0 S1 A8])
        (reg:QI 43 [ D.1635 ])) adctest.c:12 71 {movqi_insn}
     (expr_list:REG_DEAD (reg:QI 43 [ D.1635 ])
        (expr_list:REG_DEAD (reg:QI 29 r29)
            (nil))))
(insn 24 19 25 2 (set (reg/i:HI 24 r24)
        (const_int 0 [0])) adctest.c:13 83 {*movhi}
     (nil))
(insn 25 24 0 2 (use (reg/i:HI 24 r24)) adctest.c:13 -1
     (nil))

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
Does that help anyone understand what's going on? 
If you want to know the answer you are going to have to locate and ask Georg-Johann Ley ("SprinterSB")

 

PS one way to do that (indirectly) is to raise the issue in the GCC Bugzilla.

Last Edited: Wed. Nov 28, 2018 - 09:38 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

raise the issue in the GCC Bugzilla.

Done.  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88253

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The combine pass tries to merge multiple insns by substituting operands with the RTL pattern that set them. If you pass -fdump-rtl-all-raw-details and inspect .combine, you'll find something like

 

Trying 8, 12 -> 14:
Successfully matched this instruction:
(set (reg:QI 44 [ D.1630 ])
    (xor:QI (reg:QI 58)
        (subreg:QI (reg:HI 55 [ D.1631 ]) 0)))
allowing combination of insns 8, 12 and 14
original costs 4 + 20 + 4 = 28
replacement cost 4
deferring deletion of insn with uid = 12.
deferring deletion of insn with uid = 8.

 

where insn 8 is the load from 121B. This is why it goes missing later.

 

If the exclusive or is replaced with a plain bitwise or, a previous combine attempt with just two insns succeeds, and therefore the load insn remains untouched.

 

Trying 12 -> 13:
Successfully matched this instruction:
(set (reg:QI 44 [ D.1630 ])
    (ior:QI (subreg:QI (reg:HI 55 [ D.1631 ]) 0)
        (const_int 42 [0x2a])))
allowing combination of insns 12 and 13
original costs 20 + 4 = 24
replacement cost 4
deferring deletion of insn with uid = 12.
modifying insn i3    13: r44:QI=r55:HI#0|0x2a
      REG_DEAD r55:HI
deferring rescan insn with uid = 13.
deferring deletion of insn with uid = 7.
modifying insn i3    14: r44:QI=r58:QI^r55:HI#0
      REG_DEAD r55:HI
      REG_DEAD r58:QI
deferring rescan insn with uid = 14.

 

I haven't figured out why this match did not occur for XOR. I suspect the "register_operand" predicate for the XOR pattern, versus "nonmemory_operand" for IOR (because the ISA has ORI but no EORI) is somehow causing this, need to dig into this further.

Regards

Senthil

 

blog | website

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh, I wasn't aware there was another avr-gcc developer lurking around (besides SprinterSB). Coolyes

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Response from Microchip:

 

Microchip Engineering Support has added comments to support Case 00358500.

 

 **** Microchip Comments Begin ****

Hi Bill,

The issue was confirmed and reported as Bug (JIRA). I hope it will be fixed in the future release.
Thank you for bringing this issue to us.

Have a great new year,
Mahalakshmi.

 **** Microchip Comments End ******

 

I guess that's good.

 

They now want me to "Accept" or "Reject" the resolution of this case.  That's not so good.  Admitting that there is a bug is not a "resolution", at least not in my view!

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

El Tangas wrote:

Oh, I wasn't aware there was another avr-gcc developer lurking around (besides SprinterSB). Coolyes


Senthil is one of the engineers that do Atmel (now Microchip's) work on GCC.
.
BTW, It's quite enlightening to look at the identities on recent commits to AVR-LibC and the AVR specific bits of GCC.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And, there's a patch!

https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01028.html

 

(originated via the gcc team, I guess...)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
(originated via the gcc team, I guess...)

  • From: <SenthilKumar dot Selvaraj at microchip dot com>

AKA @saaadhu

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And it got approved with a minor change too, yay! (https://gcc.gnu.org/ml/gcc-patch...)

Regards

Senthil

 

blog | website