Even describe it is lengthy... I was hunting it down half the day.
In a main() running some 2k lines (don't frown, it's sort of generated) with lots of local variables (neatly enclosed in blocks), one place goes like:
typedef struct __attribute__((__packed__)) { uint16_t outVector; uint16_t time; uint8_t x; } T_something; // we don't need x #define SIZEOF_T_something_copy (sizeof(T_something) - 1) { union { T_something a; uint8_t b[0]; } buf; uint8_t i; for (i = 0; i < SIZEOF_T_something_copy; i++) { buf.b[i] = PopByteFromCommBuffer(); } [...] if (buf.a.time > someTime) { // only use of buf.a.time [...] } }
The problem is in b[0] in the typedef, intended as a placeholder I use routinely. Now we had that discussion on how unions should not be used to convert types, but I said back then that I like to use them so and I routinely do and never had any problem (yes, these are exactly the words for which I slap on hands by a keyboard). (Those who'd suggest b[1]: the problematic behaviour remains the same).
The compiler decides that there is no way how I can write to buf.a.time, as the b[] array does not reach that far. So, it fills a local variable (in stack frame - note the stack frame is around 600 bytes so it cannot reach all of it through Y+d and has to calculate the address) just before the main loop starts, basically by garbage:
6582 1c18 2B89 ldd r18,Y+19 6583 1c1a 3C89 ldd r19,Y+20 6585 1c1c C85D subi r28,lo8(-552) 6586 1c1e DD4F sbci r29,hi8(-552) 6587 1c20 3983 std Y+1,r19 6588 1c22 2883 st Y,r18 6589 1c24 C852 subi r28,lo8(552) 6590 1c26 D240 sbci r29,hi8(552)
uses a different place in stack frame for buf.b (Y+18 and on):
16672 4624 8E01 movw r16,r28 16675 4626 0F5E subi r16,lo8(-(17)) 16676 4628 1F4F sbci r17,hi8(-(17)) 16677 .L1090: 16692 4638 0E94 0000 call PopByteFromCommBuffer 16696 463c D801 movw r26,r16 16698 463e 8D93 st X+,r24 16700 4640 8D01 movw r16,r26 16703 4642 CA5C subi r28,lo8(-566) 16704 4644 DD4F sbci r29,hi8(-566) 16705 4646 E881 ld r30,Y 16706 4648 F981 ldd r31,Y+1 16707 464a C653 subi r28,lo8(566) 16708 464c D240 sbci r29,hi8(566) 16710 464e AE17 cp r26,r30 16711 4650 BF07 cpc r27,r31 16713 4652 01F4 brne .L1090
[at Y+566, there is a pre-calculated end pointer for this loop, another "optimisation feature"],
and then at the place, where buf.a.time is used, it merrily picks it from the stack frame (where it is never changed):
16774 469e C85D subi r28,lo8(-552) 16775 46a0 DD4F sbci r29,hi8(-552) 16776 46a2 2881 ld r18,Y 16777 46a4 3981 ldd r19,Y+1 16778 46a6 C852 subi r28,lo8(552) 16779 46a8 D240 sbci r29,hi8(552) 16781 46aa 8217 cp r24,r18 16782 46ac 9307 cpc r25,r19 16784 46ae 00F4 brsh .L1098
Declaring the byte array in typedef as b[sizeof(T_something)] solves the "problem".
16755 468c 2B89 ldd r18,Y+19 16756 468e 3C89 ldd r19,Y+20 16758 4690 838D ldd r24,Z+27 16759 4692 948D ldd r25,Z+28 16761 4694 8217 cp r24,r18 16762 4696 9307 cpc r25,r19 16764 4698 00F4 brsh .L1097
JW