Help tracking down "HardwareSerial" gcc 4.5.x/mega

Last post
20 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've been a lurker here for a while and have decided to register to get some help. There is a bug in code generation by gcc 4.5.0/4.5.1 that causes a crash/reset when class instances are declared at global scope. It affects the mega1280. People have come to see it as a bug in the HardwareSerial arduino class because that class creates several global instances of itself:

http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1250084886/all
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44617

I think I am close to narrowing down the problem and have created the following minimal code that, when compiled, will exhibit the problem. It's designed to light the LED on pin 13 if initialization works. As presented, it will fail and the LED stays off. However, change anything significant at all, such as commenting out the _u2x = u2x; constructor line will cause it to work, as will commenting out one of the Instance* declarations.

What I'm asking for is a little help from someone that can read assembly language. I've attached broken and working assembler listing files. Can anyone spot where the compiler is messing this up?

Regards,
- Andy Brown

#include 


class MyClass
{
private:
    uint8_t _rxen;
    uint8_t _txen;
    uint8_t _rxcie;
    uint8_t _udre;
    uint8_t _u2x;

public:
  MyClass(unsigned char rxen, unsigned char txen, unsigned char rxcie, unsigned char udre, unsigned char u2x)
	{
	  _rxen = rxen;
	  _txen = txen;
	  _rxcie = rxcie;
	  _udre = udre;
	  _u2x = u2x;
	}
};


MyClass Instance1(RXEN1, TXEN1, RXCIE1, UDRE1, U2X1);
MyClass Instance2(RXEN2, TXEN2, RXCIE2, UDRE2, U2X2);


int main()
{
	DDRB=0xff;
	PORTB=128;

	for(;;);
}

The compiler is gcc 4.5.1 (Windows 7 x64, mingw compiled), invocation flags:

avr-g++ -Wall -g2 -gstabs -Os -fpack-struct -fshort-enums -funsigned-char -funsigned-bitfields -fno-exceptions -mmcu=atmega1280 -DF_CPU=16000000UL -MMD -MP -MF"main.d" -MT"main.d" -c -o"main.o" "../main.cpp"

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

OK. I'm game. If you strip the disassembled object files of all the text and look at just the lines that claim to be emitted object code, it looks like the "broken" image has an unexplained "gap" between address 0x6 and address 0x10. I don't have a disassembler handy, but it LOOKS like the code is otherwise similar in length and content. But who knows what will happen if that "gap" stays uninitialized in flash.
broken.txt also has a mysterious

  98               	68,0,31,.LM1-.LFBB1

in that same spot, which would occupy the right amount of space if those are somehow "int" values. (intermediate code that wasn't translated? I dunno.)
It's a bit hard to tell what is really there and what is an artifact of the listing that gcc is producing; do you have disassembly of the final object code?

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Actually, neither looks correct. I think that the "working" one just happens to set the bit in PORTB and the other doesn't. Both listings look quite jumbled and are missing output codes in places. Both don't have the LDI R24, 0xFF, and the working does not have the first STS. The gap that westfw spotted should contain the RJMP 0 and the first LDI to initialize the member variables of the instances. It is also missing that first STS and three others.

The emitted address/opcodes should be beside the assembly that they represent. Here is the same code compiled in 4.3.3:

Attachment(s): 

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hmmph. I tried to build gcc 4.5.1 on my Mac so I could look into this further, but was unsuccessful. Admittedly, the Mac environment is a bit weird (a native gcc and lives in /usr/bin, but fink and other downloaded packages live in /sw/... (and the target directory for the avr version is /sw/cross/...) But still...

1) Unlike the configure steps for (all) other packages like gmp, mpfr, and mpc, gcc's configure doesn't seem to figure out that it should use -m64, leading to "architecure mismatches" in the sanity testing. search web, examine logs, Add CLFAGS=-m64 to configure cmd line.

2) Building gcc/dfp.o can't find gstdint.h, even though that file exists in the apparently correct place (builddir/../libgcc/). (build the .o manually with added -I switch.)

3) libcpp is apparently missing a "configure" file? (the build doesn't find libcpp.a, and all the steps I can think of won't build it...) (still stuck on this one.)

these are errors that ought not exist :-(
(or problems with my system that ought to be caught during "configure" ?) Grr.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Finally got gcc/g++ installed. Other issue were:
1) Several other "configure" files missing. (fix: download them individually via svn browser.) (maybe I just downloaded a bad tar file?)
2) libgmpxx needed, but not checked for. (fix: configure gmp with --enable-cxx)
2) libppl too old, but not checked in configure.
3) libppl required newer libgmp (not checked by configure for gcc; not sure it should have.)
4) newer libgmp apparently doesn't compile on macs due to problems in the apple version of gcc WRT "extern inline". (workaround: patch gmp.h so that it thinks it doesn't have "extern inline")
Grr.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have duplicated your findings with my 4.5.1 install an Mega. Attached are the "full" (elf) disassemblies, which look much better than the "assembly listing" that C seems to generate (! a bug in itself?)
However, both main() and initializer functions look fine

Quote:
The gap that westfw spotted should contain the RJMP 0 and the first LDI to initialize the member variables of the instances.

Well, no. The static class initialization is done by a function that is called BEFORE main(). You won't see it if you just disassemble the .o file...

It looks a bit like the initialization code was changed to allow the table of initialization functions to be located beyond the first 64k of flash. Perhaps this is not quite correct (though it's a bit mysterious that it would work at all, ever, if not...)

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ah hah! The new initialization code uses R20 to hold the high 8 bits (of 24) of the table of initialization functions. However, R20 is NOT preserved by the initialization functions!! So the next time through the loops, we fetch an address from some random location of flash instead of where it should be coming from, and go off into never-never land!

I'll have to read up to decide whether this is a problem with the init functions using registers that they shouldn't, or the loop not preserving a register that needs preserved... But it's definitely wrong!

Relevant bits of code in __do_global_ctors

 	ldi	r20, hh8(__ctors_end)
	rjmp	.L__do_global_ctors_start
.L__do_global_ctors_loop:
	sbiw	r28, 2
	sbc     r20, __zero_reg__
	mov_h	r31, r29
	mov_l	r30, r28
	out     __RAMPZ__, r20
	XCALL	__tablejump_elpm__
.L__do_global_ctors_start:
	cpi	r28, lo8(__ctors_start)
	cpc	r29, r17
	cpc	r20, r16
	brne	.L__do_global_ctors_loop

And in the class setup code

  MyClass(unsigned char rxen, unsigned char txen, unsigned char rxcie, unsigned char udre, unsigned char u2x)
   {
     _rxen = rxen;
 152:	44 e0       	ldi	r20, 0x04	; 4
 154:	40 93 00 02 	sts	0x0200, r20
     _txen = txen;
 158:	33 e0       	ldi	r19, 0x03	; 3
 15a:	30 93 01 02 	sts	0x0201, r19
     _rxcie = rxcie;
 15e:	27 e0       	ldi	r18, 0x07	; 7
 160:	20 93 02 02 	sts	0x0202, r18
     _udre = udre;
 164:	95 e0       	ldi	r25, 0x05	; 5
 166:	90 93 03 02 	sts	0x0203, r25
     _u2x = u2x;
 16a:	81 e0       	ldi	r24, 0x01	; 1
 16c:	80 93 04 02 	sts	0x0204, r24
    uint8_t _u2x;
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This patch appears to fix things. I'm not sure it's a fully correct patch, or if it's the best patch, but it works with your example case!
I'll see if I can navigate bug submition...

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That was more fun than I've had in a while...
:-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw, you're a genius :)

Interesting that you should mention that the bug seemed to come about as the result of a change to allow initializers above 64K. Perhaps fixing this bug introduced the new one.

I'll apply your patch and see if it works for my projects. Glad you had fun tracking this one down!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For better or worse, I've had a lot of experience tracking down bugs. "What's changed since it worked" is one of those easy questions, IF you can narrow down the pieces that are likely to have broken.

I want to be sure you realize how important your "minimalized" example was. I'm not a compiler guru, and would normally not have tried to fix an apparent compiler bug. But since this was down to the "this 400 byte program works, but this nearly identical 400 byte program doesn't work", I figured I had a reasonable chance of at least indentifying an area for the compiler folks to look at. An in the end, it wasn't exactly a compiler bug at all (assuming that my diagnosis is correct.)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Aw; someone else found the same problem about 2 weeks ago...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Also, if you're going to post a patch to GCC, please use the -u switch when generating the patch. It makes it easier to read. ;)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I got a private email along the lines of "great, this is fixed now, right?" Lest anyone else be similarly confused:

NO, this is still an open problem.

It's now gcc bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45263 (which as of today, is still "unassigned" and "unconfirmed"; it has two "amateur" patches but no official attention.)

There is currently NO version of gcc that correctly supports (c++ only?) programs larger than 80k. Older versions cannot access the class initializer table at all if it its beyond the 64k limit, and newer versions access it "wrong" under some (common) circumstances...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And the other bad news:

I don't think I believe that the fix implemented for 29141 actually fixes all cases of the bug, even not counting that it has bad code in the piece it does implement. The fix for 29141 allows the table of constructors created by the compiler to be at a location beyond the first 64k, but the constructor function addresses IN that table seem to still be only 16 bits long. It looks like the linker only produces a 16bit table, and then the code that walks through the table hardwires a zero into the upper 8 bits for the called constructor as well. This is beyond me; since I'm not even aware of how gcc handles "far" addresses on big AVRs. (You might be able to work around this by making sure the .o files containing constructors occur in the first 64k of flash.)

The good news is that I think it ought to be possible to "fix" older versions of the compiler by using linker switches (or a linker script) to make sure that the table of constructor addresses (the .ctors and .dtors segments from .o files) is put into flash at the beginning of the flash .text segment instead of at the end. This would theoretically allow an updated Arduino IDE or Makefile to work around the bug without needing a patched compiler. (OTOH, linker scripts seem even more like Black Magic than AVR Assembler. I can't figure out how to actually do this, or even if it's really possible. And link wizards out there?)

As explanation, the pseudocode looks like this:

code_t * constructor_table[] = {serial_constructor_func, ethernet_constructor_func, ...};
init() {
   :
  function_ptr_t *entry = &constructor_table[0];
  while (entry < END_OF(constructor_table)) {
     function_ptr func = *entry;
     envokeFunction(func);
     entry += sizeof(code_t *);
  }
  :
}

So, 29141 fixed things so that function_ptr_t is now 24 bits, and fixed "envokeFunction()" to handle a 24bit pointer, but it 1) used a register that the constructors could trash, and 2) did NOT increase the size of "code_t *"

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I just duplicated this entire process and got to the same point. Anyone taken a further look at this?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
The fix for 29141 allows the table of constructors created by the compiler to be at a location beyond the first 64k, but the constructor function addresses IN that table seem to still be only 16 bits long.

I can't tell whether this is a bug or not. there are "hints" that all call destinations >64k are handled by a "trampoline" area in the low 64k, in which case things could be OK.

But I can't find any summary of just how >64k AVRs are supposed to be supported :-(

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here is a hint: Most of the GCC developers for the AVR port don't come to AVR Freaks on a regular basis. If you are going to be involved in working on AVR GCC, then I suggest that you subscribe to the avr-gcc-list mailing list and/or the avr-libc-dev mailing list for the avr-libc project. All of the AVR GCC developers are subscribed to those mailing lists and you're likely to get better feedback than posting a thread here.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Actually the bug with clobbering R20 when calling __tablejump_elpm__ from __do_global_ctors has now been fixed in the GNU mainline v4.6.1. (BUG # 45263, New Revision: 174427, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45263) My question is, since Atmel is still shipping 4.5.1, how do I get this fix into my libgcc.a? And where do I get it? Or do I have to assemble it myself?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Or do I have to assemble it myself?

Yes.