sprintf_P never returns unless the next line is repeated!

Go To Last Post
16 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi. I have a very odd and frustrating bug. I had some working code which I was doing a bit of reorganization on (in an unrelated area) and after a full re-build it locks up.

Here is my code for XMEGA:

char 	temp_string[160];

void TERM_print_date_time(RTC_DATA *time)
{
	uint8_t 	y, m, d;

	RTC_days_offset_to_ymd(time->days, &d, &m, &y);
	sprintf_P(temp_string, PSTR("%02u/%02u/%02u %02u:%02u:%02u"), d, m, y, time->hours, time->minutes, time->seconds);
	LED_PORT.OUTCLR = LED_RED_PIN_bm;
	TERM_tx_string_r(temp_string);
}

The LED never comes on. If I put the LED setting line just before sprintf_P the LED comes on as expected. sprintf_P is not returning for some reason.

Now it gets really weird. The following works:

char 	temp_string[160];

void TERM_print_date_time(RTC_DATA *time)
{
	uint8_t 	y, m, d;

	RTC_days_offset_to_ymd(time->days, &d, &m, &y);
	sprintf_P(temp_string, PSTR("%02u/%02u/%02u %02u:%02u:%02u"), d, m, y, time->hours, time->minutes, time->seconds);
	LED_PORT.OUTCLR = LED_RED_PIN_bm;
	TERM_tx_string_r(temp_string);
	TERM_tx_string_r(temp_string);
}

The only difference is that the final statement is repeated twice. If I print temp_string twice sprintf_P returns fine. In fact I can put other function calls in there and it will work, but not things like nop() or other non-call statements.

What... the... hell?

I can't see any reason why it wouldn't work. Interrupts disabled, over 5000 bytes of stack available, temp_string is easily long enough...

Anyone seen this before? It's driving me nuts.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

.lss for the two?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

The only difference is that the final statement is repeated twice. If I print temp_string twice sprintf_P returns fine.

So, the sprintf_P can "look into the future" and see that there will be two prints executed? I think not. You are seeing ghosts.

Back off, and try to tke a different perspective on things.

E.g. are you calling TERM_print_date_time more than once?

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am only calling TERM_print_date_time once, and it does not matter where I call it from.

I know exactly what you mean about sprintf_P being affected by things that come later. I decided to look at the assembler code but can't see anything wrong with it. The only differences between the two versions are the extra call and different addresses.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

How about if you try using a simulator or better yet a debugger? Something like sprintf_P is not reliant on external events so is a prime candidate for simulation.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The code needed a fair bit of hacking to run in the simulator, but now it does. And all of a sudden it works...

This is what I experienced last time. I changed something, did a full clean/rebuild and it stopped. There must be something that causes the compiler or linker to produce code that is broken somehow.

I noticed that when it failed in the simulator the first few times it jumped off to unset addresses containing invalid op-codes. I have see the micro go into a restart loop because of that, but it doesn't always do it. Unfortunately I can't debug into sprintf_P because the debugger doesn't seem to know where the source is and asks me for a file.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

There must be something that causes the compiler or linker to produce code that is broken somehow.

I think you'll find he's sat in a chair in front of your computer ;-)
Quote:

Unfortunately I can't debug into sprintf_P because the debugger doesn't seem to know where the source is and asks me for a file.

Yes but you can step the asm? But the first key question is if you an attempt a step-over rather than a step-into does it come back? You LED experiment would seem to suggest it would not. If that's the case then just follow it in using disassembly and, if necessary refer to the source code here:

http://svn.savannah.nongnu.org/v...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

Yes but you can step the asm?

No. The debugger won't let me; when I click the icon nothing happens. It seems to be a bit confused somehow, which is probably because the build system somehow managed to generate code that is inconsistent with the source.

I have seen it before. Code compiles apparently without issue but fails to run properly. You do a clean/rebuild from scratch and the compiler notices various issues. You correct them and the problem goes away. Something in the build system must fail to notice when certain files need recompiling.

This is the first time that I have seen it happen after a full rebuild though.

If/when it happens again I will spend some more time trying to get the debugger to let me step through it instruction by instruction in the disassembly view.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

which is probably because the build system somehow managed to generate code that is inconsistent with the source.

Are you using -Wl,-relax by any chance?

BTW I just built this:

#include 
#include 
#include 

char buffer[20];

int main(void) {
	sprintf_P(buffer, PSTR("Hello %u"), 123);
}

When I start to debug then try to single step it throws up a file selector asking me to tell it where the source of sprintf_P() is but if I simply [Cancel] it switches to a disassembly view of:

--- C:\home\tools\hudson\workspace\avr8-gnu-toolchain\src\avr-libc\libc\stdio\sprintf_p.c 

00000071 ae.e0                LDI R26,0x0E		Load immediate 
00000072 b0.e0                LDI R27,0x00		Load immediate 
00000073 e7.e7                LDI R30,0x77		Load immediate 
00000074 f0.e0                LDI R31,0x00		Load immediate 
00000075 0c.94.1f.03          JMP 0x0000031F		Jump 
etc.

Does this not happen for you?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi clawson. I will have to continue this debugging on Monday because I finished work for the week :)

I do get the assembler listing, but when I try to step the debugger clicking the icon does not do anything. Normally it works.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
There must be something that causes the compiler or linker to produce code that is broken somehow.

I try to be humble, but this time I'll go the boasty road: I have taught programming for 10+ years at uni (a long time ago, but still..). I can not count the times when I had students saying "it must be a compiler bug", and in all cases it turned out that it was the usual "pilot error".

It is not uncommon, especially for a beginner, to not understand why problems suddenly pop up and then vanish without a trace and totally inexplicably. This is not an indication of a faulty compiler, but that you have a challenging but exciting learning experience ahead. :D

The absolutely best favour you can do yourself is to write code in small chunks, change only one thing at a time and test three times more than you think is necessary. Always write code so that you can test a small chunk of code independently of the rest of the program. NEVER write code that can not be tested until the whole application is complete - this is a definitive recipe for disastrous failure.

For reference, I will show you one of my first compiler bugs when I started off with C. It kept me puzzled for a whole day. I was already an experience programmer, e.g. in Pascal (which also explains why I coded this snippet wrongly in the first place. Sketchy:

void foo()
{
   printf("In foo\n");
}

int main(void)
{
   printf("Before foo\n");
   foo;
   printf("After foo\");
   return 0;
}

The compiler bug showed itself by the fact that foo was clearly not called as I never saw the "In foo" output while the other two promptly appeared.

I hope you will not sink as slow as I did that day.. :wink:

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

JohanEkdahl wrote:
I try to be humble, but this time I'll go the boasty road: I have taught programming for 10+ years at uni (a long time ago, but still..). I can not count the times when I had students saying "it must be a compiler bug", and in all cases it turned out that it was the usual "pilot error".

Well, it isn't exactly a problem with the compiler per-se, it is a build issue. Sometimes files that need to be rebuilt are not. Normally if, say, a header file changes then all the .c files that include it are recompiled, but every now and then it doesn't work. A full rebuild of the project fixes it.

So in that sense the build system produces an executable that is not consistent with the source code. This is the first time I have ever had a full rebuild fail to work though.

I wish it was an open source project so I could just post it and let other see, but it is proprietary.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Okay, I had another look today and predictably am now unable to recreate the problem. If it happens again though I will report back.

Fortunately I kept listings files from versions that didn't work, and they show the exact problem:

    793e:	0e 94 ff 93 	call	0x127fe	; 0x127fe 

...

00012692 :

So somehow the output of the compiler+linker managed to jump to a seemingly random address beyond the end of the program code. Eventually I did something that caused the build system to fix itself. I have no idea what unfortunately, I was just trying to debug it at the time.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Okay, similar problem again and changing from -O3 to -O2 fixes it...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So this time what do the .lss files show?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The .lss files don't really provide any clues. The problem seems to be related to the position of certain functions with regards to the 64k border in flash, but I'm struggling to figure out what causes the problem. Adding code to one place can randomly move something to an address that causes this mis-addressing issue.

I also found another problem of my own making which is worth mentioning here. On XMEGA you need to make sure that NVM.CMD = 0 before using any of the pgm_read_* functions. Sometimes I would add some random code, totally unrelated to NVM, and for some reason the USB device descriptors would not be read properly until I fixed this issue. We are talking about adding a bit of arthritic or something, so I have no idea why it would make this issue manifest. Anyway, even with it fixed the mis-addressing still happens very occasionally.