GCC compiles BAD code with "optimization" on... H

Go To Last Post
43 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello Expert GCCers...

Can you please help me determine why avr-gcc generates erroneous code for the following "super simple" program when optimization (-Os) is used?

Please Note:
(1) I am an experienced AVR developer (10+yrs) but very new to avr-gcc.
(2) I researched this problem for several days before posting my question... and I still don't understand it.
(3) I have removed all setup/initialization code not necessary to demonstrate the compile problem. I know the program would not be useful as is :)

Here is the problem code for an Xmega128a1:

#include 
#include 

/*Global Varaibles*/
/*---------------------------------------------------------*/
uint16_t UpdatePWM=0xFF;


/*Main-----------------------------------------------------*/
void main( void ){
	uint16_t MainLoopCounter;			//main loop counter
	
	//infinite loop
	while(1){
		MainLoopCounter+=1;				//increment loop counter
		
		//if we have reached 65500
		if (MainLoopCounter>=65500){
			MainLoopCounter=0;			//reset counter
			PORTE.OUT|=0x80;			//Toggle PIN E7
			UpdatePWM+=1;				//increment PWM Var
			TCC1.CCA=UpdatePWM;		//Set Compare A Value
		}
	}
}

When compiled with optimization turned on (-Os) the MainLoopCounter variable never gets incremented or compared against 65500. The PORTE.OUT register gets updated every single main loop cycle, as does the TCC1.CCA register. This was verified/determined by analyzing the assembly code in the *.lss file.

When compiled with optimization off (-O0), the code compiles functionally correct, albeit not very efficiently for speed or code size.

As small change to the code will make the compile functionally correct with or without optimazation.

Code that compiles correctly:

#include 
#include 

/*Global Varaibles*/
/*---------------------------------------------------------*/
uint16_t UpdatePWM=0xFF;


/*Main-----------------------------------------------------*/
void main( void ){
	uint16_t MainLoopCounter;			//main loop counter
	
	//infinite loop
	while(1){
		MainLoopCounter+=1;				//increment loop counter
		
		//if we have reached 65500
		if (MainLoopCounter>=65500){
			MainLoopCounter=0;			//reset counter
			PORTE.OUT|=0x80;			//Toggle PIN E7
			//UpdatePWM+=1;				//increment PWM Var
			TCC1.CCA+=1;		//Set Compare A Value
		}
	}
}

For testing, I have also declared MainLoopCounter as volatile, and that works, but it shouldn't be declared as volatile... nothing outside the scope of the main while() loop has access/visibility to it. Declaring it as volatile makes the code unnecessarily big and slow with expense calls to SRAM.

As this is my first experience with the gcc compiler, I really need to understand this problem. Am I missing something or is this a serious bug in the compiler? Any help would be GREATLY appreciated.

I am using WinAVR-20100110....

Thanks,
Jim

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

PORTE.OUT|=0x80; //Toggle PIN E7

Well, that isn't really a toggle, is it?

As such, the code as shown does nothing useful (in the "eyes" of the compiler). Hmmm--except for the .CCA increment.

Under a microscope the "volatile" makes the nearly-null program bigger/slower. But in a full app there would be other references, etc.

Dunno what the compiler is thinking. (I used to be a compiler writer in past lives.)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Can you post the .lss of what WAS generated (I'm not in range of build system to make it myself). It would appear that MainLoopCounter is being used for nothing more than temporal spacing of the activity - it's not an input to what's being done. So I suspect the optimiser (knowing that no time must be wasted) just poo-poo's the idea of wasting time by counting on your fingers. If you want temporal spacing use non-optimisable delays to achieve it.

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No, it does not generate bad code. You wrote bad code.

Your while(1) loop and the MainLoupCounter is the equivalent of applying a modulo 65500 operation (division by 65500) to infinity (the while(1)). Dividing infinity by 65500 is still infinity. Therefore, and because the if() has no alternative branch (no else), the division / modulo operation can be eliminated.

In other words, with while(1) you say "do this an infinite amount of times". The compiler generates code to do "this" an infinite amount of times, and because you turned on optimization, it does it as fast as possible. Your count to 65500 just prohibits fast execution of the code, and has no other side effects, so the compiler eliminates it and gives you 65500 times faster code. You asked for fast code, you got it.

You just did what all the "the compiler is broken" whiners did before you: You rolled your own delay loop, and then you are surprised the compiler optimized the delay away.

Stealing Proteus doesn't make you an engineer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Your while(1) loop and the MainLoupCounter is the equivalent of applying a modulo 65500 operation (division by 65500) to infinity (the while(1)).

You are way beyond me on this one. I have a big main loop in all my programs. Many/most have a "lps" counter--loops per second--that I use to gauge processor usage. Or whatever.

What does my "lps++", followed by a check and reset (periodically in this case, but all my "soft timers" have a trigger value) have to do with modulus of infinity?!?

I'd say (in the absence of any volatiles) that the code could be "constant" in steady-state and the compiler might scrunch it out. But there are two things: the first |= on the PORT won't be done until a delay of 65500 reps, and the increment of the .CCA which would take place every full "pass".

If the timer were configured, one could indeed say that the above is a complete app to vary e.g. LED brightness in a never-ending cycle using main loop timing as the time base.

It will be interesting to see the generated codes and the resolution. Perhaps the .CCA is not seen as volatile in Xmega?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think that the compiler would complain about MainLoopCounter not being initialized. The first use of MainLoopCounter is adding one to an uninitialized value.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This works for me with -Os on an ATxmega128a1

uint16_t MainLoopCounter;         //main loop counter
   //infinite loop
   while(1){
      MainLoopCounter+=1;            //increment loop counter
      //if we have reached 65500
      if (MainLoopCounter>=65500){
         MainLoopCounter=0;         //reset counter
         PORTD_OUTTGL=1 ;
      }
   } 

C:\WinAVR-20090313\bin\avr-gcc.exe

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Can you post the .lss of what WAS generated ...
Cliff

Cliff, your right... that wasn't a toggle!! To many cut, pastes and modifying to test what influences the compilation. Oops. My original code wrote to the DIRTGL register. I have put it back, recompiled and am attaching the lss code.

Here is the code that executes the correct incrementing and checking of the loop counter. Every 65500 iterations of the main loop it updates the PORTE output pin and the TCC1 compare register.

00000234 
: #include #include uint16_t UpdateVal_PWM=255; void main(void){ 234: 80 e0 ldi r24, 0x00 ; 0 236: 90 e0 ldi r25, 0x00 ; 0 while(1){ MainLoopCounter+=1; if (MainLoopCounter>=65500) { MainLoopCounter=0; PORTE.DIRTGL=0x80; 238: a0 e8 ldi r26, 0x80 ; 128 23a: b6 e0 ldi r27, 0x06 ; 6 23c: 20 e8 ldi r18, 0x80 ; 128 //UpdateVal_PWM+=1; TCC1.CCA+=1; 23e: e0 e4 ldi r30, 0x40 ; 64 240: f8 e0 ldi r31, 0x08 ; 8 void main(void){ uint16_t MainLoopCounter; while(1){ MainLoopCounter+=1; 242: 01 96 adiw r24, 0x01 ; 1 if (MainLoopCounter>=65500) { 244: 3f ef ldi r19, 0xFF ; 255 246: 8c 3d cpi r24, 0xDC ; 220 248: 93 07 cpc r25, r19 24a: d8 f3 brcs .-10 ; 0x242 MainLoopCounter=0; PORTE.DIRTGL=0x80; 24c: 13 96 adiw r26, 0x03 ; 3 24e: 2c 93 st X, r18 250: 13 97 sbiw r26, 0x03 ; 3 //UpdateVal_PWM+=1; TCC1.CCA+=1; 252: 80 91 68 08 lds r24, 0x0868 256: 90 91 69 08 lds r25, 0x0869 25a: 01 96 adiw r24, 0x01 ; 1 25c: 80 a7 std Z+40, r24 ; 0x28 25e: 91 a7 std Z+41, r25 ; 0x29 260: 80 e0 ldi r24, 0x00 ; 0 262: 90 e0 ldi r25, 0x00 ; 0 264: ee cf rjmp .-36 ; 0x242

As can be seen, the code increments the MainLoopCounter variable and updates the TCC1, PORTE regesters every time it reaches 65,500... just like my C source specified.

Now here is the lising with the only functional change being that I also want to increment a particular variable (UpdatePWM) during the "once every 65500 cylces" branch.

00000234 
: #include #include uint16_t UpdateVal_PWM=255; void main(void){ 234: 80 91 00 20 lds r24, 0x2000 238: 90 91 01 20 lds r25, 0x2001 23c: 01 96 adiw r24, 0x01 ; 1 while(1){ MainLoopCounter+=1; if (MainLoopCounter>=65500) { MainLoopCounter=0; PORTE.DIRTGL=0x80; 23e: a0 e8 ldi r26, 0x80 ; 128 240: b6 e0 ldi r27, 0x06 ; 6 242: 20 e8 ldi r18, 0x80 ; 128 UpdateVal_PWM+=1; TCC1.CCA=UpdateVal_PWM; 244: e0 e4 ldi r30, 0x40 ; 64 246: f8 e0 ldi r31, 0x08 ; 8 while(1){ MainLoopCounter+=1; if (MainLoopCounter>=65500) { MainLoopCounter=0; PORTE.DIRTGL=0x80; 248: 13 96 adiw r26, 0x03 ; 3 24a: 2c 93 st X, r18 24c: 13 97 sbiw r26, 0x03 ; 3 UpdateVal_PWM+=1; TCC1.CCA=UpdateVal_PWM; 24e: 80 a7 std Z+40, r24 ; 0x28 250: 91 a7 std Z+41, r25 ; 0x29 252: 01 96 adiw r24, 0x01 ; 1 254: f9 cf rjmp .-14 ; 0x248

As can be seen, now the code does not increment and check the MainLoopCounter variable. Rather, it now updates the TCC1, PORTE regesters every single iteration of the main loop. This is NOT what my C source specified... no matter how you optimize it!

I could see it optimizing out the increment of the UpdatePWM val if nothing else in the code path uses it... but not the entire purpose of updating certain values (PORTE.DIRTGL,TCC1.CCA) only every 65500 times through a loop.

Again, if I am missing something here. Please kindly point it out. I still see this as compiled code that does not match my C source.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ossi wrote:
This works for me with -Os on an ATxmega128a1

uint16_t MainLoopCounter;         //main loop counter
   //infinite loop
   while(1){
      MainLoopCounter+=1;            //increment loop counter
      //if we have reached 65500
      if (MainLoopCounter>=65500){
         MainLoopCounter=0;         //reset counter
         PORTD_OUTTGL=1 ;
      }
   } 

C:\WinAVR-20090313\bin\avr-gcc.exe

That works for me also.... does exactly what the C code asked it to do! But now add a line that also increments a global variable inside the if{} branch....and you get totally different behavior! See my last post with *.lss code added.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just a try, replace:

MainLoopCounter>=65500

by

MainLoopCounter>=65500UL
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ArnoldB wrote:
No, it does not generate bad code. You wrote bad code.

Your while(1) loop and the MainLoupCounter is the equivalent of applying a modulo 65500 operation (division by 65500) to infinity (the while(1)). Dividing infinity by 65500 is still infinity. Therefore, and because the if() has no alternative branch (no else), the division / modulo operation can be eliminated.

In other words, with while(1) you say "do this an infinite amount of times". The compiler generates code to do "this" an infinite amount of times, and because you turned on optimization, it does it as fast as possible. Your count to 65500 just prohibits fast execution of the code, and has no other side effects, so the compiler eliminates it and gives you 65500 times faster code. You asked for fast code, you got it.

You just did what all the "the compiler is broken" whiners did before you: You rolled your own delay loop, and then you are surprised the compiler optimized the delay away.

Thank you for your response, but I think you are way off base here though. Cliff pointed this out in his reply.

I am not "whining" that the compiler is broken. I am trying to understand how it generated the code that it did. In this case, I don't think it behaves according to my C source. If I am missing something, I want to find what it is...

Last Edited: Fri. Apr 9, 2010 - 02:49 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
uint16_t MainLoopCounter;         //main loop counter
   //infinite loop
   while(1){
      MainLoopCounter+=1;            //increment loop counter
      //if we have reached 65500
      if (MainLoopCounter>=65500){
         MainLoopCounter=0;         //reset counter
	   	PORTD_OUTTGL=1 ;
		 TCC1.CCA+=1; 
      }
   } 

This runs for me also :shock:

Perhaps the compiler believes that

MainLoopCounter>=65500

is always true due to some type promotion or so ?

( unit16_t always bigger than a negative number )

Perhaps give it a try with 30000 instead of 65500 ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Just a try, replace:
Code:
MainLoopCounter>=65500

by
Code:
MainLoopCounter>=65500UL


Aah, interesting. Or even 65500u?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For all.... Here is a recap of my currently unresolved question.

This compiles correctly:

#include 
#include 

uint16_t UpdateVal_PWM=255;

void main(void){
	uint16_t MainLoopCounter=0;

	while(1){
		MainLoopCounter+=1;
		if (MainLoopCounter>=65500) {
			MainLoopCounter=0;
			PORTE.OUT^=0x80;
		}

	}
}

Meaning... every 65500 cycles through the main loop the MainLoopCounter gets reset to zero and Pin7 on PORTE gets toggled. This is Good.

This does NOT compile correctly:

#include 
#include 

uint16_t UpdateVal_PWM=255;

void main(void){
	uint16_t MainLoopCounter=0;

	while(1){
		MainLoopCounter+=1;
		if (MainLoopCounter>=65500) {
			MainLoopCounter=0;
			PORTE.OUT^=0x80;
			UpdateVal_PWM++;
		}

	}
}

Meaning that the compiled code Toggles PortE PIN7 EVERY SINGLE cycle through the main loop. It never increments the loop counter or compares it against 65500.

I agree that the incrimenting of UpdateVal_PWM should be optimized out... but PORTE PIN7 should only be toggled every 65500 cycles of the main loop.

DO PEOPLE AGREE WITH THIS STATEMENT?
DO PEOPLE DISAGREE WITH THIS STATEMENT?

Again, I appreciate the valuable time that everybody has taken to look at this and respond. Thank you very much.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Cliff is right.
Optimization results from applying the as-if rule.
Under the as-if the only variables that matter are TCC1 and PORTE.
After main starts,
the others are never seen by the outside world.
If the compiler does the right things
with TCC1 and PORTE in the right order,
it has done its job correctly.

Why accessing UpdatePWM has such a drastic effect on the OP's code,
I have no idea.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have tried this with with a mega168 (changing PORTE.OUT with PORTB) and it does the same thing. It also does the same thing on -O1, -O2 and -O3.

If I instead use a volatile global variable (volatile uint8_t count) instead of a port, the if is ignored in all circumstances. So it appears that with just the PORTE.OUT statement, the optimizer misses an optimization. Whether that is a good optimization is another question. Should modifying a volatile variable be a hint to the optimizer that the if should not be optimized out? I would tend to think so.

But certainly adding in the UpdateVal_PWM++ line should not trigger the optimization.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

CNC4ME wrote:
I agree that the incrimenting of UpdateVal_PWM should be optimized out... but PORTE PIN7 should only be toggled every 65500 cycles of the main loop.

DO PEOPLE AGREE WITH THIS STATEMENT?
DO PEOPLE DISAGREE WITH THIS STATEMENT?


If the code is perfectly optimized I disagree.

Here is how I look at it:
What productive thing is the main loop doing during the other 65499 cycles? Specifically what is it doing that the optimizer should care about?

I would say it's doing absolutely nothing, as there is no reason that I can see for the optimizer to care about the value of "MainLoopCounter". It doesn't get used for anything other then the "if" statement.

Given the fact that those 65499 iterations of the loop arn't doing anything productive why should the optimizer keep them in the code?

I'm personally a little surprised by your comment "It never increments the loop counter or compares it against 65500" because I wouldn't expect the loop counter to even exist, but it looks like the optimizer isn't perfect.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Given the fact that those 65499 iterations of the loop arn't doing anything productive why should the optimizer keep them in the code?

I'm personally a little surprised by your comment "It never increments the loop counter or compares it against 65500" because I wouldn't expect the loop counter to even exist, but it looks like the optimizer isn't perfect.


I'm losing track a bit here, but in response to the above:

"What is the purpose of your head? To keep your ears apart."

Given that there is an output toggle in the loop, there is indeed a purpose to n loops: to keep those toggles exactly n loops apart. (Yes, it is arbitrary timekeeping, but it >>is<< timekeeping nonetheless.) I wouldn't expect the compiler to know it is a toggle but it is a write to a volatile register.

I guess you could say that the loop count is arbitrary since the timekeeping is arbitrary? I.e., each loop might take x cycles with one optimization setting, and x/3 with another, and x/7 with another. And that is where infinity comes in? I think I'm getting it now. The compiler reduces it to x/infinity so only the toggles happen as fast as practical.

It's taking a while, but sans volatile I think I see it now.

But the .CCA++ based on a loop count--

(now my head is starting to hurt)

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I now have put it into a project and see the same
effects. It seems not to depend
on the value 65500 nor its type nor the type of
MainLoopCounter.

Seems we need Jörg here :(

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

I now have put it into a project

Just for grins, could there be an X-Files (errr, Xmega) factor here? Do you get the same result when compiling the equivalent for, say, a Mega88 and writing to PINx?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Compiling the nasty program for an ATmega32
gives the same result of code-elimination.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Just for grins, could there be an X-Files (errr, Xmega) factor here? Do you get the same result when compiling the equivalent for, say, a Mega88 and writing to PINx?

As I said in my earlier post, the mega168 behaves the same.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Quote:

Given that there is an output toggle in the loop, there is indeed a purpose to n loops: to keep those toggles exactly n loops apart. (Yes, it is arbitrary timekeeping, but it >>is<< timekeeping nonetheless.) I wouldn't expect the compiler to know it is a toggle but it is a write to a volatile register.

Thank you once again for you very intelligent response! It seems some people are trying to justify the fact that the compiled code does not execute setting of a volatile register (PORTE.OUT) based on the loop count as my C code asks it to do... unless I am missing something about settings/configurations/etc... that I don't know about in GCC.

This counter is not for time keeping (as in creating a delay) it is for a visual indication (flashing an LED) of how busy the processor is with other tasks... those other tasks are obviously not shown in my code, because I have simplified it down to the "heart" of the question.

I am not arguing that this is the best way to do what I want... I am just arguing that I instructed the processor to do that with C statements and the output of the compiled code does not behave that way...

Quote:
But the .CCA++ based on a loop count--
(now my head is starting to hurt)

Incrimenting the cca based on the loop count was only a temporary debugging call... not how I would use it in an "actual" application.

As I think most people will apprecieate - I don't care about "tricking" the compiler to do what I want in this one particular instance... I want to know why it did what it did so I can have confidence to use it (and know how to correctly use it) in MUCH larger and more complicated programs.

My current view is:
I asked it to toggle an Output Pin every 65500 cycles through a loop. One time it decided to create code that does exactly that. The next time it didn't. The only difference was adding and unnessary call to UpdatePWM++; How do I know what it will do with a very complex program?

If it simply "missed" an optimization without the extraneous command to UpdatePWM++, I could be happy and understand that...BUT I have not heard any satisfactory reasons why this is a legal optimization or could be anticipated.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

CNC4ME wrote:
My current view is:
I asked it to toggle an Output Pin every 65500 cycles through a loop. One time it decided to create code that does exactly that. The next time it didn't. The only difference was adding and unnessary call to UpdatePWM++; How do I know what it will do with a very complex program?

This is an incorrect view of the compiler.

You asked it to make 65500 unnecessary cycles, which it is free to decide how to implement. Although you might be outraged, the compiler is absolutely free to implement it for example as 131000 cycles, or whatever else it wants, as long as it has the same side effect as the original program, namely nothing. As maybe even more surprise, the same compiler under the same circumstances is absolutely free to produce completely different results - as long as the side effects are the same. These are extreme cases and the vast majority of compilers won't do these, but they are absolutely free to do it under the terms of definition of the language.

If you desire control over timing or consumption of other resources, please resort to assembler.

JW

PS. You might perhaps want to read the introduction to (and then also the rest of) http://blog.regehr.org/archives/28 - and of course the related chapters of the standard.

Last Edited: Thu. Apr 8, 2010 - 09:01 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I have not heard any satisfactory reasons why this is a legal optimization or could be anticipated.

But the explanation that has been given is perfectly reasonable. You told the compiler to continuously toggle the bit. The resultant code does that. The difference between the two results is only how fast that the pin is toggled. Part of the optimizers job is to make the code as fast as possible while still keeping the functionality of the code the same. The optimizer did that.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Interestingly I compiled the code with CVAVR with default settings (whatever they are) and it works correctly. However I had to set MainLoopCounter to 0 or it would not compile.

#include 
#include 

/*Global Varaibles*/ 
/*---------------------------------------------------------*/ 
uint16_t UpdatePWM=0xFF; 


/*Main-----------------------------------------------------*/ 
void main( void ){ 
   uint16_t MainLoopCounter=0;         //main loop counter 
    
   //infinite loop 
   while(1){ 
      MainLoopCounter +=1;            //increment loop counter 
       
      //if we have reached 65500 
      if (MainLoopCounter>=65500){ 
         MainLoopCounter=0;         //reset counter 
         PORTE.OUT|=0x80;         //Toggle PIN E7 
         UpdatePWM+=1;            //increment PWM Var 
         TCC1.CCA=UpdatePWM;      //Set Compare A Value 
      } 
   } 
} 

John Samperi

Ampertronics Pty. Ltd.

www.ampertronics.com.au

* Electronic Design * Custom Products * Contract Assembly

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

CNC4ME wrote:
This does NOT compile correctly:

#include 
#include 

uint16_t UpdateVal_PWM=255;

void main(void){
	uint16_t MainLoopCounter=0;

	while(1){
		MainLoopCounter+=1;
		if (MainLoopCounter>=65500) {
			MainLoopCounter=0;
			PORTE.OUT^=0x80;
			UpdateVal_PWM++;
		}

	}
}

Meaning that the compiled code Toggles PortE PIN7 EVERY SINGLE cycle through the main loop. It never increments the loop counter or compares it against 65500.

The code above is equivalent to
#include 
#include 

uint16_t UpdateVal_PWM=255;

void main(void){
	uint16_t MainLoopCounter=0;

	while(1){
		while(MainLoopCounter< 65500)
                              MainLoopCounter+=1;
		MainLoopCounter=0;
		PORTE.OUT^=0x80;
		UpdateVal_PWM++;
	}
}

The nested while loop and the assignment
after it together constitute a nop.
The compiler is free to treat them as such.
Once that is done,
MainLoopCounter is an unused variable
and *may* be eliminated altogether.
UpdateVal_PWM is likewise unused and *may* also be eliminated.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, I have been using CodeVision for quite a while and have not experienced code that was optimized away like this. In fact... what got me started down this path was porting an application that was originally compiled in CV over to AVR-GCC. The CV compilation behaved like I expected (intended it to)... The AVR-GCC compilation did not.

Because of this, I have years of experience that are "biasing" my thought process about the legitamacy of the code "disappearing" in the compilation.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
The code above is equivalent to
#include 
#include 

uint16_t UpdateVal_PWM=255;

void main(void){
	uint16_t MainLoopCounter=0;

	while(1){
		while(MainLoopCounter< 65500)
                              MainLoopCounter+=1;
		MainLoopCounter=0;
		PORTE.OUT^=0x80;
		UpdateVal_PWM++;
	}
}

The nested while loop and the assignment
after it together constitute a nop.
The compiler is free to treat them as such.
Once that is done,
MainLoopCounter is an unused variable
and *may* be eliminated altogether.
UpdateVal_PWM is likewise unused and *may* also be eliminated.

Skeeve,

Thanks for the post. That may have made the light bulb go off in my head!

I can see that if the code were executed exactly as you wrote it in "C" it would produce the same behavior as if the code executed exactly like I wrote it in "C". So, yes... they are functionally equivalent.

I can also see how

while(MainLoopCounter< 65500)
                              MainLoopCounter+=1;

would "legitametly" be fine to throw out... as long as nothing else is done in the main while loop.

Interesting...I've just never used a compiler that was this good at optimizing before. I need to stew...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:

PS. You might perhaps want to read the introduction to (and then also the rest of) http://blog.regehr.org/archives/28 - and of course the related chapters of the standard.
Thanks for the link... very interesting reading. My view of what is "right" is starting to turn :?

I have some engrained "expectations" from previous programming experiences (compilers/languages) that have made accepting this learning harder...

Last Edited: Fri. Apr 9, 2010 - 02:40 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Interestingly I compiled the code with CVAVR with default settings (whatever they are) and it works correctly.

But the point is, there is nothing "incorrect" with the way avr-gcc compiled the code. The code may not do what the OP wanted, but the compiler has no knowledge of what the programmer wants, only knowledge of what the programmer wrote.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you to everybody for responding.... I know it takes time and energy to do it.

I have learned a lot from reading the posts... and analyzing what was said.

I admit the compiler did a "great" job of legally optimizing the compiled code. It took a while for it to sink in, but I am coming to grips with it now.

It seems in all my previous C programming experiences I have used compilers that generated code that behaved more like a direct C "interpreter"... but the C compiler isn't required to execute the commands that I give it. The paper posted by Koshchi helped me understand that better. Anyone who has a hard time with this should go read that paper NOW.

Of course, that makes my choice of subject lines for this topic regrettable :oops: I knew it would get immediate attention though...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Interestingly I compiled the code with CVAVR with default settings (whatever they are) and it works correctly.

CV is not nearly as "aggressive" in these kind of situations. I don't necessarily disagree with the "level" of CV optimization--if y9ou write crap code in C you get crap results. Would one expect the compiler to make a silk purse out of a sow's ear? Let's just summarize and say that with CV WYSIWYG.

But there is nothing wrong with "aggressive" optimization, when you call for it. If you ask for smallest/fastest, then you might get it. There have been a few different variation on the theme posted. I now get the infinity part. Whether the particular situation is right or wrong...

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just to note again what I said in my post above. If what you are after is temporal spacing then counting on your fingers with an incrementing counter is NOT the way to achieve this given the aggression of the GCC optimiser. It will spot such tricks as skeeve explained above and discard them. For this very reason the authors of AVR-LibC have provided and to give you temporal delays that will not be optimised out of existence - in fact the use of these routines RELIES on the optimiser being enabled (so that loop constants are calculated at compile time, not run time). So while I can't be too accurate how long a delay you'd hoped for with a 65500 count I guess there'd be about 6 cycles in a 16bit counter loop - so say 393,000 cycles. Lets also guess at an 8MHz F_CPU so that would be about 49 milliseconds. Thus:

#define F_CPU 8000000UL
#include  
#include  
#include 

/*Global Varaibles*/ 
/*---------------------------------------------------------*/ 
uint16_t UpdatePWM=0xFF; 


/*Main-----------------------------------------------------*/ 
void main( void ){ 
   //infinite loop 
   while(1){ 
         _delay_ms(49);
         PORTE.OUT|=0x80;         //Toggle PIN E7 
         UpdatePWM+=1;            //increment PWM Var 
         TCC1.CCA=UpdatePWM;      //Set Compare A Value 
      } 
   } 
}

Assuming this WAS the intention here?

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Just to note again what I said in my post above. If what you are after is temporal spacing then counting on your fingers with an incrementing counter is NOT the way to achieve this given the aggression of the GCC optimiser....

Assuming this WAS the intention here?

Cliff

Cliff,
My original intent was NOT to add a time delay to the loop. In fact, I want to spend as little time looking at the loop counter as possible and get on with the rest of the loop...of course, I would be doing MANY more things in an "actual" program's main loop (and in ISRs).

Typically, I would have a loop counter in the main loop that triggers a toggle of an LED every so many times through the loop. This serves as a visual indication of how busy the uC is and indication that the software is executing properly (not hung).

As I was using AVR-GCC for the first time, I wanted to start with simple code, figure out how to use the new tools to compile it and program it to Flash. Once this was successful I would add in the complexity to the program.

When it didn't toggle the LED as expected... I got alarmed and confused.

I was missing a fundamental concept of what a "C" program actually means to the compiler... I was under the false impression that it had to literally do what I asked, how I asked. Now I understand why I didn't and life makes sense again!

If I add ANY meaningful code after the if() statement, the compiler generates the code I expected. This was unintentionally exposed because I stripped away all the "complexity" of the real program...

With all my previous experiences being with less aggressive optimizers the issue never came up before.

Again, I learned a lot from this. Thank to everyone.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

CNC4ME wrote:
For all.... Here is a recap of my currently unresolved question.

This compiles correctly:

#include 
#include 

uint16_t UpdateVal_PWM=255;

void main(void){
	uint16_t MainLoopCounter=0;

	while(1){
		MainLoopCounter+=1;
		if (MainLoopCounter>=65500) {
			MainLoopCounter=0;
			PORTE.OUT^=0x80;
		}

	}
}

Meaning... every 65500 cycles through the main loop the MainLoopCounter gets reset to zero and Pin7 on PORTE gets toggled. This is Good.

This does NOT compile correctly:

#include 
#include 

uint16_t UpdateVal_PWM=255;

void main(void){
	uint16_t MainLoopCounter=0;

	while(1){
		MainLoopCounter+=1;
		if (MainLoopCounter>=65500) {
			MainLoopCounter=0;
			PORTE.OUT^=0x80;
			UpdateVal_PWM++;
		}

	}
}

Meaning that the compiled code Toggles PortE PIN7 EVERY SINGLE cycle through the main loop. It never increments the loop counter or compares it against 65500.

I agree that the incrimenting of UpdateVal_PWM should be optimized out... but PORTE PIN7 should only be toggled every 65500 cycles of the main loop.

DO PEOPLE AGREE WITH THIS STATEMENT?
DO PEOPLE DISAGREE WITH THIS STATEMENT?

Again, I appreciate the valuable time that everybody has taken to look at this and respond. Thank you very much.

I disagree with that statement.

Your code is semantically equivalent to this:

#include 
#include 

uint16_t UpdateVal_PWM = 255;

int main(void)
{
    while(1)
    {
        for (uint16_t MainLoopCounter = 0; MainLoopCounter < 65500; MainLoopCounter++)
        {
        }
        PORTE.OUT ^= 0x80;
        UpdateVal_PWM++;
    }
}

Now, when optimizing, the compiler will detect that the UpdateVal_PWM++ statement has no side effects, i.e. updating that variable doesn't do anything because that variable is not being used anywhere. For that statement to be included in the code generation that variable would have to be declared as volatile.

It has been discussed here and elsewhere (e.g. the avr-gcc-list mailing list) that empty loops *can* be optimized away by the compiler. This is, and has always been, a poor way to implement any timing in embedded code. Please see prior discussions on this forum about this. It is better to use a hardware timer, and even if this case, and an output compare that automatically toggles a pin (I didn't check to see if this particular pin is available to do this). Doing so would be more accurate and be automated, allowing the processor to do other things.

Now, when the UpdateVal_PWM++ statement is commented out, it just so happens that the compiler kept the generated code for the empty loop. You just got lucky. The compiler optimizer could have easily removed the empty loop then too. No, I don't know off-hand why it was kept in.

In the end, this is not very well written code to accomplish the task. The fact that GCC behaves differently than other compilers does not make it wrong. GCC is very standards compliant. Sometimes its optimization can be very aggressive. But the compiler is not psychic; it doesn't have a switch that says do-as-I-mean-and-not-how-I-write.

The best thing to do is to change your code.

Eric

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, late to the party, as usual. But it sounds like you got the point. ;)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

> Meaning that the compiled code Toggles PortE PIN7 EVERY SINGLE
> cycle through the main loop. It never increments the loop counter
> or compares it against 65500.

It has been asked to optimize, and it simply decided that it could do faster
and with less code by not counting anything that is in the end (from its
point of view) just a waste of time. See it in another way: the compiler
got knowledge of a very specific CPU feature that could count up 65500
times within zero CPU cycles ;-), provided that the result of that count
operation is not used afterwards. Doesn't it look like a really cute
optimization then to also *use* that "CPU feature"?

As Arnold and Cliff mentioned, if you want a delay, use the routines from
(or ), or even better, use a timer.
That's why the designers of your controller added timers to them in the
first place.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dl8dtl wrote:
> As Arnold and Cliff mentioned, if you want a delay, use the routines from (or ), or even better, use a timer.

As CNC4ME mentioned, he did NOT want a delay.
CNC4ME wrote:
My original intent was NOT to add a time delay to the loop. [...]
Typically, I would have a loop counter in the main loop that triggers a toggle of an LED every so many times through the loop. This serves as a visual indication of how busy the uC is and indication that the software is executing properly (not hung).

This might confuse even a seasoned avr-gcc user: normally, the main loop would contain at least one volatile access, directly or indirectly (otherwise it has no purpose, isn't it); then the counter couldn't be optimised out (actually it could, but there would be then some other mechanism ensuring the loop repeats the required number of times before the LED toggles). So, the seasoned user would NEVER see the counter optimised out in a real-world program.

I think this deserves a decent explanation made sticky/tutorial, and those who come across this problem should be directed to that tutorial rather than subtly ridiculed.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

rather than subtly ridiculed.

I've read the entire thread - I don't see where that occurred?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

As CNC4ME mentioned, he did NOT want a delay.
But he implemented one. And that's what counts for the compiler.

You know what I find funny?

When avr-gcc doesn't optimize, e.g. because a function is used in an ISR, people complain.

When avr-gcc optimizes, like in the given case, people complain.

Stealing Proteus doesn't make you an engineer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ArnoldB wrote:
Quote:

You know what I find funny?

When avr-gcc doesn't optimize, e.g. because a function is used in an ISR, people complain.

When avr-gcc optimizes, like in the given case, people complain.

I'm not complaining anymore. I now understand why the compiler generated the code that it did and now I can use it properly.:D

I was missing a fundamental concept: the difference between the literal implementation of C statements/structure as written and the "as if" rules the compiler actually uses to generate/optimize the code.

Now I know my "simple" trial program that doesn't actually do anything other than prove I can compile with avr-gcc and succesfully program the Xmega AVR should have been written like this:

	uint16_t MainLoopCounter;
	//Main application Loop
	while(1){
	
		//Main Loop Counter.... this serves to create a visual
		//indication that the main loop is exectuing (not hung) 
		//and of how busy the uC is.
		//Faster toggles-> lots of idle time
		//Slower toggles-> busy CPU
		MainLoopCounter++;   			//inc  loop counter
		if(MainLoopCounter>=65500){		//check count  
			MainLoopCounter=0;			//if time, reset counter
			PORTE.DIRTGL=0x80;			//and toggle the LED
		}
		//force at least one "always required" access in 
		//the body of the main loop. this is required so
		// the loop counter does not get optimized out and
		// can be removed when any other volatile access calls 
		//are added to the program
		asm volatile("nop");
		
		//Actual Application Code To Be Writtten Here
	}

Whats funny to me is that I have been (successfully) programming AVRs with another compiler for so many years and I never knew this.... the only reason I "discovered" it was because I tried to create such a simple program to avoid any "bugs". Had any other useful call been in that main body - I still would not know.

I am glad this happened and that I was forced to understand this... and now I can be a better programmer. Thanks again to everybody who helped walk me through this!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Quote:

rather than subtly ridiculed.

I've read the entire thread - I don't see where that occurred?
Please don't rip the quote off the context of my post above: not in this thread, but I've seen this quite recently in maybe two or three threads with very similar problem (I admit the problem was presented in maybe a slightly different manner).

JW