question about optimization

Go To Last Post
14 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello: Using latest studio  6 & GCC install, for XMEGA 16D4 and compiler optimization setting -O3. Set up a timer0 compare interrupt @ 2 KHz from a 20 MHz clock...all works fine, no problem at all.

To check on a scope, I put in a small dumb delay to give some pulse width, works fine.

When I remove the single line semicolon after the "for" the for loop quits working (compiles into dust ) & I get very narrow 2KHz pulses (due to 2KHz irq being extremely quick in & out)...OK

 

With the semicolon removed the second led I/O becomes part of the "for"...WHY does the "for" get optimized away??--just wondering.  I'd think it would stay functional.

 

Maybe the compiler is smart enough to see the volatile abc will just reach 50 & the port will be cleared 50 times, so it just sets it and be done with it (no loop).

If that is the case, why does it keep the for loop when the semicolon is present?

 


volatile int abc=0;
volatile int slow_tick=0;

ISR(TCC0_CCA_vect,ISR_BLOCK)   //TIMER tick interrupt
{
  slow_tick++;

  PORTE.OUTSET=TESTPIN_PE;
  for(abc=0;abc<50;abc++) 
   ;    //note this semicolon.....
  PORTE.OUTCLR=TESTPIN_PE;

}

Edit: typo in title

 

 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Mon. Sep 28, 2015 - 12:51 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The 'lone semicolon' is a valid C statement, called the 'null statement'. In this case, it provides the executable body of the for loop . Without a body ( i.e. without anything to do), the compiler optimizes away the for loop.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 Without a body ( i.e. without anything to do), the compiler optimizes away the for loop

 

Except (as I mentioned) without the semicolon, the last PORTE.OUTCLR=TESTPIN_PE;  then  "becomes" the for loop's body...so the loop should exist just as much (or even more so) as when using the "  ;  "  loop body

 

Am I overlooking something?

 

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why do you think the loop gets optimized away? Have you looked at the .lst file?
Not sure what you mean by "very narrow" pulses? What else do you expect once you remove the delay between SET and CLR?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

-O3 does some pretty interesting things. I have found in a previous application that it actually removed the following break statement:

 

for (uint8_t i = 0;i < 3;i++)
{
    if (!buf[i])
        break;
    Uart_putchar(buf[i]);
}

So be careful when you use anything above -O3. I am unsure of what exactly causes it to remove the above code, porbably because it expanded my for loop.

 

 

 

In the case of your example, even with -O1 it would expand your foor loop out to evalute the end result of abc because nothing else is used inside the for loop. I have also seen that when just playing around with the following code:

 

int main(void)
{
    volatile uint8_t tmp = 0;
    for(;;)
    {
        for (uint8_t i = 0;i < 30;i++)
        {
            tmp += 2;
        }
    }
}

 

You can actually see when doing a simulation that it takes only a few clock cycles to do a full loop and most of that is simply taken up from the jump. The compilier simply adds 60 to tmp, instead of evalutating every interval.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

-O3 does some pretty interesting things. I have found in a previous application that it actually removed the following break statement:

 

for (uint8_t i = 0;i < 3;i++)
{
    if (!buf[i])
        break;
    Uart_putchar(buf[i]);
}

So be careful when you use anything above -O3. I am unsure of what exactly causes it to remove the above code, porbably because it expanded my for loop.

 

All of us would be very interested in seeing that code, and the "evidence" of the dropped break;

 

If indeed you have the hard evidence, then my speculation would be that buf[0..3] had known non-zero values at that point in the code.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

All of us would be very interested in seeing that code, and the "evidence" of the dropped break;

 

If indeed you have the hard evidence, then my speculation would be that buf[0..3] had known non-zero values at that point in the code.

 

Sorry to say, the code was quite similar to that and I have changed it many times since then, was working off memory. I now only work with -O1 after seeing this.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ryanafleming wrote:
You can actually see when doing a simulation that it takes only a few clock cycles to do a full loop and most of that is simply taken up from the jump. The compilier simply adds 60 to tmp, instead of evalutating every interval.
Since tmp is volatile, I'd expect not.

 

Edit: emphasis added

"Demons after money.
Whatever happened to the still beating heart of a virgin?
No one has any standards anymore." -- Giles

Last Edited: Mon. Sep 28, 2015 - 05:33 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If indeed you have the hard evidence, then my speculation would be that buf[0..3] had known non-zero values at that point in the code.

Exactly. The only way the compiler could possibly have ignored that break; at any optimization level was if it could determine (at compile time!) that buf[i] all held non-zero. For instance if you wrote:

char buf[3] = "ABC";
for (uint8_t i = 0;i < 3;i++)
{
    if (!buf[i])
        break;
    Uart_putchar(buf[i]);
}

I would certainly hope that the compiler would remove the pointless if() statement.

 

BTW the "normal" way to write that is simply:

char buf[] = "ABC";
char * p = buf;
while(*p)
{
    Uart_putchar(*p++);
}

or did you really want to put a limit of three on it? If so perhaps:

char buf[] = "ABC";
char * p = buf;
uint8_t lim = 0;
while(*p)
{
    if (++lim > 3) break;
    Uart_putchar(*p++);
}

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

 

ryanafleming wrote:

You can actually see when doing a simulation that it takes only a few clock cycles to do a full loop and most of that is simply taken up from the jump. The compilier simply adds 60 to tmp, instead of evalutating every interval.

Since tmp is volatile, I'd expect not.

 

 

With more recent versions of compilers, volatile may no longer do what you expect : http://blog.regehr.org/archives/28

 

In the example, and allowing for the code being subtly different to what the poster remembered; tmp is local to main(), and there is nothing external that could possibly alter tmp, so it might well be optimised in unexpected ways.

Bob.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sorry to say, the code was quite similar to that and I have changed it many times since then, was working off memory. I now only work with -O1 after seeing this.

Sorry, ryan, but this reminds me of a thread where the OP claimed that CodeVision didn't do floating-point arithmetic correctly.   But no source code (or compiler-generated code) was ever produced.

 

As Cliff and I have opined, our first guess is that it is not a "compiler problem".  Can never say never, though, right?  But without seeing the original "problem" code as well as the "fixed" code, it is hard to tell.

 

Why -O3 and not -O1?  Just speculation, but O3 might "look more deeply" and find e.g. the optimization situation we mentioned.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Re #10, I agree.

"What can the compiler validly do?" can be a different question from

"What can the compiler usefully do?".

See discussions regarding cbi and sbi, especially the latter.

"What will the compiler do?" is yet another question.

"Demons after money.
Whatever happened to the still beating heart of a virgin?
No one has any standards anymore." -- Giles

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Re #10, I agree.

Well, many of us have seen that volatile article before.  On a quick re-read, I don't see what I wouldn't expect.  Perhaps y'all can elaborate, especially in terms of the code snippets in this thread.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why do you think the loop gets optimized away? Have you looked at the .lst file?
Not sure what you mean by "very narrow" pulses? What else do you expect once you remove the delay between SET and CLR?

Indeed...if the CLR is part of the loop, then the pin will be cleared 50 times in a row after a very brief time SET.  What more do you expect?

 

And, did you actually look at the generated code.  I did... (built with -Os)

 

volatile int abc=0;
volatile int slow_tick=0;

#define TESTPIN_PE 2

#include <avr/io.h>
#include <avr/interrupt.h>

ISR(TCC0_CCA_vect,ISR_BLOCK)   //TIMER tick interrupt
{
	slow_tick++;

	PORTE.OUTSET=TESTPIN_PE;
	for(abc=0;abc<50;abc++)
// without semicolon... 	;    //note this semicolon.....
	PORTE.OUTCLR=TESTPIN_PE;

}

volatile int frog;
int main(void)
{
    while(1)
    {
        //TODO:: Please write your application code 
		frog++;
    }
}
00000186 <.do_clear_bss_start>:
 186:	a6 30       	cpi	r26, 0x06	; 6
 188:	b2 07       	cpc	r27, r18
 18a:	e1 f7       	brne	.-8      	; 0x184 <.do_clear_bss_loop>
 18c:	35 d0       	rcall	.+106    	; 0x1f8 <main>
 18e:	3e c0       	rjmp	.+124    	; 0x20c <_exit>

00000190 <__bad_interrupt>:
 190:	37 cf       	rjmp	.-402    	; 0x0 <__vectors>

00000192 <__vector_16>:

#include <avr/io.h>
#include <avr/interrupt.h>

ISR(TCC0_CCA_vect,ISR_BLOCK)   //TIMER tick interrupt
{
 192:	1f 92       	push	r1
 194:	0f 92       	push	r0
 196:	0f b6       	in	r0, 0x3f	; 63
 198:	0f 92       	push	r0
 19a:	11 24       	eor	r1, r1
 19c:	2f 93       	push	r18
 19e:	8f 93       	push	r24
 1a0:	9f 93       	push	r25
	slow_tick++;
 1a2:	80 91 00 20 	lds	r24, 0x2000
 1a6:	90 91 01 20 	lds	r25, 0x2001
 1aa:	01 96       	adiw	r24, 0x01	; 1
 1ac:	80 93 00 20 	sts	0x2000, r24
 1b0:	90 93 01 20 	sts	0x2001, r25

	PORTE.OUTSET=TESTPIN_PE;
 1b4:	82 e0       	ldi	r24, 0x02	; 2
 1b6:	80 93 85 06 	sts	0x0685, r24
	for(abc=0;abc<50;abc++)
 1ba:	10 92 02 20 	sts	0x2002, r1
 1be:	10 92 03 20 	sts	0x2003, r1
// without semicolon... 	;    //note this semicolon.....
	PORTE.OUTCLR=TESTPIN_PE;
 1c2:	22 e0       	ldi	r18, 0x02	; 2
ISR(TCC0_CCA_vect,ISR_BLOCK)   //TIMER tick interrupt
{
	slow_tick++;

	PORTE.OUTSET=TESTPIN_PE;
	for(abc=0;abc<50;abc++)
 1c4:	80 91 02 20 	lds	r24, 0x2002
 1c8:	90 91 03 20 	lds	r25, 0x2003
 1cc:	c2 97       	sbiw	r24, 0x32	; 50
 1ce:	64 f4       	brge	.+24     	; 0x1e8 <__vector_16+0x56>
// without semicolon... 	;    //note this semicolon.....
	PORTE.OUTCLR=TESTPIN_PE;
 1d0:	20 93 86 06 	sts	0x0686, r18
ISR(TCC0_CCA_vect,ISR_BLOCK)   //TIMER tick interrupt
{
	slow_tick++;

	PORTE.OUTSET=TESTPIN_PE;
	for(abc=0;abc<50;abc++)
 1d4:	80 91 02 20 	lds	r24, 0x2002
 1d8:	90 91 03 20 	lds	r25, 0x2003
 1dc:	01 96       	adiw	r24, 0x01	; 1
 1de:	80 93 02 20 	sts	0x2002, r24
 1e2:	90 93 03 20 	sts	0x2003, r25
 1e6:	ee cf       	rjmp	.-36     	; 0x1c4 <__vector_16+0x32>
// without semicolon... 	;    //note this semicolon.....
	PORTE.OUTCLR=TESTPIN_PE;

}
 1e8:	9f 91       	pop	r25
 1ea:	8f 91       	pop	r24
 1ec:	2f 91       	pop	r18
 1ee:	0f 90       	pop	r0
 1f0:	0f be       	out	0x3f, r0	; 63
 1f2:	0f 90       	pop	r0
 1f4:	1f 90       	pop	r1
 1f6:	18 95       	reti

000001f8 <main>:
int main(void)
{

Sure looks like a loop to me...

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.