Forum Menu




 


Log in Problems?
New User? Sign Up!
AVR Freaks Forum Index

Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Author Message
clawson
PostPosted: Aug 27, 2010 - 07:10 PM
10k+ Postman


Joined: Jul 18, 2005
Posts: 71231
Location: (using avr-gcc in) Finchingfield, Essex, England

GCC is an optimizing C compiler. The advantage of this is that, used right, like some of the other good AVR C compilers it can generate code that is perhaps only 10% larger/slower than hand crafted assembler code. There is, however a penalty to be paid in it trying to generate the most efficient code possible.

The first time this is usually encountered with those beginning to program AVR is in one of a number of places:

1) you build a C program then try to run it in AVR Studio's debugger or simulator. As you single step the code the yellow arrow that shows the point of execution appears to jump about and not follow the sequential path of execution you may have been expecting to see.

2) a program uses one or more interrupts and there are one or more variables used in both the interrupt handlers (ISR) and the main() code that appear to be ignored by the code in main. The ISR changes the variable but the main code does not "see" the change.

3) you try to insert a delay into the C code by simply using an empty for() loop counting from 0 to N but it doesn't delay at all.

4) you try to use one of the features of the AVR where Atmel dictate that two operations must happen within a fixed number of cycles (often 4 cycles) and you cannot make the sequence work (such as writing twice to the JTD bit to disable the JTAG interface)

5) You attempt to debug a program in the Studio debugger/simulator but when you try to add a variable to the "watch window" it always shows "variable not in scope"

6) You use the delay routines in <util/delay.h> but suddenly your program grows in size by 1..3K (this is actually an effect of not optimising, not caused when it is used)

Looking at each of these in turn:

1) When GCC's optimiser is switched on (using a -O option other than -O0, such as -Os or -O3 say) it will optimize the result such as discarding code sections that apparently do nothing useful, re-using the same sequence of opcodes if it appears several times, inlining small functions, reordering code so that registers involved in one C statement may now be initialized long before they are actually used and many other techniques to reduce the size and increase the speed of the resultant code.

Without optimization there's usually a one to many relationship between each C source statement and the opcodes generated to implement it. When a disassembly listing (such as the .s or .lss file) is studied a single C statement may be found to have generate 5..10 or more AVR opcodes but each block of opcodes is distinct and identifiable. This example program:
Code:
int main(void) {
   PORTB = 0x55;
   PORTD = 0x55;
   while(1);
}

generates:
Code:
   PORTB = 0x55;
  74:   e8 e3          ldi   r30, 0x38   ; 56
  76:   f0 e0          ldi   r31, 0x00   ; 0
  78:   85 e5          ldi   r24, 0x55   ; 85
  7a:   80 83          st   Z, r24
   PORTD = 0x55;
  7c:   e2 e3          ldi   r30, 0x32   ; 50
  7e:   f0 e0          ldi   r31, 0x00   ; 0
  80:   85 e5          ldi   r24, 0x55   ; 85
  82:   80 83          st   Z, r24

You don't need to know what all that (from the .lss file that was output) means - though a C programmer should have an understanding of the underlying Asm and what it means. But simplistically note that because the value 0x55 was used to set two different AVR registers (PORTB and PORTD) this has lead to the value 0x55 being loaded into R24 twice. This does mean that there's a separate block of opcodes for each C statement but it's very wasteful.

If the same code is built with optimization the result is much more compact and efficient:
Code:
   PORTB = 0x55;
  6c:   85 e5          ldi   r24, 0x55   ; 85
  6e:   88 bb          out   0x18, r24   ; 24
   PORTD = 0x55;
  70:   82 bb          out   0x12, r24   ; 18

In this new version the value 0x55 is only loaded into R24 once. On the one hand this is good as it's (in part) what makes the code more compact (the other being the switch from ST to the much more efficient OUT). But hopefully it's now obvious that the code used to implement "PORTB = 0x55" and "PORTD = 0x55" are effectively sharing an AVR opcode.

So there's no longer a simple connection between one C statement and one block of AVR opcodes. Now, in this simplest of examples, one opcode is being shared by two C statements. This is simply because the optimizer has recognized "why would I bother loading 0x55 into R24 again when I know it already contains that value?"

But it's more complex versions of this that can lead to the "yellow arrow" jumping about. Even just a small modification to the example:
Code:
int main(void) {
   PORTB = 0x55;
   PORTC = 0xAA; // this line added
   PORTD = 0x55;
   while(1);
}

leads to the code being:
Code:
   PORTB = 0x55;
  6c:   95 e5          ldi   r25, 0x55   ; 85
  6e:   98 bb          out   0x18, r25   ; 24
   PORTC = 0xAA;
  70:   8a ea          ldi   r24, 0xAA   ; 170
  72:   85 bb          out   0x15, r24   ; 21
   PORTD = 0x55;
  74:   92 bb          out   0x12, r25   ; 18

The first and third line of the source now share "ldi r25, 0x55" and if you run this code in the simulator (it will build/simulate for mega16 and other processors) you will find that the yellow arrow starts on the opening brace of main() but on the next step it goes straight to the PORTC=0xAA line having skipped the PORTB=0x55 line, then it moves to the PORTD=0x55 line then it disappears in the while(1) at the end never to be seen again.

If you were just stepping the C this might lead you to believe that the PORTB=0x55 line had never been executed. However, the IO view in the simulator/debugger will show that all the statements were executed.

A very useful technique for debugging optimized C code is to start the debugger/simulator which will show the yellow arrow on the opening brace of main(). Now select "Disassembler" on the View menu. This opens a new window where the yellow arrow is now positioned on the first opcode of:
Code:
+00000036:   E595        LDI       R25,0x55       Load immediate
+00000037:   BB98        OUT       0x18,R25       Out to I/O location
6:           PORTC = 0xAA;
+00000038:   EA8A        LDI       R24,0xAA       Load immediate
+00000039:   BB85        OUT       0x15,R24       Out to I/O location
7:           PORTD = 0x55;
+0000003A:   BB92        OUT       0x12,R25       Out to I/O location

While this window (and not the C source window) has the focus pressing the [Step Into] icon (or pressing F11) will execute just a single AVR opcode, not an entire C statement on each click/press. If you press it once the LDI is executed and the yellow arrow halts on the first OUT instruction. If you keep stepping you will be happy to see that ALL these statements are executed in turn and nothing is really being missed out in the execution of the program at all.

It's just that Studio's job is made a bit tricky when it can no longer just execute a single block of opcodes for each C statement. Hence the "yellow arrow jumps around" effect. If you are puzzled always switch to the mixed C/Asm view and follow the opcodes.

For points (2) and (3) above here is a program that demonstrates the two points:
Code:
#include <avr/io.h>
#include <avr/interrupt.h>

char count;

int main(void) {
   char i;
   
   count = 0;
   TIMSK = (1<<TOIE0); // overflow interrupts
   TCCR0 |= (1<<CS01); // start timer0 no prescale
   sei();
   for (i=0; i<100; i++) {
      // just delay
   }
   while (count < 10) {
      PORTB = 0xAA;
   }
   PORTB = 0x55;
   while(1);
}

ISR(TIMER0_OVF_vect) {
  count++;
}

When built for mega16 with -Os this shows another of the two commonest "gotchas" for optimization that beginners may not be aware of. The code generated is:
Code:
0000007c <main>:
char i;

int main(void) {
   char i;
   
   count = 0;
  7c:   10 92 61 00    sts   0x0061, r1
   TIMSK = (1<<TOIE0); // overflow interrupts
  80:   81 e0          ldi   r24, 0x01   ; 1
  82:   89 bf          out   0x39, r24   ; 57
   TCCR0 |= (1<<CS01); // start timer0 no prescale
  84:   83 b7          in   r24, 0x33   ; 51
  86:   82 60          ori   r24, 0x02   ; 2
  88:   83 bf          out   0x33, r24   ; 51
   sei();
  8a:   78 94          sei
   for (i=0; i<100; i++) {
      // just delay
   }
   while (count < 10) {
      PORTB = 0xAA;
  8c:   8a ea          ldi   r24, 0xAA   ; 170
  8e:   88 bb          out   0x18, r24   ; 24
  90:   fe cf          rjmp   .-4         ; 0x8e <main+0x12>

Again you don't have to be an AVR Asm expert and try to understand all of this (though if you can it's a really useful skill to have) but it's hopefully obvious that a lot of the program appears to have "gone missing"!

The for() loop does not seem to have generated any code and there's no sign of code using the value 0x55 and it appears to be infinitely stuck in the first while() loop. This is because:

a) The compiler, with optimization, will discard pointless code. After starting the timer there is a "delay" using a count from 0 to 99. That for() loop has generated no code whatsoever (the disassembly shows the source lines but no opcodes generated for it). This is because the delay has no inputs and no outputs so, to the compiler, that is trying to make the code as small and fast as possible, it just seems pointless - so it is discarded.

A simple solution is presented below but to be honest, if using AVR-LibC it makes far more sense to not try and code your own delay loops but, instead, use the _delay_ms(), _delay_us(), _delay_loop_1() and _delay_loop_2() found in <util/delay.h> and <util/delay_basic.h> which are explained on these two pages:

http://www.nongnu.org/avr-libc/user-man ... delay.html
http://www.nongnu.org/avr-libc/user-man ... basic.html

b) the real "gotcha" in the program example (and in the use of the optimizer in general) is the use of the variable 'count' in this program. The idea had been that the timer with interrupts would be started, the timer would interrupt for 10 overflows, incrementing 'count' each time - at least it does this bit OK:
Code:
ISR(TIMER0_OVF_vect) {
  92:   1f 92          push   r1
  94:   0f 92          push   r0
  96:   0f b6          in   r0, 0x3f   ; 63
  98:   0f 92          push   r0
  9a:   11 24          eor   r1, r1
  9c:   8f 93          push   r24
  count++;
  9e:   80 91 61 00    lds   r24, 0x0061
  a2:   8f 5f          subi   r24, 0xFF   ; 255
  a4:   80 93 61 00    sts   0x0061, r24
}
  a8:   8f 91          pop   r24
  aa:   0f 90          pop   r0
  ac:   0f be          out   0x3f, r0   ; 63
  ae:   0f 90          pop   r0
  b0:   1f 90          pop   r1
  b2:   18 95          reti

The "problem" is that the while loop in main() was supposed to keep an eye on 'count' and while it was less than 10 you may have been expecting to see some code that outputted 0xAA to PORTB. Then when main() saw that 'count' had exceeded 10 it should have finished the while(count<10) loop and output 0x55 instead followed by an infinite while(1) loop. Yet there's no sign of any code that would ever output 0x55 in what's been generated or fall into the final "while(1)" loop.

The reason is that as far as the compiler is concerned when it's compiling main() it enters with count=0 (all globals default to 0) and then there is no way (as far as main() and the compiler is concerned) that count can ever change value. The compiler cannot "see" or know that the value of 'count' may be changed in the separate ISR() function so it compiles the program as if it had been written as:
Code:
   sei();
   for (i=0; i<100; i++) {
      // just delay
   }
   while (1) { // count would ALWAYS be less than 10
      PORTB = 0xAA;
   }
   PORTB = 0x55;
   while(1);

In which execution could never escape that first while() loop and the PORTB=0x55 and the final while(1) can never be reached. So the program ends at:
Code:
  8c:   8a ea          ldi   r24, 0xAA   ; 170
  8e:   88 bb          out   0x18, r24   ; 24
  90:   fe cf          rjmp   .-4         ; 0x8e <main+0x12>

which repeatedly outputs 0xAA to PORTB just as the compiler believes it was asked to do.

If it's required that this program behave as originally written it's possible to tell the compiler that variables 'i' and 'count' must not be ignored by using the word 'volatile' which means "this variable is possibly subject to use elsewhere so you must always read/write it when told to". With the modification as follows:
Code:
volatile char count;

int main(void) {
   volatile char i;

the code generated becomes:
Code:
   sei();
  94:   78 94          sei
   for (i=0; i<100; i++) {
  96:   19 82          std   Y+1, r1   ; 0x01
  98:   03 c0          rjmp   .+6         ; 0xa0 <main+0x24>
  9a:   89 81          ldd   r24, Y+1   ; 0x01
  9c:   8f 5f          subi   r24, 0xFF   ; 255
  9e:   89 83          std   Y+1, r24   ; 0x01
  a0:   89 81          ldd   r24, Y+1   ; 0x01
  a2:   84 36          cpi   r24, 0x64   ; 100
  a4:   d0 f3          brcs   .-12        ; 0x9a <main+0x1e>
  a6:   02 c0          rjmp   .+4         ; 0xac <main+0x30>
      // just delay
   }
   while (count < 10) {
      PORTB = 0xAA;
  a8:   98 bb          out   0x18, r25   ; 24
  aa:   01 c0          rjmp   .+2         ; 0xae <main+0x32>
  ac:   9a ea          ldi   r25, 0xAA   ; 170
   TCCR0 |= (1<<CS01); // start timer0 no prescale
   sei();
   for (i=0; i<100; i++) {
      // just delay
   }
   while (count < 10) {
  ae:   80 91 60 00    lds   r24, 0x0060
  b2:   8a 30          cpi   r24, 0x0A   ; 10
  b4:   c8 f3          brcs   .-14        ; 0xa8 <main+0x2c>
      PORTB = 0xAA;
   }
   PORTB = 0x55;
  b6:   85 e5          ldi   r24, 0x55   ; 85
  b8:   88 bb          out   0x18, r24   ; 24
  ba:   ff cf          rjmp   .-2         ; 0xba <main+0x3e>

in which the for() loop has generated some delaying code and a check is repeatedly kept on the 'count' variable and because the ISR will eventually increase it above 10 the code will then go on to output the value 0x55 and enter the final "rjmp .-2" which is the final while(1) loop.

4) the problem with timed sequences is that Atmel dictates that certain registers must be written twice within 4 cycles. For example a typical sequence to disable JTAG is:
Code:
MCUCSR = (1<<JTD);
MCUCSR = (1<<JTD);

and to set a new value into CLKPR it's typically:
Code:
CLKPR = (1<<CLKPCE);
CLKPR = 0;

The MCUCSR or CLKPR will be written using either OUT or STS depending on where they are located in memory. The requirement is that the two writing instructions happen within 4 cycles.

When this code is built with optimization enabled (-Os in this case) the sequences generated are:
Code:
   MCUCSR = (1<<JTD);
  6c:   80 e8          ldi   r24, 0x80   ; 128
  6e:   84 bf          out   0x34, r24   ; 52
   MCUCSR = (1<<JTD);
  70:   84 bf          out   0x34, r24   ; 52

and
Code:
   CLKPR = (1<<CLKPCE);
  80:   80 e8          ldi   r24, 0x80   ; 128
  82:   80 93 61 00    sts   0x0061, r24
   CLKPR = 0;
  86:   10 92 61 00    sts   0x0061, r1

In both of these the writes (OUT, STS) are so close there's no worries about them meeting the four cycle requirement.

If the same codes are built using -O0 to turn off optimization then the generated code is far more long winded and it's far more likely that the code will not meet the 4 cycle timing requirement:
Code:
  74:   e4 e5          ldi   r30, 0x54   ; 84
  76:   f0 e0          ldi   r31, 0x00   ; 0
  78:   80 e8          ldi   r24, 0x80   ; 128
  7a:   80 83          st   Z, r24
   MCUCSR = (1<<JTD);
  7c:   e4 e5          ldi   r30, 0x54   ; 84
  7e:   f0 e0          ldi   r31, 0x00   ; 0
  80:   80 e8          ldi   r24, 0x80   ; 128
  82:   80 83          st   Z, r24

and
Code:
   CLKPR = (1<<CLKPCE);
  88:   e1 e6          ldi   r30, 0x61   ; 97
  8a:   f0 e0          ldi   r31, 0x00   ; 0
  8c:   80 e8          ldi   r24, 0x80   ; 128
  8e:   80 83          st   Z, r24
   CLKPR = 0;
  90:   e1 e6          ldi   r30, 0x61   ; 97
  92:   f0 e0          ldi   r31, 0x00   ; 0
  94:   10 82          st   Z, r1

Finally another way to "fix" the counting program would have been to build the entire program using -O0, this would have generated:
Code:
0000007c <main>:
#include <avr/io.h>
#include <avr/interrupt.h>

char count;

int main(void) {
  7c:   df 93          push   r29
  7e:   cf 93          push   r28
  80:   0f 92          push   r0
  82:   cd b7          in   r28, 0x3d   ; 61
  84:   de b7          in   r29, 0x3e   ; 62
   char i;
   
   count = 0;
  86:   10 92 60 00    sts   0x0060, r1
   TIMSK = (1<<TOIE0); // overflow interrupts
  8a:   e9 e5          ldi   r30, 0x59   ; 89
  8c:   f0 e0          ldi   r31, 0x00   ; 0
  8e:   81 e0          ldi   r24, 0x01   ; 1
  90:   80 83          st   Z, r24
   TCCR0 |= (1<<CS01); // start timer0 no prescale
  92:   a3 e5          ldi   r26, 0x53   ; 83
  94:   b0 e0          ldi   r27, 0x00   ; 0
  96:   e3 e5          ldi   r30, 0x53   ; 83
  98:   f0 e0          ldi   r31, 0x00   ; 0
  9a:   80 81          ld   r24, Z
  9c:   82 60          ori   r24, 0x02   ; 2
  9e:   8c 93          st   X, r24
   sei();
  a0:   78 94          sei
   for (i=0; i<100; i++) {
  a2:   19 82          std   Y+1, r1   ; 0x01
  a4:   03 c0          rjmp   .+6         ; 0xac <main+0x30>
  a6:   89 81          ldd   r24, Y+1   ; 0x01
  a8:   8f 5f          subi   r24, 0xFF   ; 255
  aa:   89 83          std   Y+1, r24   ; 0x01
  ac:   89 81          ldd   r24, Y+1   ; 0x01
  ae:   84 36          cpi   r24, 0x64   ; 100
  b0:   d0 f3          brcs   .-12        ; 0xa6 <main+0x2a>
  b2:   04 c0          rjmp   .+8         ; 0xbc <main+0x40>
      // just delay
   }
   while (count < 10) {
      PORTB = 0xAA;
  b4:   e8 e3          ldi   r30, 0x38   ; 56
  b6:   f0 e0          ldi   r31, 0x00   ; 0
  b8:   8a ea          ldi   r24, 0xAA   ; 170
  ba:   80 83          st   Z, r24
   TCCR0 |= (1<<CS01); // start timer0 no prescale
   sei();
   for (i=0; i<100; i++) {
      // just delay
   }
   while (count < 10) {
  bc:   80 91 60 00    lds   r24, 0x0060
  c0:   8a 30          cpi   r24, 0x0A   ; 10
  c2:   c0 f3          brcs   .-16        ; 0xb4 <main+0x38>
      PORTB = 0xAA;
   }
   PORTB = 0x55;
  c4:   e8 e3          ldi   r30, 0x38   ; 56
  c6:   f0 e0          ldi   r31, 0x00   ; 0
  c8:   85 e5          ldi   r24, 0x55   ; 85
  ca:   80 83          st   Z, r24
  cc:   ff cf          rjmp   .-2         ; 0xcc <main+0x50>

000000ce <__vector_9>:
   while(1);
}

ISR(TIMER0_OVF_vect) {
  ce:   1f 92          push   r1
  d0:   0f 92          push   r0
  d2:   0f b6          in   r0, 0x3f   ; 63
  d4:   0f 92          push   r0
  d6:   11 24          eor   r1, r1
  d8:   8f 93          push   r24
  da:   df 93          push   r29
  dc:   cf 93          push   r28
  de:   cd b7          in   r28, 0x3d   ; 61
  e0:   de b7          in   r29, 0x3e   ; 62
  count++;
  e2:   80 91 60 00    lds   r24, 0x0060
  e6:   8f 5f          subi   r24, 0xFF   ; 255
  e8:   80 93 60 00    sts   0x0060, r24
}
  ec:   cf 91          pop   r28
  ee:   df 91          pop   r29
  f0:   8f 91          pop   r24
  f2:   0f 90          pop   r0
  f4:   0f be          out   0x3f, r0   ; 63
  f6:   0f 90          pop   r0
  f8:   1f 90          pop   r1
  fa:   18 95          reti

I'll leave you to contemplate whether you really want your C compiler to be generating such long winded, slow and bloated code just so you don't have to think about using 'volatile' or so that the yellow arrow doesn't "jump about". I know what I'd want from a C compiler!

5) The watch window in AVR Studio is (usually) very simplistic. Each time code stops executing after either a single step or when it hits a breakpoint Studio redraws the contents of he watch window. It knows which locations in SRAM the variables are located at and it just reads what is in the locations and uses that to display the variable's current value.

This is all well and good as long as the code is updating the SRAM locations for a variable every time they are written (as happens with -O0 or 'volatile'). But one of the function of the optimizer is to recognise when it can simply hold a local copy of the variable in a machine register and not bother to update the copy in SRAM. Also sometimes a variable may never actually exist at all - in which case there'd never be a change of watching it.

Here is a simple program to demonstrate some of this:
Code:
#include <avr/io.h>

int main(void) {
   uint8_t a, b, c;

   a = 5;
   b = 7;
   c = a * b;
   PORTB = c;   
   while(1);
}

When this is built without optimization (-O0) the generated code is:
Code:
   uint8_t a, b, c;

   a = 5;
  78:   85 e0          ldi   r24, 0x05   ; 5
  7a:   8b 83          std   Y+3, r24   ; 0x03
   b = 7;
  7c:   87 e0          ldi   r24, 0x07   ; 7
  7e:   8a 83          std   Y+2, r24   ; 0x02
   c = a * b;
  80:   9b 81          ldd   r25, Y+3   ; 0x03
  82:   8a 81          ldd   r24, Y+2   ; 0x02
  84:   98 9f          mul   r25, r24
  86:   80 2d          mov   r24, r0
  88:   11 24          eor   r1, r1
  8a:   89 83          std   Y+1, r24   ; 0x01
   PORTB = c;   
  8c:   e8 e3          ldi   r30, 0x38   ; 56
  8e:   f0 e0          ldi   r31, 0x00   ; 0
  90:   89 81          ldd   r24, Y+1   ; 0x01
  92:   80 83          st   Z, r24

The compiler creates the variables on the stack and uses the Y register to access them. 'a' is at RAM location 'Y+3', 'b' is at 'Y+2' and 'c' is at 'Y+1'. They are in RAM and they are updated each time they are written to by the "STD Y+n, Rn". This means that the Studio "watcher" has no problem showing you there current values as you step through this code.

Now consider the same program built with -Os but first consider what the intention of this entire program is. It's final output is to write a value to PORTB. The input values and 5 and 7 and sitting in your armchair you can already tell that 5*7 is 35 (or 0x23). So now look at what the optimizing compiler actually generates:
Code:
0000006c <main>:
   uint8_t a, b, c;

   a = 5;
   b = 7;
   c = a * b;
   PORTB = c;   
  6c:   83 e2          ldi   r24, 0x23   ; 35
  6e:   88 bb          out   0x18, r24   ; 24

Well that certainly does what the programmer intended and outputs 35/0x23 to the PORTB I/O location (0x18 for a mega16). But where are 'a', 'b' and 'c'? what RAM locations are they in? The answer is that they never existed and as such there was never any SRAM set aside to hold their values. Why should the compiler waste time and space to do this when all the program really does is output 35 to PORTB and so that's what the generated code does. Notionally a/b/c existed during compilation but the compiler could see that the only use for them was to be assigned 5 and 7 then the result of multiplying these. It can see that a,b,c are never used for anything else in the program so it might as well do the multiplication (5*7=35) at compile time rather than leaving it to be done by the AVR at runtime as was seen in the non-optimized code. An 80x86 processor running at several gigahertz is much quicker at multiplying 5 by 7 while it's compiling your AVR program for you than the AVR is at doing it. The final result (35) is known at compile time and there's no chance of it changing while the AVR is running so why bother leaving it to the AVR to do the math?

But the bottom line is that you won't be able to watch a, b or c in the optimized version of that program and any attempt to do so will just show "Location not valid". Note that is even true of 'c'. You might say that in the "LDI r24, 0x23" that R24 was effectively the 'c' variable ('c' holds 35 and so does R24) but (unlike some other debuggers) Studio is not quite this smart to make the association between the AVR's register number 24 and 'c'. It (usually) needs 'c' to be an actual location in SRAM for the debugger watch window to be able to "see" it.

While we're here I'll just mention another example of point (3) - the pointless for() loop using this example:
Code:
#include <avr/io.h>

int main(void) {
   uint8_t a;

   for (a=0; a<10; a++) {
   }
   PORTB = a;   
   while(1);
}

So you might think this will create a variable 'a' in RAM and have it count from 0 to 10 and then put 10 into PORTB. You'd be right about the very last part of that but not about 'a' being created in RAM or there being a counting loop:
Code:
0000006c <main>:
int main(void) {
   uint8_t a;

   for (a=0; a<10; a++) {
   }
   PORTB = a;   
  6c:   8a e0          ldi   r24, 0x0A   ; 10
  6e:   88 bb          out   0x18, r24   ; 24

All this program really does is output 10 to PORTB so that's all the compiler has generated.

Now at this stage you may be wondering (a) well why didn't it discard that bit and (b) why do I keep using PORTB/C/D in these examples. Well it comes down to this. What you know of as PORTB is what the C compiler really sees as:
Code:
(*(volatile uint8_t *)((0x18) + 0x20))

Don't worry about how complex this looks except to note our old friend 'volatile' in there. While PORTB is really just the label for IO location 0x18 (SRAM location 0x38 - hence the +0x20 above) this construct tells the compiler to treat that location as if it were volatile - so code must always be generated to write to that location.

Again this all comes down to the core function of the optimizer - it will discard code that does not do anything useful. All computer programs have inputs, outputs or both. If they have neither then they might as well not exist. By using 'volatile' you are saying "this thing is the "output" of this program so you must generate code to write to the output". If the program started by reading PINA as an input value then guess what, PINA is defined as being 'volatile' too so the compiler MUST go and read it before using the value it finds there.

6) the delay functions in <util/delay.h> have already been mentioned (as a better alternative than trying to use empty for() loops to create delays). They do have a "gotcha" of their own though. The functions _delay_ms(N) and _delay_us(N) are defined to take an input 'N' as floating point variable. So if you say _delay_ms(2) it's really _delay_ms(2.0) and then the code in _delay_ms() does a floating point calculation using this "2.0" to work out how many times _delay_loop_2() must be called to achieve the number of milliseconds requested.

If you use _delay_ms(2) and build with optimization switched on then just like the c=a*b/c=35 example above the 80x86 processor in your PC (which is better at maths than your AVR!) will do all the calculations necessary while compiling because 2 (or rather 2.0) is a constant that is known at compile time. If however you use the function with -O0 (optimizer off) then you are saying "don't precalculate this during compilation but generate code so that the AVR will calculate it at run time". Unfortunately the floating point library code the AVR needs to do this is about 1K in size (and if you aren't using libm.a a far worse version that is 3K is used). So your AVR program "bloats" by 1K..3K if you use <util/delay.h> functions and don't have optimization switched on.

This also explains why, if instead of using _delay_ms(2) you use _delay_ms(some_variable_that_changes) then even with optimization switched on the program drags in 1K..3K of floating point library code. This is because even with optimization the 80x86 can no longer do the sums at compile time and, instead, the AVR must do them as it runs.

.... more to come in the next edit ...

_________________


Last edited by clawson on Aug 28, 2010 - 02:48 PM; edited 1 time in total
 
 View user's profile Send private message  
Reply with quote Back to top
Phil.Barlow
PostPosted: Aug 27, 2010 - 07:45 PM
Hangaround


Joined: May 08, 2010
Posts: 154
Location: South Africa

Oh how I wish I'd you'd written (and I'd subsequently read) this months ago... I spent hours trying to figure out what was wrong with my ISRs when I started coding in C Evil or Very Mad

As for the optimizing, is there any way, short of writing out the asm, of getting gcc to leave certain blocks of code alone?

Come to that am I right in thinking that asm is left undisturbed?

Thank you for taking the time to write this up Very Happy
 
 View user's profile Send private message  
Reply with quote Back to top
clawson
PostPosted: Aug 27, 2010 - 08:23 PM
10k+ Postman


Joined: Jul 18, 2005
Posts: 71231
Location: (using avr-gcc in) Finchingfield, Essex, England

Quote:

As for the optimizing, is there any way, short of writing out the asm, of getting gcc to leave certain blocks of code alone?

Come to that am I right in thinking that asm is left undisturbed?

Phil,

Depends what you mean. If you are talking about inline Asm (basically anything that the C compiler gets it's hands on) there are no absolute guarantees about ordering. If, however the Asm code is in separate .S files only presented to the assembler then you have no worries about it being "mangled". Obviously to get from a .c to a .S involves a CALL and a RET - this is the price you pay for getting the Asm advantage.

You may want to search out a thread in the GCC forum from the last month where the use of the word volatile in the context of "asm volatile ("...")" was explored at length. The bottom line was that volatile in this context does not mean what the above article might have you believe it would mean - there's no guarantees about code ordering. This even goes as far as the library routines sei() and cli() and whether you can guarantee the actual SEI and CLI opcodes appearing at the exact point you may have positioned them in a .c file.

But this is really the subject for a different tutorial as the use of volatile in that context (a bit like the multiple meanings of the 'static' keyword) is different from it's use to guarantee read/write access to variables.

Cliff

PS I've had the following signature graphic for a long time - there's a reason why FAQ#1 is where it appears Wink

_________________
 
 View user's profile Send private message  
Reply with quote Back to top
Phil.Barlow
PostPosted: Aug 27, 2010 - 08:41 PM
Hangaround


Joined: May 08, 2010
Posts: 154
Location: South Africa

Quote:
bit like the multiple meanings of the 'static' keyword
Indeed, I'm still coming to grips with this one Smile

As for the "asm volatile" stuff, I've seen it around and sort of guessed it was something of the sort but not yet found myself in a situation where I needed to find out more... The thought just popped into my head while reading your tutorial. Shall read the thread now:)

Quote:
Asm code is in separate .S
and this I've not yet seen, shall have to file this nugget away for the right occasion.

Quote:
I've had the following signature graphic for a long time
Atmel should look into putting this on their packaging, especially 1, 3 & 5.

Thanks for the response Cliff.

Phil
 
 View user's profile Send private message  
Reply with quote Back to top
clawson
PostPosted: Aug 27, 2010 - 08:54 PM
10k+ Postman


Joined: Jul 18, 2005
Posts: 71231
Location: (using avr-gcc in) Finchingfield, Essex, England

Quote:

and this I've not yet seen, shall have to file this nugget away for the right occasion.

Start here:

http://www.nongnu.org/avr-libc/user-man ... mdemo.html

_________________
 
 View user's profile Send private message  
Reply with quote Back to top
skeeve
PostPosted: Aug 27, 2010 - 10:42 PM
Raving lunatic


Joined: Oct 29, 2006
Posts: 3210


clawson wrote:
You may want to search out a thread in the GCC forum from the last month where the use of the word volatile in the context of "asm volatile ("...")" was explored at length. The bottom line was that volatile in this context does not mean what the above article might have you believe it would mean - there's no guarantees about code ordering. This even goes as far as the library routines sei() and cli() and whether you can guarantee the actual SEI and CLI opcodes appearing at the exact point you may have positioned them in a .c file.
In a way, asm volatile is a complement to the other volatile.
One says something might be done to me behind your back.
The other says I might do something behind your back.
Quote:
But this is really the subject for a different tutorial as the use of volatile in that context (a bit like the multiple meanings of the 'static' keyword) is different from it's use to guarantee read/write access to variables.

_________________
Michael Hennebry
"Religious obligations are absolute." -- Relg
 
 View user's profile Send private message  
Reply with quote Back to top
js
PostPosted: Aug 27, 2010 - 11:25 PM
10k+ Postman


Joined: Mar 28, 2001
Posts: 22635
Location: Sydney, Australia (Gum trees, Koalas and Kangaroos, No Edelweiss)

It's volatile not volaltile Smile

_________________
John Samperi
Ampertronics Pty. Ltd.
www.ampertronics.com.au
* Electronic Design * Custom Products * Contract Assembly
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
clawson
PostPosted: Aug 28, 2010 - 10:44 AM
10k+ Postman


Joined: Jul 18, 2005
Posts: 71231
Location: (using avr-gcc in) Finchingfield, Essex, England

John,

The irony of that is that it's my 'l' key that normally doesn't work right yet this time it's done over-time! Laughing

Cliff

(I'll correct it when I make the next major edit above - I've already thought of (5) and (6) to add above Wink and I got some useful feedback from Jan)

_________________
 
 View user's profile Send private message  
Reply with quote Back to top
skeeve
PostPosted: Aug 30, 2010 - 04:03 PM
Raving lunatic


Joined: Oct 29, 2006
Posts: 3210


clawson wrote:
You might say that in the "LDI r24, 0x23" that R24 was effectively the 'c' variable ('c' holds 35 and so does R24) but (unlike some other debuggers) Studio is not quite this smart to make the association between the AVR's register number 24 and 'c'. It (usually) needs 'c' to be an actual location in SRAM for the debugger watch window to be able to "see" it.
From this I infer that at least some compilers emit
the kind of debugging information necessary to follow
a variable when it is not stored in main memory.
Do they also emit the information necessary
when the variable changes locations?

In the case at hand, R24 shares an address space with SRAM.
Is the problem that c is local and memory location 24 is global?

_________________
Michael Hennebry
"Religious obligations are absolute." -- Relg
 
 View user's profile Send private message  
Reply with quote Back to top
clawson
PostPosted: Aug 30, 2010 - 04:31 PM
10k+ Postman


Joined: Jul 18, 2005
Posts: 71231
Location: (using avr-gcc in) Finchingfield, Essex, England

Quote:

From this I infer that at least some compilers emit
the kind of debugging information necessary to follow
a variable when it is not stored in main memory.
Do they also emit the information necessary
when the variable changes locations?

In the case at hand, R24 shares an address space with SRAM.
Is the problem that c is local and memory location 24 is global?

Michael, I think you'd have to ask the AVR Studio developers about that. You and I know that R24 is SRAM address 0x0018 so it should be possible to watch it but I can only assume that either the ELF doesn't contain the info to say 'c' is just R24 or Atmel's programmers don't parse this out of the ELF data and make the association.

_________________
 
 View user's profile Send private message  
Reply with quote Back to top
Phil.Barlow
PostPosted: Aug 31, 2010 - 08:30 AM
Hangaround


Joined: May 08, 2010
Posts: 154
Location: South Africa

Quote:
a thread in the GCC forum from the last month where the use of the word volatile in the context of "asm volatile ("...")" was explored at length
Should anyone be interested, I think this is the thread Cliff was talking about. http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=94571 Well worth a read:)
 
 View user's profile Send private message  
Reply with quote Back to top
demcanulty
PostPosted: Sep 23, 2010 - 08:00 AM
Newbie


Joined: Oct 06, 2007
Posts: 14


Thanks for this, great stuff!
 
 View user's profile Send private message  
Reply with quote Back to top
Seppo
PostPosted: Jan 05, 2011 - 03:39 PM
Newbie


Joined: Sep 02, 2007
Posts: 5


Also, thankyou from here too!... Ran into the jumping arrow/weird debug/no variable watch problems exactly as described above. It makes complete sense after reading your excellent tutorial on the topic.
Thankyou!
 
 View user's profile Send private message  
Reply with quote Back to top
Perfesser
PostPosted: Apr 27, 2011 - 01:15 AM
Newbie


Joined: Apr 27, 2011
Posts: 2


Hi Cliff.
I just signed up as a member, and I did so simply because I admired your posting on this topic.
It is very eloquent, yet concise.
Well done !
When I grow up, I want to be like you.
Thanks,
-Karl
 
 View user's profile Send private message  
Reply with quote Back to top
clawson
PostPosted: Apr 27, 2011 - 09:10 AM
10k+ Postman


Joined: Jul 18, 2005
Posts: 71231
Location: (using avr-gcc in) Finchingfield, Essex, England

Quote:

When I grow up, I want to be like you.

Me too - but the chances of me ever growing up at this stage seem very remote.

Cliff (48years 18days)

_________________
 
 View user's profile Send private message  
Reply with quote Back to top
Perfesser
PostPosted: Apr 27, 2011 - 09:21 PM
Newbie


Joined: Apr 27, 2011
Posts: 2


I wonder what my chances are...

Karl = 56Y 269D
 
 View user's profile Send private message  
Reply with quote Back to top
Civic_Power
PostPosted: Apr 29, 2011 - 03:24 PM
Newbie


Joined: Jul 15, 2009
Posts: 1


Grow old?

I was not born... I was downloaded!!!
 
 View user's profile Send private message  
Reply with quote Back to top
bareligion
PostPosted: Jun 19, 2011 - 03:32 AM
Newbie


Joined: Jun 19, 2011
Posts: 8


Thanks for taking your time to write this tutorial! It was very helpful!
 
 View user's profile Send private message  
Reply with quote Back to top
mvadu
PostPosted: Aug 14, 2011 - 02:45 AM
Newbie


Joined: Nov 14, 2009
Posts: 12


Thanks Clawson, this is really helpful. Wihtout this knowledge, I was going nuts to understand debugger results.
 
 View user's profile Send private message  
Reply with quote Back to top
valleyman
PostPosted: Aug 14, 2011 - 06:06 AM
Rookie


Joined: Jul 25, 2011
Posts: 49


clawson wrote:
This also explains why, if instead of using _delay_ms(2) you use _delay_ms(some_variable_that_changes) then even with optimization switched on the program drags in 1K..3K of floating point library code.


Cliff I remember you mentioned in another post that _delay_ms() is to be used as base delay fixed at compile time in order to build variable delays. If I try to pass a variable to this built-in function, the compiler complains

AVR Studio 5 wrote:
Error 2 __builtin_avr_delay_cycles expects an integer constant. util/delay.h 152 28


Great tutorial. Because AS5 defaults to -O1, I had been wondering why the sample code that uses an empty loop for delay declares volatile. I'll switch that to -Os now.

Other than the "deterministic ëmpty loop" that an optimiser drops, is there a rule of thumb to decide when to use volatile to defeat the optimiser and when to encourage optimisation?
 
 View user's profile Send private message  
Reply with quote Back to top
clawson
PostPosted: Aug 14, 2011 - 11:32 AM
10k+ Postman


Joined: Jul 18, 2005
Posts: 71231
Location: (using avr-gcc in) Finchingfield, Essex, England

Quote:

the compiler complains

Yes, you are using the later version of delay.h which has been fixed to emit an error when it is used wrong - this is a distinct improvement as it forces users to actually bother reading the manual rather than simply guessing at how the function probably works.

As for when to optimise: always and often - why would you want a compiler to generate sub-optimal code?

(debugging is the one exception but you don't deploy the debug code - you deploy the release code and you want it as efficient as possible).

_________________
 
 View user's profile Send private message  
Reply with quote Back to top
valleyman
PostPosted: Aug 15, 2011 - 02:16 AM
Rookie


Joined: Jul 25, 2011
Posts: 49


clawson wrote:
(debugging is the one exception but you don't deploy the debug code - you deploy the release code and you want it as efficient as possible).


(BTW, AS5 only defaults to -O1 in Debug configuration; in "Release" configuration, the default is -Os. Debug, in turn, is the default configuration for new project template as well as in all sample applications I downloaded from Atmel.) When I wrote my first program from scratch (using a blank template), I didn't use volatile. Tests worked as expected. Had I changed to Release configuration, I would be unpleasantly surprised to see all my elaborate arithmetic only resulted in a steady blink. (After reading this, I purposely removed volatile and switched to Release. Yes, all variations are gone.)

The tutorial seems to suggest that I shouldn't be throwing volatile to all variables or compiler will not optimise as much as it should. How do I know when I should force volatile even though I do not expect another program or peripheral to alter a variable? I can see that interrupt handler should be considered another program. But there's another example in which a non-empty loop also needs a volatile declaration to maintain the "expected," or designed, behaviour. Should every "loop variable" be volatile?
 
 View user's profile Send private message  
Reply with quote Back to top
clawson
PostPosted: Aug 15, 2011 - 09:22 AM
10k+ Postman


Joined: Jul 18, 2005
Posts: 71231
Location: (using avr-gcc in) Finchingfield, Essex, England

Atmel, you have to realise, are like kids with a new toy who haven't quite worked out how it actually works. Any regular user of avr-gcc knows that the -O0 they use for "Debug" is not acceptable in any circumstance. Some idiot at Atmel thought this was the right choice for "Debug" build. It is not. It's only purpose is for the compiler writers to check the unoptimized code that is initially generated. While it does make programs that can be easily debugged because of the straight one-to-many relation between C source and generated Asm it is utterly and totally pointless to debug that code as it, in no way, represents the code you are finally going to be running. Try adjusting the watchdog or changing the CLKPR register or set the JTD bit using a program built -O0 and it will never work. At the very least Atmel should have used -O1 for "Debug" but even then it's going to behave differently to the final -Os/-O3 program. What Atmel should do is improve the debugger so it can more easily track locals that are cached into registers and encourage users to debug -Os code.

As I say the best solution is to use -Os and simply don't use 'volatile'. Instead debug in the mixed C+Asm view (by which you also quickly learn AVR assembler) and work out which machine registers locals are cached into and watch those rather than using the "Watch window". If you have a problem with this then only while debugging a localised section of the code just temporarily make any locals you think you need to watch there 'volatile'. When you are happy that code section works then remove the volatile.

This rule for volatile has nothing to do with FAQ#1. Any variable that is accessed within two threads of execution must ALWAYS be volatile whether you want to watch it in a debugger or not.

But keep this thought in mind - every time you make a variable volatile you make your program bigger and slower. So do it with care and don't hand out the 'volatile's like they were sweeties.

_________________
 
 View user's profile Send private message  
Reply with quote Back to top
ashokok
PostPosted: Aug 16, 2011 - 06:36 AM
Hangaround


Joined: Aug 11, 2003
Posts: 137
Location: Bangalore India

we are using -Os -mcall-prologues optimization.
Avr_libc_user manual say that this is the most universal best optimization level can you explain about this optimization and also what all the consideration to be taken care for using this optimization.

P.Ashok Kumar
 
 View user's profile Send private message  
Reply with quote Back to top
clawson
PostPosted: Aug 16, 2011 - 09:06 AM
10k+ Postman


Joined: Jul 18, 2005
Posts: 71231
Location: (using avr-gcc in) Finchingfield, Essex, England

Quote:

can you explain about this optimization

I'd suggest this is the ultimate source:

http://gcc.gnu.org/onlinedocs/gcc-4.3.6 ... logues-953
http://www.nongnu.org/avr-libc/user-man ... tools.html

_________________
 
 View user's profile Send private message  
Reply with quote Back to top
abcminiuser
PostPosted: Aug 16, 2011 - 12:04 PM
Moderator


Joined: Jan 23, 2004
Posts: 10218
Location: Melbourne, Australia

Quote:

we are using -Os -mcall-prologues optimization.
Avr_libc_user manual say that this is the most universal best optimization level can you explain about this optimization and also what all the consideration to be taken care for using this optimization.



Normally each ISR has a unique "prologue" and "epilogue" - a few housekeeping instructions to push and pop various portions of the AVR's register set which the ISR needs for its own use, to prevent the existing values from being lost. This is great for low-latency ISRs, as only the registers used are saved and restored.

If you have a large set of ISRs each with a lengthy prologue and epilogue, there can be a lot of flash memory wasted storing the individual ISR prologue/epilogue code. To combat this, you can use the -mcall-prologues switch. This will make every ISR call a single unified ISR prologue and epilogue routine, which will in turn save and restore the entire AVR register set regardless of which registers are actually required in the ISR.

The downside is that every ISR now has a lot more latency, due to the extra CALL/RET instruction pairs to jump to the unified prologue/epilogue functions, and because you now have to wait for all the registers to be saved and restored regardless of which are used. The upside is a space savings if the space taken up by the one unified prologue/epilogue function pair is less than the individual ISR prologue/epilogue sequences.


TLDR; It increases ISR latency, but will reduce overall flash memory consumption if you have a lot of complex ISR handlers in your application.

- Dean Twisted Evil

_________________
Make Atmel Studio better with my free extensions. Open source and feedback welcome!
 
 View user's profile Send private message Send e-mail Visit poster's website 
Reply with quote Back to top
ashokok
PostPosted: Aug 16, 2011 - 02:31 PM
Hangaround


Joined: Aug 11, 2003
Posts: 137
Location: Bangalore India

Thanks Cliff and Dean Very Happy
 
 View user's profile Send private message  
Reply with quote Back to top
thinki_cao
PostPosted: Jul 12, 2012 - 04:15 AM
Newbie


Joined: Jul 08, 2012
Posts: 10
Location: Nanjing,China

This really gives me a great help! Thanks
 
 View user's profile Send private message  
Reply with quote Back to top
Display posts from previous:     
Jump to:  
All times are GMT + 1 Hour
Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Powered by PNphpBB2 © 2003-2006 The PNphpBB Group
Credits