understanding lss

Go To Last Post
14 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi All,

I was trying to grasp the concept of variable scope and I was looking at the lss file.

so i have this.

#include 

uint8_t fcn(void)
{
	uint8_t foobar = 0;
	
	foobar++;
	
	return foobar;
}

int main(void)
{
    while(1)
    {
		uint8_t foo = 0;
		static uint8_t bar = 0;
		
		bar++;
		foo = fcn();
        //TODO:: Please write your application code 
    }
}

which generates this.

#include 

uint8_t fcn(void)
{
  84:	cf 93       	push	r28
  86:	df 93       	push	r29
  88:	1f 92       	push	r1
  8a:	cd b7       	in	r28, 0x3d	; 61
  8c:	de b7       	in	r29, 0x3e	; 62
	uint8_t foobar = 0;
  8e:	19 82       	std	Y+1, r1	; 0x01
	
	foobar++;
  90:	89 81       	ldd	r24, Y+1	; 0x01
  92:	8f 5f       	subi	r24, 0xFF	; 255
  94:	89 83       	std	Y+1, r24	; 0x01
	
	return foobar;
  96:	89 81       	ldd	r24, Y+1	; 0x01
}
  98:	0f 90       	pop	r0
  9a:	df 91       	pop	r29
  9c:	cf 91       	pop	r28
  9e:	08 95       	ret

000000a0 
: int main(void) { a0: cf 93 push r28 a2: df 93 push r29 a4: 1f 92 push r1 a6: cd b7 in r28, 0x3d ; 61 a8: de b7 in r29, 0x3e ; 62 while(1) { uint8_t foo = 0; aa: 19 82 std Y+1, r1 ; 0x01 static uint8_t bar = 0; bar++; ac: 80 91 00 01 lds r24, 0x0100 b0: 8f 5f subi r24, 0xFF ; 255 b2: 80 93 00 01 sts 0x0100, r24 foo = fcn(); b6: 0e 94 42 00 call 0x84 ; 0x84 ba: 89 83 std Y+1, r24 ; 0x01 //TODO:: Please write your application code } bc: f6 cf rjmp .-20 ; 0xaa

I have a few questions about this.
1. why subi? why not inc?

2. I know that everytime I enter a function, the non-static variables are "re-created" and they vanish when I exit the function. But looking at the lss, this is also true for loops? foo was re-initialized to zero.

3. The static bar was given a fixed address e.g. 0x100. Is this in RAM?

4. What about the non-static variables? They are referenced as Y+1. Where is this? Instruction set says that Y=R29:R28. Does this mean that foo is in 0x3E3D+1? why there?

5. Also, r1 is always zero?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
I have a few questions about this.
1. why subi? why not inc?

logically SUB -1 is the same as INC. GCC always seems to use the SUBI approach. I actually had a look at the instruction encoding as I imagined that the opcode would be the same.... but it is not.
http://www.atmel.com/Images/doc0...

2. I know that everytime I enter a function, the non-static variables are "re-created" and they vanish when I exit the function. But looking at the lss, this is also true for loops? foo was re-initialized to zero.

The code instructs that foo is set to 0 at the top of each loop.
I am a little surprised that foo was not optimised out as it is never used. What optimisation level were you using?

3. The static bar was given a fixed address e.g. 0x100. Is this in RAM?

yes. all variables are in RAM unless you explicitly direct the compiler/locater to put them in Flash or Eeprom.

4. What about the non-static variables? They are referenced as Y+1. Where is this? Instruction set says that Y=R29:R28. Does this mean that foo is in 0x3E3D+1? why there?

I am not sure why there are three pushes at the beginning of main but one is just making space on the stack for "foo". the two IN instructions load the SP into the Y register. Y+1 points to foo on the stack. To find the address you need to know the value of SP.

5. Also, r1 is always zero?

yes.

regards
Greg

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

gregd99 wrote:

I have a few questions about this.
1. why subi? why not inc?

logically SUB -1 is the same as INC. GCC always seems to use the SUBI approach. I actually had a look at the instruction encoding as I imagined that the opcode would be the same.... but it is not.
http://www.atmel.com/Images/doc0...

INC doesn't set the carry flag. Irrelevant in this case, but not in others (variable bigger than 8 bit). Why should the compiler use a different instruction in the 8 bit case when this different instruction has no advantage (both are 1word/1clock)?

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
INC doesn't set the carry flag.
I learn something every day - thanks. I should have read the instruction set manual more carefully.

regards
Greg

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

all variables are in RAM

Suggest you read this:

http://www.nongnu.org/avr-libc/u...

Uninitialised statics and globals will be in .bss (block started by symbol) while those that are initialised will be in .data

On modern CPUs SRAM starts at 0x100. The fact that your static was located at 0x100 just serves to show that on this occasion .data was empty.

Y is known as the frame pointer. Inside a function the local variables will be on the stack from Y+1 upwards. I descibed something about it in this thread:

https://www.avrfreaks.net/index.p...

(explanation a few posts on from this).

Last Edited: Tue. Jun 25, 2013 - 08:20 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The generated code is quite long because it is not optimized.

You'll get a better impression of the generated code with -O i.e. with optimizations turned on.

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

with optimizations turned on.

So this:
00000066 :
	uint8_t foobar = 0;
	
	foobar++;
	
	return foobar;
}
  66:	81 e0       	ldi	r24, 0x01	; 1
  68:	08 95       	ret

0000006a 
: int main(void) { 6a: ff cf rjmp .-2 ; 0x6a

or to put it another way:

	.section	.text.fcn,"ax",@progbits
.global	fcn
	.type	fcn, @function
fcn:
//==> {
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
//==> }
	ldi r24,lo8(1)
	ret
	.size	fcn, .-fcn

	.section	.text.startup.main,"ax",@progbits
.global	main
	.type	main, @function
main:
//==> {
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
.L3:
	rjmp .L3
	.size	main, .-main

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

midea31 wrote:

2. I know that everytime I enter a function, the non-static variables are "re-created" and they vanish when I exit the function. But looking at the lss, this is also true for loops? foo was re-initialized to zero.

They are block scope variables.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

hmm... thanks for the link to the avrlibc manual.

please correct me if wrong

> globals and statics (and volatile?) have a set space in RAM (bss?) and does not get reassigned to any other variables

> locals are pushed to the stack while still in use, and popped out when they expire

> but what about the registers? aren't they faster to access? shouldn't they be used first before assigning variables to RAM? or does the compiler know the frequently used variables and assign them to the registers?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Volatile is not a storage specifier, it has no bearing on where the variable is allocated, only how the compiler can optimize its use.

Your original sample was obviously compiled without optimization. It is a documented fact that GCC intentionally produces very verbose code in that case, typically loading and storing every value to memory between every operation. Experiment with the different optimization levels and compare the output.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

> globals and statics (and volatile?) have a set space in RAM (bss?) and does not get reassigned to any other variables

They have two or more sections in RAM. Anything that is global/static that you don't initialise (or init to 0) is placed in .bss and the compiler guarantees to wipe them all (__do_clear_bss) to 0 before main() starts (so you don't need = 0 on the definition of those vars in fact). Anything that you assign a non-zero initial value to is placed into the .data section and its initial value is placed in a copy of .data in flash (beyond the end of .text). When the code starts the compiler adds code (__do_copy_data) that block copies from flash to RAM the entire block of initial values. .data is placed at the start of RAM with .bss immediately following. As ander_m says this has nothing to do with "volatile". That's simply an instruction to the compiler to effectively avoid optimisation to say "whenever this variable is mentioned code must always be generated to access it". That's nothing to do with placement in RAM.

I gave you a link above:

Quote:

Suggest you read this:

http://www.nongnu.org/avr-libc/u...


I'm guessing you haven't read it yet. If you had you would have seen this extremely descriptive picture:

That shows you everything you need to know about variable placement in RAM. You can see that .data (cyan) is placed first in RAM and has a fixed size known at link time. This is followed by .bss (green) which again is known and fixed at link time. Everything to the right (above) in memory is then variable at run time. Not many embedded AVR programs use malloc()/free() and the heap memory but if they did you can see they use the "puce" coloured heap which increases in size from the end of .bss upwards. Meanwhile, at power on as almost the first thing done by the CRT the stack pointer is initialised to RAMEND. So as you call function and registers are pushed return addresses and register contents are pushed to the stack and SP descends towards the heap and .bss. In GCC there is just one stack so it's also the place where locals are created on entry to functions by the SP being adjusted downwards by the amount of memory required to hold the variables then, as already noted, Y is set to the base of this (actually the byte below) so that Y+1 to Y+n can be used to access those "stack frame variables" with Y actaing as the "frame pointer". When a function that created locals exits then first SP is adjusted back up by the amount it was reduced to hold the locals, then any PUSH'd registers are popped then the RET instruction finally takes the return address from the stack and puts it into PC.

Quote:

> locals are pushed to the stack while still in use, and popped out when they expire

So, yes, that's true.
Quote:

> but what about the registers? aren't they faster to access? shouldn't they be used first before assigning variables to RAM? or does the compiler know the frequently used variables and assign them to the registers?

This is the difference between -O0 (non optimized) and -O1/2/3/s code. If you leave the optimiser turned off you get exactly what you asked for. All locals ARE created on the stack and whenever the code accesses them there is code generated to read/write Y+n. Only a fool builds code with -O0 though (or someone testing the compilers basic code generation). Most users will elect to use at least -O1 and usually either -Os or -O3. When those are selected the optimizing part of the compiler goes out of its way to try and find ways to speed the code or reduce code size (which often means the same anyway). Any variable that isn't used simply isn't created. So if you write:

void foo(void) {
	int n = 12345;
	long l = 0xDEADBEEF;
	char buff[10];
	buff[9] = 0x55;
}

and build it with -O0 you get:

00000046 :
void foo(void) {
  46:	cf 93       	push	r28
  48:	df 93       	push	r29
  4a:	cd b7       	in	r28, 0x3d	; 61
  4c:	de b7       	in	r29, 0x3e	; 62
  4e:	60 97       	sbiw	r28, 0x10	; 16
  50:	0f b6       	in	r0, 0x3f	; 63
  52:	f8 94       	cli
  54:	de bf       	out	0x3e, r29	; 62
  56:	0f be       	out	0x3f, r0	; 63
  58:	cd bf       	out	0x3d, r28	; 61
	int n = 12345;
  5a:	89 e3       	ldi	r24, 0x39	; 57
  5c:	90 e3       	ldi	r25, 0x30	; 48
  5e:	9a 83       	std	Y+2, r25	; 0x02
  60:	89 83       	std	Y+1, r24	; 0x01
	long l = 0xDEADBEEF;
  62:	8f ee       	ldi	r24, 0xEF	; 239
  64:	9e eb       	ldi	r25, 0xBE	; 190
  66:	ad ea       	ldi	r26, 0xAD	; 173
  68:	be ed       	ldi	r27, 0xDE	; 222
  6a:	8b 83       	std	Y+3, r24	; 0x03
  6c:	9c 83       	std	Y+4, r25	; 0x04
  6e:	ad 83       	std	Y+5, r26	; 0x05
  70:	be 83       	std	Y+6, r27	; 0x06
	char buff[10];
	buff[9] = 0x55;
  72:	85 e5       	ldi	r24, 0x55	; 85
  74:	88 8b       	std	Y+16, r24	; 0x10
}
  76:	60 96       	adiw	r28, 0x10	; 16
  78:	0f b6       	in	r0, 0x3f	; 63
  7a:	f8 94       	cli
  7c:	de bf       	out	0x3e, r29	; 62
  7e:	0f be       	out	0x3f, r0	; 63
  80:	cd bf       	out	0x3d, r28	; 61
  82:	df 91       	pop	r29
  84:	cf 91       	pop	r28
  86:	08 95       	ret

while if you build that with -O1 or above you get:

00000046 :
void foo(void) {
  46:	08 95       	ret

The optimiser has recognised that none of those variables has any point whatsoever so there's no code to create them. As for speeding register/RAM access. Some code that actually does something is:

void foo(void) {
	uint8_t n;
	n = PINB;
	PORTC = n;
}

(it "does something" because PINB and PORTC are "volatile" so must be read and written). When built without optimisation you get:

foo:
//==> void foo(void) {
	push r28
	push r29
	push __zero_reg__
	in r28,__SP_L__
	in r29,__SP_H__
/* prologue: function */
/* frame size = 1 */
/* stack size = 3 */
.L__stack_usage = 3
//==> 	n = PINB;
	ldi r24,lo8(35)
	ldi r25,0
	movw r30,r24
	ld r24,Z
	std Y+1,r24
//==> 	PORTC = n;
	ldi r24,lo8(40)
	ldi r25,0
	ldd r18,Y+1
	movw r30,r24
	st Z,r18
/* epilogue start */
//==> }
	pop __tmp_reg__
	pop r29
	pop r28
	ret

In the prologue this push's __zero_reg__ (that is R1 which always contains 0) onto the stack to create "n". That is read in from the PINB location and stored at Y+1 (that is "n"). The next bit reads from Y+1 and writes it to the PORTC location.

Now see what happens when the code is built with -O1 or above:

foo:
//==> void foo(void) {
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
//==> 	n = PINB;
	in r24,0x3
//==> 	PORTC = n;
	out 0x8,r24
	ret

In effect "n" is never really created. It's just a synonym for "register 24" so the code reads from PINB to n using the most efficient opcode possible and then writes that immediately out to PORTC again using the efficient OUT opcode. No stack manipulation is required as "n" does not need to be created on the stack and the function is free to corrupt R24 anyway so it does not need to be PUSH/POP'd either.

It's true that if I could make up an example that had so many local variables that they couldn't all be kept in registers (with optimisation enabled) then the compiler would have no choice but to create some space on the stack too but even in functions that define a lot of variables at the top it could be the case, for example, that you define two for() loop variables "i" and "j" but you only use "i" near the top of the function and "j" towards the end. In this case the code would likely just re-use the single register (often R24) for the two different things and, again, there'd be no need to create i/j on the stack.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

hmmm...
I did read the link to the manual. But it would seem I cannot quite grasp a lot of concepts.
For example, you talk about variables having fixed size and being known at "link time" and then I have read other classifications of "time" e.g.compile time and run time.

Perhaps this has got to do with the compile process? For all I know, I just push F7 on AS6 and my C file is converted to hex or elf, which I could burn to my AVR.

Is there some resource you could point me that would describe these? Hopefully centered on the AVR.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

When you press F7 there are a number of processes that take place. for the 'compilation' there is three main ones:

preprocess
compile
link

Then there maybe another process to create the hex file

As for the different 'times' spoken of, a global or local static is 'known' at compile time and space allocated for it. At this point the space is just allocated but no address assigned to it. similarly for functions and other things. The linker 'knits' everything together and resolves the addresses. This is especially necessary when you have multiple files and for the C libraries and run time code.

Local vars, on the other hand are created at run time - when the code is actually running on your cpu - space is allocated on the stack by the code.

Since the linker 'knows' how much ram your cpu has, it can complain if your statically allocated variables exceed the ram size, dynamically allocated(ie locals and other space allocated at run time) are not known at link time so that is up to the programmer to manage.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Is there some resource you could point me that would describe these? Hopefully centered on the AVR.

Well the Wiki here is the slowest website on Earth and consequently unusable. Which is a shame. If you wait a bit for it to load and navigate to the right place you see this very succinct diagram:

(it just took me 2+ minutes to find that on the Wiki (and I knew where to look!) so it could take a while to download/draw too).

That shows what goes in one end and out the other. You pass .c/.cpp to the C/C++ compilers (avr-gcc/avr-g++) and they output .s assembler source files. There are bassed to avr-as which is the GNU assembler in the Binutils package. You can also pass your own Asm source files (.S files) directly to it as well. Each is assembled and output in ELF/Dwarf object files (.o extension). All these are passed to avr-ld the linker and it fixes up CALL/destination link from one .c/.cpp file to another (the .o files have all the destinations set to 0x0000 and a table saying what needs to be linked when they are first built). The linker also conglomerates all the variables putting all those with initial values into .data one after another and all those without initial values into .bss. It then writes the detail of .text (with all the joined code), .data and .bss (and a few other things) to a .elf file which, surprise, surprise also contains ELF/Dwarf just as the .o files did. Finally avr-objcopy is used to extract just (usually) .text and .data from the .elf file and write it in Intel Hex format as a .hex file which is what you program into the micro.

The sizes of variables are known when you use them in C/C++. A char is 1 byte, int/short is 2 bytes and long/float/double is 4 bytes. When the C compiler generates the .s file it puts in instructions to generate these variables with the right number of bytes and that then goes all the way through the process to the link stage so the linker can then add them all together, knowing their widths, to create the .data and .bss areas (if any).