[SOFT] [C] AVRGCC: Monitoring Stack Usage

Last post
34 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi folks,

This is my first ever posting here - I hope it is useful!

A common problem is wanting to know how much stack space a program uses. This can reveal how much valuable RAM is can be used for static variables, or whether the stack is growing too large and corrupting program data.

With AVR gcc, two symbols are defined by the linker that can make this easy. These are _end and __stack which define the first free byte of SRAM after program variables, and the starting address of the stack, respectively.

The stack starts at __stack, which is conventionally the highest byte of SRAM, and grows towards zero; _end will be somewhere between zero and __stack. If the stack ever falls below _end, it has almost certainly corrupted program data.

The following C declarations gain access to these linker symbols:

extern uint8_t _end;
extern uint8_t __stack;

Taking the address of these symbols (e.g. &_end) gives the memory addresses that bound the stack. The following function therefore 'paints' the stack with a known value:

void StackPaint(void) __attribute__ ((naked)) __attribute__ ((section (".init1")));

void StackPaint(void)
{
#if 0
    uint8_t *p = &_end;

    while(p <= &__stack)
    {
        *p = STACK_CANARY;
        p++;
    }
#else
    __asm volatile ("    ldi r30,lo8(_end)\n"
                    "    ldi r31,hi8(_end)\n"
                    "    ldi r24,lo8(0xc5)\n" /* STACK_CANARY = 0xc5 */
                    "    ldi r25,hi8(__stack)\n"
                    "    rjmp .cmp\n"
                    ".loop:\n"
                    "    st Z+,r24\n"
                    ".cmp:\n"
                    "    cpi r30,lo8(__stack)\n"
                    "    cpc r31,r25\n"
                    "    brlo .loop\n"
                    "    breq .loop"::);
#endif
}

This is declared in such a way that AVR-libc will execute the assembly before the program has started running or configured the stack. It also runs at a point before some of the normal runtime setup, hence assembly should be used as C maynot be fully reliable (this is discussed in the AVR libc manual).

The function itself simply fills the stack with 0xc5, the idea being that stack usage will overwrite this with some other value, hence making stack usage detectable.

Finally the following function can be used to count how many bytes of stack have not been overwritten:

uint16_t StackCount(void)
{
    const uint8_t *p = &_end;
    uint16_t       c = 0;

    while(*p == STACK_CANARY && p <= &__stack)
    {
        p++;
        c++;
    }

    return c;
}

This function can be called at any time to check how much stack space has never been over written. If it returns 0, you are probably in trouble as all the stack has been used, most likely destroying some program variables.

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nice idea and clean implementation. Thanks for posting.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the kind comment :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the great post.
I was interested in monitoring the stack size of a subset of functions in my application.
Is there anyway I can paint the stack only up to the current stack pointer?
So that I can paint it right before calling the function I want to monitor and then doing the count after calling the function.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

great idea, saved a copy in case I ever need it! :)

(hopefully you put it in the avrfreaks projects section too)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The code may be really usefull as long no use is made from Malloc

pre malloc free stack: 3469 bytes
post malloc free stack: 0 bytes

printf_P(PSTR("pre malloc free stack: %u bytes\r\n"), StackCount());
	//
	// Allocate memory for the volume
	//
	Volume = malloc(sizeof(s_VOLUME));
	if (Volume != NULL)
	{
		printf_P(PSTR("post malloc free stack: %u bytes\r\n"), StackCount());

MY MICROCONTROLLER CAN BEAT THE HELL OUT OF YOUR MICROCONTROLLER /ATMEL

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quite interesting!
But is it correct, that _end and __stack are uint8_t?
The bigger controllers have more than 256 bytes internal RAM.

Michael

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But he's not using what they contain (as they don't "contain" anything) but the addresses where they are pointing.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sweet! I will definately use this when trying to optimize. Every little bit helps. I mean I guess I could program asm style and be super efficient, but this could at least show me any gross C memory mismanagement. Thanks bro!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the post. Very useful!

-Brad

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MaxK wrote:
The code may be really usefull as long no use is made from Malloc

pre malloc free stack: 3469 bytes
post malloc free stack: 0 bytes

Yup - that is a short coming of the code. As illustrated in the AVR libc manual, the malloc arena starts at __bss_end: http://www.nongnu.org/avr-libc/user-manual/malloc.html

If you really need malloc, you could use the malloc tunables to setup the stack painting to avoid the malloc arena - probably not too difficult to do.

Regards,

Mike

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Could you please tell me what's going wrong?My stack usage is returning 5 bytes. I am using atmega169. I am filling my ram (0x100 to 0x4ff) with 0xaa value. Then I calculate my stack usuage. I have to find 3 consecutive occurrence of 0xaa values, so I can confirm that I reached the end of the stack. Is there any logical problem in finding 3 consecutive occurrence of 0xaa values in my implementation??Any help??

My asm:
#define MAX_MEM_ADDR 0x4FF

#define RAM_PATTERN ((unsigned char)0xAA)
/* The minimum number of times this pattern must occur consecutively in memory */
#define RAM_MIN_PATTERN 3
#define RAMSTART 0x0100

LDI R27, 0X04 ;upper ram address high byte
	ldI R26,0xff ;upper ram address low byte
	ldi R25,0X01 ;lower  ram address high  byte
	ldi R24,0X00 ;lower ram address low byte
	LDI R16,0xAA ; VALUE TO BE COPIED 

loop:
	st x,r16   ; STORE 0XAA VALUE IN RAM LOCATION 
	sbiw r26,1 ;DECREMENT THE RAM ADDRESS
	cp r26,r24  ;COMPARE THE LOW BYTES OF THE RAM  END AND STARTING  ADRDRESS
	cpc r27,r25 ;COMPARE THE HIGH BYTES OF THE RAM END AND STARTING  ADRDRESS THROUGH THE CARRY 
	brne loop ;	loop continues untill it becomes equal 


int GetStackByteCount(void)
{
	

	int StackCount = 0;
	unsigned char consecutive = 0;
	unsigned char  *p = NULL;

	unsigned char  *temp = NULL;



	for ((p = (unsigned char *)MAX_MEM_ADDR); (p >= (unsigned char *)RAMSTART); --p )
	{
		if( ((*p) == RAM_PATTERN) ) 
		{
			break;
		}

	}

	while( ( (p >= (unsigned char *)RAMSTART) && (p <= (unsigned char *)MAX_MEM_ADDR) ) )	
	{ 
		temp = p;

		if (((*temp) == RAM_PATTERN) )
		{
			StackCount++;
			consecutive++;

			if( ((*(temp + 1)) == RAM_PATTERN) )
			{
				StackCount++;
				consecutive++;

				if( ((*(temp -1)) == RAM_PATTERN) )
				{
					StackCount++;
					consecutive++ ;
				}

				else 
				{ 
					--p;
					consecutive = 0;
			    }
			}
			else
			{
				--p;
				consecutive = 0;
			}

		}
		else
		{
			--p;
			consecutive = 0;
		}

			
		if(consecutive  == RAM_MIN_PATTERN)
		{
			break;
		}
		
	}
	
	return StackCount;
}

MichaelMcTernan wrote:
Hi folks,

This is my first ever posting here - I hope it is useful!

A common problem is wanting to know how much stack space a program uses. This can reveal how much valuable RAM is can be used for static variables, or whether the stack is growing too large and corrupting program data.

With AVR gcc, two symbols are defined by the linker that can make this easy. These are _end and __stack which define the first free byte of SRAM after program variables, and the starting address of the stack, respectively.

The stack starts at __stack, which is conventionally the highest byte of SRAM, and grows towards zero; _end will be somewhere between zero and __stack. If the stack ever falls below _end, it has almost certainly corrupted program data.

The following C declarations gain access to these linker symbols:

extern uint8_t _end;
extern uint8_t __stack;

Taking the address of these symbols (e.g. &_end) gives the memory addresses that bound the stack. The following function therefore 'paints' the stack with a known value:

void StackPaint(void) __attribute__ ((naked)) __attribute__ ((section (".init1")));

void StackPaint(void)
{
#if 0
    uint8_t *p = &_end;

    while(p <= &__stack)
    {
        *p = STACK_CANARY;
        p++;
    }
#else
    __asm volatile ("    ldi r30,lo8(_end)\n"
                    "    ldi r31,hi8(_end)\n"
                    "    ldi r24,lo8(0xc5)\n" /* STACK_CANARY = 0xc5 */
                    "    ldi r25,hi8(__stack)\n"
                    "    rjmp .cmp\n"
                    ".loop:\n"
                    "    st Z+,r24\n"
                    ".cmp:\n"
                    "    cpi r30,lo8(__stack)\n"
                    "    cpc r31,r25\n"
                    "    brlo .loop\n"
                    "    breq .loop"::);
#endif
}

This is declared in such a way that AVR-libc will execute the assembly before the program has started running or configured the stack. It also runs at a point before some of the normal runtime setup, hence assembly should be used as C maynot be fully reliable (this is discussed in the AVR libc manual).

The function itself simply fills the stack with 0xc5, the idea being that stack usage will overwrite this with some other value, hence making stack usage detectable.

Finally the following function can be used to count how many bytes of stack have not been overwritten:

uint16_t StackCount(void)
{
    const uint8_t *p = &_end;
    uint16_t       c = 0;

    while(*p == STACK_CANARY && p <= &__stack)
    {
        p++;
        c++;
    }

    return c;
}

This function can be called at any time to check how much stack space has never been over written. If it returns 0, you are probably in trouble as all the stack has been used, most likely destroying some program variables.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Stealing Proteus doesn't make you an engineer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for this tutorial! Can this be used as it is for xmegas, or if not what needs to change? This is going to be very useful in my app as I need as much free SRAM as I can get for other buffers.

Mark.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The stack works the same in xmega as mega and tiny so the same principles apply but this is only a technique for monitoring usage, it won't magically free up SRAM - you do that by moving data to flash and EEPROM and reusing RAM when you can.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I understand the principles are the same, but does the code posted work unaltered for xmega parts? If not, what needs to change in the code?

I will use this to monitor and measure stack usage, add a little contigency just to be sure and set the stack accordingly. Of course I'll use the standard techniques to reduce SRAM usage.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What do you mean by "set the stack"? GCC does that for you setting it to RAMEND the only requirement on your part is to ensure it never descends as low as the end of .bss ( or .noinit if you use that)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
What do you mean by "set the stack"? GCC does that for you setting it to RAMEND the only requirement on your part is to ensure it never descends as low as the end of .bss ( or .noinit if you use that)

Yes, my bad. This is actually an LED display project and the more frame buffers I can allocate the better the system performance, so by 'set the stack' I actually meant make sure I don't allocate too many buffers such that the stack would over-write a buffer. The stack pointer would move down from the top of memory and my buffers would be allocated from the bottom up. I don't use the heap or malloc().

Mark.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nice work, I've pasted it into my project and it worked without a glitch. I'm trying to find out if my stack overflows due to nested interrupts. Hope this will help.

Kind regards.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

_end should be __bss_end, according to http://www.nongnu.org/avr-libc/user-manual/malloc.html. I found that in my code it only produced meaningful results if I used __bss_end. I am using an XMEGA if it makes any difference, I have not tested it with MEGA/TINY. I don't use any memory allocation functions.

Thanks for this really useful bit of code.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just to note that:

void StackPaint(void) __attribute__ ((naked)) __attribute__ ((section (".init1")));

void StackPaint(void)
{ 

can be written more simply as:

__attribute__ ((naked,section (".init1")))
void StackPaint(void)
{ 

Also note that .init3 is probably more appropriate for this and .init1.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sorry for dumb question, but how should I use this StackPaint()? I mean if I call it in the beginning of main(), it executes and then I get a call to main() as subroutine back and so on and i get an infinite cycle. What am I doing wrong?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Don't manually call it - the special "init1" code that Cliff mentions makes the function link into the C startup code, so that it will be automatically called on power on as part of the C startup automatically.

- Dean :twisted:

Make Atmel Studio better with my free extensions. Open source and feedback welcome!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@MichaelMcTernan

Thank you for this useful piece of code and clear instructions. 6 years since your post you will be pleased to know, the information you have provided is relevant for some applications :)

I am using Atmega 2560 and planning to use your code in my project but there is one thing I am not too clear about.

In your code you have mentioned that __end and __stack need to be defined at the linker level for the monitoring application to work as expected.

The way I have done this is via the AVR Studio v4.19 as follows:

1 - Go to Project Options > Memory Settings

2 - Under "Stack Settings" tick the "Specify Initial Stack Address" and specify the setting as 0x8021FF

3 - Then go to Project Options > Custom Options

4 - On the left window of the "Custom Compiler Options", locate [Linker Options] and left click it.

5 - On the window to the right following should be displayed already, which is one of the linker options that will be used during compilation:
-Wl,--defsym=__stack=0x8021FF

6 - Similarly, I added the following option on the same window which defines the __end at the linker level:
-Wl,--defsym=__end=0x800200

I then clean built my code and checked the generated map file and confirmed that the following appeared in that file:

Linker script and memory map
Address of section .data set to 0x800200
LOAD c:/winavr-20100110/bin/../lib/gcc/avr/4.3.3/../../../../avr/lib/avr6/crtm2560.o
0x008021ff __stack = 0x8021ff
0x00800200 __end = 0x800200

Does the above meet the criterion on the __stack and __end you have mentioned?

Your quick feedback will be greatly appreciated.

Regards...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

In your code you have mentioned that __end and __stack need to be defined at the linker level for the monitoring application to work as expected.

You have misunderstood. These symbols belong to the compiler/linker(*) (you can tell because the names start with two underscore) so these are not something you create, indeed you mustn't. They are generated by the linker and available to you for read-only access.

If you want to influence their value use -section-starts or change the linekr script or something. But don't mess with the linker's own symbols.

(just as a general note whenever you see a symbol who's name starts with one underscore it belongs to the C library. When you see one that starts with two underscore it belongs to the toolchain. This convention is to prevent name pollution. Your code can have a symbol called delay_ms(), the library can have _delay_ms() and the compiler/linker can have __delay_ms() and they won't clash. You should never create your own symbols that start _ or __)

(*) _end is from the linker script. In your case that will be avr6.x, the RAM layout is:

  .data	  : AT (ADDR (.text) + SIZEOF (.text))
  {
     PROVIDE (__data_start = .) ;
    /* --gc-sections will delete empty .data. This leads to wrong start
       addresses for subsequent sections because -Tdata= from the command
       line will have no effect, see PR13697.  Thus, keep .data  */
    KEEP (*(.data))
    *(.data*)
    *(.rodata)  /* We need to include .rodata here if gcc is used */
    *(.rodata*) /* with -fdata-sections.  */
    *(.gnu.linkonce.d*)
    . = ALIGN(2);
     _edata = . ;
     PROVIDE (__data_end = .) ;
  }  > data
  .bss   : AT (ADDR (.bss))
  {
     PROVIDE (__bss_start = .) ;
    *(.bss)
    *(.bss*)
    *(COMMON)
     PROVIDE (__bss_end = .) ;
  }  > data
   __data_load_start = LOADADDR(.data);
   __data_load_end = __data_load_start + SIZEOF(.data);
  /* Global data not cleared after reset.  */
  .noinit  :
  {
     PROVIDE (__noinit_start = .) ;
    *(.noinit*)
     PROVIDE (__noinit_end = .) ;
     _end = . ;
     PROVIDE (__heap_start = .) ;
  }  > data

Towards the end of that you can see that _end marks the end of the .noinit section.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wrote an Atmel Studio 6.1 extension to automate this:

http://gallery.atmel.com/Products/Details/c34c555d-86dc-4514-af24-013bc3fd2141?

- Dean :twisted:

Make Atmel Studio better with my free extensions. Open source and feedback welcome!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have just integrated the .c and the .h files into my project ant it all works very well. The device I am using is Atmega 2560 and defined _end and __stack.

I use the LCD on my system to monitor the remaining stack in real-time and it looks like I currently have 1400 bytes of headroom (total RAM available is 8192 bytes in my device).

I will leave this stack monitoring on my system that will help me make RAM usage more efficient.

@MichaelMcTernan Thank you once again for this extremely useful contribution.

Regards...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@clawson

Thank you for your clarifications. I thought by the following assignments I was merely assigning values rather than defining the symbols ( Though, I did correct __end as _end. Double underscore was my mistake! ).

-Wl,--defsym=__stack=0x8021FF
-Wl,--defsym=_end=0x800200

So to test that what happens if ı did not make this assignmenti I just removed the _end assignment above. I left the __stack assignment as this is provided as a valid user option in the linker configuration of AVR studio 4.19.

The code still worked as needed which means internally the _end assignment is done correctly for my device.

I presume the following line is where the _end is assigned its initial value in the avr6.x file
_end = . ;

Since I am not very familiar with the structure of the linker script files, I am unable to comment further. I think I will need to take a few things for granted for the time being.

One thing I am not clear on though is where is the __stack in the avr6.x file? The only reference to the keyword "stack" in that file is on the following line in that file:

KEEP (*(.init1))
*(.init2) /* Clear __zero_reg__, set up stack pointer. */

Regards...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

One thing I am not clear on though is where is the __stack in the avr6.x file?

Sit back and think for a moment. Why does _end actually have one under-score and __stack have two?

They are created/owned by different components. _end is a static symbol that is created when you link (usually the end address of .bss in fact as few people use .noinit).

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This (thread subject) works! Thanks a lot!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I did not get the original code to work when compiling with Arduino IDE 1.5.7. The StackPaint function did not fire at all.

 

I modified it slightly, which instead requires that you run StackPaint() manually.

 

extern uint8_t _end;

void StackPaint(void)
{
    uint8_t *p = &_end;

    while(p < (uint8_t*)&p)
    {
        *p = 0xc5;
        p++;
    }
} 

uint16_t StackCount(void)
{
    const uint8_t *p = &_end;
    uint16_t       c = 0;

    while(*p == 0xc5 && p < (uint8_t*)&p)
    {
        p++;
        c++;
    }

    return c;
} 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

which instead requires that you run StackPaint() manually.

How does that work? When you call StackPaint() isn't its return address obliterated?

 

The entire "trick" of this whole tutorial is the very fact that it's put into init1 and made "naked" that means:

 

a) it runs before .data and .bss are set up

b) it's just "fall through" code early in the CRT so no stack usage is made and it certainly is not CALL'd or RET'd from.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

How does that work? When you call StackPaint() isn't its return address obliterated?

 

The entire "trick" of this whole tutorial is the very fact that it's put into init1 and made "naked" that means:

 

a) it runs before .data and .bss are set up

b) it's just "fall through" code early in the CRT so no stack usage is made and it certainly is not CALL'd or RET'd from.

 

My modified StackPaint() function will work similarly to the original in that it starts at _end and works its way "up" toward the stack. It stops "painting" when it reaches the address of its own stack variable p. The return info in the stack is just above this point, and not touched.

 

I do not use malloc() at all so this should be safe.

 

Unfortunately the "naked" init1 "trick" did not work for me with Arduino IDE. I did not dive too deeply into the why at the time.

 

To make it malloc()-safe, you can replace _end with:

 

void StackPaint(void)
{
    extern uint8_t __brkval;
    extern uint8_t __heap_start;
    uint8_t *p = (__brkval == 0 ? &__heap_start : &__brkval);;

    while(p < (uint8_t*)&p)
    {
        *p = 0xc5;
        p++;
    }
}

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh I see you are starting beyond .bss and filling RAM to just below 'p' - I guess that if it's the last thing on the stack you should be OK. I thought you were doing what the originally presented routine was doing which would have obliterated the entire stack including the return address from StackPaint() (and .data and .bss).

 

I'd still explore why the .init1 stuff did not work in Arduino - there's no reason why it shouldn't work there.