[SOFT] [C] AVRGCC: Monitoring Stack Usage

Go To Last Post
45 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi folks,

This is my first ever posting here - I hope it is useful!

A common problem is wanting to know how much stack space a program uses. This can reveal how much valuable RAM is can be used for static variables, or whether the stack is growing too large and corrupting program data.

With AVR gcc, two symbols are defined by the linker that can make this easy. These are _end and __stack which define the first free byte of SRAM after program variables, and the starting address of the stack, respectively.

The stack starts at __stack, which is conventionally the highest byte of SRAM, and grows towards zero; _end will be somewhere between zero and __stack. If the stack ever falls below _end, it has almost certainly corrupted program data.

The following C declarations gain access to these linker symbols:

extern uint8_t _end;
extern uint8_t __stack;

Taking the address of these symbols (e.g. &_end) gives the memory addresses that bound the stack. The following function therefore 'paints' the stack with a known value:

void StackPaint(void) __attribute__ ((naked)) __attribute__ ((section (".init1")));

void StackPaint(void)
{
#if 0
    uint8_t *p = &_end;

    while(p <= &__stack)
    {
        *p = STACK_CANARY;
        p++;
    }
#else
    __asm volatile ("    ldi r30,lo8(_end)\n"
                    "    ldi r31,hi8(_end)\n"
                    "    ldi r24,lo8(0xc5)\n" /* STACK_CANARY = 0xc5 */
                    "    ldi r25,hi8(__stack)\n"
                    "    rjmp .cmp\n"
                    ".loop:\n"
                    "    st Z+,r24\n"
                    ".cmp:\n"
                    "    cpi r30,lo8(__stack)\n"
                    "    cpc r31,r25\n"
                    "    brlo .loop\n"
                    "    breq .loop"::);
#endif
}

This is declared in such a way that AVR-libc will execute the assembly before the program has started running or configured the stack. It also runs at a point before some of the normal runtime setup, hence assembly should be used as C maynot be fully reliable (this is discussed in the AVR libc manual).

The function itself simply fills the stack with 0xc5, the idea being that stack usage will overwrite this with some other value, hence making stack usage detectable.

Finally the following function can be used to count how many bytes of stack have not been overwritten:

uint16_t StackCount(void)
{
    const uint8_t *p = &_end;
    uint16_t       c = 0;

    while(*p == STACK_CANARY && p <= &__stack)
    {
        p++;
        c++;
    }

    return c;
}

This function can be called at any time to check how much stack space has never been over written. If it returns 0, you are probably in trouble as all the stack has been used, most likely destroying some program variables.

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Nice idea and clean implementation. Thanks for posting.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the kind comment :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the great post.
I was interested in monitoring the stack size of a subset of functions in my application.
Is there anyway I can paint the stack only up to the current stack pointer?
So that I can paint it right before calling the function I want to monitor and then doing the count after calling the function.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

great idea, saved a copy in case I ever need it! :)

(hopefully you put it in the avrfreaks projects section too)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The code may be really usefull as long no use is made from Malloc

pre malloc free stack: 3469 bytes
post malloc free stack: 0 bytes

printf_P(PSTR("pre malloc free stack: %u bytes\r\n"), StackCount());
	//
	// Allocate memory for the volume
	//
	Volume = malloc(sizeof(s_VOLUME));
	if (Volume != NULL)
	{
		printf_P(PSTR("post malloc free stack: %u bytes\r\n"), StackCount());

MY MICROCONTROLLER CAN BEAT THE HELL OUT OF YOUR MICROCONTROLLER /ATMEL

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quite interesting!
But is it correct, that _end and __stack are uint8_t?
The bigger controllers have more than 256 bytes internal RAM.

Michael

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But he's not using what they contain (as they don't "contain" anything) but the addresses where they are pointing.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sweet! I will definately use this when trying to optimize. Every little bit helps. I mean I guess I could program asm style and be super efficient, but this could at least show me any gross C memory mismanagement. Thanks bro!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for the post. Very useful!

-Brad

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MaxK wrote:
The code may be really usefull as long no use is made from Malloc

pre malloc free stack: 3469 bytes
post malloc free stack: 0 bytes

Yup - that is a short coming of the code. As illustrated in the AVR libc manual, the malloc arena starts at __bss_end: http://www.nongnu.org/avr-libc/u...

If you really need malloc, you could use the malloc tunables to setup the stack painting to avoid the malloc arena - probably not too difficult to do.

Regards,

Mike

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Could you please tell me what's going wrong?My stack usage is returning 5 bytes. I am using atmega169. I am filling my ram (0x100 to 0x4ff) with 0xaa value. Then I calculate my stack usuage. I have to find 3 consecutive occurrence of 0xaa values, so I can confirm that I reached the end of the stack. Is there any logical problem in finding 3 consecutive occurrence of 0xaa values in my implementation??Any help??

My asm:
#define MAX_MEM_ADDR 0x4FF

#define RAM_PATTERN ((unsigned char)0xAA)
/* The minimum number of times this pattern must occur consecutively in memory */
#define RAM_MIN_PATTERN 3
#define RAMSTART 0x0100

LDI R27, 0X04 ;upper ram address high byte
	ldI R26,0xff ;upper ram address low byte
	ldi R25,0X01 ;lower  ram address high  byte
	ldi R24,0X00 ;lower ram address low byte
	LDI R16,0xAA ; VALUE TO BE COPIED 

loop:
	st x,r16   ; STORE 0XAA VALUE IN RAM LOCATION 
	sbiw r26,1 ;DECREMENT THE RAM ADDRESS
	cp r26,r24  ;COMPARE THE LOW BYTES OF THE RAM  END AND STARTING  ADRDRESS
	cpc r27,r25 ;COMPARE THE HIGH BYTES OF THE RAM END AND STARTING  ADRDRESS THROUGH THE CARRY 
	brne loop ;	loop continues untill it becomes equal 


int GetStackByteCount(void)
{
	

	int StackCount = 0;
	unsigned char consecutive = 0;
	unsigned char  *p = NULL;

	unsigned char  *temp = NULL;



	for ((p = (unsigned char *)MAX_MEM_ADDR); (p >= (unsigned char *)RAMSTART); --p )
	{
		if( ((*p) == RAM_PATTERN) ) 
		{
			break;
		}

	}

	while( ( (p >= (unsigned char *)RAMSTART) && (p <= (unsigned char *)MAX_MEM_ADDR) ) )	
	{ 
		temp = p;

		if (((*temp) == RAM_PATTERN) )
		{
			StackCount++;
			consecutive++;

			if( ((*(temp + 1)) == RAM_PATTERN) )
			{
				StackCount++;
				consecutive++;

				if( ((*(temp -1)) == RAM_PATTERN) )
				{
					StackCount++;
					consecutive++ ;
				}

				else 
				{ 
					--p;
					consecutive = 0;
			    }
			}
			else
			{
				--p;
				consecutive = 0;
			}

		}
		else
		{
			--p;
			consecutive = 0;
		}

			
		if(consecutive  == RAM_MIN_PATTERN)
		{
			break;
		}
		
	}
	
	return StackCount;
}

MichaelMcTernan wrote:
Hi folks,

This is my first ever posting here - I hope it is useful!

A common problem is wanting to know how much stack space a program uses. This can reveal how much valuable RAM is can be used for static variables, or whether the stack is growing too large and corrupting program data.

With AVR gcc, two symbols are defined by the linker that can make this easy. These are _end and __stack which define the first free byte of SRAM after program variables, and the starting address of the stack, respectively.

The stack starts at __stack, which is conventionally the highest byte of SRAM, and grows towards zero; _end will be somewhere between zero and __stack. If the stack ever falls below _end, it has almost certainly corrupted program data.

The following C declarations gain access to these linker symbols:

extern uint8_t _end;
extern uint8_t __stack;

Taking the address of these symbols (e.g. &_end) gives the memory addresses that bound the stack. The following function therefore 'paints' the stack with a known value:

void StackPaint(void) __attribute__ ((naked)) __attribute__ ((section (".init1")));

void StackPaint(void)
{
#if 0
    uint8_t *p = &_end;

    while(p <= &__stack)
    {
        *p = STACK_CANARY;
        p++;
    }
#else
    __asm volatile ("    ldi r30,lo8(_end)\n"
                    "    ldi r31,hi8(_end)\n"
                    "    ldi r24,lo8(0xc5)\n" /* STACK_CANARY = 0xc5 */
                    "    ldi r25,hi8(__stack)\n"
                    "    rjmp .cmp\n"
                    ".loop:\n"
                    "    st Z+,r24\n"
                    ".cmp:\n"
                    "    cpi r30,lo8(__stack)\n"
                    "    cpc r31,r25\n"
                    "    brlo .loop\n"
                    "    breq .loop"::);
#endif
}

This is declared in such a way that AVR-libc will execute the assembly before the program has started running or configured the stack. It also runs at a point before some of the normal runtime setup, hence assembly should be used as C maynot be fully reliable (this is discussed in the AVR libc manual).

The function itself simply fills the stack with 0xc5, the idea being that stack usage will overwrite this with some other value, hence making stack usage detectable.

Finally the following function can be used to count how many bytes of stack have not been overwritten:

uint16_t StackCount(void)
{
    const uint8_t *p = &_end;
    uint16_t       c = 0;

    while(*p == STACK_CANARY && p <= &__stack)
    {
        p++;
        c++;
    }

    return c;
}

This function can be called at any time to check how much stack space has never been over written. If it returns 0, you are probably in trouble as all the stack has been used, most likely destroying some program variables.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Stealing Proteus doesn't make you an engineer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for this tutorial! Can this be used as it is for xmegas, or if not what needs to change? This is going to be very useful in my app as I need as much free SRAM as I can get for other buffers.

Mark.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The stack works the same in xmega as mega and tiny so the same principles apply but this is only a technique for monitoring usage, it won't magically free up SRAM - you do that by moving data to flash and EEPROM and reusing RAM when you can.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I understand the principles are the same, but does the code posted work unaltered for xmega parts? If not, what needs to change in the code?

I will use this to monitor and measure stack usage, add a little contigency just to be sure and set the stack accordingly. Of course I'll use the standard techniques to reduce SRAM usage.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What do you mean by "set the stack"? GCC does that for you setting it to RAMEND the only requirement on your part is to ensure it never descends as low as the end of .bss ( or .noinit if you use that)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
What do you mean by "set the stack"? GCC does that for you setting it to RAMEND the only requirement on your part is to ensure it never descends as low as the end of .bss ( or .noinit if you use that)

Yes, my bad. This is actually an LED display project and the more frame buffers I can allocate the better the system performance, so by 'set the stack' I actually meant make sure I don't allocate too many buffers such that the stack would over-write a buffer. The stack pointer would move down from the top of memory and my buffers would be allocated from the bottom up. I don't use the heap or malloc().

Mark.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Nice work, I've pasted it into my project and it worked without a glitch. I'm trying to find out if my stack overflows due to nested interrupts. Hope this will help.

Kind regards.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

_end should be __bss_end, according to http://www.nongnu.org/avr-libc/u.... I found that in my code it only produced meaningful results if I used __bss_end. I am using an XMEGA if it makes any difference, I have not tested it with MEGA/TINY. I don't use any memory allocation functions.

Thanks for this really useful bit of code.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just to note that:

void StackPaint(void) __attribute__ ((naked)) __attribute__ ((section (".init1")));

void StackPaint(void)
{ 

can be written more simply as:

__attribute__ ((naked,section (".init1")))
void StackPaint(void)
{ 

Also note that .init3 is probably more appropriate for this and .init1.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sorry for dumb question, but how should I use this StackPaint()? I mean if I call it in the beginning of main(), it executes and then I get a call to main() as subroutine back and so on and i get an infinite cycle. What am I doing wrong?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Don't manually call it - the special "init1" code that Cliff mentions makes the function link into the C startup code, so that it will be automatically called on power on as part of the C startup automatically.

- Dean :twisted:

Make Atmel Studio better with my free extensions. Open source and feedback welcome!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@MichaelMcTernan

Thank you for this useful piece of code and clear instructions. 6 years since your post you will be pleased to know, the information you have provided is relevant for some applications :)

I am using Atmega 2560 and planning to use your code in my project but there is one thing I am not too clear about.

In your code you have mentioned that __end and __stack need to be defined at the linker level for the monitoring application to work as expected.

The way I have done this is via the AVR Studio v4.19 as follows:

1 - Go to Project Options > Memory Settings

2 - Under "Stack Settings" tick the "Specify Initial Stack Address" and specify the setting as 0x8021FF

3 - Then go to Project Options > Custom Options

4 - On the left window of the "Custom Compiler Options", locate [Linker Options] and left click it.

5 - On the window to the right following should be displayed already, which is one of the linker options that will be used during compilation:
-Wl,--defsym=__stack=0x8021FF

6 - Similarly, I added the following option on the same window which defines the __end at the linker level:
-Wl,--defsym=__end=0x800200

I then clean built my code and checked the generated map file and confirmed that the following appeared in that file:

Linker script and memory map
Address of section .data set to 0x800200
LOAD c:/winavr-20100110/bin/../lib/gcc/avr/4.3.3/../../../../avr/lib/avr6/crtm2560.o
0x008021ff __stack = 0x8021ff
0x00800200 __end = 0x800200

Does the above meet the criterion on the __stack and __end you have mentioned?

Your quick feedback will be greatly appreciated.

Regards...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

In your code you have mentioned that __end and __stack need to be defined at the linker level for the monitoring application to work as expected.

You have misunderstood. These symbols belong to the compiler/linker(*) (you can tell because the names start with two underscore) so these are not something you create, indeed you mustn't. They are generated by the linker and available to you for read-only access.

If you want to influence their value use -section-starts or change the linekr script or something. But don't mess with the linker's own symbols.

(just as a general note whenever you see a symbol who's name starts with one underscore it belongs to the C library. When you see one that starts with two underscore it belongs to the toolchain. This convention is to prevent name pollution. Your code can have a symbol called delay_ms(), the library can have _delay_ms() and the compiler/linker can have __delay_ms() and they won't clash. You should never create your own symbols that start _ or __)

(*) _end is from the linker script. In your case that will be avr6.x, the RAM layout is:

  .data	  : AT (ADDR (.text) + SIZEOF (.text))
  {
     PROVIDE (__data_start = .) ;
    /* --gc-sections will delete empty .data. This leads to wrong start
       addresses for subsequent sections because -Tdata= from the command
       line will have no effect, see PR13697.  Thus, keep .data  */
    KEEP (*(.data))
    *(.data*)
    *(.rodata)  /* We need to include .rodata here if gcc is used */
    *(.rodata*) /* with -fdata-sections.  */
    *(.gnu.linkonce.d*)
    . = ALIGN(2);
     _edata = . ;
     PROVIDE (__data_end = .) ;
  }  > data
  .bss   : AT (ADDR (.bss))
  {
     PROVIDE (__bss_start = .) ;
    *(.bss)
    *(.bss*)
    *(COMMON)
     PROVIDE (__bss_end = .) ;
  }  > data
   __data_load_start = LOADADDR(.data);
   __data_load_end = __data_load_start + SIZEOF(.data);
  /* Global data not cleared after reset.  */
  .noinit  :
  {
     PROVIDE (__noinit_start = .) ;
    *(.noinit*)
     PROVIDE (__noinit_end = .) ;
     _end = . ;
     PROVIDE (__heap_start = .) ;
  }  > data

Towards the end of that you can see that _end marks the end of the .noinit section.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wrote an Atmel Studio 6.1 extension to automate this:

http://gallery.atmel.com/Product...?

- Dean :twisted:

Make Atmel Studio better with my free extensions. Open source and feedback welcome!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have just integrated the .c and the .h files into my project ant it all works very well. The device I am using is Atmega 2560 and defined _end and __stack.

I use the LCD on my system to monitor the remaining stack in real-time and it looks like I currently have 1400 bytes of headroom (total RAM available is 8192 bytes in my device).

I will leave this stack monitoring on my system that will help me make RAM usage more efficient.

@MichaelMcTernan Thank you once again for this extremely useful contribution.

Regards...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@clawson

Thank you for your clarifications. I thought by the following assignments I was merely assigning values rather than defining the symbols ( Though, I did correct __end as _end. Double underscore was my mistake! ).

-Wl,--defsym=__stack=0x8021FF
-Wl,--defsym=_end=0x800200

So to test that what happens if ı did not make this assignmenti I just removed the _end assignment above. I left the __stack assignment as this is provided as a valid user option in the linker configuration of AVR studio 4.19.

The code still worked as needed which means internally the _end assignment is done correctly for my device.

I presume the following line is where the _end is assigned its initial value in the avr6.x file
_end = . ;

Since I am not very familiar with the structure of the linker script files, I am unable to comment further. I think I will need to take a few things for granted for the time being.

One thing I am not clear on though is where is the __stack in the avr6.x file? The only reference to the keyword "stack" in that file is on the following line in that file:

KEEP (*(.init1))
*(.init2) /* Clear __zero_reg__, set up stack pointer. */

Regards...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

One thing I am not clear on though is where is the __stack in the avr6.x file?

Sit back and think for a moment. Why does _end actually have one under-score and __stack have two?

They are created/owned by different components. _end is a static symbol that is created when you link (usually the end address of .bss in fact as few people use .noinit).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This (thread subject) works! Thanks a lot!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I did not get the original code to work when compiling with Arduino IDE 1.5.7. The StackPaint function did not fire at all.

 

I modified it slightly, which instead requires that you run StackPaint() manually.

 

extern uint8_t _end;

void StackPaint(void)
{
    uint8_t *p = &_end;

    while(p < (uint8_t*)&p)
    {
        *p = 0xc5;
        p++;
    }
} 

uint16_t StackCount(void)
{
    const uint8_t *p = &_end;
    uint16_t       c = 0;

    while(*p == 0xc5 && p < (uint8_t*)&p)
    {
        p++;
        c++;
    }

    return c;
} 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

which instead requires that you run StackPaint() manually.

How does that work? When you call StackPaint() isn't its return address obliterated?

 

The entire "trick" of this whole tutorial is the very fact that it's put into init1 and made "naked" that means:

 

a) it runs before .data and .bss are set up

b) it's just "fall through" code early in the CRT so no stack usage is made and it certainly is not CALL'd or RET'd from.

Last Edited: Mon. Sep 15, 2014 - 10:20 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

How does that work? When you call StackPaint() isn't its return address obliterated?

 

The entire "trick" of this whole tutorial is the very fact that it's put into init1 and made "naked" that means:

 

a) it runs before .data and .bss are set up

b) it's just "fall through" code early in the CRT so no stack usage is made and it certainly is not CALL'd or RET'd from.

 

My modified StackPaint() function will work similarly to the original in that it starts at _end and works its way "up" toward the stack. It stops "painting" when it reaches the address of its own stack variable p. The return info in the stack is just above this point, and not touched.

 

I do not use malloc() at all so this should be safe.

 

Unfortunately the "naked" init1 "trick" did not work for me with Arduino IDE. I did not dive too deeply into the why at the time.

 

To make it malloc()-safe, you can replace _end with:

 

void StackPaint(void)
{
    extern uint8_t __brkval;
    extern uint8_t __heap_start;
    uint8_t *p = (__brkval == 0 ? &__heap_start : &__brkval);;

    while(p < (uint8_t*)&p)
    {
        *p = 0xc5;
        p++;
    }
}

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh I see you are starting beyond .bss and filling RAM to just below 'p' - I guess that if it's the last thing on the stack you should be OK. I thought you were doing what the originally presented routine was doing which would have obliterated the entire stack including the return address from StackPaint() (and .data and .bss).

 

I'd still explore why the .init1 stuff did not work in Arduino - there's no reason why it shouldn't work there.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm a bit late to this thread, almost 9 years on but thank you for this terrific piece of code, works brilliantly!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is there a better/modern way to do this? It doesn't work at all with avr-gcc 5.3 xmega. I tried with _end and __bss_end... StackCount always returns 0 even with an empty main().

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Post an LSS for the code that you are building.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks! Got it working. The prototype for StackPaint() can't be in a header. Not that it needs to be but I guess gcc ain't that smart and even the gcc docs don't mention this?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello,

 

could someone help me in translating the ASM code for the AVR32? The opcodes are not the same, and while I have a clear understanding of the memory model, my assembler is very rusty...

 

Just for reference:

This is the AVR8 ASM documentation: http://www.atmel.com/webdoc/avra...

This is the AVR32 Architecture manual, which contains a full description of the new ASM: http://www.atmel.com/Images/doc3...

(I wasn't able to find an equivalent web version of the ASM documentation, like they have for AVR8).

 

This is the original ASM from the first post:

#if 0
    uint8_t *p = &_end;
    while(p <= &__stack) {
        *p = STACK_CANARY;
        p++;
    }
#else
    __asm volatile ("    ldi r30,lo8(_end)\n"
                    "    ldi r31,hi8(_end)\n"
                    "    ldi r24,lo8(0xc5)\n" /* STACK_CANARY = 0xc5 */
                    "    ldi r25,hi8(__stack)\n"
                    "    rjmp .cmp\n"
                    ".loop:\n"
                    "    st Z+,r24\n"
                    ".cmp:\n"
                    "    cpi r30,lo8(__stack)\n"
                    "    cpc r31,r25\n"
                    "    brlo .loop\n"
                    "    breq .loop"::);
#endif

 

And to be honest, I don't really get how it will reach the same condition as expressed in the C while() loop. All I see is the comparison block:

cpi r30,lo8(__stack)
cpc r31,r25
brlo .loop
breq .loop

which just compares __stack and _end once and again, without really updating the comparison terms (like in the C code, the p pointer gets updated), so how is this not an infinite loop? That would be a very nice firs step :-)

 

After that, I need to find the equivalent instructions for those initial "LDI"'s. Which in AVR32 seem to be the "MOV" operator, but I'm not sure why the documentation seems to imply that only immediates of either 8 or 21 bits can be moved... the register are 32-bit wide so I expected to load the whole addresses straight into one register (instead of dividing them in two parts like in the original code).

 

I think it would be a nice addition to have this code translated for AVR32, in conjunction with the original one for AVR8, so any help will be welcome!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But the code doesn't need to be written in Asm in the first place? Just do it in C. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The original post didn't include all the relevant info, but it is included in the attached files. In the .c file, there is a comment saying so:

Fill the stack space with a known pattern.
This fills all stack bytes with the 'canary' pattern to allow stack usage
to be later estimated.  This runs in the .init1 section, before normal
stack initialisation and unfortunately before __zero_reg__ has been
setup.  The C code is therefore replaced with inline assembly to ensure
the zero reg is not used by compiled code.

 

So what I understand from that is that it is safer to write the code in assembly. In fact, I've just compiled the C code, and these are the instructions generated (from the LSS file):

80002004  lddpc R8, 0x80002018
80002006  lddpc R9, 0x8000201c
80002008  cp.w R9, R8
8000200A  retcc R12
8000200C  mov R8, R9
8000200E  mov R10, -59
80002010  lddpc R9, 0x80002018
80002012  st.b R8++ R10
80002014  cp.w R8, R9
80002016  brcs 0x80002012
80002018  add R1, R0
8000201A  add R0, R0
8000201C  add R0, R0

so it is true that the generated assembly makes use of the zero register.

 

In any case, based on what I've learned from the LSS file, I've been able to translate the original assembly to AVR32:

void StackPaint(void) __attribute__ ((naked)) __attribute__ ((section (".init")));
void StackPaint(void)
{
#if 0
    uint8_t *p = &_stack;
    while (p < &_estack) {
        *p = STACK_CANARY;
        ++p;
    }
#else
    __asm__ volatile (
        "    mov R8, lo(_stack)\n"  // p = &_stack
        "    orh R8, hi(_stack)\n"
        "    mov R9, lo(_estack)\n" // q = &_estack
        "    orh R9, hi(_estack)\n"
        "    mov R10, lo(0xC6)\n"   // v = STACK_CANARY = 0xC5
        "    rjmp .cmp\n"
        ".loop:\n"
        "    st.b R8++, R10\n"      // *(p++) = v
        ".cmp:\n"
        "    cp.w R8, R9\n"         // c = p < q
        "    brcs .loop"            // if (c) goto loop
        );            
#endif
}

 

I think this would work, however I'm not able to try it! The AVR32 start-up process doesn't seem to have the old .init1 or .init3 sections, instead there is (among others) one .init which is unused, and totally ignored by the start-up code found on the trampoline_uc3.S which comes with all ASF projects.

This file contains the following code:

.section  .reset, "ax", @progbits


  .global _trampoline
  .type _trampoline, @function
_trampoline:
  // Jump to program start.
  rjmp    program_start

  .org  PROGRAM_START_OFFSET
program_start:
  // Jump to the C runtime startup routine.
  lda.w   pc, _stext

and the code in _stext is similar to the old .init2, it initializes the stack and other parameters (such as interrupts, exceptions, .bss, etc).

 

So how to make the .init section to be executed at startup?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Good news:

I've been able to run the code and it works perfectly. The not so good news are that, to do so, I had to manually edit one of the start-up files which get automatically generated by Atmel Studio on the project...

 

Canary function for AVR32 UC3:

/* Fill the stack space with a known pattern.
 * This fills all stack bytes with the 'canary' pattern to later allow
 * estimation of maximum stack usage.
 * This runs from the .init section, before normal stack initialization
 * and unfortunately before __zero_reg__ has been setup. The C code is
 * therefore replaced with inline assembly to ensure the zero reg is not
 * used by compiled code.
 *
 * This function must be called before the .text._stext section, which is
 * where all the stack initialization code resides. To accomplish that,
 * the start-up trampoline file (trampoline_uc3.S) must be modified, and
 * this function will be the one on charge of jumping to .text._stext after
 * finishing.
 *
 * Edit trampoline_uc3.S so it reads like this:
 * program_start:
 *     rjmp mem_canary_init
 *
 * To check that this function is working, just use the Debug mode called
 * "Start Debugging and Break", and open a Memory View on the address
 * 0x0000_FFFF (end of the SRAM space), so see that the canary value has
 * been written.
 */
void mem_canary_init(void) __attribute__((naked, section(".init")));
void mem_canary_init(void)
{
#if 0
    uint8_t* p = &_stack;
    while (p < &_estack) {
        *p = STACK_CANARY;
        ++p;
    }
#else
    __asm__ volatile (
        "    mov R8, lo(_stack)\n"  // p = &_stack
        "    orh R8, hi(_stack)\n"
        "    mov R9, lo(_estack)\n" // q = &_estack
        "    orh R9, hi(_estack)\n"
        "    mov R10, lo(0xC5)\n"   // v = STACK_CANARY = 0xC5
        "    rjmp .cmp\n"
        ".loop:\n"
        "    st.b R8++, R10\n"      // *(p++) = v
        ".cmp:\n"
        "    cp.w R8, R9\n"         // c = p < q
        "    brcs .loop\n"          // if (c) goto loop
        "    lda.w pc, _stext"      // Jump to the C runtime startup routine
        );            
#endif
}

 

Modified trampoline file (src\ASF\avr32\utils\startup\trampoline_uc3.S):

  // This must be linked @ 0x80000000 if it is to be run upon reset.
  .section  .reset, "ax", @progbits


  .global _trampoline
  .type _trampoline, @function
_trampoline:
  // Jump to program start.
  rjmp    program_start

  .org  PROGRAM_START_OFFSET
program_start:
  // Jump to the C runtime startup routine.
  //lda.w pc, _stext    // <----------- CHANGED HERE
  rjmp mem_canary_init  // <-----------|

 

 

Now it works flawlessly, but there is something to improve: I'd love to avoid having to modify auto-generated files. The beauty of the original solution on the first post is that it works as-is by just placing that function in any .c file. It just inserts itself into the .init1 section, which is already run by default.

 

I'd like to achieve the same for this function. Could anyone throw a comment on this?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I just picked an AVR32 (admittedly an old installation!) linker script file at random and it contains this:

  .interp         : { *(.interp) } >FLASH AT>FLASH
  .reset : {  *(.reset) } >FLASH AT>FLASH

...

  .init           :
  {
    KEEP (*(.init))
  } >FLASH AT>FLASH =0xd703d703
  .plt            : { *(.plt) } >FLASH AT>FLASH
  .text           :
  {
    *(.text .stub .text.* .gnu.linkonce.t.*)
    KEEP (*(.text.*personality*))
    /* .gnu.warning sections are handled specially by elf32.em.  */
    *(.gnu.warning)
  } >FLASH AT>FLASH =0xd703d703
  .fini           :
  {
    KEEP (*(.fini))
  } >FLASH AT>FLASH =0xd703d703

So just as you use .init1 or .init3 in the AVR8 case to insert code "before" .text which can you not define your code in a .canary.init or similar to have it linked before the .text stuff? What you are doing at present is simply using the fact that the trampoline code appears to be in section ".reset"

 

Presumably the use of .init and .fini sections in AVR32 is documented somewhere?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Update: (Studio 7): The current end of the heap, as set by malloc(), is contained in __brkval;

modifying StackCount as following:

extern uint8_t *__brkval;

 

uint16_t StackCount(void)
{
  const uint8_t *p = __brkval;
  uint16_t       c = 0;

  while(*p == STACK_CANARY && p <= &__stack)
  {
    p++;
    c++;
  }

  return c;
}

 after a call to malloc() or calloc() will calculate the memory between the current top of the heap and the top of the stack.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I use __heap_start for avr-gcc.  Hard to go wrong.  Works fine for me.

 

I don't use malloc because I don't have an overwhelming desire to churn the heap.  You can allocate heap memory with a few lines of code as long as you don't want to churn the heap. That is, you don't use free().

 

I can see my RAM usage on the PC.  I rarely look at it because my Xmega has much more RAM than I've ever used. 

I have a habit of displaying numbers in hex.  That's what the computer uses, and I don't want to overwork the computer.  wink  In this case I decided to display the free RAM in decimal too.