Potentially dangerous optimization

Go To Last Post
4 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I ran into a problem trying to run the "Etherrape" software (I didn't choose the name) on an Atmega324p, which is smaller than the Atmega644 the software normally runs on. As a result, there isn't much stack space available and the program behaved very erratically until I figured out what happened.

WinAvr20071221 adds an optimization (for any mode except -O0) to accumulate "stack corrections" and only do the actual stack correction at the end of a region of straight line code. This happens when using the "printf_P" function which uses a stack interface for parameters (as opposed to a register interface). As a result, if you do a sequence of printf_P statements, each requiring at least a 2 byte correction to the Stack Pointer (SP) after the call, this correction is not done until the end, even if other (external) functions are called in between. If there are 8 calls to printf_P, there is a correction of 16 bytes at the end. This is correct, but it means that the program has less stack space (16 bytes) available until the correction happens, which may make the difference between run or crash when SRAM is tight.
If you do printf_P calls with many arguments, the situation gets a lot worse.
Maybe there is a hard to find optimization option to change this behavior. Please let me know if there is.

Here is a test program:

#include 
#include 
#include 
#include 

/* use 115200 baud at 20mhz  */
#define DEBUG_UART_UBRR 10

#define noinline __attribute__((noinline))
#define HI8(x)  ((uint8_t)((x) >> 8))
#define LO8(x)  ((uint8_t)(x))

void DEBUG_INIT_UART(void);
int noinline debug_uart_put(char d, FILE *stream);

void noinline foo(void);

int main(void)
{

	DEBUG_INIT_UART();

	printf_P(PSTR("hello world\n"));
	printf_P(PSTR("hello world\n"));
	printf_P(PSTR("hello world\n"));
	printf_P(PSTR("hello world\n"));
	foo();
	printf_P(PSTR("hello world\n"));
	printf_P(PSTR("hello world\n"));
	printf_P(PSTR("hello world\n"));
	printf_P(PSTR("hello world\n"));

	return (1);
}

void DEBUG_INIT_UART()
/* {{{ */ {

    /* set baud rate */
    UBRR0H = HI8(DEBUG_UART_UBRR);
    UBRR0L = LO8(DEBUG_UART_UBRR);

    /* set mode */
	UCSR0C = _BV(UCSZ00) | _BV(UCSZ01);

    /* enable transmitter and receiver */
    UCSR0B = _BV(TXEN0) | _BV(RXEN0);

    /* open stdout/stderr */
    fdevopen(debug_uart_put, NULL);

} /* }}} */

int noinline debug_uart_put(char d, FILE *stream)
/* {{{ */ {

    if (d == '\n')
        debug_uart_put('\r', stream);

    while (!(UCSR0A & _BV(UDRIE0)));
    UDR0 = d;

    return 0;

} /* }}} */

Here is a dissassemly listing (relevant part only):

 1a6:	0e 94 f8 00 	call	0x1f0	; 0x1f0 
 1aa:	83 eb       	ldi	r24, 0xB3	; 179
 1ac:	90 e0       	ldi	r25, 0x00	; 0
 1ae:	9f 93       	push	r25
 1b0:	8f 93       	push	r24
 1b2:	0e 94 47 01 	call	0x28e	; 0x28e 
 1b6:	86 ea       	ldi	r24, 0xA6	; 166
 1b8:	90 e0       	ldi	r25, 0x00	; 0
 1ba:	9f 93       	push	r25
 1bc:	8f 93       	push	r24
 1be:	0e 94 47 01 	call	0x28e	; 0x28e 
 1c2:	89 e9       	ldi	r24, 0x99	; 153
 1c4:	90 e0       	ldi	r25, 0x00	; 0
 1c6:	9f 93       	push	r25
 1c8:	8f 93       	push	r24
 1ca:	0e 94 47 01 	call	0x28e	; 0x28e 
 1ce:	8c e8       	ldi	r24, 0x8C	; 140
 1d0:	90 e0       	ldi	r25, 0x00	; 0
 1d2:	9f 93       	push	r25
 1d4:	8f 93       	push	r24
 1d6:	0e 94 47 01 	call	0x28e	; 0x28e 
 1da:	8d b7       	in	r24, 0x3d	; 61 SPL
 1dc:	9e b7       	in	r25, 0x3e	; 62 SPH
 1de:	40 96       	adiw	r24, 0x10	; 16
 1e0:	0f b6       	in	r0, 0x3f	; 63 SREG
 1e2:	f8 94       	cli
 1e4:	9e bf       	out	0x3e, r25	; 62 SPH
 1e6:	0f be       	out	0x3f, r0	; 63 SREG
 1e8:	8d bf       	out	0x3d, r24	; 61 SPL

It's also interesting to see that the SREG register is restored (possibly re-enabling interrupts) before the SPL value is restored. I hope no immediate interrupt can happen since the SP value would be invalid.

jrseattle
oscilloscopeclock.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

One instruction is guaranteed to execute after the instruction that enables interrupts, so SP will be valid.
Dave Raymond

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

-fno-defer-pop

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
It's also interesting to see that the SREG register is restored (possibly re-enabling interrupts) before the SPL value is restored.

This is untrue. It is guaranteed that at least one instruction will be run after the global interrupt flag is set. From the instruction set datasheet:
Quote:
The instruction following the SEI will be executed before any pending interrupts.

Quote:
If you do printf_P calls with many arguments, the situation gets a lot worse.

Possibly not too much worse. ADIW can only add up to 63. We would have to see what would happen when that limit was exceeded. It is possible that the optimizer would do multiple adjustments in that case.

Regards,
Steve A.

The Board helps those that help themselves.