How can I tell GCC to use STD instead of STS

Go To Last Post
5 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm using the standard WinAVR makefile (-Os) and WinAVR20080512.

Question 1:
I have a small application which uses struct data (located in SRAM) which needs to be reinitialized peridically:

typedef struct
{
    uint16_t max;
    uint16_t upper_warning_limit;
    uint32_t upper_warning_count;
    uint16_t min;
    uint16_t lower_warning_limit;
    uint32_t lower_warning_count;
    uint32_t sum;
}
pulse_data_t;


struct input_capture_data_t
{
    uint32_t pulse_count;
    
    pulse_data_t high;
    pulse_data_t low;
}
input_capture;


input_capture.pulse_count = 0;

input_capture.high.max = 0;
input_capture.high.upper_warning_count = 0;
input_capture.high.min = 0xFFFF;
input_capture.high.lower_warning_count = 0;
input_capture.high.sum = 0;

input_capture.low.max = 0;
input_capture.low.upper_warning_count = 0;
input_capture.low.min = 0xFFFF;
input_capture.low.lower_warning_count = 0;
input_capture.low.sum = 0;

I'm compiling with -Os so I expected GCC will initialize a pointer and then access the SRAM via the STD instruction which only takes 2 bytes of program memory (per instruction) while the STS instruction takes 4 bytes. But it doesn't.

So I declared a pointer and accessed the struct via the pointer:

struct input_capture_data_t *input_capture_ptr;

input_capture_ptr->pulse_count = 0;

input_capture_ptr->high.max = 0;
input_capture_ptr->high.upper_warning_count = 0;
input_capture_ptr->high.min = 0xFFFF;
input_capture_ptr->high.lower_warning_count = 0;
input_capture_ptr->high.sum = 0;

input_capture_ptr->low.max = 0;
input_capture_ptr->low.upper_warning_count = 0;
input_capture_ptr->low.min = 0xFFFF;
input_capture_ptr->low.lower_warning_count = 0;
input_capture_ptr->low.sum = 0;

But GCC still prefers the STS instruction. The next step was to put the code into a function which I declared with the noinline attribute and with the struct pointer as parameter. Now GCC does use the STD instruction but I'm getting a RCALL + MOVW + RET overhead.

Is there any easy way I can tell GCC to use STD intead of STS? I want to put a lot stuff into an ATtiny2313, so I'm happy about each byte which I could spare.

Question 2:
Another thing I have noticed is that GCC doesn't take advantage of the r1 register whenever possible:

pulse_data_ptr->sum += input_capture_value;

sum is a 32 bit unsigned integer and input_capture_value is a 16 bit unsigned integer. The code above is compiled to:288:

	a0 e0       	ldi	r26, 0x00	; 0
28a:	b0 e0       	ldi	r27, 0x00	; 0
28c:	28 89       	ldd	r18, Y+16	; 0x10
28e:	39 89       	ldd	r19, Y+17	; 0x11
290:	4a 89       	ldd	r20, Y+18	; 0x12
292:	5b 89       	ldd	r21, Y+19	; 0x13
294:	82 0f       	add	r24, r18
296:	93 1f       	adc	r25, r19
298:	a4 1f       	adc	r26, r20
29a:	b5 1f       	adc	r27, r21
29c:	88 8b       	std	Y+16, r24	; 0x10
29e:	99 8b       	std	Y+17, r25	; 0x11
2a0:	aa 8b       	std	Y+18, r26	; 0x12
2a2:	bb 8b       	std	Y+19, r27	; 0x13

But the first two instructions aren't neccessary. GCC could have simply compiled the code to:

28c:	28 89       	ldd	r18, Y+16	; 0x10
28e:	39 89       	ldd	r19, Y+17	; 0x11
290:	4a 89       	ldd	r20, Y+18	; 0x12
292:	5b 89       	ldd	r21, Y+19	; 0x13
294:	82 0f       	add	r24, r18
296:	93 1f       	adc	r25, r19
298:	a4 1f       	adc	r26, r1
29a:	b5 1f       	adc	r27, r1
29c:	88 8b       	std	Y+16, r24	; 0x10
29e:	99 8b       	std	Y+17, r25	; 0x11
2a0:	aa 8b       	std	Y+18, r26	; 0x12
2a2:	bb 8b       	std	Y+19, r27	; 0x13

Which saves in this case 4 bytes of memory plus 2 registers which might even don't need to be pushed on the stack when the corresponding function is called.

Are there any makefile hacks neccessary to enable these optimizations?

Regards
Sebastian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

-hO2a (hand optimized to assembly)

I think you could wipe out the struct, then put in the non-zero data-

uint8_t i = sizeof(input_capture);
uint8_t* ip = (uint8_t*)&input_capture;
while(i--){
*ip++=0;
}
input_capture.low.min = 0xFFFF; 
input_capture.high.min = 0xFFFF;
//the other warning limits here
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
-hO2a (hand optimized to assembly)
:lol: :lol: :lol:

Quote:

uint8_t i = sizeof(input_capture);
uint8_t* ip = (uint8_t*)&input_capture;
while(i--){
*ip++=0;
}
input_capture.low.min = 0xFFFF;
input_capture.high.min = 0xFFFF;
//the other warning limits here 

Yes, I thought about that, too.But the struct contains also the members high.upper_warning_limit, low.upper_warning_limit, high.lower_warning_limit, low.lower_warning_limit which mustn't be changed at this point in the program flow.

Regards
Sebastian

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This matter was previously discussed here

https://www.avrfreaks.net/index.p...

Regards. Carlos.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

#define FIX_POINTER(_ptr) __asm__ __volatile__("" : "=b" (_ptr) : "0" (_ptr))

That did the trick. Many thanks Carlos!!!

Regards
Sebastian