GP Register Access Optimization

Go To Last Post
3 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In C++ ( gcc 4.3.3 ) I am working on big, ugly, somewhat ridiculous source library; that is going well. At its very core, however, is one template class that wraps access to virtual registers ( and as part of its job, physical registers ). The class is, again, working well and doing what it is expected to do; no problem there either. The problem, such as it is, appears in the optimization of the accesses. I have, more or less, satisfied my fears that the missed optimizations may be a result of C++. It appears, indeed, to be internal to gcc and the optimizer.

To wit, I have pulled the basic functionality out of the class to create this test code ( with # for percent ):

#include 
#include 

void main(void) __attribute__((OS_main));

static inline uint8_t set_r18( const uint8_t& _value, const bool _volatile = false ) {
	register uint8_t return_value asm("r18");
	if( _volatile ) asm volatile("mov #0, #1": "=r" (return_value) : "r" (_value) );
	else asm("mov #0, #1" : "=r" (return_value) : "r" (_value) );
	return return_value;
}

static inline uint8_t get_r18( const bool _volatile = false ) {
	register uint8_t reg_value asm("r18");
	uint8_t return_value;
	if( _volatile ) asm volatile("mov #0, #1": "=r" (return_value) : "r" (reg_value) );
	else asm("mov #0, #1" : "=r" (return_value) : "r" (reg_value) );
	return return_value;
}

static inline const uint8_t& set_r18_mem( const uint8_t& _value, const bool _volatile = false ) {
	if ( _volatile ){
		*(( volatile uint8_t * const ) 0x12) = _value;	  	  
	}
	else{
		*(( uint8_t * const ) 0x12) = _value;	  
	}
	
	return _value;
}

static inline const uint8_t& get_r18_mem( const bool _volatile = false ) {
	if( _volatile ) return ( const uint8_t & ) (*(( uint8_t const * const ) 0x12));
	else return ( const uint8_t & ) (*(( uint8_t const * const ) 0x12));
}

void main(){
	DDRB = 0xFF;
	set_r18( 0x03, false );
	PORTB = get_r18( false );
	
	set_r18_mem( 0x03, false );
	PORTB = get_r18_mem( false );
	
	while( true );
}

The command line options:

CPP_FLAGS="-O2 -ffunction-sections -fno-exceptions -std=c++0x -fno-inline-small-functions
-funsigned-char -funsigned-bitfields -fshort-enums -fno-split-wide-types -fno-tree-scev-cprop -ffreestanding"
LINK_FLAGS="-Wl,--gc-sections,--relax"

The code has two ways to get and set a value in r18, tested in main. Now, at some point I will have to be able to control the volatility of the access, but for now I'd be happy if I can just get them optimized as non-volatile. This is what I am getting in the lss file ( after quite a bit of cleanup ).

0000005e 
: // DDRB = 0xFF 5e: 8f ef ldi r24, 0xFF ; 255 60: 87 bb out 0x17, r24 ; 23 // set_r18( 0x03 ); 62: 83 e0 ldi r24, 0x03 ; 3 64: 28 2f mov r18, r24 // PORTB = get_r18(); 66: 22 2f mov r18, r18 68: 28 bb out 0x18, r18 ; 24 // set_r18_mem( 0x03 ); // * ldi r24, 0x03 - this is required here too 6a: 80 93 12 00 sts 0x0012, r24 // PORTB = get_r18_mem(); 6e: 88 bb out 0x18, r24 ; 24 // Desired set_r18( 0x03 ); ldi r18, 0x03 // Desired PORTB = get_r18(); out 0x18, r18

This is complete with what I would like to see as the optimized implementation. Ideally, anyway, this would all work with the gp register instructions. So, using inline asm I am able to use the register instructions but, as expected, the optimizer is stalled and there are some duplicate ( or at least redundant from an optimization point of view ) instructions - total cost four clock cycles. The pointer based version uses direct SRAM access with a total cost of four cycles as well. Because there is no inline asm the optimizer is able to see the second read ( equivalent to - mov r18, r18 ) and remove it. The optimal code would be a mere two cycles. However, to reach that the optimizer would need to recognize the address as being within gp register space and compatible with a mov in place of sts.

Anyone have any ideas about how to convince the optimizer to do something with this, or a different way to achieve the same. One thing I can't lose is flexibility in the wrapper. If a change negatively effects that, speed and size will just have to take second place to other things but if I can do better... well. This is the heart of a monster, a small improvement ( if you can call 50# reduction small ) here would be nice.

Martin Jay McKee

As with most things in engineering, the answer is an unabashed, "It depends."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

R18 is in use constantly by the compiler. By trying to use it yourself you are going to create very big problems.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Don't worry, I am well aware of the problems with r18. I doubt very seriously that I will do much with the core registers, in fact, much more with the IO registers. As the SRAM version of the code optimizes nicely, I need to work on getting closer to the hardware. In a real application, should I find a place I need access, it would, of course, be reserved in the compiler.

Next on the list of tests is, indeed, IO registers, less possibility of major disaster. Should be interesting as well.

Martin Jay McKee

As with most things in engineering, the answer is an unabashed, "It depends."