optimization and warning

Go To Last Post
11 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have a string that I receive by serial and need to test the first 2 chars to take further action.

char buf[20];
...
if(buf[0]=='O' && buf[1]=='K')
{
...
}

I optimized it by converting to a word and compare the content to the ascii value of 4F (O) and 4B (K)

char buf[20];
...
if(*(uint16_t*)buf==0x4B4F)
{
...
}

I have a couple of test like this and it saves a lot of byte. It works well in my test too.

However it throws a

Quote:
warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]

I am using the atmel toolchain gcc version 4.7.2 (AVR_8_bit_GNU_Toolchain_3.4.2_939)

How can I remove this warning?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well I am not an expert but i can suggest 2 things that maight help:
you could use a union to skip the backround pointer creation and dereferencing like this

union intbytes
{
char b[20];
int  i;
};
union intbytes b2i;
//so you receive on 
b2i.b[] , 
//and compare 
(b2i.i == 0x4b4f)

2) I think there are some builtins to get 2 bytes from an array and return an int. look it up in compiler.h

I know I didnt answer to you to your direct question, that is just an alternative
Alex

There are 10 kinds of people... those who digg binary and those who don't

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And another thing that came to me...
What you did, doesn't seem to oprimize anything in my opinion... The compiler would evaluate each part of your "if" in a left to right order. So If you do two byte-wide compares(initial code) and the first fails, the second will never run, so its likely to get faster code, than with your second choise, where the 16bit number must be created in backround and go through a word comparisson which is slower - talking about 8-bit architecture ofc.
So imho you better stick with any plan except the int pointer thing!

Alex

There are 10 kinds of people... those who digg binary and those who don't

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

so its likely to get faster code

He's right you know:

#include 

char buf[20];

void fill(char *);

int main(void) {
	fill(buf);
	if(buf[0]=='O' && buf[1]=='K')
	{
		PORTB = 0x55;
	}
	if(*(uint16_t*)buf==0x4B4F)
	{
		PORTB = 0xAA;
	}	
}

yields:

main:
//==> int main(void) {
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
//==>   fill(buf);
        ldi r24,lo8(buf)
        ldi r25,hi8(buf)
        call fill
//==>   if(buf[0]=='O' && buf[1]=='K')
        lds r24,buf
        cpi r24,lo8(79)
        brne .L2
//==>   if(buf[0]=='O' && buf[1]=='K')
        lds r24,buf+1
        cpi r24,lo8(75)
        brne .L2
//==>           PORTB = 0x55;
        ldi r24,lo8(85)
        out 0x18,r24
.L2:
//==>   if(*(uint16_t*)buf==0x4B4F)
        lds r24,buf
        lds r25,buf+1
        cpi r24,79
        sbci r25,75
        brne .L3
//==>           PORTB = 0xAA;
        ldi r24,lo8(-86)
        out 0x18,r24
.L3:
//==> }
        ldi r24,0
        ldi r25,0
        ret

Oh and here's the union solution:

#include 

typedef union {
	char buf[20];
	int i;
} join_t;

void fill(char *);

join_t b2i;

int main(void) {
	fill(b2i.buf);
	if (b2i.i == 0x4B4F)
	{
		PORTB = 0xAA;
	}	
}

and the code it generates:

main:
//==> int main(void) {
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
//==>   fill(b2i.buf);
        ldi r24,lo8(b2i)
        ldi r25,hi8(b2i)
        call fill
//==>   if (b2i.i == 0x4B4F)
        lds r24,b2i
        lds r25,b2i+1
        cpi r24,79
        sbci r25,75
        brne .L2
//==>           PORTB = 0xAA;
        ldi r24,lo8(-86)
        out 0x18,r24
.L2:
//==> }
        ldi r24,0
        ldi r25,0
        ret

which confirms it's the same as the type-punned thing.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks people, I am looking for space, not speed.

Not sure how the compiler does it, but a "make clean all" before/after optimizing a single test saves 10 bytes sometimes. I have dozens of them.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

To avoid the warning, disable strict aliasing optimization and use -fno-strict-aliasing.

A single test should save 2 bytes (BRxx) and in some rare cases 4 or 6 bytes ([R]JMP + BRxx).

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Wouldn't a function save all the space in the world?

bool CheckThis(int *v,int C)
{
return((*v==C)?1:0);
}

So

if(CheckThis((int *)buf[0],0x494f))
{
// your code
}

And if the location you check is always &buf[0], then you don't even need the argument:

bool CheckThis(int C)
{
return(b2i.i==C); // using the union
}

So

if(CheckThis(0x494f))
{
// your code
}

And if the Preambule is always 0x494F then you don't even need any argument:

bool CheckThis(void)
{
return(b2i.i==0x494F); // using the union
}

So

if(CheckThis())
{
// your code
}

There are 10 kinds of people... those who digg binary and those who don't

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Wouldn't a function save all the space in the world?
How would it save space? Certainly if you were calling it in many places it may save space, but I see no evidence that the OP is needing this functionality in more than one place.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
, I am looking for space, not speed.

I gotta say that that I don't understand how your optimization will save any space, either. The AVR is an 8bit machine, so regardless of whether you compare against a 16bit constant or two 8bit constants, you're going to have two compare instructions. You might save a conditional jump by doing 16bit math, but you might add a mov instruction to create the 16bit number.

Looking at Cliff's example:

//==>   if(buf[0]=='O' && buf[1]=='K')

        lds r24,buf      ; 2 bytes
        cpi r24,lo8(79)  ; 2
        brne .L2         ; 2 (extra jump)
        lds r24,buf+1    ; 2
        cpi r24,lo8(75)  ; 2
        brne .L2         ; 2.  12 bytes total.
//==>           PORTB = 0x55;
          ldi r24,lo8(85)
          out 0x18,r24
.L2:
//==>   if(*(uint16_t*)buf==0x4B4F)
        lds r24,buf      ; 2
        lds r25,buf+1    ; 2 (assemble 16bits is free!)
        cpi r24,79       ; 2
        sbci r25,75      ; 2 (16 bit math!)
        brne .L3         ; 2. 10 bytes total.
//==>           PORTB = 0xAA;
        ldi r24,lo8(-86)
        out 0x18,r24 

So it looks like you get to save one instruction, at least as long as you have free registers to use. Saving 10 bytes with one such optimization would be totally unexpected; it implies that the check was completely optimized away, and you should probably explain what is going on before proceeding!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It saves one instruction because of sbci and the way it sets the zero flag based on the first cpi, thus one only needs a single branch. From the 8-bit instruction ref:

CPI: Z flag cleared if k matches r, set otherwise.
SBCI: Z flag cleared if result is non-zero, unaffected otherwise.

Parhaps you could make an asm macro, something like both_match( a1, a2, b1, b2 ) that does the equivalent of (a1==a2 && b1==b2).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:

Quote:
How would it save space? Certainly if you were calling it in many places it may save space, but I see no evidence that the OP is needing this functionality in more than one place.

and the Initial poster Magister wrote earlier:

Quote:
Not sure how the compiler does it, but a "make clean all" before/after optimizing a single test saves 10 bytes sometimes. I have dozens of them.

That is why I suggest the function.

Correct me if i misunderstood

Alex

There are 10 kinds of people... those who digg binary and those who don't