Code generation question

Go To Last Post
21 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Given the following C expression (or equivalent if-else statement)

uint8_t x = (var & 0x80) ? 0x87 : 0

avr-gcc generates code like this

    sbrs r21,7        ; 1,2
    rjmp .L10         ; 2,0
    ldi r24,lo8(-121) ; 0,1
    rjmp .L12         ; 0,2
 .L10:
    ldi r24,lo8(0)    ; 1,0
 .L12:

The comment at the end of each line were added by me to indicate the number of cycles required in the false and true cases. Results are 4 cycles for false and 5 cycles for true. With a total of 5 words of code.

I have two questions about that:
1) Why lo8(-121) instead of just lo8(135)?
2) Why not generate the code below instead?

   ldi r24,lo8(-121)  ; 1,1
   sbrs r21,7         ; 1,2
   ldi r24,lo8(0)     ; 1,0

Results are 3 cycles for false and 3 cycles for true, and a total of 3 words of code. Admittedly a small difference, but with no down-side that I see. Particularly given that the equivalent C code is fairly common in avr code.

I'm guessing this may be a side effect of the fact that GCC targets multiple platforms?

-Brad

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Because the standard says every number you type there is a signed integer or something like that.

If you want smaller code, you should say your constant magical numbers are unsigned chars maybe by casting.

- Jani

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Jani thanks for the comment, but I just tried that and it doesn't make any difference. And I don't see why it would make any difference, because none of those rjmp's or ldi's depend on the sign.

-Brad

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Jepael wrote:
Because the standard says every number you type there is a signed integer or something like that.

If you want smaller code, you should say your constant magical numbers are unsigned chars maybe by casting.

It shouldn't matter.
0x80 has one nonzero bit regardless.
The signedness of var doesn't matter either.
I think he has optimization turned off.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Particularly given that the equivalent C code is fairly common in avr code.

Hmmm--interesting conjecture. I find that I don't use ? all that much but that may be a style issue. Where I'd often like to use it is on I/O bits, but often that can result in multiple tickles of the I/O register.

And your method would change that I/O bit for a few cycles--bad! Bad puppy!

Now, you picked the case where both the tested value and the result were in registers. Repeat the analysis with one or both (especially the result) in SRAM, and perhaps indexed or pointer accessed and the analysis might be different.

Your example also used very simple expressions as the values to be stored.

I've got places where I do use

frog = 0;
if (some conditions, often complex exceptions) frog = 123;

and if you feel strongly then you could write that way too.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Lee,

That construction produces exactly the same code.

Personally I think that there will always be opportunities for humans to out-think compilers. But compilers do not get tired or have silly moments.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Quote:
Particularly given that the equivalent C code is fairly common in avr code.

Hmmm--interesting conjecture. I find that I don't use ? all that much but that may be a style issue. Where I'd often like to use it is on I/O bits, but often that can result in multiple tickles of the I/O register.

Its not about the use of ?. That's why I said the "equivalent" C code. Because an if statement generates the same code.

if(var & 0x80)
   x = 0x87;
else
   x = 0;

This isn't really about I/O bits... just byte values loaded into general purpose registers (no matter where they come from or are heading to).

Quote:
I've got places where I do use
frog = 0;
if (some conditions, often complex exceptions) frog = 123;

and if you feel strongly then you could write that way too.


True, or write it directly in ASM. I'm just surprised the compiler doesn't do this automatically when -Os is used.

Edit: fixed typos

-Brad

Last Edited: Tue. Apr 22, 2008 - 08:13 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
I think he has optimization turned off.

When you compile that C code, do you get different results? The compilation step is pasted below. Is it missing something?

avr-gcc -c -mmcu=at90usb162 -I. -gdwarf-2 -DF_CPU=8000000UL -DAVRGCC -DENABLE_TRACE -DENABLE_STACK_CHECK -Os -funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums -fgnu89-inline -finline-functions-called-once -Wall -Wstrict-prototypes -Wundef -Wsign-compare -Wa,-adhlns=obj/input_sj.lst -I"/usr/local/avr/include/" -std=gnu99 -MMD -MP -MF .dep/input_sj.o.d input_sj.c -o obj/input_sj.o

-Brad

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

This isn't really about I/O bits...

Well, it certainly could be in the general case. As mentioned, if the target destination is PORTx (most obviously but could be any other I/O register) then the output would incorrectly be turned on for a few cycles. Another I/O register example could be that an ...IE. bit is turned on and the condition is pending so the ISR fires even though it is not "supposed" to, and examination of the source code would reveal no problems.

I'd think the same could happen to a variable, let's say it is the "do_something_flag", that is only set under certain conditions and turned off otherwise. This fits your criteria, I believe. An ISR could fire during the few cycles it is turned on and "do something" even when it is not supposed to.

Yep, I'll stick to that one. Just as multi-byte variable access shared with ISRs needs to be kept atomic, so would the code sequence that you propose to cover the same cases.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Quote:

This isn't really about I/O bits...

Well, it certainly could be in the general case. As mentioned, if the target destination is PORTx (most obviously but could be any other I/O register) then the output would incorrectly be turned on for a few cycles.

But I'm not talking about some general case. I'm talking about this specific case where the compiler clearly knows that the values are in GP registers (the compiler did the register allocation).

Perhaps the existence of the I/O case caused the compiler's developers to keep GCC's code simpler by using a single safe approach. Or maybe no one has had the time/desire to do something about it. But just because the I/O example exists, doesn't mean you can't optimize other cases.

Quote:
I'd think the same could happen to a variable, let's say it is the "do_something_flag", that is only set under certain conditions and turned off otherwise. This fits your criteria, I believe. An ISR could fire during the few cycles it is turned on and "do something" even when it is not supposed to.

Yep, I'll stick to that one. Just as multi-byte variable access shared with ISRs needs to be kept atomic, so would the code sequence that you propose to cover the same cases.


That's a good thought, but I don't think it is correct. If the "do_something_flag" was in a register the ISR would not see its local state changes anyway. Variables shared between normal code and ISRs would be declared volatile. And then the value would be written back to SRAM after the entire operation was complete. So the ISR would see it either before being changed or after, but not in an intermediate state while loaded into a register.

-Brad

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

And then the value would be written back to SRAM after the entire operation was complete. So the ISR would see it either before being changed or after, but not in an intermediate state while loaded into a register.

Nope--the multi-byte read or write, regardless of "volatile" status, needs to be protected to make it atomic. [At least AFAIK--I've never seen a posted code fragment where it >>was<< protected implicitly by the compiler.]

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Quote:

And then the value would be written back to SRAM after the entire operation was complete. So the ISR would see it either before being changed or after, but not in an intermediate state while loaded into a register.

Nope--the multi-byte read or write, regardless of "volatile" status, needs to be protected to make it atomic. [At least AFAIK--I've never seen a posted code fragment where it >>was<< protected implicitly by the compiler.]

Lee

Correct, and it likely won't, as the compiler has no idea if interrupts are enabled or not at any given point. If it was, it would have to add protection code around each operation. Potentially making things very inefficient, and large space wise. Best to leave those protections to the programmer, so that they can be placed at the optimum locations.

"volatile" also does not indicate that it's an ISR that is changing/depending on the value, only that the value relates to something EXTERNAL to the compilers knowledge. Since the "external" item could be an ISR or hardware, adding CLI/SEI protection code, may or may-not, help.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Quote:

And then the value would be written back to SRAM after the entire operation was complete. So the ISR would see it either before being changed or after, but not in an intermediate state while loaded into a register.

Nope--the multi-byte read or write, regardless of "volatile" status, needs to be protected to make it atomic. [At least AFAIK--I've never seen a posted code fragment where it >>was<< protected implicitly by the compiler.]

Fine, but its a moot point. Multi-byte read or write needs to be protected if the variable is to be shared with external code (aka ISRs). Point to a spot in the code below where an ISR could trigger causing a problem. This is a local (stack based), non-volatile, single byte variable.

   ldi r24,lo8(-121)
   sbrs r21,7
   ldi r24,lo8(0)

The entire thing is completely invisible to an ISR which would just push and pop r24 and r21 if used.

I checked to see what would happen if the byte being written to was global and volatile, and it does indeed do what I suggested. All modifications of r24 and branches are done first, and then r24 is stored to the global. Probably because this is in fact not a multi-byte write. Potentially writing one byte twice isn't the same thing.

-Brad

Last Edited: Tue. Apr 22, 2008 - 10:44 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

"volatile" also does not indicate that it's an ISR that is changing/depending on the value, only that the value relates to something EXTERNAL to the compilers knowledge. Since the "external" item could be an ISR or hardware, adding CLI/SEI protection code, may or may-not, help.

True enough--memory-mapped peripheral comes to mind.

I see Brad's point. If he thinks ? handling is bad, try all the combinations of toggling an I/O point (both in low & high address space) if you want to see ugly code. I finally just do if-else as the best method for that. In my old age, if I don't like the code the compiler makes I try to make it easier on the compiler by re-wording/casting/re-ordering to get it to output the sequence that I want.

I've been caught by the "it's a lot easier/more efficient to just turn this on now and then decide if it needs to be turned off in a few microseconds" before but not directly in this context. Both were multiplex-type apps; one was a piezo element and the other an LED display. those couple microseconds were enough to give sound/light so I had to do it the "clean" way. Thus my caution.

One more note:

Quote:
If the "do_something_flag" was in a register the ISR would not see its local state changes anyway.

Yes, it certainly will see them if it were a variable bound to that register. [I do that a lot in tight fast ISRs--use global register variables.]

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

schickb wrote:
Multi-byte read or write needs to be protected if the variable is to be shared with external code (aka ISRs). Point to a spot in the code below where an ISR could trigger causing a problem. This is a local (stack based), non-volatile, single byte variable.

   ldi r24,lo8(-121)
   sbrs r21,7
   ldi r24,lo8(0)

The entire thing is completely invisible to an ISR which would just push and pop r24 and r21 if used.

But now you've turned it around. Firstly it's not invisible to the ISR, it's irrelevant to the ISR. It's the ISR executing that is invisible to the code. If it's not a shared variable, then who-cares when the ISR executes? It does not make any difference.

However, if the code executing, or the code in the ISR relies on the variable being manipulated, then it does matter. And the ISR executing between any 2 of those instructions could result in an error. And note that it's not just multi-byte operations that need to be protected, but also any multi-step RMW (read-modify-write) sequence.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
I've been caught by the "it's a lot easier/more efficient to just turn this on now and then decide if it needs to be turned off in a few microseconds" before but not directly in this context.

Point taken.

Quote:
Quote:
If the "do_something_flag" was in a register the ISR would not see its local state changes anyway.

Yes, it certainly will see them if it were a variable bound to that register. [I do that a lot in tight fast ISRs--use global register variables.]

Can this be done in C code with r0-r31 registers? Or do you mean using an AVR GPIO register?. In any case, that isn't what is happening in this example.

-Brad

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

glitch wrote:
schickb wrote:
Multi-byte read or write needs to be protected if the variable is to be shared with external code (aka ISRs). Point to a spot in the code below where an ISR could trigger causing a problem. This is a local (stack based), non-volatile, single byte variable.

   ldi r24,lo8(-121)
   sbrs r21,7
   ldi r24,lo8(0)

The entire thing is completely invisible to an ISR which would just push and pop r24 and r21 if used.

But now you've turned it around. Firstly it's not invisible to the ISR, it's irrelevant to the ISR. It's the ISR executing that is invisible to the code. If it's not a shared variable, then who-cares when the ISR executes? It does not make any difference.


Argg... I'm just returning this thread to my original point. This isn't shared code and I don't care about ISRs. That' my point. GCC could generate the code as I've written without any problems.

glitch wrote:
However, if the code executing, or the code in the ISR relies on the variable being manipulated, then it does matter. And the ISR executing between any 2 of those instructions could result in an error. And note that it's not just multi-byte operations that need to be protected, but also any multi-step RMW (read-modify-write) sequence.

I understand the general multi-step problem, but I don't agree that it applies here. When I change my code to make it a volatile global variable, the value held in the register gets written back to SRAM after those lines of code. So if the ISR runs anywhere before its written back to SRAM it will simply see the value as it was before those lines. The isr won't use the "wrong value" that temporarily lives in r24 because that value is never written back to SRAM.

With a volatile global, GCC currently generates this

    sbrs r21,7
    rjmp .L10
    ldi r24,lo8(-121)
    rjmp .L12
 .L10:
    ldi r24,lo8(0)
 .L12:
    sts global,r24

My suggested "optimized" version would be this

   ldi r24,lo8(-121)
   sbrs r21,7
   ldi r24,lo8(0)
   sts global,r24

Unless I'm really missing something I just don't see how that could cause an error in an ISR.... even if that was my original question... which it wasn't :)

Edit: I'm the typo master.

-Brad

Last Edited: Tue. Apr 22, 2008 - 11:33 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

schickb wrote:
theusch wrote:
...
Quote:
If the "do_something_flag" was in a register the ISR would not see its local state changes anyway.

Yes, it certainly will see them if it were a variable bound to that register. [I do that a lot in tight fast ISRs--use global register variables.]

Can this be done in C code with r0-r31 registers? Or do you mean using an AVR GPIO register?.
avr-gcc has a syntax just for the former.
One doesn't need it for the latter.
Just include # and use the right name.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

schickb wrote:

Unless I'm really missing something I just don't see how that could cause an error in an ISR.... even if that was my original question... which it wasn't :)

Single byte reads or writes are atomic by nature, so they are relatively safe, as you will never get caught mid-write for the entire value, resulting in corruption. Thus for single operation RMW tasks, the worst case is that the ISR grabs the stale value. As the new/correct value is still in the register waiting to be written back.

However for volatile variables, and a multi-step update there is a chance for error. When the var is volatile, GCC will write it back after each operation. This means that if you do a bit clear, followed by a bit set operation on separate lines, there is a chance that the ISR will see the cleared bits, but not the set bits on a particular pass. This could result in failure of your program to operate correctly.

volatileVar &= BITS_TO_CLEAR;
volatileVar |= BITS_TO_SET;

will result in something like:

lds r24, volatileVar
andi r24, BITS_TO_CLEAR
sts volatileVar, r24
lds r24, volatileVar
ori r24, BITS_TO_SET
sts volatileVar, r24

The compiler must write it out after the first operation, because it is volatile, and it has to assume that any manipulation has side-effects. It must also re-read the value before the second operation, because it has to allow for the possibility that the value has changed externally, since the previous read. (that would include for multiple reads in the same statment)

[sorry that it was off-topic form the original question, was only responding to the latter discussion going on, I'll no go back to see what the original question was ;)]

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Why lo8(-121) instead of just lo8(135)?

They are equivalent and the selection is the choice of whatever you are using to disassemble. Mine emits "LDI r24,0x87".

GCC is an excellent compiler, worth far more than it costs. But expecting it to emit "ideal" code is asking more than is paid for.

Consider:

unsigned char x= 0;
if (var&0x80) // (var was assigned PINA)
	x=0x87;

With -O2 we get:

SBIC 0x19,7
RJMP 0x4
LDI r24,0x0
LDI r25,0x0
RJMP 0x3
LDI r24,0x87
LDI r25,0x0

So, not only is the condition tested prior to assignment, but GCC has seen fit to convert the 8 bit quantity to 16 bits.

Compiling with -Os:

SBIS 0x19,7
RJMP 0x3
LDI r24,0x87
RJMP 0x2
LDI r24,0x0

Perhaps the lesson is: if you want "ideal" performance, code it in assembler, or consider trying another compiler. Generally I trust GCC to emit correct code and if performance is critical I'll hand tune with assembler.

C: i = "told you so";

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

cpluscon wrote:
Perhaps the lesson is: if you want "ideal" performance, code it in assembler, or consider trying another compiler. Generally I trust GCC to emit correct code and if performance is critical I'll hand tune with assembler.

Yes, I think that is a good summary. What surprises me, however, is that some of the optimizations GCC does perform are quite good (aka complex) while in other situations it seems to miss rather simple opportunities. I suspect if I dug into gcc's internals, the simple opportunities might be more complex than they appear. But in some cases (like this one) I can't guess how.

-Brad