__builtin_expect in embedded applications

Go To Last Post
12 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

After browsing through a link posted in another thread earlier today I came accross this snippet:

#define likely(x)      __builtin_expect(!!(x), 1)
#define unlikely(x)    __builtin_expect(!!(x), 0)

:shock:
Which seems to be used to tell the compiler which event in an if statement is more likely to occur. One assumes the compiler then knows to keep the likely code next to the compare and jump to the unlikely code...

So, question 1: is this how it works?
And question 2: does this offer any measurable advantage in embedded applications? I'm thinking largely of routines where there is an initialisation block that is only run once right below a compare like

if (b_init_has_run == 0)
{
//initialise stuff
b_init_has_run = 1;
}
else{
//run...
}

would replacing the first line with

if (unlikely(b_init_has_run == 0))

be a good idea?

Thanks,

Phil

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

So, question 1: is this how it works?

I presume you already saw this in the user manual?

http://gcc.gnu.org/onlinedocs/gc...

I also assume you recognise the !! construct as the way to convert any integer expression result to a simple 0 or 1 boolean result?

PS does branch prediction have any relevance to a RISC processor without a pipeline anyway?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

PS does branch prediction have any relevance to a RISC processor without a pipeline anyway?

I don't read the above as about branch prediction (at runtime) but as the programmer knowing what will be the likely case and tipping the compiler off so that if it has two alternative ways of generating code it will choose the one that is most efficient for the likely case.

Example: Assume test-and-branch instruction takes two clocks if the branch is not taken, and three if the branch is taken. If the instruction set is orthogonal" w.r.t the test-and-branch instructions, eg there is both a test for zero and a test for non-zero, then we could tip the compiler to use the test that makes the likely case the two-clock one. (OTOH, we often need an unconditional jump at the end of that case (the if-leg, so to say), while the unlikely case (the else-case) might just fall through out of the whole conditional construct, and if a jump takes two clocks then the same advice about what is likely might be used in the opposite way.)

As I see it this holds even for non-pipelined architectures.

Am I too tired?

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The manual doesn't seem to say how the compiler handles this. Johan's post clears this up a bit, thank you.

Realising I'm still new to this, intuition suggests that a method that uses fewer cycles in the case that occurs most frequently would be advantageous regardless of architecture, as alluded to by Johan. Am I thinking wrong/misunderstanding what was said?

Quote:
we often need an unconditional jump at the end of that case (the if-leg, so to say), while the unlikely case (the else-case) might just fall through out of the whole conditional construct, and if a jump takes two clocks then the same advice about what is likely might be used in the opposite way

If you wouldn't mind explaining this in a little more detail it would be much appreciated, I'm not quite following...

Aside from this, I'd still like to know if anyone has made use of this with and what where the results?

Here's the link if anyone wants a closer look http://lxr.linux.no/linux+v2.6.35.4/include/linux/compiler.h#L140

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Three programs:

	c = PINA;
  6c:	89 b3       	in	r24, 0x19	; 25
	if (__builtin_expect(c == 0, 1)) {
  6e:	88 23       	and	r24, r24
  70:	11 f4       	brne	.+4      	; 0x76 
		PORTB = 0x55;
  72:	85 e5       	ldi	r24, 0x55	; 85
  74:	01 c0       	rjmp	.+2      	; 0x78 
	} else {
		PORTB = 0xAA;
  76:	8a ea       	ldi	r24, 0xAA	; 170
  78:	88 bb       	out	0x18, r24	; 24
	}
	c = PINA;
  6c:	89 b3       	in	r24, 0x19	; 25
	if (__builtin_expect(c == 0, 0)) {
  6e:	88 23       	and	r24, r24
  70:	11 f4       	brne	.+4      	; 0x76 
		PORTB = 0x55;
  72:	85 e5       	ldi	r24, 0x55	; 85
  74:	01 c0       	rjmp	.+2      	; 0x78 
	} else {
		PORTB = 0xAA;
  76:	8a ea       	ldi	r24, 0xAA	; 170
  78:	88 bb       	out	0x18, r24	; 24
	}
	c = PINA;
  6c:	89 b3       	in	r24, 0x19	; 25
	if (c == 0) {
  6e:	88 23       	and	r24, r24
  70:	11 f4       	brne	.+4      	; 0x76 
		PORTB = 0x55;
  72:	85 e5       	ldi	r24, 0x55	; 85
  74:	01 c0       	rjmp	.+2      	; 0x78 
	} else {
		PORTB = 0xAA;
  76:	8a ea       	ldi	r24, 0xAA	; 170
  78:	88 bb       	out	0x18, r24	; 24
	}
}

Anyone for a game of "spot the difference" ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
The manual doesn't seem to say how the compiler handles this. Johan's post clears this up a bit, thank you.

Please note that my post was highly speculative. I am in no position to say how the compiler actually handles this. It was rather a reaction to Cliffs comment about branch prediction.

Anyhow, Cliff has demonstrated with one example that the compiler does not use the hint in all cases.

Quote:

Quote:
we often need an unconditional jump at the end of that case (the if-leg, so to say), while the unlikely case (the else-case) might just fall through out of the whole conditional construct, and if a jump takes two clocks then the same advice about what is likely might be used in the opposite way

If you wouldn't mind explaining this in a little more detail it would be much appreciated, I'm not quite following...


Imagine this piece of code:

// Something happens before the if-stateent

if (PINA == 00)
{
   //do stuff...
}
else
{
   //do some other stuff...
}

// Something happens after the if-statement

In a more or less imagined assembler dialect, that could end up as something like this

        ;Something happens before...

        TEST PINA, 0
        BRANCH_IF_NOT_EQUAL else

        ;do stuff...
        JUMP endif               ; <--- This is the jump that I am referring to

else:
        ;do some other stuff...
                                 ; <--- No jump needed here - the else-leg just "falls out" of the if/else-construct
endif:

        ; Something happens after...

Without actually stating any clock cost for the different instructions it is clear that to get the total clock costs you need to add

If PINA == 0:
- The cost of the TEST
- The cost of the BRANCH_IF_NOT_EQUAL for the case that the branch is not taken.
- The cost of the JUMP

If PINA==1:
- The cost of the TEST
- The cost of the BRANCH_IF_NOT_EQUAL for the case that the branch is taken.

Depending on the individual costs of those instructions it might be best to place the likely case in the if-leg or the else-leg.

But this is probably all moot or only theoretically fun anyway, as Cliffs test seems to imply that avr-gcc does not use the hint.

And BTW the AVR is pipe-lined, at least in some sense, as I understand it. That is the reason that branch-instructions costs an extra clock when a branch is actually taken. Or maybe look at it like this: AVR has one branch prediction - that branches wil not be taken. When that does not hold (when this prediction is wrong) there will be an associated cost to get back on track. Did that make sense?

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think this has more to do with "speculative execution" architectures than it does pipelines. In PA-RISC, a mis-predicted branch could be quite costly: Not only would all the computations need to be flushed, but the instruction fetch cache would need to be flushed and reloaded. If the compiler is given a hint as to the most likely case, it can provide clues to the CPU when it comes to executing the code.

In the extreme case, compiler-generated superscalar architectures, such as the Itanium and DSP VLIW architectures, would be solely dependent on these clues for the most efficient code.

The AVR is none of these. And as Cliff has shown, the compiler cheerfully ignores the directives.

"Mostly harmless."

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

And as Cliff has shown

The only thing he has shown is that the compiler ignored the directive in that particular case.

If you ask me, then I speculate that it is highly likely that it does not use the hint. Still, that is just that - speculation. Not a proven fact.

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Perfect sense! Thanks for taking the time to clear that up for me. I failed to think past the actual comparison to what else must take place. Meanwhile, in most cases, you're going to have to jump over the unused code at some point...

And Cliff, thanks for the test results... As well as the original link. I think I'll be spending a fair bit of my time going through bits of that code!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Anyone want to sprinkle a true, reasonable sized AVR app with _builtin_expect's on each conditional and see if you can take any cycles/bytes out of it (I know what I'm predicting ;-)). If I could work out where in the GCC source the hook for AVR handling this actually occurred I'm willing to bet we'd find a comment like:

// this looks interesting - must remember to fill something in here sometime

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Meanwhile, in most cases, you're going to have to jump over the unused code at some point...

Absolutely. But the clock cost for jumping "one way or the other" might differ.

As of January 15, 2018, Site fix-up work has begun! Now do your part and report any bugs or deficiencies here

No guarantees, but if we don't report problems they won't get much of  a chance to be fixed! Details/discussions at link given just above.

 

"Some questions have no answers."[C Baird] "There comes a point where the spoon-feeding has to stop and the independent thinking has to start." [C Lawson] "There are always ways to disagree, without being disagreeable."[E Weddington] "Words represent concepts. Use the wrong words, communicate the wrong concept." [J Morin] "Persistence only goes so far if you set yourself up for failure." [Kartman]

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Anyone want to sprinkle a true, reasonable sized AVR app with _builtin_expect's on each conditional and see if you can take any cycles/bytes out of it
My curiosity has reached the point where I'll have to do this and see... Thanks for all the info guys.