Memory barrier: what it does and what it does not do

35 posts / 0 new
Last post
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Programs contain sequences of statements, and a naive compiler would execute them exactly in the order as they are written. But an optimizing compiler is free to reorder the statements - or even parts of them - if the resulting "net effect" is the same. The "measure" of the "net effect" is what the standard calls "side effects", and is accomplished exclusively through accesses (reads and writes) to variables tagged as volatile. So, as long as all volatile reads and writes are to the same addresses and in the same order (and writes write the same values), the program is correct, regardless of other operations in it. (One important point to note here is, that time duration between consecutive volatile accesses is not considered at all.)

Unfortunately, there are also operations which are not covered by volatile accesses. An example of this in avr-gcc/avr-libc are the cli()/sei() macros defined in , which convert directly to the respective assembler mnemonics through the __asm__() statement. These don't constitute a variable access at all, not even volatile, so the compiler is free to move them around. Although there is a "volatile" qualifier which can be attached to the __asm__() statement, its effect on (re)ordering is not clear from the documentation (and is more likely only to prevent complete removal by the optimiser), as it (among other) states:

avr-gcc manual wrote:
Note that even a volatile asm instruction can be moved relative to other code, including across jump instructions. [...] Similarly, you can't expect a sequence of volatile asm instructions to remain perfectly consecutive.

There is another mechanism which can be used to achieve something similar: memory barriers. This is accomplished through adding a special "memory" clobber to the assembler statement, and ensures that all variables are flushed from registers to memory before the statement, and then re-read after the statement. The purpose of memory barriers is slightly different than to enforce code ordering: it is supposed to ensure that there are no variables "cached" in registers, so that it is safe to change the content of registers e.g. when switching context in a multitasking OS (on "big" processors with out-of-order execution they also imply usage of special instructions which force the processor into "in-order" state (this is not the case of AVRs)).

However, memory barrier works well in ensuring that all volatile accesses before and after the barrier occur in the given order with respect to the barrier. However, it does not ensure the compiler moving non-volatile-related statements across the barrier. Peter Dannegger provided a nice example of this effect:

#define cli() __asm volatile( "cli" ::: "memory" )
#define sei() __asm volatile( "sei" ::: "memory" )

unsigned int ivar;

void test2( unsigned int val )
{
  val = 65535U / val;

  cli();

  ivar = val;

  sei();
}

compiles with optimisations switched on (-Os) to

00000112 :
 112:	bc 01       	movw	r22, r24
 114:	f8 94       	cli
 116:	8f ef       	ldi	r24, 0xFF	; 255
 118:	9f ef       	ldi	r25, 0xFF	; 255
 11a:	0e 94 96 00 	call	0x12c	; 0x12c <__udivmodhi4>
 11e:	70 93 01 02 	sts	0x0201, r23
 122:	60 93 00 02 	sts	0x0200, r22
 126:	78 94       	sei
 128:	08 95       	ret

where the potentially slow multiplication is moved across cli(), resulting in interrupts to be disabled longer than intended. Note, that the volatile access occurs in order with respect to cli()/sei(); so the "net effect" required by the standard is achieved as intended, it is "only" the timing which is off. However, for most of embedded applications, timing is an important, sometimes critical factor.

Unfortunately, at the moment, in avr-gcc (nor in the C standard), there is no mechanism to enforce complete match of written and executed code ordering - except maybe of switching the optimization completely off (-O0), or writing all the critical code in assembly.

To sum it up:

  • memory barriers ensure proper ordering of volatile accesses
  • memory barriers don't ensure statements with no volatile accesses to be reordered across the barrier

[This article was written as a supporting documentation for related items in avr-libc - the sei()/cli() macros in , the ATOMIC_BLOCK mechanism in and the newly being introduced _MemoryBarrier() in . It also drew from http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=89990&start=20&postdays=0&postorder=asc&highlight=.

Comments please. Thanks.

Jan Waclawek

[edit] fixed the last link

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

The "measure" of the "net effect" is what the standard calls "side effects", and is accomplished exclusively through accesses (reads and writes) to variables tagged as volatile.

I cannot agree with this, especially the "exclusively" part. The optimization situations that you allude to existed long before microcontrollers and "volatile". You start with basic blocks first, mix well with a portion of sequence points, and add a dash of register lifetime. Volatile is like the sprinkles on the top of the cupcake icing. (generalizations based on past live(s) doing compilers and the like)

However, I can certainly see your point as you expand on the particular situation(s) being addressed.

You can put lipstick on a pig, but it is still a pig.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@Jan

Tell the poor compiler that you insist on using val "NOW ....."

#define cli() __asm volatile( "cli" ::: "memory" )
#define sei() __asm volatile( "sei" ::: "memory" ) 


unsigned int ivar;

void test2( unsigned int val )
{

  val = 65535U / val;

  asm volatile ("" : : "b"(val): "memory");


  cli();

  ivar = val;

  sei();
} 
test2:
/* prologue: function */
/* frame size = 0 */
	movw r22,r24
	ldi r24,lo8(-1)
	ldi r25,hi8(-1)
	rcall __udivmodhi4
	movw r30,r22
/* #APP */
 ;  23 "testw.c" 1
	cli
 ;  0 "" 2
/* #NOAPP */
	sts (ivar)+1,r23
	sts ivar,r22
/* #APP */
 ;  27 "testw.c" 1
	sei
 ;  0 "" 2
/* epilogue start */
/* #NOAPP */
	ret

Some hints here

http://blog.regehr.org/archives/28

http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf

And i think Dean or Danni also mentioned this trick.

Btw: I'm not 100% sure about the "b" in

: "b"(val):

I used b to get it to shut up :-)

And you doesnt access the var at all , you just tell the compiler that you want to use/access it.

/Bingo

Ps: If you complain about the

 movw r30,r22 

Then "Put the beast out of it's misery" and go buy an IAR compiler :-)

Wonder where it came from ... :-)
Was it the "b" access ... ?

.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Bingo600 wrote:

asm volatile ("" : : "b"(val): "memory");

Well, some form of "volatilisation" should help (btw. "b" is not a particularly good choice in this case, standing for "Base pointer register (r28–r31)", the register allocator might feel pressed in a more convoluted situation and move things around unnecessarily... or not, as it's a pretty unpredictible beast... :-) ). Danni's solution is also a form of "volatilisation", except he did it on the other side of barrier, involving C-ish cast magic, http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&p=672085#672085.

But my point is slightly different: the article is NOT intended to provide a solution, it is intended to WARN about the effect. Once you KNOW this may happen, you will be aware of it when hunting down the subtle bugs it may cause.

Jan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@Jan

Well im not an assembler guru , but i take it that you get the hint ...
Tell gcc that you want to use the variable as in the above , but maybe with the right "magic" instead of "b".
Then it will deliver it at that point.

But you are right that it would be optimal if the plain ::: memory , could have done it.
But i do think a volatile would help see below link.

There was a previous issue here
http://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&t=89378

/Bingo

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Seems like the "r" aka. any register will get rid of the extra code.

void test2( unsigned int val )
{

  val = 65535U / val;

  asm volatile ("" : : "r"(val): "memory");


  cli();

  ivar = val;

  sei();
} 
test2:
/* prologue: function */
/* frame size = 0 */
	movw r22,r24
	ldi r24,lo8(-1)
	ldi r25,hi8(-1)
	rcall __udivmodhi4
/* #APP */
 ;  23 "testw.c" 1
	cli
 ;  0 "" 2
/* #NOAPP */
	sts (ivar)+1,r23
	sts ivar,r22
/* #APP */
 ;  27 "testw.c" 1
	sei
 ;  0 "" 2
/* epilogue start */
/* #NOAPP */
	ret

Could any of the "C" gurus make a TOUCH(val) macro out of this one : asm volatile ("" : : "r"(val): "memory");

Could come in handy ....

/Bingo

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Quote:

The "measure" of the "net effect" is what the standard calls "side effects", and is accomplished exclusively through accesses (reads and writes) to variables tagged as volatile.

I cannot agree with this, especially the "exclusively" part.

That's just a fancy word... ;-)

Lee wrote:
The optimization situations that you allude to existed long before microcontrollers and "volatile". You start with basic blocks first, mix well with a portion of sequence points, and add a dash of register lifetime. Volatile is like the sprinkles on the top of the cupcake icing. (generalizations based on past live(s) doing compilers and the like)

This all comes from C99 5.1.2.3. Sure, I've committed (over)simplification, but I don't think the description of the problem needs to go to further details (which I think the same what you said in the following:)

Lee wrote:
However, I can certainly see your point as you expand on the particular situation(s) being addressed.

You are free to suggest different wording, of course.

Jan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:

But my point is slightly different: the article is NOT intended to provide a solution, it is intended to WARN about the effect. Once you KNOW this may happen, you will be aware of it when hunting down the subtle bugs it may cause.

Jan

Ahh... I get it (now...)
It was a "warning/info" not a "how to avoid" question ..

Sorry ... But a "TOUCH" macro could come in handy anyways.

Even though one prob still has to check if "the beast" does what "You think/expect you have told it to do" :-)

/Bingo

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do I get it right, you are trying to move an age-old avr-libc discussion to avrfreaks, to build up some pressure on the decision makers there?

Stealing Proteus doesn't make you an engineer.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ArnoldB wrote:
Do I get it right, you are trying to move an age-old avr-libc discussion to avrfreaks, to build up some pressure on the decision makers there?
Well, there are no real decisions made there, as far as substantial issues of gcc are concerned.

I'm just trying to document the status quo.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've asked Jan to discuss this here, so his contribution can go into
the avr-libc documentation, as what's currently there might be a
little terse if you never thought about all those details.

Jörg Wunsch

Please don't send me PMs, use email if you want to approach me personally.
Please read the `General information...' article before.

Pages