Memory barrier: what it does and what it does not do

Go To Last Post
68 posts / 0 new

Pages

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It looks like the problem was ancient when this discussion were young.

ArnoldB wrote:
... move an age-old avr-libc discussion to avrfreaks ...

Do you know what avr-libc discussion ArnoldB means? I still don't see difference between cli, sei which use memory barrier and wdr, sleep, reti and nop which do not need (?) this. Adding the memory barrier may be quite expensive so the authors of libraries used as default in Atmel Studio probably had a good reason to do this. Does anyone know an example when removing the barrier may cause wrong result? I cannot find any scenario where reordering sei() may cause problems but reordering sleep_cpu() is OK. My reasoning:

 

cli() and sei() is used for doing atomic operations - operations that should not be interrupted. Those are either manipulating variables stored in RAM shared with an ISR (need to be declared "volatile") and timing sensitive access to I/O registers. AFAIK from compiler's point of view I/O registers are only another part of RAM, whole marked as "volatile" so in fact there is no difference between accessing I/O and user defined volatile location of RAM. We suppose without the memory barrier the compiler is allowed to move cli() and sei() around such volatile accesses. If that is true the compiler may move sleep_cpu() around volatile access because no barrier is implemented by default. So it should be able to move it in front of sei() leading to an endless sleep or some I/O access that may happen just before sleep (such as enabling some interrupt or disabling outputs to conserve power). Yet such problems are either very rare or do not happen at all (it would be fixed otherwise). If reordering sleep_cpu() never happens reordering of cli() and sei() should never happen too - and the memory barrier is wasteful. OTOH if such reordering is possible in rare (or specific?) conditions the instructions should have also added the memory barrier.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Read the AVR-LibC manual it gives a link to the dev list where the devs discuss things like this (not on Freaks, which is kind of the point Arnold was making).

 

Manual ultimately leads to here:  https://savannah.nongnu.org/mail...

 

BTW you are wasting your time worrying about this. Minds greater than you or I have explored this in infinite depth (the LibC devs) and are satisfied that the sequencing of sei()/cli()/etc. are always right. In fact if sei(), cli(), etc did not insert the opcodes at exactly the right point, every time, there could well be a lot of very broken avr-gcc code out there! (but there isn't) 

Last Edited: Sun. Aug 26, 2018 - 11:50 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

js wrote:
and I thought a memory barrier means you forget thing easily, I have one of those.

You have to remember things first before you can forget.  A classic barrier to memory:

Image result for tequila

Timber/Pitbull

It's going down, I'm yelling timber
You better move, you better dance
Let's make a night, you won't remember
I'll be the one, you won't forget

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Sun. Aug 26, 2018 - 06:38 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
Indeed the gurus will need to comment.  Does it depend on the "something"?  If that is a sequence point, then won't the volatile take care of it?

#define wdt_reset() \
	__asm__ __volatile__ ("wdr") 
No.

volatile, like static, multitasks.

In this case, it implies that if the code is reached, it will need to be executed,

even if all its outputs are unused.

In this case, the __volatile__ is redundant.

Inline assembly without outputs is implicitly volatile.

Also, volatile inline assemblies may not be reordered with respect to each other.

Reordering with respect to volatile memory access is not prohibited by the __volatile__ in __asm__ __volatile__.

 

Being able to put volatile in the clobber list with the

obvious semantics might be a useful enhancement.

 

"Demons after money.
Whatever happened to the still beating heart of a virgin?
No one has any standards anymore." -- Giles

Last Edited: Mon. Aug 27, 2018 - 01:28 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
No. ...

So, then, what is the answer to the query about how e.g. wdr() works without possibly being moved?

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
skeeve wrote:
No. ...

So, then, what is the answer to the query about how e.g. wdr() works without possibly being moved?

My guess is that sometimes (rarely) it doesn't.

IIRC wdt is a one-cycle instruction.

Also, regardless of how far the compiler moves it,

the wdr must be performed the correct number of times.

That limits the amount of damage likely to be done by a non-hostile compiler.

"Demons after money.
Whatever happened to the still beating heart of a virgin?
No one has any standards anymore." -- Giles

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

...

Also, regardless of how far the compiler moves it,

the wdr must be performed the correct number of times.

That limits the amount of damage likely to be done by a non-hostile compiler.

 

And what about sleep_cpu()? Usually you do VERY important I/O manipulations around sleep_cpu() - such as enabling sleep just before and disabling after. Changing order of those will lead to code failure every time, not only during quite rare interrupt race events. Yet no complaints AFAIK.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Very often the sleep instruction is immediately preceded by an sei which includes the memory barrier.

 

--Mike

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

1) Not always

2) Need for placing "cli() memory barrier" just after sleep_cpu() is application specific - so at least this barrier does not happen "very often".

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Note that I said "sei" and "before" the sleep instruction.

 

--Mike

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Note that for quite some time, sei() and cli() did not have the memory barrier.

They usually worked anyway.

With no inputs or outputs,

the toolchain rarely had any reason to move them.

Exceptions exist.

"Demons after money.
Whatever happened to the still beating heart of a virgin?
No one has any standards anymore." -- Giles

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Noted.

 

--Mike

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:

Exceptions exist.

Do they? If so what causes such exception? If such thing may happen to unprotected cli is it also possible for unprotected sleep?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Smajdalf wrote:
skeeve wrote:

Exceptions exist.

Do they? If so what causes such exception? If such thing may happen to unprotected cli is it also possible for unprotected sleep?

I'm not sure the cause is well-understood.

avrfreaks has had posts on the subject.

Such exceptions were the reason the memory barrier was added.

Dean Cameron's ATOMIC_BLOCK was written before the change.

There have been discussions about whether the

memory barrier obviates the need for ATOMIC_BLOCK.

"Demons after money.
Whatever happened to the still beating heart of a virgin?
No one has any standards anymore." -- Giles

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
I'm not sure the cause is well-understood. avrfreaks has had posts on the subject. Such exceptions were the reason the memory barrier was added. Dean Cameron's ATOMIC_BLOCK was written before the change. There have been discussions about whether the memory barrier obviates the need for ATOMIC_BLOCK.

Sign us "Confused Since 2009".  In

https://www.avrfreaks.net/forum/...

#10 has links to discussions before that which you gave.  Dean expounds on ATOMIC_ and memory barrier and similar.  In particular see #14

abcminiuser wrote:
Which shows why the memory barrier is needed - without it the compiler will always do the SEI after the CLI, but doesn't always put the code shown inbetween them between the two, as they aren't specifically dependant on the volatile operations. - Dean :twisted:

Like you, I thought there were posted "real" examples from live code where the reordering occurred.  Hard to find.

[edit] An anecdotal example here https://www.avrfreaks.net/commen...

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

Last Edited: Tue. Aug 28, 2018 - 03:09 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@theusch: I was reading the same post a few minutes ago. Note in #15 KKP explains the #14 example is incorrect:

abcminiuser wrote:

From the original discussion thread in the avr-libc development lists:

Quote:

The main problem is that the generated code doesn't always do what is
expected. Something like:

 

> uint16_t counter1, counter2;
>
> ...
>

> int main(void)
> {
> uint16_t a, b;
>
> ATOMIC_BLOCK(ATOMIC_RESTORESTATE) {
> a = counter1;
> b = counter2;
> }
>
> PORTB = a;
> PORTC = b;
>
> return 0;
> }

compiles to (using gcc 4.2.0):

> ATOMIC_BLOCK(ATOMIC_RESTORESTATE) {
> 64: 8f b7 in r24, 0x3f ; 63
> 66: f8 94 cli
> 68: 8f bf out 0x3f, r24 ; 63
> a = counter1;
> b = counter2;
> }
>
> PORTB = a;
> 6a: 80 91 60 00 lds r24, 0x0060
> 6e: 88 bb out 0x18, r24 ; 24
> PORTC = b;
> 70: 80 91 62 00 lds r24, 0x0062
> 74: 85 bb out 0x15, r24 ; 21

Neither of used variables is declared volatile: it is no surprise the compiler is free to move them at will! If this is the only known "example" reordering may happen and every other use of inline assembly is known counterexample when reordering did not happen maybe it is time to rethink the memory barrier thing?

Last Edited: Tue. Aug 28, 2018 - 03:17 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can force some things to occur in a guaranteed order by using the mechanisms

provided by C++ classes.  Objects are constructed where they're declared and then

destructed at the end of the block they're in.  I have been using the following code

and have not had any problems with reordered cli instructions:

struct SavedStatusRegister {
    SavedStatusRegister  () : status_ (SREG) { }
    ~SavedStatusRegister () { restore_status(); }

    void restore_status  () { SREG = status_; }
    operator uint8_t     () { return status_; }

    SavedStatusRegister (const SavedStatusRegister&) = delete;
    SavedStatusRegister& operator= (const SavedStatusRegister&) = delete;

private: uint8_t status_;
};

#define PROTECT_STATUS_REGISTER SavedStatusRegister saved_status_

struct InterruptGuard : SavedStatusRegister {
    InterruptGuard () : SavedStatusRegister () { disable_interrupts(); }

    void enable_interrupts  () const { asm_code_sei(); }
    void disable_interrupts () const { asm_code_cli(); }
};

#define DISABLE_INTERRUPTS InterruptGuard int_guard_

At the location where you want interrupts disabled, you just say:

{
    some_code();
    
    DISABLE_INTERRUPTS;
    
    more_code();
    
    // interrupts automatically restored here
}

--Mike

 

Pages