how to write EEPROM with extreme safety?

Go To Last Post
14 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I want to be robust to power drops at least.  I'm interested in robustness to interrupts as well (see below).

 

what I have now is a system with three bytes per byte of data: copy_a, copy_b, and valid_copy (0x00 means a, other means b).

Writes work like this:

 

  1. check valid_copy

  2. write into the other copy

  3. read it back and check, if ok means both copies are currently valid (one at old value, one at new), else return fail

  4. write valid_copy to point to newly written copy

  5. read back valid_copy, if ok write can be reported as successful, else return fail

 

After a fail at least the old copy is ok.  write to valid_copy cannot fail in such a way that it doesn't point to one copy or other.

after fail it isn't known whether at old value or new, clients must check

 

Multi-byte write just use a single valid copy byte (corresponding to first byte), and write it after all other bytes written and read back ok

 

Is this approach safe against power drops?  The datasheet says incorrect CPU instructions can happen as well at power drop, so I guess the

readback check could manage to fail in such a way that a corrupted write gets missed.  But if this is used together with brown-out detection

(as described in Preventing EEPROM Corruption section of datasheet) is it entirely safe?

 

Is it safe against interrupts during EEPROM write, or must ATOMIC_BLOCK() be used?  Note that I'm not interested in doing any other EEPROM

activity from the interrupt context so maybe interrupts aren't even a problem, I'm not clear on this.

 

Thanks,

Britton

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

When the voltage gets too low an in-process EEPROM write can fail, possibly infect/corrupt any locations...the logic will go haywire.  Who can say what the logic does when its voltage lifeline is cut down? So any such scheme has a chance of failing.

 

You need to ensure that at the moment of commencing, a byte write will have sufficient power to finish all the way through its write process.  Don't start a byte write until you are certain enough energy remains to complete it.  You only need to hold the power for several milliseconds.  Size the caps to allow the AVR to continue at least this long.  If the BOD is set up high, this will be an assurance at the start of the sequence, the voltage is sufficient to proceed forward to storing (assuming the cap's charging is not slowed by resistance).  The most that can happen is a controlled reset operation, if the brownout is tripped (not a low voltage mayhem).

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sat. Jul 4, 2020 - 02:34 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No write is safe against a power fail. The result is indeterminate. The brown out detector protects against random writes, but if you have a write in progress and you loose power, then the outcome is not assured. You need to ensure an eeprom write is not interrupted.
You can mitigate against the problem by using a journalling technique that guarantees valid old data or new data. Or you can add extra hardware to maintain sufficient power of the duration of the eeprom write and can detect impending power fail.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can create several copies of the data, and write a checksum or some check value in each copy. Write them in a round-robin fashion (so you always overwrite the oldest one) and add an incrementing version number on each one.

 

Then when you start up, read the latest valid copy of the data, ignore any corrupted or older copies of the data.

 

There is always a possibility that a glitch or cosmic ray or something, corrupts data in memory and your good code will write out bad data with a valid checksum, that is difficult to prevent.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This problem is a lot like error-correcting RAM, as markxr pointed out.

 

What you want to do is, every time, write out 'copy_A', 'copy_B', and then 'valid' but 'valid' should not be a flag - it should be a checksum of A and B (XOR or a parity check will work well for bytes - longer sequences will need a somewhat more complex algorithm).

 

When reading, take A and B and calculate the checksum, proving all three EEprom write operations completed successfully*.

 

If you need valid readings even when an EEwrite has failed, you'll need a series of writes, and you can pick the most recent one that did not fail.

 

S.

* Or you have a wildly unlikely multiple error.  There are algorithms for detecting and correcting multiple bit errors as well, but the mathematics are beyond the scope of this post.

 

Edited to add PS:

PS - Protect your EEwrites from interrupts.  To do otherwise is sheer stupidity.  If you really need that fast interrupt reactions that you cannot tolerate the timing of an ATOMIC_BLOCK, get some faster hardware.  S.

Last Edited: Wed. Jul 8, 2020 - 03:33 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I write two copies of data. In each copy part of it is its CRC. Each copy is saved in other memory page or one of they in internal and second one in external memory. In that case I don't start writing the second one before the first isn't finished. When restore data, I check if both copies have correct CRC. If not I restore data from copy with correct CRC. If yes, I test if both copies are the same. If they aren't identical I restore data from first (two different copies both with correct CRC means that only 1st was wrote, break was between writings).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kartman wrote:
No write is safe against a power fail. The result is indeterminate. The brown out detector protects against random writes, but if you have a write in progress and you loose power, then the outcome is not assured. You need to ensure an eeprom write is not interrupted. You can mitigate against the problem by using a journalling technique that guarantees valid old data or new data. Or you can add extra hardware to maintain sufficient power of the duration of the eeprom write and can detect impending power fail.
Ok, but what granularity is guaranteed? Byte? Page? Because eeprom has a page size bigger than one byte (why?). Without this information, you cannot do journalling. The eeprom emulation techniques used for chips without separate eeprom are based on the fact that the most you can loose is one flash sector/page.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Scroungre wrote:

Edited to add PS:

PS - Protect your EEwrites from interrupts.  To do otherwise is sheer stupidity.  If you really need that fast interrupt reactions that you cannot tolerate the timing of an ATOMIC_BLOCK, get some faster hardware.  S.

What do you mean by "EEwrites"? I think eeprom in the avr's is kind of RWW. The cpu can run from flash freely for any long the eeprom needs to be written. If the cpu can poll for eeprom completion flags, it can do anything, including interrupt processing.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awit wrote:

I write two copies of data. In each copy part of it is its CRC. Each copy is saved in other memory page or one of they in internal and second one in external memory.

 

Elaborate, but powerful.  If your data is that valuable, throwing steel and rivets and boilerplate at it is not a bad thing.  S.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

rammon wrote:

Scroungre wrote:

Edited to add PS:

PS - Protect your EEwrites from interrupts.  To do otherwise is sheer stupidity.  If you really need that fast interrupt reactions that you cannot tolerate the timing of an ATOMIC_BLOCK, get some faster hardware.  S.

What do you mean by "EEwrites"? I think eeprom in the avr's is kind of RWW. The cpu can run from flash freely for any long the eeprom needs to be written. If the cpu can poll for eeprom completion flags, it can do anything, including interrupt processing.

 

Yes.  That's why we don't let the CPU poll for EEprom completion flags while doing other things.  We remain atomic until the EEprom is done writing.

 

The EEprom in an AVR is not the same as the SRAM.  It requires timing details.  Think of it as a 'wait state', perhaps.

 

What I meant by 'EEwrites' is "putting some amount of data (could be many bytes) into the EEprom".  Muck with that timing at your own risk.

 

Yeah, there are reasons why you might want the CPU to do something else while the EE is writing, but unless you're writing highly optimized assembler, knock it off.  It's Not A Good Idea.

 

S.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

To be safe (against a power failure) you need two things:

- a control sum for the value: something that can say your data is valid.

- a history (journal): the previous value is perfectly fine in case you lost the current one.

Let's say we have byte granularity (if you write to a byte, you loose only that byte).

For a byte, we need three bytes: the first two of them are the previous/current (circularly) and the third is the control sum for the "right" one.

The write order is important: first write the new value, then write the control sum.

The simplest control sum is just the bitwise inverted value, it is simple, fast and at least guarateed to be different than the both "good" value and the possibly "wrong" one. Here we postulate than we won't write a new value identical to the previous one, but even in that case the technique works.

So, let's say we have the following three bytes "record" for our one byte eeprom storage:

A B ~A  (~A is the control sum, and it is for the A of course)

We want to write the new value C

Step1: We detect that the good value is the first byte (A) because the control sum is for it. So we write in the second byte (B):

A C ~A   --- so far so good, but the valid one is stil the previous (A)

             A X ~A   --- oops, it was a power failure, the C got an X (anything). We still have a valid value, the previous A.

Step 2: write the control sum for the new value (~C)

A C ~C   --- done, we have the new value valid

             A C X     --- oops, but the things are simple: although the control sum is indeterminate, both the A and C values are perfectly fine, we can choose any of them. This record needs fixing at power up let's say A C ~A

Just for completion, when the second byte is the current:

A C ~C   --- we write a new value D

D C ~C  /  X C ~C

D C ~D  /  D C X ---> fix to D C ~D  (in this case we restore the new value not the old one!)

 

That's all

 

If the eeprom page (4/8 bytes usually) is the granularity, every byte in the above algorythm should be in different pages, so you need three pages, not three bytes.

 

Conclusion: For one byte storage: 3 bytes (pages?), only two writes.

 

 

 

 

Last Edited: Wed. Jul 8, 2020 - 09:36 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Scroungre wrote:

rammon wrote:

Scroungre wrote:

Edited to add PS:

PS - Protect your EEwrites from interrupts.  To do otherwise is sheer stupidity.  If you really need that fast interrupt reactions that you cannot tolerate the timing of an ATOMIC_BLOCK, get some faster hardware.  S.

What do you mean by "EEwrites"? I think eeprom in the avr's is kind of RWW. The cpu can run from flash freely for any long the eeprom needs to be written. If the cpu can poll for eeprom completion flags, it can do anything, including interrupt processing.

 

Yes.  That's why we don't let the CPU poll for EEprom completion flags while doing other things.  We remain atomic until the EEprom is done writing.

 

The EEprom in an AVR is not the same as the SRAM.  It requires timing details.  Think of it as a 'wait state', perhaps.

 

What I meant by 'EEwrites' is "putting some amount of data (could be many bytes) into the EEprom".  Muck with that timing at your own risk.

 

Yeah, there are reasons why you might want the CPU to do something else while the EE is writing, but unless you're writing highly optimized assembler, knock it off.  It's Not A Good Idea.

 

S.

I don't really understand about what you are talking, I want to do a very simple thing: the cpu is dealing with eeprom in the "main" program, in strictly order (waiting for eeprom write completion byte by byte) but in the meantime let the interrupts do their jobs just normally (for example, in a 5 milliseconds eeprom write, we can have 5 uart interrups at 9600bps, and these interrupts can work just fine, putting the received byte in a buffer, etc).

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Never mind.  S.

Last Edited: Wed. Jul 8, 2020 - 11:31 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The next code examples show assembly and C functions for reading the EEPROM. The examples
assume that interrupts are controlled so that no interrupts will occur during execution of
these functions.

A quotation from the spec sheet I have for the ATmega328.

 

You are expected to control your interrupts during EEprom write operations.  S.