EEPROM strategies?

Go To Last Post
25 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This is a followup from an earlier thread where I was seeing EEPROM corruption (cured by enabling BOD). Two things were mentioned that I wanted to ask about:

1) Don't use the first EEPROM location since it is said to be more prone to corruption. True or not?

2) Store every value in 3 EEPROM locations, using majority logic when reading back. How many people do this?

Mike

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have never used the second item. Though, I might if it were something critical like "password" or network address which, if corrupted, would lead to system failure.

I usually do the first. While it may be mythology with current technology, the "cost" is very small so little downside.

Jim

 

Until Black Lives Matter, we do not have "All Lives Matter"!

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I do skip the first location, but only because I've read to do so, not because I have any experience with it failing.

I don't do triple redundancy unless I have a specific need to do so, application driven.

JC

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have done double redundancy and using two (global) flags to indicate which copy is currently being updated; mainly to prevent partly written values, most of the data were 32 bits and the device only 8.

But this was not on an AVR, but on a system using one of those fancy Simtek chips that have an EEPROM cell next to each SRAM cell that copies the contents of the SRAM to EEPROM on powerdown and vice versa, using a capacitor as energy buffer.

Most of the data were statistics, like number of products produced, number of rejects, std deviation etc that are read continuously by another system but need to be saved in case of a sudden power loss.

For the really critical blocks of data that are not updated frequently like ADC calibration and serial number I also protected the blocks with a CRC, so that I least can reload defaults and give a warning if all else fails.

I also made sure the backup values were not a power of 2 of bytes apart, in case an address line gets stuck or receives a transient.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Did something similar on Ramtron's FRAM, two copies of a hundred bytes record, flagged appropriately for "currentness", because the application was updating them all the time (like 10 times a second or so). Never lost the data in hundreds of installations under rather harsh conditions. Having learned the trade on battery-backuped RAMs...

Just FYI, Simtek is gone, bought by Cypress; and as Simtek succeeded to buy out the nvRAM line of its sole competitor ZMD shortly before, Cypress is now the only place to go for these. We used to use the ZMD ones, a fine and convenient stuff indeed.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

2 sounds like a fictional decision making method for a super computer.

I thought the EEPROM was good for a million writes (ok 100,000). Does it actually get corrupted often enough to implement a design that gives you 1/3 of the room available?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If the cell is writing while you lose power, then yes, especially when your data is multibyte.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It depends how critical the data is. The last selected channel on a TV is not too bad when lost. Odometer readings are, just like carefully adjusted calibration constants might be. For some devices malfunction due to NVRAM failure can be costly because it's expected they always work or at least required to indicate something is wrong, like medical devices.

The EEPROM of an AVR has one big advantage, it is not so easy to write to, so the chances of a runaway program or transients on a bus causing problems are slimmer. Not like an external battery backed up async SRAM with dozen of connections that can pick up noise.

Jack Ganssle has a nice article on it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kk6gm wrote:
This is a followup from an earlier thread where I was seeing EEPROM corruption (cured by enabling BOD). Two things were mentioned that I wanted to ask about:

1) Don't use the first EEPROM location since it is said to be more prone to corruption. True or not?
e


AVR-LibC FAQ #33 that I wrote:
"Why are some addresses of the EEPROM corrupted (usually address zero)?"

Quote:
In older generation AVRs the EEPROM Address Register (EEAR) is initialized to zero on reset, be it from Brown Out Detect, Watchdog or the Reset Pin. If an EEPROM write has just started at the time of the reset, the write will be completed, but now at address zero instead of the requested address. If the reset occurs later in the write process both the requested address and address zero may be corrupted.

To distinguish which AVRs may exhibit the corrupt of address zero while a write is in process during a reset, look at the "initial value" section for the EEPROM Address Register. If EEAR shows the initial value as 0x00 or 0x0000, then address zero and possibly the one being written will be corrupted. Newer parts show the initial value as "undefined", these will not corrupt address zero during a reset (unless it was address zero that was being written).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jayjay1974 wrote:
But this was not on an AVR, but on a system using one of those fancy Simtek chips that have an EEPROM cell next to each SRAM cell that copies the contents of the SRAM to EEPROM on powerdown and vice versa, using a capacitor as energy buffer.

I've just learned that Cypress now makes these with serial iterface, too, up to 8Mbit (1MB). Very handy.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just another technique to ensure that you have what you think you saved to EEPROM . . .

// have to be in global space
uint8_t  EEMEM EEPROM1_check_byte = 0x55;
uint8_t  EEMEM EEPROM1_address_lo_byte = 0;
uint8_t  EEMEM EEPROM1_address_hi_byte = 0;
uint8_t  EEMEM EEPROM1_crc = 0;


int16_t main(void)
{
   uint16_t current_EEPROM1_address = 0;
   uint16_t temp_EEPROM1_address = 0;
   uint8_t local_crc = 0;
   uint8_t lo_byte = 0;
   uint8_t hi_byte = 0;
 

// test to make sure the EEPROM is there and retrieve stored current_EEPROM1_address  
   if((eeprom_read_byte(&EEPROM1_check_byte)) != 0x55)
   {
      ERROR_FLASH('1');
   }
   else
   {
      temp_EEPROM1_address = eeprom_read_byte(&EEPROM1_address_lo_byte);
      local_crc = addcrc(temp_EEPROM1_address, 0);
      current_EEPROM1_address = temp_EEPROM1_address;
      
      temp_EEPROM1_address = eeprom_read_byte(&EEPROM1_address_hi_byte);
      local_crc = addcrc(temp_EEPROM1_address, local_crc);
      current_EEPROM1_address = ((temp_EEPROM1_address << 8) & 0xFF00) + current_EEPROM1_address;

      if(local_crc != eeprom_read_byte(&EEPROM1_crc))
      {
         ERROR_FLASH('2');
      }
      else
         {;} // everything is OK  
   }

and

uint8_t addcrc(uint8_t added_data, uint8_t crc)
{
   uint8_t i = 0;


   for(i = 0; i < 8; i++)
   {
      crc = (uint8_t)((crc << 1) | (((crc >> 7) ^ (crc >> 1) ^ crc ^ added_data) & 1));
      added_data >>= 1;
   }

   return crc;
}
/* ---------- end of addcrc() ---------- */
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dancanada wrote:

I thought the EEPROM was good for a million writes (ok 100,000). Does it actually get corrupted often enough to implement a design that gives you 1/3 of the room available?

No, but EEPROM on AVR is written in pages (typically 4 bytes long). So any write to an address within a page erases all page data and makes page to be written again. So effectively if you write adjacent bytes you shorten life of cells in the whole page. By placing your variables in separate pages you can extend number of writes to declared 100.000.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

TFrancuz wrote:
No, but EEPROM on AVR is written in pages (typically 4 bytes long).

Says who?

I don't doubt it, but would like to know the exact details.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

See my project in the projects area:

https://www.avrfreaks.net/index.p...

Thanks,

Alan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:
TFrancuz wrote:
No, but EEPROM on AVR is written in pages (typically 4 bytes long).

Says who?

I don't doubt it, but would like to know the exact details.

JW


From this thread, the AVRLIBC doc writers?
https://www.avrfreaks.net/index.p...

Now, see my discussion near the bottom of that thread on how could this be, given the posted erase-only and write times. Then convince me.

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks, Lee.

Well, well. Now I am left with doubts anyway. Not that I believed the Atmel datasheet too much so far.

Jan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kk6gm wrote:
2) Store every value in 3 EEPROM locations, using majority logic when reading back. How many people do this?

I do this even with BOD enabled because I have a value that is pivotal to the device's behaviour. However, it is still no guarantee because if you start off with 3 values:

1) dog
2) dog
3) dog

and you then run an update to "cat" on the 3 items, you can update 1) to "cat" successfully and then lose power while updating 2) and corrupt the cell. you then have 1 dog, 1 cat and one piece of garbage. Which one is correct?

BOD should complete writing to the cell if there is sufficient voltage to complete the write, according to the Atmel datasheet. But running off a lithium coin cell you must be pretty conservative to guarantee this, as the voltage drops like a stone once the battery starts to go flat.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Best is to have an early warning on power failure (external BOD tied to an interrupt) and then have some hold-up for the AVR itself, to ensure it can complete any writes in progress.

As for strategies in EEPROM, you can take a page, or two, out of the RAID manuals.

here's one method I've used. not on an AVR, but could easily be appied here.

you have 2 direct copies of the data. But arranged in offset order from each other. Then a 3rd set is simply an XOR of the two corresponding bytes. This allows you to find which data entry is incorrect, and even correct it.

Copy_A  Copy_B     XOR
 d.1     d.2    d.1 x d.2
 d.2     d.3    d.2 x d.3
 d.3     d.4    d.3 x d.4
 d.4     d.1    d.4 x d.1

you can use any arrangement on the 2nd copy, just be sure that it is a different order than the first.

Writing code is like having sex.... make one little mistake, and you're supporting it for life.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

For reasons still unknown to me one of my m164P used when developing my last project suddenly refused to to store data to location 0 in the chips internal EEPROM.

The chip had been programmed (FLASH) a large number of times, might even hit 10.000 limit.
BUT...
I have not reprogrammed EEPROM when programming FLASH. Location 0 is read when powering the system and its value decide if the chip will act as master or slave in the system. It does rarely get written to.

Took some time to figure out why the application failed to start. Not that obvious that EEPROM fail to store data, so I looked for the reason for failure in my code and hardware connections etc. before even starting to think about possibility of EEPROM failure (in only location 0).

After this I stopped using location 0, but it's still a mystery to me how this could happen and I still not feel confident that this type of failure can't occur at any location.

I have used AVR's for more than a decade and almost all of my apps have some of their data stored to EEPROM.
I have never seen a failure like this in a single cell before and the thought that it might occur again is pretty scary.

Someone else experienced this?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:
Thanks, Lee.

Well, well. Now I am left with doubts anyway. Not that I believed the Atmel datasheet too much so far.

Jan

You can find informations about that in Atmel datasheets too.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

TFrancuz wrote:
wek wrote:
Thanks, Lee.

Well, well. Now I am left with doubts anyway. Not that I believed the Atmel datasheet too much so far.

Jan

You can find informations about that in Atmel datasheets too.

Can you please point me to this information in the datasheets?

I just went through 5 different devices' datasheets and found no trace of anything similar.

Thanks,

Jan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The 4 byte paging thing only applies when using an external programmer.

Perhaps there's some logic that hides this fact when using the firmware interface.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I hope the topic fits to my question, or the other way around.

 

I have an application for which the requirement is to save some working parameters and also error freeze frame data to EEPROM.

 

Question is what strategy to use in order to have a good trade-off between not getting to the end of life for the EEPROM but also saving also the data.

For now my approach is quite restrictive, I just use a static flag so that I save to EEPROM only once per power cycle of the uC. Theoretically this could

also kill the EEPROM if the power is cycle every second for 100000 times for 27 hours, but realistically the power-cycles would be quite long.

But this is not enough for the functionality at the moment, because it is possible that I have an error, I save the freeze data for the error to the EEPROM,

then some parameters request is received and this will be then ignored so after a shutdown the parameters will not be saved.

So I am thinking now to add two static variables, one for the power-cycle, so that I save to EEPROM every time a shut-down is performed, and a counter with the maximum number

of error data that can be saved within a cycle, in this way the EEPROM will not be continuously written in case some errors are permanently enabled.

But probably I also need to introduce some counter to save to EEPROM periodically, for example every hour or so, the EEPROM writing function already has a check where I will

not write to EEPROM if the read data is the same with the data that needs to written. Also I write to 3 redundant locations and do a majority check when reading back.

 

What are you using for your projects?

Last Edited: Mon. Feb 24, 2020 - 01:58 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Remember that the 100,000 limit is not for the entire EEPROM but for every location in it. So if there are 128 bytes then theoretically you could write byte 0 the 100,000 times, then use byte 1, then use byte 2. So the entire life is 128 * 100,000. But to do this need to know when to move on to the next location and that is the tricky bit.  You can't just keep another counter in the EEPROM with "writes until we move on" because that counter itself will wear out the location it is held. So maybe the best idea is to just cycle storage through some part of or the whole device. If each write packet is 10 bytes say then write the 10 bytes but next time somehow mark those 10 as "stale" and move on to the next 10. In the read back code work your way through the range looking for the first entry that is not marked "stale". When you reach the end of the device (range) then simply cycle back and start to reuse the oldest "stale entry and so on.

 

There are all kinds of strategies along these line for "wear levelling" chips that age so you may simply want to google "wear levelling" and see some of the suggested approaches.

 

Oh and whatever EEPROM system you use make sure you use a writing routine that does a "read before write" verification and if the thing that is about to be written is the same that is already there just skip the write. It's actually the erase (all bit s back up to 1 so bytes return to 0xFF) that "costs". It involves an internal high voltage pump that restores the charge on a gate and in doing so the gate isolation oxidizes a little. So there's no point erasing/rewriting if what you are about to write is the same bit pattern already in that location - you will use one out of the 100,000 "life" of that cell for no reason.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Write if only not same = already done
The problem is the strategy to not write if not needed.
For example the freeze frame data is based on ADC value when some digital input is active.
I tried to implement a simple counter when errors are present, so that only the first n errors data would be written, but I shot myself in the foot as