EEPROM checksum

Go To Last Post
14 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi.

I'm programming an ATmega644P, as a part of a set-top box main board. After some problems with EEPROM corruption, I want to create an EEPROM checksum to be stored in the EEPROM when writing a new value to the EEPROM (that's a lot of EEPROM's in one sentence :)). And I'm pondering on how to achieve this in an easy way.

After some searching, I've opted for going for a 16-bit Fletcher's checksum. But the question is how I am to generate this.

Presently we use the ATmega's EEPROM to store various settings, and we've mapped a list of variabled and their locations in the EEPROM. As a result, the stored values are not in one continous part of the EEPROM.

I've thought about just creating a RAM stored array the same size as the total amount of EEPROM stored values, and just copying the values from the EEPROM to the RAM, one by one. And finally, calculating the checksum and storing it in the EEPROM.

Alternatively, it's possible to just create a checksum based on the whole EEPROM, from start to finish. But I'm not sure about how to do this in CodeVisionAVR, as the EEPROM is accessed through global variables defined to access set EEPROM registries.

Have anyone done this before, and have some thoughts on how this is best implemented? I'm thankfull for any and all feedback. :-)

Thanks!

EDIT: Added relevant code: eeprom.c

Attachment(s): 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You didn't say but is that CodeVision?

Anyway, if you want to read all the variables to RAM in one go then group them all in a single struct. Your compiler should then support:

ram_struct = eeprom_struct;

to invoke a complete block copy from EEPROM to RAM.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm sorry. I mentioned CodeVisionAVR, but I see that it's a little vague. Yes, I'm using CodeVisionAVR. :-)

But do you know if it's possible to access areas of the EEPROM that isn't defined by a global variable?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

After some problems with EEPROM corruption,

While copies of configuration setups and integrity checks and repair options ("restore factory defaults") are indeed useful and wise in many situations, I'd try to find out the cause of this "corruption".

AVR EEPROM doesn't "corrupt" by itself. There, I've said it. Now, there are a number of ways it can get corrupted. The most obvious is not having a brown-out detector (internal or external). Weird things happen when operating below minimum Vcc levels e.g. ~1V. That usually takes care of things in most apps.

Severe noise can cause AVRs to do all kinds of weird things. Not just EEPROM, but indeed a rogue EEPROM write can happen. So cure the noise. Note that noise spikes severe enough to cause problems inside the AVR will WAY violate Absolute Maximum Ratings. So if you indeed have that noisy an environment then the effort of your checksum/repair is largely putting lipstick on a pig.

Another cause of apparent corruption is a write sequence, of say a multi-byte variable or set of related items, doesn't complete before e.g. power is lost. The approach there is to detect power loss early, and suspend operations and wait to die.

And of course, there can be a rogue pointer in your code...

Lee

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Anyway, if you want to read all the variables to RAM in one go then group them all in a single struct. Your compiler should then support:

ram_struct = eeprom_struct;

to invoke a complete block copy from EEPROM to RAM.

That's not a bad idea, really. But is this possible to achieve with EEPROM variables that are spread out and not in one area of the EEPROM?

The following will work, but this will of course place the variables in one continuous area:

struct eeprom_structure {
    char a;
    int  b;
    char c[15];
};

eeprom struct eeprom_structure test_structure@0x90;
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:
While copies of configuration setups and integrity checks and repair options ("restore factory defaults") are indeed useful and wise in many situations, I'd try to find out the cause of this "corruption".

Not a bad idea. I've added BOD, and it will be interesting to see if this has any effect. Of course, now I wonder why this is not enabled by default? Why would you *not* want to have protection against undervoltage?

theusch wrote:
Severe noise can cause AVRs to do all kinds of weird things. Not just EEPROM, but indeed a rogue EEPROM write can happen. So cure the noise. Note that noise spikes severe enough to cause problems inside the AVR will WAY violate Absolute Maximum Ratings. So if you indeed have that noisy an environment then the effort of your checksum/repair is largely putting lipstick on a pig.

True (and I love your analogy :)). But in this case, voltage spikes are not an issue. The setup here is a set-top box main board, that has a main CPU running Linux firmware, and a ATmega644P microcontroller connected to the CPU through both USART0 and the SPI controller. The problem here happens when a main CPU crashes. For some reason, we see that when the main CPU has crashed, the microcontroller connected to it in some cases gets its EEPROM scrambled. Seeing as this happens in the microcontroller when an external component malfunctions, I get the feeling that it's not the code in the microcontroller that causes it.

Then again, it could be a compiler error... :)

theusch wrote:
And of course, there can be a rogue pointer in your code...

Pardon the n00b question, but could you elaborate on that one? Is this an issue when using a C compiler, and not coding ASM?

Thanks!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Pardon the n00b question, but could you elaborate on that one?

Quote:

when a main CPU crashes.

An AVR doesn't "crash". If your program is running amok, it could be doing anything, including "corrupting" -- writing unexpected values -- to EEPROM.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

When you say the eeprom is corrupted do you mean you are reading a random variable, or do you read back all ones? If the latter case I would try setting the "preserve eeprom" fuse. I have also seen corruption where I could not read the eeprom right after a reset of the processor. However if I then power cycled the cpu again it was readable. Perhaps a noisy power supply during reset was locking up the eeprom read circuity of the cpu? Also try setting your clock fuse setting to use the longest delay time.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

Quote:

when a main CPU crashes.

An AVR doesn't "crash". If your program is running amok, it could be doing anything, including "corrupting" -- writing unexpected values -- to EEPROM.

I apologize if I miswrote. It's not the AVR that "crashes". The CPU connected to the AVR sometimes crashes when the developers break out of the application. In this state, I don't know what the CPU is doing. But I imagine that it *could* be locked in a state where it sends random data to the AVR on either the SPI bus or the USART0. And initially, considering that the CPU is running at a frequency more than 10 times higher than the AVR, I was wondering if this could "saturate" the AVR in such a way that the EEPROM could get corrupted.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

kscharf wrote:
When you say the eeprom is corrupted do you mean you are reading a random variable, or do you read back all ones? If the latter case I would try setting the "preserve eeprom" fuse. I have also seen corruption where I could not read the eeprom right after a reset of the processor. However if I then power cycled the cpu again it was readable. Perhaps a noisy power supply during reset was locking up the eeprom read circuity of the cpu? Also try setting your clock fuse setting to use the longest delay time.

No, it's not the "preserve eeprom" fuse bit. Both because we're actively enabling it during AVR flashing, but also because the value is not necessarily set to 0xff.

But we're doing some EEPROM "checkups" during the AVR boot, so I guess something *could* happen then.

The clock fuse could be a good idea. I'll look into it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

what about #include

It's got 4 implementations of CRC routines including their equivalent C.

I use it in my bootloader to ensure that the downloaded program is complete! It's saved me a few times!

I recommend you use Xmodem CRC16 since srec also can append CRC16's to files.

srec_cat .eep -binary -l-e-crc16 0x -xmodem output.eep -binary

[http://blog.schicks.net/wp-conte...

Check out the link it has a few words about loading files and verifying them.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
we've mapped a list of variabled and their locations in the EEPROM

Do not know IAR. Does it always put globals in SRAM? Perhaps you can make a continuous section/structure of variables(sram) mirrors(eeprom) and its default values(flash) as suggested so that the addresses correspond some way? A single loop could easily check and overwrite data with defaults if errorous.
There will be no possibility to find out which location write was interrupted/damaged with crc (whole area must be overwritten by defaults). What about keeping eeprom write atomic? You need one extra location for backup_value(union) and one backup_pointer (void*). Both eeprom erase byte and eeprom write byte are atomic if you keep specification voltage (U>1.8V) for 1,8ms after power down so it is pretty easy to design circuit to keep it that way at any possible condition.

        if (eeprom_tv_start_volume > 50)
            eeprom_tv_start_volume = DEFAULT_TV_START_VOLUME;

Mind this will not work on signed variables, as 0xFF=-128. But on unsigned it is fine.

No RSTDISBL, no fun!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

You might check out my project here:

https://www.avrfreaks.net/index.p...

It is for gcc though although the concepts may help you or it could be adapted.

Good luck,

Alan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi!

Bassmus wrote:

ATmega644P microcontroller connected to the CPU through both USART0 and the SPI controller. The problem here happens when a main CPU crashes. For some reason, we see that when the main CPU has crashed, the microcontroller connected to it in some cases gets its EEPROM scrambled.

You should protect telegrams between CPU and ATmega. After CPU crash mega wouldn't receive accurate telegrams.

I use CRC (16 or 8) in every myself controlled connections.

Ilya