EEPROM errors

Go To Last Post
24 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello all,

I'm having a problem with EE chips. Changing from a M24C08 from ST to a microchip 24LC08B we now see blocks of 16 bytes turning up with all FF. We're unable to recreate the issue and can't seem to find any difference in the specs that would lead to this. Any ideas?

They are in high lightning areas but the old M24C08 never had a problem with this. Could it possibly be a batch of counterfeit parts?

Thank you for any thoughts you may have on this.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do you chance something more or just this ic? Something on power supply?

Quote:
They are in high lightning areas but the old M24C08 never had a problem with this. Could it possibly be a batch of counterfeit parts?

Some other equipment fail near your equipment? Have some storm when happen it? The idea is to know if something strange happen around your equipment are...

Regards,

Bruno Muswieck

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

can it be a timing issue?
or the wp pin floating...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Politically correctness say it's wrong to blame microchip, as it's not acceptable to say they do worse chips than any other. Just say it twice and watch. You should waste your time searching (maybe forever) for the real (acceptable) cause of this phenomena.

Does it worth to investigate such volatile phenomena when you already have M24C08 working well ? You better try the cheapest chinese 24C08 batch on market and you get the answer.

Dor

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for all the suggestions guys. It's very much appreciated...

Unfortunately going back to the M24C08 doesn't appear to be an option.

The WP pin is not floating. and the lightning is not new and hasn't been a problem before. THought I'd mention it incase there's a reason why it would affect one chip and not the other. The rest of the components appear to be working just fine. Only issue is the corrupt data on the EE.

Timing could very well be the problem. Will do a thorough investigation tomorrow...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I been caught out with Microchip in the past. Although they label pins 1-3 as A0, A1, A2 they are in fact not connected ( page 5 of the data sheet). On the ST part they form part of the device address. Could this be your problem?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you fingar.

Had a look at the old circuit diagrams and we've never connected those pins.

Our best guess at this point is that, during a power on reset, the buffer is getting cleared (filled with FF) but the address pointer is being left as is and a write is being triggered. Does this sound plausible to you? Turns out the microchip part only stops working down at 1.5V (we thought it was 2.5V) which may well be the root of the problem.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well can you describe more about hardware and software?

-Are there any other chips on I2C bus?
-Do you have a I2C power up reset sequence in the code that safely aborts bus operations without starting a write?
-What size pull-ups you have on bus?
-Is the bus long?
-Bus voltage?
-Bus speed?
-Do you have control over the write protect pin in software?
-Do you do acknowledge polling when writing? or do you have a delay between writes?
-are 16-byte writes aligned to 16-byte pages?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Jepael,

To answer your questions:

- No, just the EE on the I2C.
- No, looking back at the code, this wasn't implemented in the problematic product. It seems quite likely that this may be our problem. Once we manage to recreate the problem, I'll be able to confirm this. This does tie in with the current theory and points to a way to solve the problem :)
- Pull-ups in the bus are 10K
- No, the bus length would be measured in mm
- The voltage is kept at 5V on the bus
- The speed should be 100KHz
- Yes, the write protect pin is controlled in software. Does anyone know how the microchip part handles this pin? At what point/points in the write cycle is it polled?
- During writing a delay is used (from the data sheets the delay should be sufficient for both parts)
- All 16-byte writes are page aligned. To the best of my knowledge the corruption is always one page long.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Based on that, even before you think it is a power-up reset problem, it may be one of these too:

-Write protect pin is controlled wrong, so it won't write.

-Delay between writing is too small. You should make sure the device responds to next write command properly, or you should replace the delay with polling when the device responds.

-Also if it only gets corrupted on power-up or reset, maybe you have something odd regarding the IO pin initialization? Do you make sure you initialize the IO pins to be high-impedance inputs before enabling TWI hardware?

-Oh yeah, do you use TWI hardware or do you have a software I2C implementation?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This particular product is using a purely software I2C implementation.

I should add, the product has been deployed in other areas with no such problem. The only environmental difference we know of is the lightning which is causing more power failures... I have considered the bus free time as a possible problem but surely this would cause the problem to show up all over the place? Same with the write protect pin?

We have also sent a part that has experienced the corruption to microchip to see if it's just part of a bad batch.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
-Do you have a I2C power up reset sequence in the code that safely aborts bus operations without starting a write?

Quote:
- No, looking back at the code, this wasn't implemented in the problematic product. It seems quite likely that this may be our problem. Once we manage to recreate the problem, I'll be able to confirm this. This does tie in with the current theory and points to a way to solve the problem

Quote:
I should add, the product has been deployed in other areas with no such problem. The only environmental difference we know of is the lightning which is causing more power failures...

As you said that the product work on other areas with no corruption the problem is on power supply (shure) and firmware (maybe).

Does you uC has some reset source for power supply goes down? If yes, does your uC could get an power supply reset on the middle of communication with the memory and after that the things get lost...

How is decoupling caps and power supply lines around 24LC08B?
All the products on the area of lighting have this problem?

Regards,

Bruno Muswieck

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So what kind of software implementation it is? I2C pins are driven between output low and input only? Never output high?

Those devices should go up to 400kHz, so running a 100kHz bus should not be a problem, even if you violated some 100kHz bus specifications a bit. But a standard is a standard and every timing aspect should be considered to be within specifications, or things don't have to work.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

About 2% of the devices are showing this problem in the affected area. A power fail routine allows for the completion of any EE writes that may be happening before doing anything else.

Thanks for the headstart on this problem, you guys have all given me helpfull leads with this. Still waiting to get my hands on one of the units but now, when I do, I'll have a couple of places to start poking it in earnest :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
I2C pins are driven between output low and input only? Never output high?

I'm afraid I don't really understand. The data is just bit banged on a normal IO port onto the 5V bus through a 2k2 onto the SDA pin.

Checking the code again, I see that ACK polling is actually implemented during a write sequence.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Phil.Barlow wrote:
Quote:
I2C pins are driven between output low and input only? Never output high?

I'm afraid I don't really understand. The data is just bit banged on a normal IO port onto the 5V bus through a 2k2 onto the SDA pin.

Checking the code again, I see that ACK polling is actually implemented during a write sequence.

So your hardware is designed wrong, and maybe even your code? Have you ever read I2C specifications or seen how I2C hardware or software is done?

If you have 10k pull-ups on bus and 2k2 in series with AVR, your low voltage on bus is 0.9 volts. Now while that is withing specification, it is usually much lower so you get better noise margins. Typically you would use 0 to 100 ohms in place of 2k2 resistors.

And the I2C bus is an open collector bus. Any device can either pull a line low or let float via the pull-up resistors, pulling a line high is never allowed. If you have pulled data high with your AVR, fortunately your 2k2 series resistors have helped here so no permanent damage is done, the EEPROM has seen 1k8 to 5V when pulling low so you have been within the 3mA limit.

So in theory your hardware and code should work, but I do see some suspicious things here, so it may be that some EEPROM chips work and others just won't with your hardware and software.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Fortunately the current implementation is hardware based and much improved on the version that's giving troubles. I've inherited this problem and am not totally familiar with the design.

The data is never pulled high by the microcontroller but I see there may be situations where this could occur on the product in question. This, in my mind, would still affect either device equally... They both have identical ratings for high and low logic levels. Were the issue noise on the data line I would also expect to see more random corruption. It is always exactly one page that goes bad.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And the 2k2 resistor in series does you have or not?

Other question, do you have an ac/dc power supply? Which one, swicht or linear? And your protections?

Regards,

Bruno Muswieck

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
The data is never pulled high by the microcontroller

that is correct, it should not be. The line should either be pulled low or left floating with an external pull-up of 5K to 10K

you should wait a certain amount of time after writing the first bank before continuing with the second.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Can you post the relevant bit-banging code, and your schematic near SDA, SCL, WP?

I can see no reason for any 24Cxx chip to behave differently from different manufacturers. Providing of course that you are driving it properly.

It is seriously bad news to change chips in the field. But upgrading the firmware (or exchanging boards) is not too much trouble.

Let's face it. You know that 2% have failed. Identical firmware / hardware will also fail in time. You have to resolve it before your customer(s) get upset.

I am horrified by a 2k2 resistor. Is it in series with the SDA line?

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Good morning,

The 2k2 is in series on the SDA line with a 10k pull up to a 5V rail in between the 2k2 and the IO pin. Otherwise the circuit is stock standard. WP connected to a 5V line through another 10k, controlled by another IO. Clock connected to an IO with a 10k to ground.

I'll try and have some code posted here soon, I don't have access to it at the moment.

PSU is a rather involved linear regulator. There is more than enough protection on the supply for the conditions. The supply itself is showing no signs of damage in any way.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Phil.Barlow wrote:
Good morning,

The 2k2 is in series on the SDA line with a 10k pull up to a 5V rail in between the 2k2 and the IO pin. Otherwise the circuit is stock standard. WP connected to a 5V line through another 10k, controlled by another IO. Clock connected to an IO with a 10k to ground.

Putting a 10k to ground is NOT "stock standard" in any way. Please google and read "i2c specification", here is the latest:
http://www.nxp.com/documents/user_manual/UM10204.pdf

So again based on that new information, with resistor on SCL to GROUND, your SCL line cannot be driven in open-collector mode, so you must bit-bang the line as output high and output low. While that is yet again against the specifications and a lot, in this case it does not matter a lot since the EEPROM is the only device on bus, and the EEPROM never tries to pull SCL low in order to slow down communications if necessary.

But what this means is that always when your AVR goes into reset, the SCL line is pulled low by a resistor and SDA line is pulled high by a resistor.

At least this means that whenever the AVR goes into reset, your EEPROM might see a) high SDA with falling SCL (not dangerous), b) rising SDA with low SCL (not dangerous), or c) rising SDA with falling SCL (such operation is not defined and should not happen or devices go crazy).

Also when the AVR comes out of reset and configures the IO pins, it matters very much in which order you initialize the pin state and direction and also in which order you initialize SDA and SCL pins, you may accidentally send a stop condition and the EEPROM may trigger a write. Seems this is what you are experiencing here.

I have to agree what is said before, the chips are identical in specs but you are not exactly using them by the specs or how they should be used, so they don't have to work identically in weird conditions.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
PSU is a rather involved linear regulator. There is more than enough protection on the supply for the conditions. The supply itself is showing no signs of damage in any way.

No sign of damage doesn't mean that some surge didn't pass to your board..

I'm seen here that a combination of things are creating your problem..

Regards,

Bruno Muswieck

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thank you for all the help everyone. It would seem it is indeed a combination of things causing the problem but that we could've avoided it by closer adherence to the I2C spec (thanks for the link).

We're suggesting that a new start up routine will probably fix the issue (an application note from microchip hints that this is a problem they may actually know about).

We've still not recreated the problem so I can't yet confirm any of your suggestions but if we ever do I'll be sure to let you know.

Thank you again for all the help.

Phil