Is FatFS file system robust?

Go To Last Post
12 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is the FatFS file system robust or is it my SD cards?

 

I'm using the Fatfs file system on an ATmega2560 to write to an SD card via an SPI bus.  fopen, fclose, and fwrite often return errors and I need to repeat the calls to get it to work, and even after 50 tries I occasionally get failures.   Happens on some SD cards a lot more than others, so perhaps my SD cards have been damaged. (I've got more on order).  

Occasionally I can't initialize the card, and after 50 tries, I put it in a wait loop and use the WDT to restart.  Just ran two systems overnight, one was fine, the other failed and had corrupted files on the SD card.

 

I also had to be very careful with interrupts during the SD writes.  

 

At the high level the code runs on a one second loop. (taking data, updating control loops, )

I've got an clock chip generating a 1024 Hz interrupt on the INT7 pin, which is controlling the timing.

I've got Timer2 generating an interrupt at 115.2 Hz (which just is sampling an input pin).

Also having 200 bytes coming on UART1 at 115.2 kbaud once a second but it is on a different clock that drifts relative to the 1024 Hz.

 

When interrupts occurred during SD writes, it would clear the spif flag, so I would check upon entering and leaving interrupts if the spif bit was set and changed the wait_for_spif() function to:

 

static void wait_for_spif()
{
    do {} while(!(bit_is_set(SPSR, SPIF) || spif_seen_in_interrupt));
    if (spif_seen_in_interrupt)
    {
        spif_seen_in_interrupt = 0;
    }
}

 

Does all of this sound reasonable? 

I am hoping new SD cards will be this robust.

 

 

 

 

 

 

 

 

This topic has a solution.
Last Edited: Mon. May 18, 2020 - 10:58 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There's two aspects to this question:

 

1) FatFs, properly implemented with correct electronics is very reliable and generally does what it promises to do without error

 

2) But FAT12/FAT16/FAT32 inherently is NOT a robust filing system. If there are interruptions/power fails part way through activity then the filing system can be left in a "bad way". This is not a new phenomenon. If you cast your mind back to the mid-80's when people were running the text base Microsoft Word or Lotus Excel on an IBM PC with two floppy drives there was a tendency for people to finish their work each day, write the document/spreadsheet they had been working on to floppy disk and then either pop out the disks to lock them in a filing cabinet or to simply turn the PC off at the wall (this predates all things like sleeping laptops and so on). If they popped the disks or killed the power at the wrong moment (often!) then they might catch things in moid-write. The FAT filing system would not finish all the activity and the files were left in a "broken" state. Users of that era may well remember "chkdsk.exe" and later "scandisk.exe" that were DOS utilities that would try to recover things. Those tools would often tell you about things like "orphaned chain found". That's because the FAT tables had "used" entries that appeared to give a linked list of clusters that were supposed to make up a file but there would be no matching directory entry as an "anchor point" to start the FAT chain. FAT32 on SD cards is just as susceptible to this 35-40 years later as it was back then. Nothing "clever" happened to FAT over all these decades that suddenly turned it from a vulnerable to a robust filing system. It's as vulnerable today as it ever was even if the silicon storage media is a bit more robust than the previous magnetic was.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Users of that era may well remember... 

Norton Utilities !

 

laugh

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yup but Norton was only fractionally better than chkdsk. The latter just found the orphaned chains and made new file000.chk file001.chk and so on out of them. Norton was a bit smarter in trying to connect the dots between directory entries that had invalid chains and orphaned chains with missing directory entries.

 

BTW the reason I know all about this (apart from my usual interest in FAT format anyway) is that I worked on a project that involved a whole bunch of "electronic advertising hoarding" that would show fixed graphics and Flash .swcs to "adverise stuff" and they were dotted around in WH Smiths, BP filling stations, shopping malls and other places. In all those the habit at night when the places were "shutting up shop" was just to go round and switch off at the wall anything electrical. Unfortunately for us this meant Compact Flash based Linux machines that had "VFAT" on the card and once in a while (which then became significant in a large population!) the filing system would corrupt to the point of non-recoverability because some FAT operation was caught in "mid action".

 

We then explored using "journalling filing systems" in which file changes are all "atomic". Each filing operation either works completely or they don't work at all and you can "replay journal entries" to get things back when things go wrong.

 

Bottom line: don't trust FAT if it's not being used in a "stable" operating environment.

 

(Oh and don't talk to me about Sky+ HD TV recording boxes. In those we used a kind of "dual FAT" system with some small clusters (data) and some very large clusters (MPEG2 recorded TV) - a bit like advertising hoardings people have a very nasty habit of turning the things off at the most inappropriate times - it especially doesn't help that a Sky+ is actually recording TV all the time that it is not in Standby so it' almost inevitable that it was always "mid write" when the user switched it off - oh and don't talk about thunderstorms and power cuts - those were always interesting times trying to do a forensic analysis of all the damage that had occurred across the UK and Europe!)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

glewis wrote:

Is the FatFS file system robust or is it my SD cards?

Neither - It's probably your power supply:

  • SD Cards can go from a modest 1mA to over 250mA in within a few µs.
  • The current profile is very peaky as erases & writes take place, you need very good de-coupling near the card socket.
  • The spec. for voltage levels on the I/O lines is important. Typically VIL (input low voltage) can be quite strict.
  • Some cards are much more susceptible to bad supplies than others.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Looking on a scope there is +/- ~0.3V peak noise on the power to the SD card. (i.e looking at the 3.3V SD card power, the scope is not triggered unless the trigger level is between 3.05 and 3.54V.  RMS is a small fraction of that.

 

Not sure if that qualifies as particularly noisy.  I"ll add some capacitors and see what happens.) 

 

I started with a Transcend  TS4GSDHC4 4GB SD Card.  This card is approved by NASA for use on the ISS (not a particularly good electrical environment) so it should be good. Yesterday I bought several San Disk Ultra Plus 32 GB disks.  Ran three units overnight with the new disks Systems,  with a new file every hour.  Most hours the data was complete but ended up with corrupted files on each unit (garbage filenames and ridiculously large files). 

 

BTW, at the beginning of each write, I set an ADC interrupt to continually monitor the 12V power supply voltage. If it drops a few tenths of a volt, I abort the write, and close the file well before the 5V on the uC droops. 

 

Greg

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
When interrupts occurred during SD writes, it would clear the spif flag,
That should only happen if you are using the SPIF interrupt, or if code in your other interrupts is (mistakenly) reading the SPDR.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Transcend state they make their own flash cards and don't re-brand generic stuff. Genuine Sandisk likewise.

Quality of both cards should be consistent therefore.

 

glewis wrote:
+/- ~0.3V peak noise on the power to the SD card

Perhaps a bit high but not alarmingly high, BUT say say corruptions are rare so somehow you need to capture the electrical signals around the corruption event.

 

Do you have the electrical datasheet ?

Go over each item of electrical interface specification and verify compliance with your scope.

 

This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Turns out I was disabling all interrupts in one of the interrupt routines!  That was clearing the flag.

Thanks to all for the comments/help.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for feeding back

 

please mark the solution - see Tip #5 in my signature, below:

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

glewis wrote:
Turns out I was disabling all interrupts in one of the interrupt routines!

Was that something completely unrelated to FatFS ?

I'm surprised that bug didn't reveal itself ALL the time instead of just randomly.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I was disabling and enabling them in one of the timer interrupts (left over code from elsewhere!).  The SPI was used for reading/writing to FRAM, which is done only when changing parameters, and when loading initila settings, and it was also used for reading the RTC, which is only done on startup or when serial commands change the time settings (the uC just counted interrupts from the RTC and kept time of itself rather than reading it continually.)  So my error would have only shown up in the SD write.  The failure happened more often at lower SPI clock speeds.  It could go many hours to a few days with out problems when the SPI Clock at was set to 7.32Mhz, but failed fairly often at 921KHz.

Last Edited: Mon. May 18, 2020 - 10:59 PM