DMA MMC SD Cards

Go To Last Post
18 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Just how fast is the DMA transfer between memory(array) and sd card?  I'm thinking of losing the need for buffers to hold the (data and directory) clusters and loading/saving the data using dma when required.

 

I can't actually remember the speed of the of the MMC interface.

 

Whats your views guys?

This topic has a solution.
Last Edited: Wed. Jun 19, 2019 - 01:55 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not enough info. The data rate is dependent on the card and the interface. This determines the dma rate. You still need to issue the cmds to the card and wait for the response. I doubt if there's much difference in speed between programmed io and dma. Do some tests to determine the avg time for a read and decide if you want to cache the fat etc.

This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

This does depend on the data rate you are trying to achieve. If you are going to play 4K video from a card you need far more bandwidth than if you are just logging a few bytes of temperature readings every 3 minutes.

 

On low end micros like AVR8 there is no 4 bit SD interface peripheral so the cards are interfaced in a legacy 1 bit SPI mode so that immediately throws away 3/4 of the available bandwidth. But AVRs still do a pretty good job of reading/writing - however you wouldn't be using one for reading/writing high density video (for example). The other end of the scale are top end devices using the 4 bit interface and DMA and caching (often megabytes of DRAM cache) that can get high data rates from SD/MMC. Just look at the Classes available:  https://en.wikipedia.org/wiki/SD_card#Class There are cards there that can achieve 90MB/s - you need quite a fancy software support to maintain such a rate !

 

So it's up to you. Do you just want to drive any card (whatever the class) at something like 1MB/s .. 10MB/s or do you want to try and deliver blistering performance.

 

This comes back to other threads you started - it's very difficult to produce a "one size fits all" filing system. What is appropriate for an 8bit  micro is quite different to an octa-core 3GHz Cortex-A15 or whatever!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

This does depend on the data rate you are trying to achieve. If you are going to play 4K video from a card you need far more bandwidth than if you are just logging a few bytes of temperature readings every 3 minutes.

 

On low end micros like AVR8 there is no 4 bit SD interface peripheral so the cards are interfaced in a legacy 1 bit SPI mode so that immediately throws away 3/4 of the available bandwidth. But AVRs still do a pretty good job of reading/writing - however you wouldn't be using one for reading/writing high density video (for example). The other end of the scale are top end devices using the 4 bit interface and DMA and caching (often megabytes of DRAM cache) that can get high data rates from SD/MMC. Just look at the Classes available:  https://en.wikipedia.org/wiki/SD_card#Class There are cards there that can achieve 90MB/s - you need quite a fancy software support to maintain such a rate !

 

So it's up to you. Do you just want to drive any card (whatever the class) at something like 1MB/s .. 10MB/s or do you want to try and deliver blistering performance.

 

This comes back to other threads you started - it's very difficult to produce a "one size fits all" filing system. What is appropriate for an 8bit  micro is quite different to an octa-core 3GHz Cortex-A15 or whatever!

 

 

Hi clawson, the SAMA5D44 has onboard hardware MMC controller supporting verey high speeds.  I've gone ahead and dump all the buffering to DMA.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yeah but previously I thought you said that you were trying to make a generic solution for multiple micros. If you start tying things to specific peripherals like DMA etc then you are going make it very specifically targetted.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

yup, I've a hardware adoption layer that deals with this.

Have this baby sorted.  All memory accesses are done using DMA.

 

Just of note, I'm starting a DMA transaction and then immediately reading/writing the data that pertains to the DMA transfer.  Is that okay?  I'm assuming the data is ready straight away (well the start of the buffer is ready with the rest being ready shortly after).  It seems fine!

Last Edited: Wed. Jun 19, 2019 - 05:18 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think you might just be lucky. I’d be testing for completion or maybe there’s an interrupt on complete.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kartman wrote:
I think you might just be lucky. I’d be testing for completion or maybe there’s an interrupt on complete.

 

That's what I was thinking kartman.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Be the luck ‘o the irish!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kartman wrote:
Be the luck ‘o the irish!

lol

 

EDITED: I'm guessing that by the time I get to reading or writing the frst byte from the buffer the DMA has already written or read that first few bytes.

Last Edited: Wed. Jun 19, 2019 - 10:48 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Chasing the dragon’s tail as it were. The other thing to consider is the cache operation. If the memory area is uncached you might get away with it, but if the area is cacheable, your first read will likely cache the next 32,64 or whatever the cache line size is. Realistically, you want the area cacheable as you’re dealing with sizeable data blocks where the cache operation is going to improve performance but you need to make sure you invalidate the cache at the correct point to ensure consistency with main memory. If you’ve tried running the sama5 with no cache, you’ve found the performance is like a Tesla running on a couple of energizers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kartman wrote:
Chasing the dragon’s tail as it were. The other thing to consider is the cache operation. If the memory area is uncached you might get away with it, but if the area is cacheable, your first read will likely cache the next 32,64 or whatever the cache line size is. Realistically, you want the area cacheable as you’re dealing with sizeable data blocks where the cache operation is going to improve performance but you need to make sure you invalidate the cache at the correct point to ensure consistency with main memory. If you’ve tried running the sama5 with no cache, you’ve found the performance is like a Tesla running on a couple of energizers.

 

Totally agree, I'm using a cache region for the DMA.  I know how cache works but what do you mean by "invalidate the cache"?

 

EDITED: okay I know what you mean by invalidating cache.  how do you do it?  I suppose this thread should be moved to another forumn.

Last Edited: Thu. Jun 20, 2019 - 12:06 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

By 'invalidate' the cache I mean that the relevant cache lines are freed or flushed to main memory. I have no direct experience with the ARM A5 caching hardware but the general scheme is similar for most cacheing systems.

 

For example - your dma has written a block of data to memory. At this point, you don't know if the cache is holding what is 'stale' data for that memory area. So there is usually a means of scanning through the cache to see if it is holding anything for the memory area and if it is, you invalidate it - that is the cache no longer holds data for the given memory area. For the case of having the dma write data to the peripheral, you need to ensure main memory has the data you expect, as some of your write data might be held in cache - of course this depends on the cacheing strategy as to whether writes are cached or not. You scan to cache to see if it holds any data for your memory area then force it to flush - ie write any data it holds and invalidate that cache line.

 

My recent experience with cacheing was on a fast Cortex M4 with DDR ram. The RTOS (mqx) had calls to invalidate the cache - you just passed it a base address and size and it did the rest. Without these calls you had weird read and write data.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have a bug I came across the night where by if the task running exFAT is a low priority then the system freezes.  I'm hoping it the cache that's causing it.

 

I'll try invalidating the cache tomorrow.

 

/*----------------------------------------------------------------------------
 *        Exported functions
 *----------------------------------------------------------------------------*/

/**
 *  \brief Invalidate cache lines corresponding to a memory region
 *
 *  \param start Beginning of the memory region
 *  \param length Length of the memory region
 */
extern void cache_invalidate_region(void *start, uint32_t length);

/**
 *  \brief Clean cache lines corresponding to a memory region
 *
 *  \param start Beginning of the memory region
 *  \param length Length of the memory region
 */
extern void cache_clean_region(const void *start, uint32_t length);

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Had a bug whereby if the sd mmc card was to do a DMA read then the task making the DMA had to be a higher priority than the other tasks.

I had to invalidate the cache.

 

Thanks guys for introducing me to cache!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Last question, when and where do you invalidate cache?

 

EDITED: I had a bug and when I invalidated the cache it worked.  Now out of interest I rerun the test this time without invalidating the cache and it still works.  Nothing gives.

Last Edited: Thu. Jun 20, 2019 - 05:43 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The challenge is that you don’t really know what is cached at a given time unless you force it’s hand by accessing the area of interest. In a multitasking system this might not be reliable as your task might get swapped out and when you come back the cache will be in a totally different state.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

finished the hardware interface, I order to keep the architecture simple to port I setup a memory allocation that can use a cache region and functions for reading and writing to mmc using DMA.