On a mega1284 last night I was using the regular SPI at clk/2 and because of no buffering there are gaps between bytes being transferred. I suppose this is the nature of SPI because you have to pull the byte being received out of the SPDR. I am running the clock at 7.3728 MHz and the SPI at 3.6864 MHz. Has anyone used a UART in MSPIM mode to gain the buffering or performance? How did it work out?
Using MSPIM for buffering
Yes, if you use USART_MSPI you get gap-less comms.
It is a mystery why anyone ever uses SPI as Master.
Especially when you can remap pins on later Xmega chips.
David.
In most cases the gap does not matter!
Jim
I've always been fond of the SPI interface for its simplicity. I agree Jim that the gaps don't really hurt anything, but if you want maximum performance then they are time when nothing is going on!
At those data rates surely the issue is how optimal you can make the AVR code to get the actual data bytes into and out of SPDR ? You only have a (*)few cycles per byte. Where are they coming from or going to that can be serviced at such a rate??
(*) well OK, if SPI is F_CPU/2 and 8 bits per byte I guess it's 16 cycles to get a byte in/out ?
It is not going to make much difference to an SPI temperature sensor.
But if you are writing 153600 bytes to draw a 240x320 TFT every cycle counts.
And the popular 320x480 TFTs require 3 bytes per pixel i.e. 460800 bytes.
This is still 461ms @ 8MHz (or 92ms @ 40MHz with ARM or Expressif)
David.
But if you are writing 153600 bytes to draw a 240x320 TFT every cycle counts.
In my case this is FATFS on microSD and so the data is coming from/going to an SRAM buffer. Not a huge deal, but if I can improve it by using a different peripheral and I wouldn't mind freeing up the main SPI for something else anyway.
Here it is at clk/2 with SPI at 3.6864 MHz - I'll try the USART/MSPIM tonight and see how much different it might make...
It is probably highly likely that some optimization could be even with the SPI - this is what I'm using:
uint8_t spi_transfer(uint8_t AByte) { //read to clear them SPSR; SPDR; //send SPDR=AByte; //wait until complete while (!(SPSR & _BV(SPIF))) ; //return result return SPDR; }
In my case this is FATFS on microSD
//read to clear them SPSR; SPDR;
I've never really come across and SPI_xchg() routine that is any more than write;wait,;return read. So you can probably save yourself the 4 cycles (or whatever it is) per byte that the two needless reads use.
EDIT: yup, datasheet says:
Alternatively, the SPIF bit is cleared by first reading the SPI Status Register with SPIF set, then accessing the SPI Data Register (SPDR).
So the while() loop and the subsequent read of SPDR for the return should clear SPIF for the next invocation anyway.
Think about it. Drawing a fixed colour is simply writing the R, G, B bytes. Or the HL bytes of a 5-6-5 encoded pixel.
The bytes are pre-prepared in registers. And it only takes 2-3 cycles to write to SPDR / UDR.
Likewise the polling is a tight 4-cycle loop.
Even if you are writing data from SRAM or Flash. LD or LPM with post-increment is a 2-cycle op.
If your hardware can read an SD card with DMA you can drive your SPI device at full speed.
David.
The datasheet says :
Alternatively, the SPIF bit is cleared by first reading the SPI Status Register with SPIF set, then accessing the
SPI Data Register (SPDR).
Looking at it, it would be difficult to keep the outgoing and incoming bytes aligned to each other unless you send it, wait for it to send, wait for the rx, then receive it. The 512 byte transfers (send or receive) could probably be optimized however and that is where the bulk of the time is wasted.
Looking at the 2017 version of ffsample.zip I see the following in the mmc_avr_spi.c file:
/*-----------------------------------------------------------------------*/ /* Transmit/Receive data from/to MMC via SPI (Platform dependent) */ /*-----------------------------------------------------------------------*/ /* Exchange a byte */ static BYTE xchg_spi ( /* Returns received data */ BYTE dat /* Data to be sent */ ) { SPDR = dat; loop_until_bit_is_set(SPSR, SPIF); return SPDR; } /* Receive a data block fast */ static void rcvr_spi_multi ( BYTE *p, /* Data read buffer */ UINT cnt /* Size of data block */ ) { do { SPDR = 0xFF; loop_until_bit_is_set(SPSR, SPIF); *p++ = SPDR; SPDR = 0xFF; loop_until_bit_is_set(SPSR, SPIF); *p++ = SPDR; } while (cnt -= 2); } /* Send a data block fast */ static void xmit_spi_multi ( const BYTE *p, /* Data block to be sent */ UINT cnt /* Size of data block */ ) { do { SPDR = *p++; loop_until_bit_is_set(SPSR, SPIF); SPDR = *p++; loop_until_bit_is_set(SPSR, SPIF); } while (cnt -= 2); }
I don't see Chan clearing SPDR/SPSR in his implementation here?
He isn't; I was using the SPI for a display and FATFS so I was using my SPI code from the programmer I developed.
Does assigning a value to SPDR on its own clear the SPIF flag? I guess if it doesn't, the loop exits possibly before this SPDR is fully sent one time and then we would be reading the previous result possibly? Could it become unaligned?
Looks like Chan has already got this worked out in his mmc_avr_usat.c. Note the flags he uses in the multi functions.
/* Exchange a byte */ static BYTE xchg_spi ( /* Returns received data */ BYTE dat /* Data to be sent */ ) { UDR1 = dat; loop_until_bit_is_set(UCSR1A, RXC1); return UDR1; } /* Receive a data block fast */ static void rcvr_spi_multi ( BYTE *p, /* Data read buffer */ UINT cnt /* Size of data block */ ) { UDR1 = 0xFF; loop_until_bit_is_set(UCSR1A, UDRE1); UDR1 = 0xFF; cnt -= 2; do { loop_until_bit_is_set(UCSR1A, UDRE1); *p++ = UDR1; UDR1 = 0xFF; } while (--cnt); loop_until_bit_is_set(UCSR1A, RXC1); *p++ = UDR1; loop_until_bit_is_set(UCSR1A, RXC1); *p++ = UDR1; } /* Send a data block fast */ static void xmit_spi_multi ( const BYTE *p, /* Data block to be sent */ UINT cnt /* Size of data block */ ) { UDR1 = *p++; loop_until_bit_is_set(UCSR1A, UDRE1); UDR1 = *p++; cnt -= 2; do { loop_until_bit_is_set(UCSR1A, UDRE1); UDR1; UDR1 = *p++; } while (--cnt); loop_until_bit_is_set(UCSR1A, RXC1); UDR1; loop_until_bit_is_set(UCSR1A, RXC1); UDR1; }
Note that your trace in #9 is showing about 32 cycles per byte (16 for shift + 16 for gap)
Yes, you can optimise USART_MSPI for 16 cycles per byte. i.e. no gap.
And you can optimise SPDR for 18 cycles per byte. i.e. 16+2
Chan library gives examples.
Arduino SPI library does too.
A lot of time can be wasted with poorly designed loop structures.
David.
After changing out the code and rewiring the breakboard...
Final result - reading 512 byte sector at 3.6864 MHz from microsd:
USART MSPIM is 1.609ms
SPI is 2.675ms
It is 1.066ms faster, or the USART MSPIM takes 60.14% of the time that the SPI does.
512 bytes @ SCK = 3.864MHz would take 1.06ms.
So there is still room for improvement. But you have certainly done better than SPI.
You can inspect the performance with your Logic Analyser.
I would expect GCC to make a good job of compiling Chan's xxx_spi_multi() functions.
I might try it in AS7.0 Simulator later.
David.
The difference is mostly packet overhead and waiting on the microSD to be ready.
Between where nCS goes low and A1 begins is the read sector request. This tiny area could be improved, but it wouldn't make much difference and might be difficult to synchronize the bytes being sent with the ones being received. Not worthwhile given how small it is and likely why Chan didn't bother with it.
Between A1 and A2 it is waiting on the microsd to be ready. Still not back to back SPI here, but we are waiting on the microsd anyway so back to back isn't going to make it ready any faster.
After A2 the sector is being read in, there are no gaps in the clock here it couldn't be faster.
If you want the best performance, the USART in SPI mode can transfer a sector in almost 60% of the time it takes the traditional unbuffered SPI to do it.
After A2 the sector is being read in, there are no gaps in the clock here it couldn't be faster.
That is the important section. i.e. read_multi()
If you have no gaps, you have cracked it.
Yes, there will always be housekeeping that involves calculating FAT and sector addresses. But the main grunt is down to reading the 512 byte sectors.
David.