Using MSPIM for buffering

Go To Last Post
21 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

On a mega1284 last night I was using the regular SPI at clk/2 and because of no buffering there are gaps between bytes being transferred.  I suppose this is the nature of SPI because you have to pull the byte being received out of the SPDR.  I am running the clock at 7.3728 MHz and the SPI at 3.6864 MHz.  Has anyone used a UART in MSPIM mode to gain the buffering or performance?  How did it work out?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes,   if you use USART_MSPI you get gap-less comms.

 

It is a mystery why anyone ever uses SPI as Master.

Especially when you can remap pins on later Xmega chips.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In most cases the gap does not matter!

 

Jim

 

Click Link: Get Free Stock: Retire early! PM for strategy

share.robinhood.com/jamesc3274
get $5 free gold/silver https://www.onegold.com/join/713...

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've always been fond of the SPI interface for its simplicity.  I agree Jim that the gaps don't really hurt anything, but if you want maximum performance then they are time when nothing is going on!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

At those data rates surely the issue is how optimal you can make the AVR code to get the actual data bytes into and out of SPDR ? You only have a (*)few cycles per byte. Where are they coming from or going to that can be serviced at such a rate??

 

(*) well OK, if SPI is F_CPU/2 and 8 bits per byte I guess it's 16 cycles to get a byte in/out ?

Last Edited: Thu. Jul 25, 2019 - 01:21 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It is not going to make much difference to an SPI temperature sensor.

 

But if you are writing 153600 bytes to draw a 240x320 TFT every cycle counts.

And the popular 320x480 TFTs require 3 bytes per pixel i.e. 460800 bytes.

 

This is still 461ms @ 8MHz (or 92ms @ 40MHz with ARM or Expressif)

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
But if you are writing 153600 bytes to draw a 240x320 TFT every cycle counts.
I'd still wonder about where those 153600 bytes are coming from and can you really get that out of the source and into the SPDR in the 16 bytes per byte you have available? Is this streaming pages out of a dataflash that contains pre-repared images or something? If it's an external RAM used as a frame buffer isn't the 64K boundary handling going to stymie your 16 byte data flow when it occurs?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In my case this is FATFS on microSD and so the data is coming from/going to an SRAM buffer.  Not a huge deal, but if I can improve it by using a different peripheral and I wouldn't mind freeing up the main SPI for something else anyway.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


Here it is at clk/2 with SPI at 3.6864 MHz - I'll try the USART/MSPIM tonight and see how much different it might make...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It is probably highly likely that some optimization could be even with the SPI - this is what I'm using:

uint8_t spi_transfer(uint8_t AByte)
{
  //read to clear them
  SPSR;
  SPDR;

  //send
  SPDR=AByte;

  //wait until complete
  while (!(SPSR & _BV(SPIF)))
    ;

  //return result
  return SPDR;
}
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

alank2 wrote:
In my case this is FATFS on microSD
So you are talking about 512 byte "burst transfers" - not a continuous data stream then?
alank2 wrote:

  //read to clear them
  SPSR;
  SPDR;

Umm - why is that necessary? SPDR is going to be updated as a result of the transfer you initiate anyway. Also it seems the only bit in SPSR of any importance is the SPIIF you are going to block on but doesn't the subsequent read of SPDR (for the return) clear the IF anyway?

 

I've never really come across and SPI_xchg() routine that is any more than write;wait,;return read. So you can probably save yourself the 4 cycles (or whatever it is) per byte that the two needless reads use.

 

EDIT: yup, datasheet says:

Alternatively, the SPIF bit is cleared by first reading the SPI Status Register with SPIF set, then accessing the SPI Data Register (SPDR).

So the while() loop and the subsequent read of SPDR for the return should clear SPIF for the next invocation anyway.

Last Edited: Thu. Jul 25, 2019 - 02:40 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Think about it.    Drawing a fixed colour is simply writing the R, G, B bytes.    Or the HL bytes of a 5-6-5 encoded pixel.

 

The bytes are pre-prepared in registers.    And it only takes 2-3 cycles to write to SPDR / UDR.

Likewise the polling is a tight 4-cycle loop.

 

Even if you are writing data from SRAM or Flash.   LD or LPM with post-increment is a 2-cycle op.

 

If your hardware can read an SD card with DMA you can drive your SPI device at full speed.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The datasheet says :

 

Alternatively, the SPIF bit is cleared by first reading the SPI Status Register with SPIF set, then accessing the
SPI Data Register (SPDR).

 

Looking at it, it would be difficult to keep the outgoing and incoming bytes aligned to each other unless you send it, wait for it to send, wait for the rx, then receive it.  The 512 byte transfers (send or receive) could probably be optimized however and that is where the bulk of the time is wasted.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Looking at the 2017 version of ffsample.zip I see the following in the mmc_avr_spi.c file:

/*-----------------------------------------------------------------------*/
/* Transmit/Receive data from/to MMC via SPI  (Platform dependent)       */
/*-----------------------------------------------------------------------*/

/* Exchange a byte */
static
BYTE xchg_spi (		/* Returns received data */
	BYTE dat		/* Data to be sent */
)
{
	SPDR = dat;
	loop_until_bit_is_set(SPSR, SPIF);
	return SPDR;
}


/* Receive a data block fast */
static
void rcvr_spi_multi (
	BYTE *p,	/* Data read buffer */
	UINT cnt	/* Size of data block */
)
{
	do {
		SPDR = 0xFF;
		loop_until_bit_is_set(SPSR, SPIF);
		*p++ = SPDR;
		SPDR = 0xFF;
		loop_until_bit_is_set(SPSR, SPIF);
		*p++ = SPDR;
	} while (cnt -= 2);
}


/* Send a data block fast */
static
void xmit_spi_multi (
	const BYTE *p,	/* Data block to be sent */
	UINT cnt		/* Size of data block */
)
{
	do {
		SPDR = *p++;
		loop_until_bit_is_set(SPSR, SPIF);
		SPDR = *p++;
		loop_until_bit_is_set(SPSR, SPIF);
	} while (cnt -= 2);
}

I don't see Chan clearing SPDR/SPSR in his implementation here?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

He isn't; I was using the SPI for a display and FATFS so I was using my SPI code from the programmer I developed.

 

Does assigning a value to SPDR on its own clear the SPIF flag?  I guess if it doesn't, the loop exits possibly before this SPDR is fully sent one time and then we would be reading the previous result possibly?  Could it become unaligned?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Looks like Chan has already got this worked out in his mmc_avr_usat.c.  Note the flags he uses in the multi functions.

 

/* Exchange a byte */
static
BYTE xchg_spi (		/* Returns received data */
	BYTE dat		/* Data to be sent */
)
{
	UDR1 = dat;
	loop_until_bit_is_set(UCSR1A, RXC1);
	return UDR1;
}


/* Receive a data block fast */
static
void rcvr_spi_multi (
	BYTE *p,	/* Data read buffer */
	UINT cnt	/* Size of data block */
)
{
	UDR1 = 0xFF;
	loop_until_bit_is_set(UCSR1A, UDRE1); UDR1 = 0xFF;
	cnt -= 2;
	do {
		loop_until_bit_is_set(UCSR1A, UDRE1);
		*p++ = UDR1; UDR1 = 0xFF;
	} while (--cnt);
	loop_until_bit_is_set(UCSR1A, RXC1); *p++ = UDR1;
	loop_until_bit_is_set(UCSR1A, RXC1); *p++ = UDR1;
}


/* Send a data block fast */
static
void xmit_spi_multi (
	const BYTE *p,	/* Data block to be sent */
	UINT cnt		/* Size of data block */
)
{
	UDR1 = *p++;
	loop_until_bit_is_set(UCSR1A, UDRE1); UDR1 = *p++;
	cnt -= 2;
	do {
		loop_until_bit_is_set(UCSR1A, UDRE1);
		UDR1; UDR1 = *p++;
	} while (--cnt);
	loop_until_bit_is_set(UCSR1A, RXC1); UDR1;
	loop_until_bit_is_set(UCSR1A, RXC1); UDR1;
}

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Note that your trace in #9 is showing about 32 cycles per byte (16 for shift + 16 for gap)

 

Yes,  you can optimise USART_MSPI for 16 cycles per byte.  i.e. no gap.

And you can optimise SPDR for 18 cycles per byte. i.e. 16+2

 

Chan library gives examples.

Arduino SPI library does too.

 

A lot of time can be wasted with poorly designed loop structures.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

After changing out the code and rewiring the breakboard...

 

Final result - reading 512 byte sector at 3.6864 MHz from microsd:

 

USART MSPIM is 1.609ms

SPI is 2.675ms

 

It is 1.066ms faster, or the USART MSPIM takes 60.14% of the time that the SPI does.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

512 bytes @ SCK = 3.864MHz would take 1.06ms.

 

So there is still room for improvement.   But you have certainly done better than SPI.

 

You can inspect the performance with your Logic Analyser.

 

I would expect GCC to make a good job of compiling Chan's xxx_spi_multi() functions.

I might try it in AS7.0 Simulator later.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 

The difference is mostly packet overhead and waiting on the microSD to be ready.

 

Between where nCS goes low and A1 begins is the read sector request.  This tiny area could be improved, but it wouldn't make much difference and might be difficult to synchronize the bytes being sent with the ones being received.  Not worthwhile given how small it is and likely why Chan didn't bother with it.

 

Between A1 and A2 it is waiting on the microsd to be ready.  Still not back to back SPI here, but we are waiting on the microsd anyway so back to back isn't going to make it ready any faster.

 

After A2 the sector is being read in, there are no gaps in the clock here it couldn't be faster.

 

If you want the best performance, the USART in SPI mode can transfer a sector in almost 60% of the time it takes the traditional unbuffered SPI to do it.

 

 

Last Edited: Fri. Jul 26, 2019 - 12:10 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

alank2 wrote:
After A2 the sector is being read in, there are no gaps in the clock here it couldn't be faster.

 

That is the important section.   i.e. read_multi()

If you have no gaps,  you have cracked it.

 

Yes,  there will always be housekeeping that involves calculating FAT and sector addresses.   But the main grunt is down to reading the 512 byte sectors.

 

David.