Highest SPI transfer rate

Go To Last Post
18 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What's the highest spi transfer rate anyone was able to achive on an ATMega? Let's assume a clock of 16Mhz. I was able to do 2.4Mbs between two chips reliably for a 19 byte packet (16 byte data and 2 bytes address + 1 byte command). I find it hard to go beyond that because of the delays that creep in for reloading the SPDR and storing bytes into a buffer on the slave. According to my research, interrupt driven SPI was able to do only 600kbs reliably due to interrupt latency.

LibK - device driver support for flash based microcontrollers: GitHub project

http://oskit.se - join me on my quest to become a better programmer

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

See the 'double speed' bit in the spi control register? That lets you clock the spi at 8MHz. So that's 1usec per byte. One megabte per second.

 

Imagecraft compiler user

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

due to interrupt latency

Do you meant the 12 cycles (or whatever it is) to get into the ISR or the "extra" cycles executed once you get there? If the latter you can obviously make an ISR() leaner and meaner by implementing it in Asm entirely.

 

If you need things even faster (and can stand 3V3 operation) possibly look at Xmega. The chips can run at 32MHz and I think you'll find you can set things up to feed/receive the SPI via DMA too.

 

Only thing is how are you generating/processing the data bytes you are SPIing so fast? Surely the bandwidth limit is actually in how fast you can generate/process the data not the actual speed of transfer?

 

(or is this about getting the fastest possible read/write to SD/MMC or something?)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The new Arduino SPI library is tuned for near optimal performance.  Here's one of my posts from the Arduino Developer's google group, detailing the optimization I contributed to the library.

 

On Tuesday, 29 April 2014 17:52:04 UTC-3, Mikael Patel wrote:

---------------------------------------------------------------------------------------------------------------------------------------------------
// Snippet from Cosa/SPI

void transfer_start(uint8_t data) __attribute__((always_inline))
{
   SPDR = data;
}

uint8_t transfer_await() __attribute__((always_inline))
{
   loop_until_bit_is_set(SPSR, SPIF);
   return (SPDR);
}

void
SPI::transfer(void* buf, size_t count)
{
   if (count == 0) return;
   uint8_t* dp = (uint8_t*) buf;
   uint8_t data = *dp;
   while (1) {
     transfer_start(data);
     if (--count) break;
     uint8_t* tp = dp + 1;
     data = *tp;
     *dp = transfer_await();
     dp = tp;
   }
   *dp = transfer_await();
}

 

Pretty good.   It compiles to code that has st and rjmp instructions after transfer_wait().  This code will give in SPDR immediately followed by out SPDR:

uint8_t tmp;

uint8_t* dp = (uint8_t*) buf;

uint8_t data = *dp;

transfer_start(data)

while(--count){

     uint8_t* tp = dp + 1; 

     data = *tp; 

     tmp = transfer_await(); 

     transfer_start(data); 

     *dp = tmp

     dp = tp;

}

*dp = transfer_await();

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The theoretical maximum is F_CPU/2.  i.e. 8MHz

 

You will struggle to get this with the SPI peripheral unless you carefully count cycles so that you load SPDR exactly when it is sending out the last bit of the previous byte.

 

However,  modern AVRs have USART_MSPI.    These can transfer SPI with no gaps.

 

In practice,   the 2-cycle gap while you poll for completion is not too bad.   i.e. 18-cycles instead of the theoretical 16-cycles.

 

I have every sympathy for 'speed' when you are sending 100000 bytes.    Does it really make a difference with a 19-byte packet ?

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So that's 1usec per byte. One megabte per second.

But as the OP said, you also have to read and write to the register which give you at most 16 clocks to read the register, put the value someplace reasonable, read the next byte from where ever it is stored and write it to the register. It would be very difficult to maintain 1 megabyte per second.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It is quite possible.    You just have to do it carefully.

 

After all,   it only takes 2 cycles to do a "ld r24,Z+"

And 1 cycle for "out SPDR,r24"

And 3 cycles for a loop counter e.g. "dec r16; brne more"

 

You want to avoid a RCALL/RET because that will eat up cycles.

 

David.

 

Edit. corrected the "loop counter" cycles.

Last Edited: Tue. Sep 23, 2014 - 04:25 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

And 2 cycles for a loop counter e.g. "dec r16; brne more"

Well, if really on a quest for max speed to dump a 19-byte packet:

-- What is the receiver/slave?  Can its engine handle continuous transmission?  (An AVR8 slave could not.)

-- Already mentioned, USART-as-SPI-master might be more forgiving.

-- It depends on whether the slave is responding with data, as well.  I.e., true full-duplex SPI.

-- For max speed, unroll the loop.

-- For ultimate speed, cycle-count in the unrolled loop.

 

With all that said, it might be tricky trying to get no "gap" in the SCK.

 

 

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In practice,    most hardware SPI Slaves can manage 8MHz or even 24MHz SCK rate.

 

A 'MCU Slave' is very unlikely to cope with the highest speeds.

It can probably keep up with 4MHz.

 

The critical timing is likely to be the "/CS to first SCK transition" for a MCU Slave.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:

 

The critical timing is likely to be the "/CS to first SCK transition" for a MCU Slave.

 

On an AVR, no CPU intervention is required to start receiving when SS goes low, so I'd guess the limiting factor is sampling SCK.

 

 

I have no special talents.  I am only passionately curious. - Albert Einstein

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It is quite possible.    You just have to do it carefully.

For short bursts. Not for "sustained rates". And SPI is bidirectional. You might be able to send a stream of data, but with most SPI communication you need to read and interpret the response.

Regards,
Steve A.

The Board helps those that help themselves.

Last Edited: Tue. Sep 23, 2014 - 08:32 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have every sympathy for 'speed' when you are sending 100000 bytes.    Does it really make a difference with a 19-byte packet ?

I split the data into 19 bytes "packets" to make the transfers fast so that I do not tie up interrupts for too long. Also the data can be any length. I have a concept working of "memory mapped" SPI devices - where the master accesses a number of slaves by first sending an address and then clocking out bytes out of the slaves internal buffer while holding the shared SS line low. All other slaves are required to shut up and not talk while master talks to the addressed slave. So the master can get data out of a slave quickly and then the slave can continue servicing it's own interrupts when the transfer is complete.

LibK - device driver support for flash based microcontrollers: GitHub project

http://oskit.se - join me on my quest to become a better programmer

Last Edited: Tue. Sep 23, 2014 - 09:30 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ralphd wrote:

The new Arduino SPI library is tuned for near optimal performance.  Here's one of my posts from the Arduino Developer's google group, detailing the optimization I contributed to the library.

 

 

 

Thanks! Will try it out :)

LibK - device driver support for flash based microcontrollers: GitHub project

http://oskit.se - join me on my quest to become a better programmer

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

theusch wrote:

And 2 cycles for a loop counter e.g. "dec r16; brne more"

Well, if really on a quest for max speed to dump a 19-byte packet:

-- What is the receiver/slave?  Can its engine handle continuous transmission?  (An AVR8 slave could not.)

-- Already mentioned, USART-as-SPI-master might be more forgiving.

-- It depends on whether the slave is responding with data, as well.  I.e., true full-duplex SPI.

-- For max speed, unroll the loop.

-- For ultimate speed, cycle-count in the unrolled loop.

 

With all that said, it might be tricky trying to get no "gap" in the SCK.

 

 

 

I'm writing a driver for my own kind of "arduino system bus". It is based on SPI but it implements addressing and error detection - and a kind of "memory map" interface. So in the case of one master/multiple slave system the driver makes it possible to read and write data on the slaves as though it was normal addressable buffer. So no other special protocol would be required - just that the devices make important data buffers available to the SPI driver on the slave side. The master can then read and write these data buffers through SPI when ever it wants to.

 

So the slaves are other atmegas. Slaves are both responding with and getting data from the master - so full duplex is used. During write to slave, the slave will in fact send the previous received byte back to the master as a form of integrity check.

 

Also, can anyone help me figure this out: I want to have a fast, general purpose serial interface between multiple atmegas. I want the bus to support interrupts from the slaves as well - but without any extra wires for each slave. As in "message signaled interrupts" (I'm studying PCI-express as an example). So I'm thinking in terms of using I2C on the atmegas for slave interrupt signaling. This way, an arbitrary number of devices can be connected to the master and interrupts can be signaled by any slave even while the master is in the process of transfering data. The interrupts are for things like "hey master, my data is ready come and get it" king of stuff. Good way to do this? Using I2C along side of SPI the best solution? Any other concept that can be used, that is better?

 

See schematic below for the way I'm connecting things together.

 

 

Attachment(s): 

LibK - device driver support for flash based microcontrollers: GitHub project

http://oskit.se - join me on my quest to become a better programmer

Last Edited: Tue. Sep 23, 2014 - 11:15 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If anyone cares to read, I have put down some notes from the last couple of days researching this: http://oskit.se/developing-ardui...

 

Here is the code that I currently use:

 

On the master for reading memory on the slave:

// reads data from expansion slot peripheral
uint8_t ext_read(uint16_t addr, char *data, uint8_t size){
	uint8_t retry = 10;
	uint8_t rx_count = 0; 
	while(retry--){
		cli();
		
		SSlo;
		_delay_us(2);

		spi_writereadbyte(addr);
		//_delay_us(1);

		spi_writereadbyte(addr >> 8);
		//_delay_us(1);

		spi_writereadbyte(EXT_RD | (size & 0x0f));
		//_delay_us(1);

		// when slave gets the last address byte, it loads it's data register with previous
		// "next" byte. When it gets the command, it loads it with our status. 
		uint8_t stat = spi_writereadbyte(0);
		//_delay_us(5);

		if(stat != E_READY){
			SShi;
			
			sei();
			
			_delay_us(50); 
			continue; 
		}
		
		spi_writereadbyte(0);
		__asm("nop"); __asm("nop"); __asm("nop");
		
		// transfer data
		while(rx_count < size){
			data[rx_count] = spi_writereadbyte(0);
			rx_count++; 
			//_delay_us(1); 
		}
		
		SShi;
		break; 
	}
	_delay_us(2);
	sei();
	
	return rx_count; 
}

And code used on the slave to either read or write local buffer and send stuff to master:

/// Only enabled in slave mode
ISR(PCINT0_vect){
	DDRB |= _BV(1);  // used for analysis with a scope
	//PORTB |= _BV(1);
	
	uint8_t changed = _prev_pinb ^ SPI_PIN; 
	_prev_pinb = SPI_PIN;

	if(SPI_is_slave && (changed & _BV(SPI_SS))){
		// if ss went low then we have a start of transmission

		MISOin; // does not need to be output before a read instruction
		
		if(!SS){
			uint8_t _rx_count = 0;
			uint16_t _addr = 0;
			uint8_t _command = 0;
			uint8_t _slave_buffer_ptr = 0;
			uint8_t _out_data = 0;
			
			while(!SS){
				uint8_t data; 
				while(!SS && (SPSR & (1<<SPIF)) == 0) data = SPDR;
				// not that this swap MUST happen right away before master starts
				// sending the next byte. The timing is critical! 
				SPDR = _out_data; 

				PORTB |= _BV(1);
				if(_rx_count < 2){
					if(_rx_count){
						_addr |= (((uint16_t)data) << 8);
						if((_addr >= _slave_addr) &&
								(_addr < (_slave_addr + _slave_buffer_size))){
									
							_out_data = E_READY; 
							MISOout; 
						}
					}
					else
						_addr |= data;
					
				} else if(_rx_count == 2){
					_command = data;
				} else if(_rx_count > 2){
					uint8_t cmd = _command & 0xf0;
					uint8_t size = _command & 0x0f; 
					if(cmd == EXT_RD && (_slave_buffer_ptr < _slave_buffer_size)){
						_out_data = _slave_buffer[_slave_buffer_ptr++];
					} else if(cmd == EXT_WR && (_slave_buffer_ptr < _slave_buffer_size)){
						_slave_buffer[_slave_buffer_ptr++] = data;
						_out_data = data; // send it back to master for inspection
					}
				}
				_rx_count++;
				PORTB &= ~_BV(1);
			}
		}
		MISOin; // always make sure it is an input when not used
	}
}

 

LibK - device driver support for flash based microcontrollers: GitHub project

http://oskit.se - join me on my quest to become a better programmer

Last Edited: Tue. Sep 23, 2014 - 11:26 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

16MHz mega328, I run SPI at 8MHz.

That's max, as I recall (1/2 clock rate)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sounds like you want CAN bus.

or if you want higher transfer rate, go to ethernet. If you use a enc chip, it has its own ram buffer that you read at your leisure. 

SPI doesn't comply with your requirement of not having multiple signals. CAN does as it has embedded clocking information like pci-e has

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:

It is quite possible.    You just have to do it carefully.

For short bursts. Not for "sustained rates". And SPI is bidirectional. You might be able to send a stream of data, but with most SPI communication you need to read and interpret the response.

Flash chips have burst read .. lots of bytes/sec.