TUTORIAL - optimize SPI transfer speed on AVR (e.g. MEGA328P) in C/C++

Go To Last Post
9 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 - EDIT - don't use this code, it's buggy. Will do an update, when fixed.

 

Hello there,

 

as the world is old, and the AVR world is old too and i didn't find a better way/code getting AVRs SPI faster, i worked it out my self and want to share it with you.

 

Usually (in AVRs datasheets as well) you will find a SPI transfer/write code like this:

void SPI::write(uint8_t data)
{
    SPDR = data;                // SPI Data Register
    while(!(SPSR & 0x80));      // SPI Interrupt Flag  -EDIT: corrected this line
    dummy = SPDR;               // clear IF & WCOL
}

BUT this function wastes a LOT of time, even much more, if your spi transmission clock is low. Here is why:

The first thing we do is to put new data into our SPI register. The transmission will start completely automatic through the SPI hardware controller.

The next step is...to WAIT...and waiting in code is always a big problem, because is just a LOT waste of time.

 

Lets assume, your AVR works at 16 MHz. Then a one cycle instruction takes 62,5ns - keep this number in mind for now.

 

If your hardware setup somehow is limited to e.g. 200 kHz (transmission over long cable ranges, what ever) your clock period time is 5µs. So it takes 5µs * 8bits = 40µs for the SPI to flush out all the bits. (lets ignore the time from the while() jumps for simplicity)

That means, we are blocking our CPU beeing possible of computing 40µs / 0,0625µs - thats a total of 640 single cycle instructions!!! Wow thats a big number.

As we increase our transmission clock, the problem is getting smaller, but it is still annoying it isn't good.

For a SPI clock of 1 MHz we'll get a waste of still 128 cycles.

For a SPI clock of 8 MHz we'll get a waste of still 16 cycles. (Fastest possible for ATmega328P - here we can add some cycles for the while()-loop)

 

So how we could work around this problem?

When i swiched from PIC16F to AVR i realized both have strengths and weaks. To me one weakness of AVR is e.g. this SPI hardware controller.

The difference? AVR is setting a flag, if data transmission is complete. PIC is setting a flag while data transmission is in progress.

Hey, there is no logic problem with this, you would maybe say. Thats what i thought too:

Let's just flip around some lines in code:

void SPI::write(uint8_t data)
{
    while(!(SPSR & 0x80));      // SPI Interrupt Flag  -EDIT: corrected this line
    dummy = SPDR;               // clear IF & WCOL
    SPDR = data;                // SPI Data Register
}

Well this WON'T work, because for the first transmission we'll end up waiting forever...

 

Ok, but then let's try this:

void SPI::write(uint8_t data)
{
    if (transmissionInProgress) // this var is set to 0 when SPI.init()
    {
        while(!(SPSR & 0x80));      // SPI Interrupt Flag
        dummy = SPDR;               // clear IF & WCOL
        transmissionInProgress = false;
    }
    SPDR = data;                    // SPI Data Register
    transmissionInProgress = true;  // should be AFTER SPDR = data; to save time
                                    // at low SPI clock (peanuts, but fastest)
}

The logic:

FIRST send data out, leave function, do some ofther stuff, and when transmitting the next byte, check if the last transmission has completed.

This code won't work neither :(( - somehow the while()-loop ends up looping forever. That means that the SPI interrupt flag is reset somehow through hardware, although SPI interrupts are disabled - THATS STRANGE - i didn't figure out why.

 

Here's why PIC16F is better in doing this. You can just code:

void SPI::write(uint8_t data)
{
    while (SSP0_STAT & BF_BIT); // wait as long transmission is in progress - works also for the very first transmission
                                // BF = Buffer Full
                                // there is no need to reset or set BF, it is
                                // controlled by hardware
    SSP0_BUF = data; // put data into the buffer, the transmission starts automatically, BF will be SET by hardware
}

 

But is there a workaround for AVR as well?? YES folks it is!! =)=)=) (but dirty^^)

With the following code you can be much more faster (i tried out e.g. with a ST7735 display with 8MHz SPI clock, seems to be twice as fast than the datasheet recommended code)

void SPI::write(uint8_t data)
{
	do
        {
            SPDR = data;
        } while (WCOL_BIT);
        SPI::dummy = SPSR; // clear IF & WCOL - static var is faster
        //SPI::dummy = SPDR; // works as well
}

When you try to write data into the SPDR register while a transmission is in progress, the WCOL (write collision flag) will be set - so it is a (dirty) way to indicate an ongoing transmission.

Although the datasheet says something else, it seems not to be necessary to reset WCOL "by hand" with "dummy = SPDR". If i start doing it as the datasheet says ("The WCOL bit (and the SPIF bit) are
cleared by first reading the SPI status register with WCOL set, and then accessing the SPI data register.")
the code doesn't work.

 

Here you have the assembly code compiled in ATMEL STUDIO 7 with ATmega328P, optimization -Os:

00000090 <_ZN3SPI5writeEh.isra.2>:
  90:	8e bd       	out	    0x2e, r24	; 46 write data to SPI data reg
  92:	0d b4       	in	    r0, 0x2d	; 45 read SPI status reg
  94:	06 fc       	sbrc	    r0, 6       ;    WCOL == 0?
  96:	fc cf       	rjmp	    .-8      	; 0x90 <_ZN3SPI5writeEh.isra.2>
  98:	8d b5       	in	    r24, 0x2d	; 45 read SPI status reg
  9a:	80 93 10 01 	sts	    0x0110, r24	; !MUST! be stored somewhere, otherwise it won't work
  9e:	08 95       	ret

 

!!! ATTENTION !!! - If you want to write and read at the same time (e.g. uint8_t write(uint8_t data)), it is necessary to use the code mentioned first, otherwise it is getting slow again.

 

Have fun :)

Last Edited: Fri. Nov 26, 2021 - 04:38 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

PeterPan1337 wrote:
i tried out e.g. with a ST7735 display with 8MHz SPI clock, seems to be twice as fast than the datasheet recommended code

There will be gaps in that transmission due to waiting, to get a gapless SPI transmission, use USART SPI because it is buffered.

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan
“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” - Antoine de Saint-Exupery

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Or, if the cycles in the blocking loop bother you use an interrupt (except for the overhead of getting into and out of an ISR!!)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok, very strange things happen here...it seems to me to really beeing a bug of AVR. I don't know what it should be, correct me, when i am wrong.

 

The "solution" mentionened above is none. The code doesn't work as expected. The problem also isn't the SPIF bit - it will be set. It seems to be, that it is forbidden to use the same registers from the outputbuffer of SPI is loaded with.

 

The config is: 16MHz, 8MHz SPI clock.

 

 

The following code produces no errors and works:

void write(uint8_t data)
	{
		//while (!(SPIF_BIT)); // not needed, as the "nops" are long enough to shift all bits out
		SPDR = data;
		asm volatile ("nop \n\t");
		asm volatile ("nop \n\t");
		asm volatile ("nop \n\t");
		asm volatile ("nop \n\t");
		asm volatile ("nop \n\t");
		asm volatile ("nop \n\t");
	}

corresponding assembly:

00000090 <_ZN3SPI5writeEh.isra.2>:
  90:    8e bd      out    0x2e, r24    ; 46 register r24 is the passed "data" variable
    ... ;<--- NOPs here
  9e:    08 95      ret

So, if we just need to wait, lets switch the NOPs and see what happens:

void write(uint8_t data)
	{
		//while (!(SPIF_BIT)); // not needed, as the "nops" are long enough to shift all bits out
		asm volatile ("nop \n\t");
		asm volatile ("nop \n\t");
		asm volatile ("nop \n\t");
		asm volatile ("nop \n\t");
		asm volatile ("nop \n\t");
		asm volatile ("nop \n\t");
		SPDR = data;
	}

corresponding assembly:

00000090 <_ZN3SPI5writeEh.isra.2>:
	...<--- NOPs here
  9c:	8e bd       	out	0x2e, r24	; 46
  9e:	08 95       	ret

This code doesn't work at all! There is transmission going on, as a LED at the clock pin indicates, but the display isn't handled correct.

 

a part of one function calling write():

b6:	80 e0       	ldi	r24, 0x00	; 0
b8:	0e 94 48 00 	call	0x90	; 0x90 <_ZN3SPI5writeEh.isra.2>
bc:	80 e0       	ldi	r24, 0x00	; 0
be:	0e 94 48 00 	call	0x90	; 0x90 <_ZN3SPI5writeEh.isra.2>
c2:	80 e0       	ldi	r24, 0x00	; 0
c4:	0e 94 48 00 	call	0x90	; 0x90 <_ZN3SPI5writeEh.isra.2>

So it seems, when r24 is loaded into SPDR and r24 is loaded, while SPI shift is in progress, it will corrupt data transmission - so there is no real load into SPDR?! If thats true, well....

 

If i decrease SPI clock, i have to increase NOPs as well to make it work again...

There are NO interrupt rountines running - not a single one.

 

Any ideas?

Last Edited: Thu. Nov 25, 2021 - 09:26 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

As others if you want to push it use the USART in SPI mode because it's buffered and easy to deal with.

 

Next why don't you not use this code :

void SPI_MasterTransmit(char cData)
{
/* Start transmission */
SPDR = cData;
/* Wait for transmission complete */
while(!(SPSR & (1<<SPIF)))
  ;
}
 

It's from the datasheet.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Ok it was all my fault...second post on this forum, second trash...feeling like trolling myself.

 

There was just another function, which need SPI to be flushed first...whats all

 

void SPI::write(uint8_t data)
{
    while(!(SPSR & 0x80));      // SPI Interrupt Flag  -EDIT: corrected this line
    SPDR = data;                // SPI Data Register
}

this works just fine...

 

this post can be deleted as it was intended as tutorial pls. realy sorry

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi sparrow2,

 

sparrow2 wrote:

 

As others if you want to push it use the USART in SPI mode because it's buffered and easy to deal with.

 

I don't want to use USART. I wanted this tutorial about SPI hardware, not about the fastest possible SPI transmission with USART.

You are right, the buffered USART can be faster - but only, if your code is too ;-)

I could also use USART only if i don't need it for other reasons. If i need to use both, SPI & USART, i want to achieve the fastest possible speed with my SPI hardware.

 

 

sparrow2 wrote:

 

Next why don't you not use this code :

[...]

It's from the datasheet.

As i wrote, this code is too slow. While waiting to flush out all bits, you cannot run other code.

Read again the first post i started this thread with, it is explained there.

The only reason to use the datasheets recommended code is e.g.: before clearing the CS pin, you need to wait, until all bits are sent, otherwise your slave will not receive the whole byte. But when not need that often, you could simply code a flush() function for this.

 

 

When fixed all bugs, i'll edit the first post of mine.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

PeterPan1337 wrote:
i'll edit the first post of mine.

please don't do that!

 

Existing replies relate to that post as it stands - so editing the original post will make a nonsense of the replies.

 

Add your updated code as another reply.

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That means, we are blocking our CPU beeing possible of computing 40µs / 0,0625µs - thats a total of 640 single cycle instructions!!! Wow thats a big number.

Not sure what problem you are trying to solve....if the SPI clock is turned way slow (maybe writing to a fish tank or toaster or something slow you don't care much about), then only check the flag occasionally---put checking/send inside a large code block (such as at the end of the block).  So by the time it gets checked after the previous send, the spi will probably be ready for the next send (or will definitely be ready on the next block check)...so you are sending slow and not wasting much time on the checking.  If you already have a timer (such as in IRQ) doing things & it is slow enough....you might always assume send, since it will always be ready by the next timer tick/IRQ...then NO checking used (however, this can be risky --such as when program gets changed later by someone).

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!