SAMD21 SPI transfer gaps

Go To Last Post
12 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

I am trying to make SPI transfers and don't understand why there are so long gaps between individual bytes transfers.

CPU frequency is 40MHz, SPI (SERCOM0) datarate 10,000,000 Bd.

I have checked the SCLK period is 100ns (correct, 10MHz) and CPU speed is 40 MIPS.

 

I tried code like this:

 

// try to read as fast as possible

((Sercom *)SERCOM0)->SPI.DATA.reg = 0x01; // start TX (=> also RX)

while (!((Sercom *)SERCOM0)->SPI.INTFLAG.bit.RXC); // wait until RX completed

k += ((Sercom *)SERCOM0)->SPI.DATA.reg; // read DATA to clear RXC flag

 

((Sercom *)SERCOM0)->SPI.DATA.reg = 0x02; // start TX (=> also RX)

while (!((Sercom *)SERCOM0)->SPI.INTFLAG.bit.RXC); // wait until RX completed

k += ((Sercom *)SERCOM0)->SPI.DATA.reg; // read DATA to clear RXC flag

 

... repeated 10x

 

Writes some data to SERCOM0_DATA, waits for RXC, reads received data. Again and again...

 

Why the gap between individual bytes is always about 700ns long ?

I expected much shorter gap, the waiting for RXC should take just few instruction cycles (25ns).

 

Or could anybody show me different method to make more effective transfers (without gaps) ?

Any ideas welcome.

 

I should note, that I also tried the ASF function  io_read()  and sending of 2 bytes takes about 15us;

optimal would be (I expected) <2us. 

 

-----

I have configured the project via Atmel Start, clock configuration is:

XOSC32K --> FDPLL96M_makes_80MHz --> Generic_Clock_Gen_0_has_DIV=2_and_makes_40MHz  --> CPU and all SERCOMs, all peripherals

CPU clock prescaler 1, NVM wait states 2.

SERCOM0 is used as SPI Master SCLK freq. 10MHz, (1 byte transfer is 8 x SCLK = 800ns).

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not familar with the UC3, but on a Mega usart as spi master, the tx is buffered, so if you are not interested in the rx data, poll the txbuffer empty flag instead of rxc flag.

You should be able to fill the tx buffer with the next byte while the first is being shifted out, with little if any gap at all.

 

Jim

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Gosh.

Never looked into SAMD. Had not realised it is ARM.

SAMD21 seems to have DMA. Use it.

https://duckduckgo.com/html?q=SA...

 

So your "long" gaps are 700ns

Try blinking that fast...

The stuff people complain about these days ... :)

 

I see a bunch of pointer redirection and additions to "k" in each line.

How much time does that take?

Look at your list output.

Paul van der Hoeven.
Bunch of old projects with AVR's:
http://www.hoevendesign.com

Last Edited: Mon. Jan 15, 2018 - 09:00 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There are buffers, so wait for data register empty, then send the next byte, before you read the current byte.

 

send #1

 

wait DRE

send #2

wait RX

read #1

 

wait DRE

send #3

wait RX

read #2

...

wait DRE

send n

wait RX

read n-1

...

wait DRE

send last

wait RX

read last-1

 

wait RX

read last

 

I think this should work without gaps. Needs testing, off course.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks Jim,

you are right, it is possible to write to DATA then wait for DRE (data register empty) and make next write to DATA.

It works without any gaps, so it is good for sending data out, if reading not needed.

But I need to read.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks El Tangas,

this method works, I tried to make 8 bytes transfer and measured gaps 0,300ns,200ns,100ns,300ns,200ns,200ns.

The problem is, it is based on assumption you are always able to read received data RX[i-1] before TX[i] is completed.

If reading is delayed, then you would lose RX[i-1] because overriden with RX[i].

So it would work correctly only if Interrupts disabled all along all transfers.

I make long SPI transfers and am using also USB middleware, better to not play with Interrupts.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can work exactly the same. Just keep track of the reads.
In other words, write if DRE. Read if RXC. When you have sent all the TX, wait for the last RXC.
.
Any SPI that writes one TX and waits for the RXC will have gaps. But it is a simple strategy e.g. used by Arduino Zero. Just inefficient.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks Paul,

I am affraid using DMA is really the only clear solution and am going to try it.

 

Pointer redirections are used as I wanted to use provided constants & data structures to access registers directly.

I have checked that compiler generates exactly the same code for these 2 versions of the same:

((Sercom *)SERCOM0)->SPI.DATA.reg = 0x01;

*(uint16_t *)((uint32_t)SERCOM0 + 0x28) = 0x01;

 

A little snippet of generated code (write #2 to DATA + waiting for RXC + reading DATA #2, repeating):

 

00004C16   movs r2, #2  

00004C18   str r2, [r3, #40]  

00004C1A   movs r2, r3  

00004C1C   ldrb r3, [r2, #24]  

00004C1E   lsls r3, r3, #29  

00004C20   bpl #-8  

00004C22   ldr r3, [pc, #424]  

00004C24   ldr r2, [r3, #40]  

00004C26   adds r1, r1, r2  

00004C28   uxth r1, r1  

 

00004C2A   movs r2, #3  

00004C2C   str r2, [r3, #40]  

00004C2E   movs r2, r3  

00004C30   ldrb r3, [r2, #24]  

00004C32   lsls r3, r3, #29  

00004C34   bpl #-8  

00004C36   ldr r3, [pc, #404]  

00004C38   ldr r2, [r3, #40]  

00004C3A   adds r1, r1, r2  

00004C3C   uxth r1, r1  

 

00004C3E   movs r2, #4

...

 

I guess the only points where should be same cache-penalty is instruction "bpl" (branch if plus ??), otherwise each instruction should take just the 25ns (1 cyc).

Gap 700ns presents about 28 CPU cycles. I don't see that many instructions here.

I still don't understand why RXC is set so late after transfer complete.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
You can work exactly the same. Just keep track of the reads. In other words, write if DRE. Read if RXC. When you have sent all the TX, wait for the last RXC. . Any SPI that writes one TX and waits for the RXC will have gaps. But it is a simple strategy e.g. used by Arduino Zero. Just inefficient.

 

Hi David,

are you saying that the El Tangas's idea is correct and should work ?

 

send #1

 

wait DRE

send #2

-------------<<< here

wait RX

read #1

 

What is interrupt occurs "here" ?

I suppose TX of #2 will have enough time to complete and DATA will contain byte received during TX #2. So you would lose data received during TX #1.

 

Last Edited: Tue. Jan 16, 2018 - 08:44 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, El Tangas shows the strategy.
.
There are several typical uses of SPI.
1. Write N bytes
2. Read N bytes
3. Write N bytes from buffer and Read N bytes and store to a buffer.
.
If N is large DMA is worth using.
Small N can use a simple loop.
.
Personally, I would have designed a universal int spi_transfer(void * txbuf, void *rxbuf, long N)
If txbuf is NULL you just write(0). If rxbuf is NULL you just ignore the read().
The SPI class would provide convenient methods that end up using the universal primitive function.
.
Blitting a TFT screen with a single colour involves up to 307200 SPI writes.
Blitting a TFT screen with a photo involves writing many writes from a buffer. Total 307200.
Writing a single pixel involves 13 SPI bytes.
.
David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Uprom wrote:

 

What is interrupt occurs "here" ?

I suppose TX of #2 will have enough time to complete and DATA will contain byte received during TX #2. So you would lose data received during TX #1.

 

 

I think it's ok because RX is double buffered (according to datasheet), so it can hold 2 data bytes. But you can test it.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You are right, according to datasheet: "Single-buffered transmitter, double-buffered receiver".

Then your procedure is ok.

 

I can't remember why I thought reversely "Double-buffered transmitter, single-buffered receiver"...

 

Thanks again.