ATMEGA4809 VPort for SS

Go To Last Post
16 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi Guys,

 

I have been playing with some ideas to speed up SPI communication on the ATMega4809. As a test setup I have SPI connected to an ST7735 display module.

 

So one of the first steps I took was to avoid polling the transfer complete flag and just wait it out with NOP's before sending the next byte. This method is nicely explained by Nick Gammon (http://www.gammon.com.au/spi).

 

The second step was to use the buffer mode which allowed me to load 2 bytes at a time, this worked great for transferring 16 bit color information to the display without delays between the two bytes. So another win!

 

The third idea was to use a virtual IO port for chip select. The idea being that it should be faster and win me some time between transfers. But this is not working as I expected. Whenever I replace SS (PORTA PIN7) with it's mapped virtual port I get no response from the display. I can however see om my oscilloscope that the SS pin does go low and high as expected but no clock signal. So this lead me to believe that the SPI peripheral is somewhere between host and client mode. I also tried setting SSD in CTRLB but did not make a difference. Finally I also configured the normal PORTA PIN7 as an output and explicitly set SPI to host (SPI0.CTRA |= SPI_MASTER_bm) before every transfer, then it worked but with the additional register update (setting to host) I don't really gain any performance.

 

Any ideas why I just can't replace the SS pin with it's mapped VPort?

And I guess a second question is that if it worked, how much time will I win?

 

Here is the code (without the additional setting to master mode):

void SPI_Init(void)
{
    // Configure pins
    PORTA.DIRCLR = PIN5_bm;                          // MISO as input
    PORTA.DIRSET = PIN4_bm;                          // MOSI as output
    PORTA.DIRSET = PIN6_bm;                          // CLK as output
    //PORTA.DIRSET = PIN7_bm;                          // NSS as output ! Have to un-comment for VPort to work !
    VPORTA.DIR |= PIN7_bm;

    //PORTA.OUTSET = PIN7_bm;                          // Take NSS high
    VPORTA.OUT |= PIN7_bm;
}

void SPI_Begin(struct spi_t *instance)
{
    SPI0.CTRLB = (instance->mode & 0x03) | SPI_SSD_bm | SPI_BUFEN_bm;

    SPI0.CTRLA = (instance->dataOrder << SPI_DORD_bp) | 
            SPI_MASTER_bm | 
            SPI_CLK2X_bm | 
            instance->clockDivider;

    // Enable SPI
    SPI0.CTRLA |= SPI_ENABLE_bm;
}

uint8_t SPI_Transfer(struct spi_t *instance, uint8_t dataByte)
{
    // SPI0.CTRA |= SPI_MASTER_bm;    // This works but gives no performance gain
    // Pull CS low
    //PORTA.OUTCLR = PIN7_bm;
    VPORTA.OUT &= ~PIN7_bm;

    // Load byte
    SPI0.DATA = dataByte;
    
    // Wait 16 cycles, faster than polling TXCIF
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");
    asm volatile("nop    \n\t");

    // Clear transfer complete flag
    SPI0.INTFLAGS |= SPI_TXCIF_bm;

    // Release CS
    //PORTA.OUTSET = PIN7_bm;
    VPORTA.OUT |= PIN7_bm;

    // Return byte received
    return SPI0.DATA;
}

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

VPORT is quicker than PORT.  Both work.

 

Look at the SPI pins with a Logic Analyser.   If you wiggle the CS line before the SPI transfer has completed the SPI byte will be rejected by the ST7735.

 

Either use buffered mode with proper monitoring of the status bits.  Or wait the required number of NOPs.

 

You will get the best performance from your ST7735 by using the fastest SPI clock e.g. 10MHz with a 20MHz 4809.

And most importantly.  Using the appropriate ST7735 commands.

 

I have not read the Nick Gammon article.   With a legacy Mega SPI you can't write to SPI in less than 18 cycles.   However hard you try.

With legacy USART_MSPI or with a modern buffered SPI you should manage 16 cycles i.e. no gaps.

 

David.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
... or with a modern buffered SPI you should manage 16 cycles i.e. no gaps.
Most XMEGA USART have DMA plus master SPI mode; likely moot due to XMEGA shortage.

FBAUD < FPER / 2

FPER <= 32 MHz

 

XMEGA AU Manual

[page 293]

23.14 DMA Support

DMA support is available on UART, USRT, and master SPI mode peripherals. ...


XMEGA Lead Time, Dec'20 | AVR Freaks

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Thanks for the insights!

 

david.prentice wrote:

VPORT is quicker than PORT.  Both work.

 

Look at the SPI pins with a Logic Analyser.   If you wiggle the CS line before the SPI transfer has completed the SPI byte will be rejected by the ST7735.

For me it's not working with VPORT. I did check with my oscilloscope and although the CS line toggles there is no data transferred from the 4809 to the ST7735 (so it's not that the ST7735 rejects it, there is no data). Therefore my assumption that internally something is happening which switches the SPI to client mode when I am using VPORT. In the meantime I also found this (section 2.2.1), not directly my problem but maybe there is some additional silicone issues there.

 

david.prentice wrote:
Either use buffered mode with proper monitoring of the status bits.  Or wait the required number of NOPs.

That's what I have done already (in a separate function not shown in my previously posted code ) and it shaved about ~1us off each 16 bit transmission.

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
    //PORTA.DIRSET = PIN7_bm;                          // NSS as output ! Have to un-comment for VPort to work !
    VPORTA.DIR |= PIN7_bm;

    //PORTA.OUTSET = PIN7_bm;                          // Take NSS high
    VPORTA.OUT |= PIN7_bm;

What happens if you use PORTA.DIRSET to set the direction, and VPORTA.OUT high/low for SS?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I can assure you that both VPORT.DIR and PORTA.OUTSET will do the same thing.

 

You can try it in the Simulator.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:
What happens if you use PORTA.DIRSET to set the direction, and VPORTA.OUT high/low for SS?

Doing this I can then see some activity on the clock, data and CS lines but the display is blinking and does not seem happy.

While configured like this I also tried again to force master mode every time just before pulling CS low and then it works. So seems like using PORTA.DIRSET combined with VPORTA helps but the SPI is kind of floating in and out of master mode.

Just to confirm it, I tried again using only VPORTA for setting the pin direction and pulling it high/low and then there was no clock or data activity from the SPI.

 

david.prentice wrote:
I can assure you that both VPORT.DIR and PORTA.OUTSET will do the same thing.

Normally I would also assume so, but something is different in this case.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

swanepoeljan wrote:
... but the display is blinking and does not seem happy.

...

... but the SPI is kind of floating in and out of master mode.

Excessive ground bounce and/or overshoot will inject current through the ESD suppressors into the die's substrate which may cause anomalies (worst case is CMOS latch-up); a bit of series termination will reduce the effect especially with cables that have excessive impedance (closer impedance match between the AVR and the cable)

AVRxt has slew-rate limiting.

 

ATmega4808/4809 Data Sheet (search for SRL)

ESD and Transients | AVR040: EMC Design Considerations

...

It can by done by large series resistors, but that is not always an option. Large series resistors on input lines will increase the impedance of the ground path described above.

...

In the context of ESD, 'large' is 2 K ohm approximate (R-C-R with a time constant of 100 ns); an order of magnitude less by R-TVS-R.

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
 SPI0.INTFLAGS |= SPI_TXCIF_bm;

If you are into cycle counting why use |= instead of = (edit, or skip it altogether, the flag is not being used at all), it will also preserve/allow checking the SSIF bit to see when the spi is switching to slave if that is what its doing (assume it is by description). You could also check ctrla to see if/when the spi is switching to slave. Not hard to imagine they 'miswired' something internally since the spi pin configuration is a little complex. One would think these mcu's have been around long enough for all the bugs to be discovered, but its possible there is one here (just like datasheet errors linger around for years).

 

fyi, you can put assembler directives in asm statements-

 

// Wait 16 cycles, faster than polling TXCIF

asm(".rept 16\nnop\n.endr"); //repeat nop statement 16 times

 

 

Simple test I ran on a mega4809 curiosity board just using vport, and it works ok (viewing logic analyzer).

https://godbolt.org/z/vbxW968EM

 

#include <avr/io.h>
#include <stdbool.h>
//mega4809
static void spi_pins_init(){
    VPORTA.DIR |= 1<<4; // PA4 MOSI
    VPORTA.DIR |= 1<<6; // PA6 SCK
    VPORTA.DIR |= 1<<7; // PA7 /SS
    }
static void spi0_init(){
    spi_pins_init();
    SPI0.CTRLB = 4;     //SSD=1
    SPI0.CTRLA = 0x21;  //master, enable
    }

int main(void) {
    spi0_init();
    uint8_t n = 0;
    while (1) {
        VPORTA.OUT &= ~(1<<7); // ss=0
        SPI0.DATA = n++;
        while( ! SPI0.INTFLAGS ){} //non-buffer mode, IF (or WRCOL)
        VPORTA.OUT |= 1<<7;    // ss=1
        SPI0.DATA;             // read clears IF flag
        }
}

Last Edited: Sun. May 15, 2022 - 06:16 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

curtvm wrote:

 SPI0.INTFLAGS |= SPI_TXCIF_bm;

If you are into cycle counting why use |=

I also have a 16bit transfer function in which I don't count the cycles as I am not exactly sure of the delay between the two 8 bit transfers in this mode. I could probably measure it and add the cycles... Anyway, so to make sure that the flag is cleared before a 16 bit transfer I manually do it there. But like you said, I could probably get away by only using = This is the 16 bit transfer function which takes advantage of the buffer mode(before changing to VPORT):

uint16_t SPI_Transfer16(struct spi_t *instance, uint16_t data)
{
    // Pull CS low
    PORTA.OUTCLR = PIN7_bm;
    //VPORTA.OUT &= ~PIN7_bm;

    // Load first byte
    SPI0.DATA = (data >> 8);

    // Load second byte
    SPI0.DATA = data;
    
    // Wait for transmission to complete and clear flag
    while(!(SPI0.INTFLAGS & SPI_TXCIF_bm));
    SPI0.INTFLAGS |= SPI_TXCIF_bm;

    // Release CS
    PORTA.OUTSET = PIN7_bm;
    //VPORTA.OUT |= PIN7_bm;

    return (SPI0.DATA << 8) | SPI0.DATA;
}

Looking at your code example I noticed that you also set MOSI and SCK using VPORT (in my code I only used it for SS), so I tried that and the signals came alive on the scope! So seems like when using a peripheral I should not mix and match PORT and VPORT. But the display still did not respond. I was going to try your advice to monitor SSIF to check if it's switching between modes, when by chance I found that just adding a NOP before pulling CS low in the 16 bit transfer function fixed it. Since the signals look fine of the scope I think this might be a ST7735 quirk...

 

curtvm wrote:

fyi, you can put assembler directives in asm statements-

 

// Wait 16 cycles, faster than polling TXCIF

asm(".rept 16\nnop\n.endr"); //repeat nop statement 16 times

That's neat, thanks for the tip!

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My apologies.  Although PORTA.OUTSET and VPORTA.DIR will "do the same thing" it appears that you need a gap between the two styles.  e.g.

 

    PORTA.DIRSET = PIN6_bm;                          // CLK as output
    asm("nop");                                        //a 1-cycle gap
    VPORTA.DIR |= PIN7_bm;

 

It is fine to have a sequence of PORTA.DIRSET or VPORTA.DIR but you should not intertwine them.

 

I had previously used PORTA.DIRSET and VPORT0.DIR on Xmega.   But probably not intertwined.

 

Your original code set bit4 and bit7 ok.  but bit6 direction was not set.  You can see this in both Simulator or EDBG.

    PORTA.DIRSET = PIN4_bm;                          // MOSI as output

    PORTA.DIRSET = PIN6_bm;                          // CLK as output
    VPORTA.DIR |= PIN7_bm;

 

I suggest that you concentrate on using the INTFLAGS with buffered mode.

 

I will connect a Logic Analyser and show the behaviour (later today)

 

David.

 

p.s. if you are in regular SPI mode 16 NOPs does not work.   You need at least 18 cycles.  You can see this in both Simulator and in Debugger.

i.e. m4809 SPI behaves like legacy Mega SPI.

Last Edited: Sun. May 15, 2022 - 12:24 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

it appears that you need a gap between the two styles.  

I never noticed that before. I'm sure I have mixed them, but typically you use one or the other or if you do mix them they most likely are separated by at least an instruction.

 

I tested the same example with a mix of vport/port, and it does act odd if you mix up the port/vport back-to-back.

 

edit-

and can see when debugging, the pins that should be getting their dir bit set, do not. The mix of port/vport sequentially seems to have a problem, and its odd that there is nothing mentioned anywhere about this (that I have seen).

Last Edited: Sun. May 15, 2022 - 06:13 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
I had previously used PORTA.DIRSET and VPORT0.DIR on Xmega.   But probably not intertwined.

curtvm wrote:
I tested the same example with a mix of vport/port, and it does act odd if you mix up the port/vport back-to-back.

An interesting situation.  A sanity-check suggestion that came to mind:  In the "faulty" sequence, take a peek at the generated machine instructions, just in case the clever toolchain decided to do things a bit differently than expected.

You can put lipstick on a pig, but it is still a pig.

I've never met a pig I didn't like, as long as you have some salt and pepper.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The toolchain is doing as instructed, generating correct code as expected.

 

It seems to be that an access to vport directly after a port access invalidates/screws up the previous port setting (dir in this case)-

https://godbolt.org/z/x8cWT7ces

 

Not sure what else applies (out,in), but I guess could figure out if needed (edit- using out also applies the same).

 

I would also note there can be small 10ns glitches happening on the other pins when one of them is screwed up.

Last Edited: Mon. May 16, 2022 - 01:04 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I suppose that this is just something that normal human beings would never do.  i.e. mix PORTA.OUTSET and VPORTA.DIR

It probably does not happen with different ports e.g. PORTA.OUTSET and VPORTB.DIR

 

But I will investigate on the Xmega.   After all,  the Xmega is 15 years old now.   And since VPORTs are more valuable you often have less critical bits on PORTx

 

Regarding Health and Efficiency.   If you are setting a single bit it is STS on PORTA but SBI on VPORTA

Setting multiple bits at once is still STS on PORTA but becomes IN, ORI, OUT on VPORTA

 

David.

Last Edited: Mon. May 16, 2022 - 06:19 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Setting multiple bits at once is still STS on PORTA but becomes IN, ORI, OUT on VPORTA

in >95% (and if structured for speed close to 100%) of the time the register already "know" the other bits so it only will be OUT.