What is the association of the three registers DIR DIRSET DIRCLR

Go To Last Post
23 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 

Hi,

 

I was reading Data sheet of ATXMEGA16D4 MCU( XMEGA D [MANUAL] 110
Atmel-8210G AVR XMEGA D 12/2014) and at page 110 they have said

"DIR This register sets the data direction for the individual pins of the port. If DIRn is written to one, pin n is configured as an output pin. If DIRn is written to zero, pin n is configured as an input pin."

"DIRSET   This register can be used instead of a read-modify-write to set individual pins as output. Writing a one to a bit will set the corresponding bit in the DIR register. Reading this register will return the value of the DIR register.
"

"DIRCLR This register can be used instead of a read-modify-write to set individual pins as input. Writing a one to a bit will clear the corresponding bit in the DIR register. Reading this register will return the value of the DIR register.
"

These registers have diffrent address, What is the principle of association and how is it mapped? Where  can I find it?

 

I  notice that  

 

I  find some information from  https://www.avrfreaks.net/forum/...  ,But that is not the answer I want

 

Last Edited: Tue. Jul 7, 2020 - 07:07 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

DIR is read-write
DIRSET, DIRCLR are write-only
.
Write-only registers are faster and easier to use e.g. for OUTSET, OUTCLR output but make little difference for the direction.
.
You get fastest performance from the VPORT registers.
.
Hey-ho. Simplicity and clarity are more important for most applications. Use whatever you are happiest with.
.
David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MaDuck wrote:
These registers have diffrent address,

Of course they do: they couldn't all have the same address - could they?

 

What is the principle of association and how is it mapped?

What do you mean by that?

 

At the end of the day, the addresses are just arbitrary - they are whatever the designer chose them to be. There doesn't have to be any logic or formula to it.

 

In practice, designers do like to keep things consistent - and that also helps users.

 

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


MaDuck wrote:
What is the principle of association and how is it mapped?
Traditional Tiny/Mega AVR used just 3 registers to control a port:

DDRx - direction
PORTx - output
PINx - input

so if you had already set some DDRB bits (say) and now wanted to set bit 5 to output you would do:

DDRB |= (1 << 5);

The compiler has three possible ways to implement this operation. One would be to:

LDS someReg, DDRB_Ram_addr
ORI someReg, 0x20
STS DDRB_Ram_addr, someReg

but it would likely find that DDRB_Ram_Addr was actually in the 0x20..0x5F address range so it could optimise this to be:

IN someReg, DDRB_IO_addr
ORI someReg, 0x20
OUT DDRB_IO_addr, someReg

If it further found that the register was actually in IO range 0x00..0x1F (RAM addr 0x20..0x3F) then it could optimise this again to be:

SBI DDRB_IO_add, 5

but the fact is that the only way (unless it optimised a single bit change in the 0x00..0x1F range to a SBI/CBI) was to do some kind of read-modify-write where "DDRB" was read into a register, bits changed and then written back. When Xmega came along they added some facilities and renamed the registers to have a more coherent naming scheme so for the 3 registers you have:

DDRB  -> PORTB_DIR
PORTB -> PORTB_OUT
PINB  -> PORTB_IN

so you could just use Xmega ports with those three alone and it would be just like the same on old tiny/mega AVR (except that some of those supported writing to PINx to toggle pins and this behaviour is not found in Xmega PORTx_IN). But Atmel thought they would do you a favour by adding "quick access" register support to the writable ones these 3 key registers. So as well as the core register itself, for each they added Set, Clear and Toggle registers with the suffixes SET, CLR, TGL. That is applied to PORTx_DIR and PORTx_OUT so now you have:

 

So for each of PORTB_DIR and PORT_OUT you now have the addition of PORTB_DIRCLR, PORTB_DIRSET, PORTB_DIRTGL and PORTB_OUTCLR, PORTB_OUTSET, PORTB_OUTTGL. So now you have a choice. Instead of being forced to use

PORTB_DIR |= (1 << 5);

to ensure that bit 5 is set and possibly invoking a read-modify-write sequence (unless it is a "VPORT" it won't be in range of either IN/OUT or CBI/SBI) then now you can use:

PORTB_DIRSET = (1 << 5);

and that will ONLY set bit 5 in the PORTB_DIR register but other bits will remain unchanged and it will be accomplished with a simple:

LDI someReg, 0x20
STS PORTB_DIRSET, someReg

On the one hand the addition of these extra registers makes Xmega IO ports "more efficient" than tiny/mega but the actual IO addressing in Xmega (PORTA registers are at 0x600, POTRB at 0x620 and so on) puts them way out of range of IO/OUT/CBI/SBI which are the very most efficient opcodes. To try and overcome that (as David alluded to in his post) some of these can be remapped into the "VPORT" area:

 

 

To buy back some efficiency that was lost by the whole ports being located up at 0x600+. Note however that when mapped to VPORTs it does not map the whole register block (each VPORT area is just four 8 bit registers) so only the key registers are mapped to the VPORTs:

 

 

If you put INTFLAGS off to one side for a moment and just consider the use of the ports for "plain" IO then you are almost back to the original Tiny/Mega 3 registers per port scenario with just DIR/OUT/IN and all the fancy facilities like CLR/SET/TGL are left behind. Curious world !

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MaDuck wrote:
I  find some information from  https://www.avrfreaks.net/forum/...  ,But that is not the answer I want

That thread refers to the SAM-D20, which is an ARM Cortex-M0+ - not an AVR.

However, the principles are the same.

 

In post #9 in that thread, Jacob wrote:

DIR is the actual register, while the DIRSET, DIRTGL and DIRCLR are "bit operator functions" that allow you to more efficiently modify content in the DIR register (by avoiding a read modify-write operation).

DIRSET corresponds to DIR = DIR | DIRSET

DIRCLR corresponds to DIR = DIR & ~DIRCLR

DIRTGL corresponds to DIR = DIR ^ DIRTGL

 

So if you did not have the set, clear and toggle registers your would need to read DIR, do a logic operation on the value, and write it back (aka read-modify-write). This is avoided by having these special registers.

Since it does not make sense to read the DIRSET, DIRCLR and DIRTGL registers, as they are just bit operator functions, reading them is therefore returning the value of DIR (the result of the operation).

In principle you could state that these are read-only registers - as Kartman says it doesn't make sense to read them back.

 

https://www.avrfreaks.net/commen...

 

That seems to be a pretty good summary - what's unclear?

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Tue. Jul 7, 2020 - 08:53 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

awneil wrote:
the principles are the same

A lot of chips have this arrangement of having a "main" register, and then some "extra" registers to allow you to set, clear, or toggle bits without a read-modify-write (RMW) operation.

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Tks,

My English is terrible, all depends on Google Translate

For example:

PORTC_DIRSET |= 0x01;//The PC.0 will been set to OUT(1).  

//My understanding is: PORTC_DIR |= 0x01.If so, how is this achieved

//Is the conversion done by internal logic circuits, or the functions provided by the AVR library?

 

 

PORTC_DIRCLR = 0x01;//The PC.0 will been set to IN(0).

//My understanding is: PORTC_DIR &= ^0x01 .If so, how is this achieved

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

MaDuck wrote:
PORTC_DIRSET |= 0x01;//The PC.0 will been set to OUT(1).  
With SET/CLR/TGL the whole idea is that you do NOT use read-modify-write operations like |= or &=~, the hardware is doing the OR/AND operation for you (effectively) so the CPU does not have to. That is the whole point of these additional registers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

MaDuck wrote:
PORTC_DIRCLR = 0x01;//The PC.0 will been set to IN(0).

//My understanding is: PORTC_DIR &= ^0x01 .If so, how is this achieved

 

Surely, that is well explained in that post in the other thread?

 

DIR is the actual register which controls the direction of the port pins.

 

The Set / Clear / Toggle registers just let you directly set / clear / toggle bits in the DIR register.

 

Taking the diagram from the SAM-D20 datasheet:

 

It's something  like this:

 

 

EDIT

 

diagram

 

 

EDIT 2

 

In the XMega datasheet, the DIR register bit for each pin is shown as a D-type latch:

 

 

 

But it is actually a D-Type with set and clear inputs:

 

So the bits of the DIRSET register drive the 'Set' and the bits of the DIRCLR register drive the 'Clear' (aka 'Reset') of each latch in the DIR register.

 

 

 

 

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Fri. Oct 23, 2020 - 12:24 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

MaDuck wrote:
PORTC_DIRSET |= 0x01;//The PC.0 will been set to OUT(1).  

clawson wrote:
With SET/CLR/TGL the whole idea is that you do NOT use read-modify-write operations

Indeed.

 

In fact, as David said right at the start, they are write-only - so you cannot do that!

 

Does the compiler flag this as an error ... ?

 

EDIT

 

correction: write only!

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
Last Edited: Tue. Jul 7, 2020 - 10:27 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

I tend to think in human terms e.g. to set a several port pins

 

Xmega with a read-modify-write access:   LDS  , ORI , STS i.e. 5 cycles

Xmega with write-only:  STS i.e. 2 cycles

 

On a Mega/Tiny most of the PORTs are in IN, OUT addressing range:  IN , ORI , OUT i.e. 3 cycles

 

If you are only setting a single bit on Mega, Tiny this can be done with SBI i.e. 2 cycles

 

Somehow,  they made Xmega SBI into a 1 cycle operation.   So setting a single bit on a VPORT is 1 cycle

 

There is a massive 5:2 advantage for Xmega to use OUTSET versus OUT for multiple bits

Or a 5:1 advantage to use VPORT versus OUT for a single port bit

 

I can't remember the exact ARM timing.   But using the write-only OUTSET on a SAM or STM32 chip is better than 5:2

 

I don't care how it is done in hardware.   I just look at the practical efficiency of OUTSET.

 

If you have understood the relevance of this for a port OUTPUT driver,   it is exactly the same for the DATA DIRECTION driver.

However,   you tend to use DIR, DIRSET as a one-off in setup() but OUT, OUTSET will be used millions and millions of times in loop()

 

David.

 

p.s. cycles typed from memory.    Check your datasheet or use the AS7.0 Simulator.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 2

Writing to the outset/outclr registers is also atomic, whereas a read/or/write can be interrupted partway through and screw everything up.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Likewise,   SBI / CBI is atomic.   So if you are changing single bits in a port the Xmega VPORT is the best of all worlds.

 

For multiple bits in a port,  OUTSET, OUTCLR is appropriate.

 

And most importantly,   the OUTSET, OUTCLR ... style is what you would use in ARM targets.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


david.prentice wrote:
the OUTSET, OUTCLR ... style is what you would use in ARM targets.

It has nothing specifically to do with ARM (although it is common in ARM-based chips).

 

Other chips also do it; eg, the AVR32:

 

 

and note how they clearly show that it's just one register, but with Read/Write, Set, Clear, and Toggle accesses!

 

smiley

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I can't remember the exact ARM timing.   But using the write-only OUTSET on a SAM or STM32 chip is better than 5:2

It depends on what you're counting.  ARM doesn't have special IO instructions, so you always have to put the 32bit address of the "port" into a register to start with (but that can be shared with other operations.)  After that it's load old/or constant/store new vs  load constant/store new, so only one "operation" different.  But then you run into CM0 vs CM3 differences - CM0 doesn't have an "Or Immediate", and doesn't have "load constant" for constants greater than 8 bits (and then it matters whether your IO locations are specified as word-wide or byte-wide, which depends on both what is possible and what is implemented in the .h files.  Sigh.)  That also consumes another register.   While the CM3 has both OR Immediate and "load many constants longer than 8bits", those are 32bit instructions, so while it may be faster, it doesn't necessarily get smaller.

 

 

Worst case: Cortex M0 or M0+, longish constant.  7-instructions, 20 bytes

 ;;;   reg32 |= 1<<23;
 
  10:	2380      	movs	r3, #128	; 0x80
  12:	4a03      	ldr	r2, [pc, #12]	; addr of register
  14:	041b      	lsls	r3, r3, #16 ; make 1<<23 from 0x80
  16:	6811      	ldr	r1, [r2, #0]    ; load
  18:	430b      	orrs	r3, r1      ;  or
  1a:	6013      	str	r3, [r2, #0]    ;   store
  1c:	4770      	bx	lr              ; return
  1e:	46c0      	nop
  20:	00000000 	.word	0x00000000  ;; literal pool: addr of register

 

CM3 for similar (5 instructions, 16 bytes):

    reg32 |= 1<<23;

  10:	4a02      	ldr	r2, [pc, #8]	; addr
  12:	6813      	ldr	r3, [r2, #0]    ; load
  14:	f443 0300 	orr.w	r3, r3, #8388608	; Or Immediate w 0x800000
  18:	6013      	str	r3, [r2, #0]    ; store
  1a:	4770      	bx	lr              ; return
  1c:	00000000 	.word	0x00000000

 

Best case: CM0 or CM3, small constants, "set."   4 instructions, 12 bytes.

;;;    reg32 = 1<<3;

  34:	2208      	movs	r2, #8    ; constant
  36:	4b01      	ldr	r3, [pc, #4   ; address of register
  38:	601a      	str	r2, [r3, #0]  ; store
  3a:	4770      	bx	lr            ; return
  3c:	00000000 	.word	0x00000000

 

It's probably worth noting that the ARM code remains essentially the same regardless of whether the port and bit(s) are constants or variables, while AVR code bloats up pretty quickly for variable anything.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I ran a simple sketch on STM32F072 (CM0), F103 (CM3), L476 (CM4), F446 (CM4)

It shows the differences between the OUTSET style of write-only style vs RMW or digitalWrite()

 

I had claimed that single-bit writes were 5:1 faster than RMW on Xmega VPORT

The ARM performance is similarly impressive on Cortex-M4.

 

I presume that I would get a similar result on SAMD21 (CM0),  SAM3X (CM3) and SAM4 (CM4)

 

I do 100 wiggles.   I measure the total time on Saleae Logic Analyser.    And calculate the machine cycles per edge.

The wiggles are too fast to see (except 48MHz CM0)

 

Note that pre-computing the Port address and Pin mask can give you good performance on the ARM.

Also note that the Arduino functions will give you the address of the ODR register.     Using Port addresses is much faster than digitalWrite()

You could pre-compute the address of the BSRR register (and the BSR register that is available on most STM32).    So could use the write-only registers too.

 

You can achieve similar results with the ATmeg4809 and other modern AVRs.   i.e. calculate addresses at runtime.

Obviously Ports and masks that are  known at compile-time will always generate the most efficient code.

 

Edit.  I have added the Port address approach to the example code.   And associated narrative.

//                 F072 @ 48MHz  F103 @ 64MHz  L476 @ 80MHz  F446 @ 180MHz  
//digitalWrite:     98.75us 47    101.6us 65     41.33us 33   18.5us 33cycle
//PortAddress:      14.83    7     17.29  11      7.67    6    3.96   7
//ReadModifyWrite:  10.58    5     14.21   9      7.67    6    2.83   5
//WriteOnly:         4.33    2      3.29   2      1.33    1    0.58   1

#define PIN_HIGH(port, pin)   (port)-> BSRR = (1<<(pin))
#define PIN_LOW(port, pin)    (port)-> BSRR = (1<<((pin)+16))

#define PIN_HIGHX(port, pin)   (port)-> ODR |= (1<<(pin))
#define PIN_LOWX(port, pin)    (port)-> ODR &= ~(1<<((pin)))

#define TGL_ARD { digitalWrite(8, HIGH); digitalWrite(8, LOW); }
//#define TGL_ADS { *d8Port |= d8PinSet; *d8Port &= ~d8PinSet; }
#define TGL_ADS { *d8Port |= d8PinSet; *d8Port &= d8PinClr; }
#define TGL_RMW { PIN_HIGHX(GPIOA, 9); PIN_LOWX(GPIOA, 9); }
#define TGL_WO  { PIN_HIGH(GPIOA, 9); PIN_LOW(GPIOA, 9); }

#define TGL  { TGL_ADS; }
#define TGL2 { TGL; TGL; }
#define TGL4 { TGL2; TGL2; }
#define TGL8 { TGL4; TGL4; }
#define TGL16 { TGL8; TGL8; }
#define TGL32 { TGL16; TGL16; }
#define TGL50 { TGL32; TGL16; TGL2; }

volatile uint32_t *d8Port;
uint32_t d8PinSet, d8PinClr;

void setup()
{
    Serial.begin(9600);
    Serial.print("toggle GPIO with OUTSET @ F_CPU = ");
    Serial.print(F_CPU / 1000000);
    Serial.println("MHz");
    pinMode(13, OUTPUT);
    pinMode(8, OUTPUT);  //toggle signal
    d8Port = portOutputRegister(digitalPinToPort(8));
    d8PinSet = digitalPinToBitMask(8);
    d8PinClr = ~d8PinSet;
    pinMode(9, OUTPUT);  //start, end signal
}

void loop()
{
    PIN_HIGH(GPIOC, 7);  //digital#9 PC7
    TGL50;   //100 edges digital#8 PA7 
    PIN_LOW(GPIOC, 7);
    digitalWrite(13, HIGH);
    delay(500);
    digitalWrite(13, LOW);
    delay(500);
}

 

Last Edited: Fri. Jul 10, 2020 - 08:11 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I do 100 wiggles.   I measure the total time on Saleae Logic Analyser.    And calculate the machine cycles per edge.

Sort of an optimal case for exposing the differences, since the slow parts of ARM are factor outside of the wiggles, and presumably you get a series of Store instructions, or load/or/store.

 

But it also points out another ARM complication that I left out - on many ARMs, the GPIO peripherals on on a "slower" bus, and may take multiple cycles to do a single read or write (even further benefiting the write-only version.)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
I ran a simple sketch on STM32F072 (CM0), F103 (CM3), L476 (CM4), F446 (CM4) ... I presume that I would get a similar result on SAMD21 (CM0),  SAM3X (CM3) and SAM4 (CM4)

 

An important thing to remember is that the GPIO implementation is nothing to do with ARM;  so it's not necessarily going to be useful to compare different manufacturer's parts - even if they share the same core (CM0/3/4/whatever)

 

For example, as westfw wrote:
on many ARMs, the GPIO peripherals on on a "slower" bus 

and that's before you even look at any differences in the GPIOs themselves

Top Tips:

  1. How to properly post source code - see: https://www.avrfreaks.net/comment... - also how to properly include images/pictures
  2. "Garbage" characters on a serial terminal are (almost?) invariably due to wrong baud rate - see: https://learn.sparkfun.com/tutorials/serial-communication
  3. Wrong baud rate is usually due to not running at the speed you thought; check by blinking a LED to see if you get the speed you expected
  4. Difference between a crystal, and a crystal oscillatorhttps://www.avrfreaks.net/comment...
  5. When your question is resolved, mark the solution: https://www.avrfreaks.net/comment...
  6. Beginner's "Getting Started" tips: https://www.avrfreaks.net/comment...
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

westfw wrote:

I do 100 wiggles.   I measure the total time on Saleae Logic Analyser.    And calculate the machine cycles per edge.

Sort of an optimal case for exposing the differences, since the slow parts of ARM are factor outside of the wiggles, and presumably you get a series of Store instructions, or load/or/store.

 

But it also points out another ARM complication that I left out - on many ARMs, the GPIO peripherals on on a "slower" bus, and may take multiple cycles to do a single read or write (even further benefiting the write-only version.)

 

Yes,  I was deliberately investigating "best possible" results from the "easiest" IDE.

I had a further play.

//                 F072 @ 48MHz  F103 @ 64MHz  L476 @ 80MHz  F446 @ 180MHz    SAM3X @ 84MHz  SAMD21 @ 48MHz  UNO @ 16MHz
//digitalWrite:     98.75us 47    101.6us 65     41.33us 33   18.5us 33cycle 207.2us 174      154.8us 74      356.5us  57
//PortAddress:      14.83    7     17.29  11      7.67    6    3.96   7       13.17   11       25.29  12       31.92    5
//ReadModifyWrite:  10.58    5     14.21   9      7.67    6    2.83   5       10.79    9       20.46  10       12.63    2
//WriteOnly:         4.33    2      3.29   2      1.33    1    0.58   1        2.38    2        8.96   4       12.63    2

and was surprised by the difference between the SAMD21 on my M0 Pro board versus the STM32F072 on my Nucleo board.   Both Cortex-M0 @ 48MHz.

 

So I looked at the generated code:

//SAMD21 TGL50 with OUTSET, OUTCLR
    2340        movs r3, #0x40
    493B        ldr r1, =0x41004418
    4A3C        ldr r2, =0x41004414
    600B        str r3, [r1]
    6013        str r3, [r2]
    600B        str r3, [r1]
    6013        str r3, [r2]

//F072 TGL50 with BSRR, BRR
    493B        ldr r1, =0x48000800
    05D2        lsls r2, r2, #23
    618B        str r3, [r1, #24]
    0064        lsls r4, r4, #1
    6193        str r3, [r2, #24]
    200D        movs r0, #13
    6293        str r3, [r2, #0x28]
    6193        str r3, [r2, #24]
    6293        str r3, [r2, #0x28]
    6193        str r3, [r2, #24]  

If I use BSRR register for both SET and RESET,   I get

    6199        str r1, [r3, #24]
    619A        str r2, [r3, #24]

Yes,  all end up with STR instructions to port addresses held in ARM registers.   But the STR on the SAMD21 takes 4 cycles compared with 2 cycles on the F072.

 

In practice,   I do care about the effective speed when writing millions of pixels to a large TFT display.

And the exercise was interesting to show the differences between digitalWrite(),  address write,   and compile-time RMW or WO

 

I was surprised by the SAM3X digitalWrite() performance on an Arduino Due board.

And pleasantly surprised by the runtime port address performance on all targets.

 

In simple terms.   You can use WO style for a library with GPIO known at compile-time e.g. Arduino Shield

Or you can use PortAddress style for a library that only knows the GPIO at run-time e.g. class constructor arguments.

 

David.

 

Last Edited: Sat. Jul 11, 2020 - 10:12 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But the STR on the SAMD21 takes 4 cycles

SAMD21 also has IOBUS for accessing at least some of the PORT registers in fewer cycles.  It's not documented particularly well ('m not sure whether the outset and outclr registers work when accessed via IOBUS.)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The SAM3X datasheet does not have IOBUS.

The SAMD21 datasheet has IOBUS but I can't make head nor tail of how to use it.

 

Perhaps I will see how the NXP parts behave on a Teensy3.2 and Teensy4.0

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Note:

Have in mind that when you go into very fast IO's (in the past FPGA's and DSP's but now micros also are getting there), you often have to slow down the IO's so the impedance match the other end.

So fast IO's can often be programmed to be strong or weak (in steps).

 

And David when you program " millions of pixels " what is that over ? (SPI, bit bang 8 or 16 bit memory, or real memory mapped)  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

A 320x480 TFT has got 153k pixels.   i.e. 614k edges on the write strobes of an 8080-8 interface as well as writing the pixel colour to the data bus.

A 800x480 TFT has got 384k pixels.   i.e. 768k edges on the write strobes of an 8080-16 interface.

 

320x480 TFTs with SPI interface require 306k SPI bytes.   i.e. 204ms @ 24MHz

800x480 TFTs with SPI interface require 768k SPI bytes.   i.e. 512ms @ 24MHz

 

Those figures relate to filling the whole rectangular screen in one go.    Whether you use parallel or SPI interface for random pixels,   it is much more expensive to write the pixel address as well as the pixel colour.

 

Yes,   Cortex-M4 is too fast.   But SAMD21 is 48MHz Cortex-M0.   It would be nice to compete with the ST CM0 chips.

Applies to SPI as well as GPIO writes.    48MHz SAMD21 is limited to 12MHz SPI.    48MHz F072 has 24MHz SPI.

 

Incidentally,   the Xmega has fast GPIO writes (with VPORT) as well as fast SPI (with USART_MSPI).

As noted in #2.   DIRSET, DIRCLR are seldom critical but OUTSET, OUTCLR performance is noticeable in some applications.

 

David.