Xmega virtual ports---definitely use them for manipulating single bits!

Go To Last Post
28 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I was surprised (or simply blissfully ignoring) how much more efficient the Xmega virtual ports are when dealing with single bits...much more code and speed efficient:

This is due to allowing the use of sbi & cbi & in & out


#include <avr/io.h>

#define PORTA_Virt  VPORT0 // just a naming substitution for clarity
#define PORTB_Virt  VPORT1  

int main(void){
  
//NOTE the port directions (.DIR) have not been configured, but need to be for actual usage

 PORTA_Virt.OUT |=PIN4_bm; // set a specific pin using virtual port 
 PORTA.OUTSET=PIN4_bm;  // set a pin using std port (same as PORTA_Virt, if already configured as noted below)
        // this code will compile 4x longer/slower  (3x If Z already set)!!
        
 PORTB_Virt.OUT &= ~PIN4_bm; // reset a pin using virtual port
 PORTB.OUTCLR=PIN4_bm;  // reset a pin using std port

 // the following configurations are needed, VPorts are UNASSIGNED  upon reset....these must be placed PRIOR to making the above virtual pin changes  
 PORTCFG.VPCTRLA=PORTCFG_VP0MAP_PORTA_gc|PORTCFG_VP1MAP_PORTB_gc; // assign the virtual ports A==>VP0, B==>VP1
 PORTCFG.VPCTRLB=PORTCFG_VP2MAP_PORTC_gc|PORTCFG_VP3MAP_PORTD_gc; // assign the virtual ports C==>VP2, D==>VP3
}

The compiled code shows a large improvement with the virtual ports:

 PORTA_Virt.OUT |=PIN4_bm; // set a specific pin using virtual port.....NICE AND QUICK
 184: 8c 9a        sbi 0x11, 4 ; 17
 
 PORTA.OUTSET=PIN4_bm;  // set a pin using std port (same as PORTA_Virt, if already configured as noted below)
 186: 80 e1        ldi r24, 0x10 ; 16
 188: e0 e0        ldi r30, 0x00 ; 0
 18a: f6 e0        ldi r31, 0x06 ; 6
 18c: 85 83        std Z+5, r24 ; 0x05   .......SING A SAD DOG SONG
        // this code will compile 4x longer/slower  (3x If Z already set)!!
        
 PORTB_Virt.OUT &= ~PIN4_bm; // reset a pin using virtual port
 18e: ac 98        cbi 0x15, 4 ; 21
 
 PORTB.OUTCLR=PIN4_bm;  // reset a pin using std port
 190: e0 e2        ldi r30, 0x20 ; 32
 192: f6 e0        ldi r31, 0x06 ; 6
 194: 86 83        std Z+6, r24 ; 0x06

 

It would be nice to have some sort of macro to allow  "virtual compatibility" (only for a single pin) with a statement such as:

 

PORTB_Virt.OUTCLR=PIN4_bm;    //same as PORTB_Virt.OUT &= ~PIN4_bm;

 

That shouldn't be impossible, I'll scratch my head on it 

 

Of course, the standard Xmega OUTCLEAR, allows multiple bits to be simultaneously cleared, at about the same rate/length as the mega/tiny read (IN), modify (AND/OR), write (OUT) dance

 

 

 

  

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sun. Jun 28, 2015 - 08:45 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

That shouldn't be impossible, I'll scratch my head on it 

But why bother if "PORTB_Virt.OUT &= ~PIN4_bm;" achieves the same thing? Why not just type that?!? After all it should become a single CBI.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Some people don't like typing I guess.

 

I'm wondering if the compiler would be clever enough to reduce compile time constants down to zero RAM and a single instruction. Like say you had something like

 

typedef struct = {
    REAL_PORT *real_port,
    VIRT_PORT *virt_port,
    uint8_t mask
    bool virtual;
} PORT_PIN_t;

const PORT_PIN_t led_pin = {NULL, &VPORT0, (1<<5), true);

static void set_pin_high(PORT_PIN_t *pin)
{
    if (!virtual)
        pin->real_port->OUTSET = pin->mask;
    else
        pin->virt_port->OUT |= pin->mask;
}

 

You could probably create a load of macros to generate everything you wanted and have a nice easy "Arduino" style interface if you wanted to.

 

But yes, I agree with clawson, just type faster :-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Interesting.

If you don't know my whole story, keep your mouth shut.

If you know my whole story, you're an accomplice. Keep your mouth shut. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's also worth noting that virtual ports can be slower in some circumstances. cbi/sbi can only act on one bit, where as the OUTSET/OUTCLR registers can operate on multiple bits.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yeah but the compiler KNOWS that. If you write code to affect 2+ bits on any AVR that has IO in SBI/CBI range the compiler will switch to IN/<and/or>/OUT - it's still faster than it doing LD/<and/or>/ST.

 

On an Xmega you should always map your four most heavily used GPIO to the VPorts.

 

In a thread previously (several years ago) I opined the wasted opportunity in Xmega - there's some IO space that could have been used for more VPorts that is just wasted :-(

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hay, the compiler did a good job :-) With optimization at -O1 and a real port:

 

static inline void pin_high(const PORT_PIN_t *pin)
{
	if (!pin->is_virtual)
		pin->real_port->OUTSET = pin->mask;
 22c:	81 e0       	ldi	r24, 0x01	; 1
 22e:	e0 e0       	ldi	r30, 0x00	; 0
 230:	f6 e0       	ldi	r31, 0x06	; 6
 232:	85 83       	std	Z+5, r24	; 0x05

 

And with a virtual port:

 

static inline void pin_high(const PORT_PIN_t *pin)
{
	if (!pin->is_virtual)
		pin->real_port->OUTSET = pin->mask;
	else
		pin->virtual_port->OUT |= pin->mask;
 22c:	88 9a       	sbi	0x11, 0	; 17

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

Yeah but the compiler KNOWS that. If you write code to affect 2+ bits on any AVR that has IO in SBI/CBI range the compiler will switch to IN/<and/or>/OUT - it's still faster than it doing LD/<and/or>/ST.

 

Yes, but with IN/OUT you have to do a read-modify-write, so three or four instructions (there is no immediate EOR for toggling). It's not atomic either.

 

Quote:
In a thread previously (several years ago) I opined the wasted opportunity in Xmega - there's some IO space that could have been used for more VPorts that is just wasted :-(

 

It would be nice, certainly, as would virtual SET/CLR/TGL registers. I imagine the limit is in silicon though, with MUXes being somewhat expensive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, but with IN/OUT you have to do a read-modify-write, so three or four instructions (there is no immediate EOR for toggling). It's not atomic either.

But this is nothing to do with Xmega/Vports. It's always been like this...

$ cat avr.c
#include <avr/io.h>

int main(void) {
	PORTB |= (1 << 3);
	PORTB &= ~(1 << 5);
	PORTB |= 0xAA;
	PORTB &= ~0xAA;
}
$ avr-gcc -mmcu=atmega16 -Os -g avr.c -o avr.elf
$ avr-objdump -S avr.elf

   [SNIP!]
   
0000006c <main>:
#include <avr/io.h>

int main(void) {
	PORTB |= (1 << 3);
  6c:	c3 9a       	sbi	0x18, 3	; 24
	PORTB &= ~(1 << 5);
  6e:	c5 98       	cbi	0x18, 5	; 24
	PORTB |= 0xAA;
  70:	88 b3       	in	r24, 0x18	; 24
  72:	8a 6a       	ori	r24, 0xAA	; 170
  74:	88 bb       	out	0x18, r24	; 24
	PORTB &= ~0xAA;
  76:	88 b3       	in	r24, 0x18	; 24
  78:	8a 7a       	andi	r24, 0x55	; 170
  7a:	88 bb       	out	0x18, r24	; 24
}
  7c:	08 95       	ret

0000007e <_exit>:
  7e:	f8 94       	cli

00000080 <__stop_program>:
  80:	ff cf       	rjmp	.-2      	; 0x80 <__stop_program>

That's mega16 code.

 

The point of this thread surely is: you can get GPIO performance on an Xmega as good as you can get on tiny/mega *IF* you map your 4 most used ports to the VPorts.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sure, my point is that with the XMEGA ports that have OUTSET, OUTCLR and OUTTGL which occasionally can be faster if you need atomic updates, or if you want to generate a little square wave or something like that. It's a rare corner case but still...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mojo-chan wrote:

static inline void pin_high(const PORT_PIN_t *pin)
{
	if (!pin->is_virtual)
		pin->real_port->OUTSET = pin->mask;
 22c:	81 e0       	ldi	r24, 0x01	; 1
 22e:	e0 e0       	ldi	r30, 0x00	; 0
 230:	f6 e0       	ldi	r31, 0x06	; 6
 232:	85 83       	std	Z+5, r24	; 0x05

 

What I don't get is why the compiler bother going through Z in the first place (when it is used as a constant that is).

This would do the same, in less cycles and in less codespace:

ldi  r24, 0x01
sts  0x0605, r24

I never bothered with virtual ports. I find outclr/outset of greater value to me, avoiding the problems with non-atomic read-modify-writes.

What I like about XMega and bit-addressing, is the gpio registers at address 0-15. They are great for boolean flags. Setting and testing bits there is fast and atomic.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It uses Z if the same (or other close) destination is accessed several times in the same function. It then saves code. Observe:

$ cat avr.c
#include <avr/io.h>

int main(void) {
	UDR0 = 0x55;
}
$ avr-gcc -mmcu=atmega168 -Os -g avr.c -o avr.elf
$ avr-objdump -S avr.elf | tail -n 16

00000080 <main>:
#include <avr/io.h>

int main(void) {
	UDR0 = 0x55;
  80:	85 e5       	ldi	r24, 0x55	; 85
  82:	80 93 c6 00 	sts	0x00C6, r24
}
  86:	08 95       	ret

00000088 <_exit>:
  88:	f8 94       	cli

0000008a <__stop_program>:
  8a:	ff cf       	rjmp	.-2      	; 0x8a <__stop_program>

but:

$ cat avr.c
#include <avr/io.h>

int main(void) {
	UDR0 = 0x55;
	UDR0 = 0xAA;
	UDR0 = 0xFF;
	UDR0 = 0x00;
}
$ avr-gcc -mmcu=atmega168 -Os -g avr.c -o avr.elf
$ avr-objdump -S avr.elf | tail -n 22
int main(void) {
	UDR0 = 0x55;
  80:	e6 ec       	ldi	r30, 0xC6	; 198
  82:	f0 e0       	ldi	r31, 0x00	; 0
  84:	85 e5       	ldi	r24, 0x55	; 85
  86:	80 83       	st	Z, r24
	UDR0 = 0xAA;
  88:	8a ea       	ldi	r24, 0xAA	; 170
  8a:	80 83       	st	Z, r24
	UDR0 = 0xFF;
  8c:	8f ef       	ldi	r24, 0xFF	; 255
  8e:	80 83       	st	Z, r24
	UDR0 = 0x00;
  90:	10 82       	st	Z, r1
}
  92:	08 95       	ret

00000094 <_exit>:
  94:	f8 94       	cli

00000096 <__stop_program>:
  96:	ff cf       	rjmp	.-2      	; 0x96 <__stop_program>

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Since the Atmega328 need 2 Clocks to change PIN-Output  like PINA=1 , I was dissapointet to see

ATXMEGA 16A4 need 4 Clocks like PORTA.OUTTGL=1 .

 

My question is what is the 2 Clockcode  for  ATXMEGA to toggle a OUTPUT-PIN ?

 

I thought Virtual-Ports with direkt addressing may be faster therefore I place my question here ?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There is no 2-cycle toggle for an arbitrary IO pin on the XMEGA.

You're example of PORTA.OUTTGL=1 only takes 3 cycles.

3 cycles @ 32MHz is 94ns is less than 2 cycles @ 20MHz.

So I'd say the XMEGA is still faster than the MEGA.

 

If you truly need 2-cycle or better pulse timing on the XMEGA start a new thread explaining what you're trying to achieved.

There are multiples tricks using optimizations & peripherals to achieve precision timings.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

With Atxmega16A4 and Atxmega128a1 i measure 8Mhz with this code

PORTA.OUTTGL=1;PORTA.OUTTGL=1;PORTA.OUTTGL=1;................100xtimes

With my UNO (16Mhz Quarz)  :  PINA=1;PINA=1;PINA=1;........100xtimes

i measure 8Mhz !

 

How can I get the XMEGA-clock ?. It shoult be 32 Mhz with this

 

// Use 2Mhz internal RC with 16x PLL

  OSC.PLLCTRL = OSC_PLLSRC_RC2M_gc | 16 ;
  OSC.CTRL |= OSC_PLLEN_bm ; // enable the PLL...
  while( (OSC.STATUS & OSC_PLLRDY_bm) == 0 ){} // wait until it's stable

  // And now we can (finally) switch to the PLL as a clocksource.
  CCP = CCP_IOREG_gc;       // protected write follows   
  CLK.CTRL = CLK_SCLKSEL_PLL_gc;  // The System clock is now the PLL output (2MhzRC * 16)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Surely you'd be better off mapping to the VPORTs (subject of this thread)? It will require an RMW so that nay sap some performance gained in IO space.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You can do everything in one cycle with VPORTs.  SET, CLR or TGL.

 

Xmega does SBI in one cycle.   Don't ask me how.   Mega takes 2 cycles.

 

David.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Are you sure VPORTs support the extra SET/CLR/TGL? I thought each VPORT location only duplicated 4 of the ports base registers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No, you don't say PORTA.OUTSET = (1<<bit)

You say VPORT1.OUT |= (1<<bit) after assigning PORTA to VPORT1

 

The Compiler produces an SBI instruction because VPORTs are all in IN/OUT memory.

 

Xmegas have got lots of PORTs.  So you have to choose the "best" ones to access with the 4 possible VPORTs.

The XTinys have only got a few PORTs so you get VPORTs automagically.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

He's talking about TGL. I don't think the reduced VPORT group (certainly a subset of 4 in original Xmega) has TGL in the group. Maybe these newer ones do?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

VPORT1.IN = (1<<bit) compiles as an OUT instruction.   Which is one cycle even on a Mega.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My measure was right but my interpretation was wrong.

Atmega328p  PINA needs 1 Clock

xmega PORTA.OUTTGL needs >= 2 Clocks

So my question as David wrote is , how to get 1Clock-Instruction  to get Pin-Change ?

 

I am looking for Xmega to get more speed with hardware application but I realize its no advantage !

 

Thx for answer

John

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Go on.   Xmega is brilliant for GPIO.   One cycle for changing a single bit.    You can toggle 1-8 bits in one cycle with any AVR.   Since Xmega is clocked faster than a regular Mega you are always faster per cycle.

 

Yes,   you have to choose the best 4 PORTs to use with VPORT0-VPORT3 on an Xmega.

XTinys have VPORTA-VPORTD always available.

 

Xmega beats Cortex-M0 hands down.   And compares with Cortex-M3 and Cortex-M4 at faster clocks at least for external 8-bit bus e.g.8080.8 or 6800-8.

 

David.

Last Edited: Thu. Jun 11, 2020 - 07:38 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

VPORT1.IN = (1<<bit)  does in-fact compile as an OUT instruction. But, it does not do anything to the IO pin. The datasheets shows VPORTx.IN as R/W which is apparently a typo. I just tested it on an 16A4U and the scope showed nothing.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The XMEGA has tremendous advantages over the MEGA. I'd say the most important are DMA, multi-level interrupts and the event system. Combining DMA with SPI you can get arbitrary signals with a resolution of 62.5ns; That's how I drive WS2812 LEDs.

 

As I suggested before, start a new thread describing what you're trying to achieve.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

My apologies.   I had used VPORT.OUT and VPORT.DIR but VPORT.IN only for reading.   Writing to VPORT.IN does not toggle the pin.

 

#define F_CPU 2000000uL

#include <avr/io.h>
#include <util/delay.h>

#define MASK (1<<4)        // wiggle PC4 pin

int main(void)
{
    PORTCFG.VPCTRLB = 0x32; //VPORT2=PORTC, VPORT3=PORTD
    VPORT2.DIR |= MASK;
    while (1) 
    {
        VPORT2.OUT |= MASK;  // 1 cycle
        VPORT2.OUT &= ~MASK; // 1 cycle
        VPORT2.OUT |= MASK;
        VPORT2.OUT &= ~MASK;
        VPORT2.OUT |= MASK;
        VPORT2.OUT &= ~MASK;
        asm("nop");
        VPORT2.IN = MASK;    // 1 cycle but it does not work
        VPORT2.IN = MASK;
        VPORT2.IN = MASK;
        VPORT2.IN = MASK;
        asm("nop");
        PORTC.OUTSET = MASK; // 2 cycle
        PORTC.OUTCLR = MASK;
        PORTC.OUTSET = MASK;
        PORTC.OUTCLR = MASK;
        asm("nop");
        PORTC.OUTTGL = MASK; // 2 cycle
        PORTC.OUTTGL = MASK;
        PORTC.OUTTGL = MASK;
        PORTC.OUTTGL = MASK;
        asm("nop");
    }
}

In practice,  you would simply read the current state of the latch with VPORT.IN and use subsequent SBI, CBI instructions.

Since SBI, CBI are 1 cycle on an Xmega there is no advantage in a 1 cycle write to the IN register.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thx David,

now I can toggle 16Mhz with Virtual-Ports .

 

I dont understand your VPORT2.IN=.. , I think a read-Command is use as  x=VPORT2.IN .

I test  virtual Read with your sketch and it works

#define readpin (1<<5)        //PC5 pin  connect with MASK

   

while(1)  //toggle Mask-Pin till Virtual readError    (Scope-Test)
    {
        VPORT2.OUT|= MASK;  asm("NOP"); asm("NOP");      //2 NOPS needed ! ! !
        if(VPORT2.IN & readpin) {VPORT2.OUT &=~MASK;  asm("NOP");  asm("NOP");  if(VPORT2.IN & readpin) break;} else break;
    }

/*
    while(1) //toggle Mask-Pin till readError    (Scope-Test)
    {
        PORTC.OUT|= MASK;asm("NOP");   //1NOP needed
        if(PORTC.IN & readpin) {PORTC.OUT &=~MASK;asm("NOP");if(PORTC.IN & readpin) break;} else break;
    }
*/

Do you know reduce the NOPS with Config the  readpin :PINnCTRL – Pin n Configuration register ?

(The readpin is only connect with the MASK-Pin )

 

John

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

avrfreaks123123 wrote:

now I can toggle 16Mhz with Virtual-Ports .

Yes,  an SBI / CBI will be 2 cycles.   i.e. 16MHz square wave when F_CPU = 32MHz.

But you will need to put all the statements in an unrolled loop to sustain a steady 16MHz.

 

You should use the hardware if you "want" 16MHz.

 

I dont understand your VPORT2.IN=..

 

On a Mega or Tiny you can toggle the port pins by writing to PINC e.g. PINC = (1<<5) will toggle PC5 pin.

I assumed erroneously that VPORT2.IN would work the same way.

 

Yes,   you probably need a NOP before reading with VPORT2.IN after writing with VPORT2.OUT

If you want to toggle pins you generally know the initial state of the pin.

 

David.

 

 

Last Edited: Sat. Jun 13, 2020 - 10:59 PM