Xmega virtual ports---definitely use them for manipulating single bits!

Go To Last Post
12 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I was surprised (or simply blissfully ignoring) how much more efficient the Xmega virtual ports are when dealing with single bits...much more code and speed efficient:

This is due to allowing the use of sbi & cbi & in & out


#include <avr/io.h>

#define PORTA_Virt  VPORT0 // just a naming substitution for clarity
#define PORTB_Virt  VPORT1  

int main(void){
  
//NOTE the port directions (.DIR) have not been configured, but need to be for actual usage

 PORTA_Virt.OUT |=PIN4_bm; // set a specific pin using virtual port 
 PORTA.OUTSET=PIN4_bm;  // set a pin using std port (same as PORTA_Virt, if already configured as noted below)
        // this code will compile 4x longer/slower  (3x If Z already set)!!
        
 PORTB_Virt.OUT &= ~PIN4_bm; // reset a pin using virtual port
 PORTB.OUTCLR=PIN4_bm;  // reset a pin using std port

 // the following configurations are needed, VPorts are UNASSIGNED  upon reset....these must be placed PRIOR to making the above virtual pin changes  
 PORTCFG.VPCTRLA=PORTCFG_VP0MAP_PORTA_gc|PORTCFG_VP1MAP_PORTB_gc; // assign the virtual ports A==>VP0, B==>VP1
 PORTCFG.VPCTRLB=PORTCFG_VP2MAP_PORTC_gc|PORTCFG_VP3MAP_PORTD_gc; // assign the virtual ports C==>VP2, D==>VP3
}

The compiled code shows a large improvement with the virtual ports:

 PORTA_Virt.OUT |=PIN4_bm; // set a specific pin using virtual port.....NICE AND QUICK
 184: 8c 9a        sbi 0x11, 4 ; 17
 
 PORTA.OUTSET=PIN4_bm;  // set a pin using std port (same as PORTA_Virt, if already configured as noted below)
 186: 80 e1        ldi r24, 0x10 ; 16
 188: e0 e0        ldi r30, 0x00 ; 0
 18a: f6 e0        ldi r31, 0x06 ; 6
 18c: 85 83        std Z+5, r24 ; 0x05   .......SING A SAD DOG SONG
        // this code will compile 4x longer/slower  (3x If Z already set)!!
        
 PORTB_Virt.OUT &= ~PIN4_bm; // reset a pin using virtual port
 18e: ac 98        cbi 0x15, 4 ; 21
 
 PORTB.OUTCLR=PIN4_bm;  // reset a pin using std port
 190: e0 e2        ldi r30, 0x20 ; 32
 192: f6 e0        ldi r31, 0x06 ; 6
 194: 86 83        std Z+6, r24 ; 0x06

 

It would be nice to have some sort of macro to allow  "virtual compatibility" (only for a single pin) with a statement such as:

 

PORTB_Virt.OUTCLR=PIN4_bm;    //same as PORTB_Virt.OUT &= ~PIN4_bm;

 

That shouldn't be impossible, I'll scratch my head on it 

 

Of course, the standard Xmega OUTCLEAR, allows multiple bits to be simultaneously cleared, at about the same rate/length as the mega/tiny read (IN), modify (AND/OR), write (OUT) dance

 

 

 

  

When in the dark remember-the future looks brighter than ever.   I look forward to being able to predict the future!

Last Edited: Sun. Jun 28, 2015 - 08:45 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

That shouldn't be impossible, I'll scratch my head on it 

But why bother if "PORTB_Virt.OUT &= ~PIN4_bm;" achieves the same thing? Why not just type that?!? After all it should become a single CBI.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Some people don't like typing I guess.

 

I'm wondering if the compiler would be clever enough to reduce compile time constants down to zero RAM and a single instruction. Like say you had something like

 

typedef struct = {
    REAL_PORT *real_port,
    VIRT_PORT *virt_port,
    uint8_t mask
    bool virtual;
} PORT_PIN_t;

const PORT_PIN_t led_pin = {NULL, &VPORT0, (1<<5), true);

static void set_pin_high(PORT_PIN_t *pin)
{
    if (!virtual)
        pin->real_port->OUTSET = pin->mask;
    else
        pin->virt_port->OUT |= pin->mask;
}

 

You could probably create a load of macros to generate everything you wanted and have a nice easy "Arduino" style interface if you wanted to.

 

But yes, I agree with clawson, just type faster :-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Interesting.

The largest known prime number: 282589933-1

It's easy to stop breaking the 10th commandment! Break the 8th instead. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's also worth noting that virtual ports can be slower in some circumstances. cbi/sbi can only act on one bit, where as the OUTSET/OUTCLR registers can operate on multiple bits.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yeah but the compiler KNOWS that. If you write code to affect 2+ bits on any AVR that has IO in SBI/CBI range the compiler will switch to IN/<and/or>/OUT - it's still faster than it doing LD/<and/or>/ST.

 

On an Xmega you should always map your four most heavily used GPIO to the VPorts.

 

In a thread previously (several years ago) I opined the wasted opportunity in Xmega - there's some IO space that could have been used for more VPorts that is just wasted :-(

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hay, the compiler did a good job :-) With optimization at -O1 and a real port:

 

static inline void pin_high(const PORT_PIN_t *pin)
{
	if (!pin->is_virtual)
		pin->real_port->OUTSET = pin->mask;
 22c:	81 e0       	ldi	r24, 0x01	; 1
 22e:	e0 e0       	ldi	r30, 0x00	; 0
 230:	f6 e0       	ldi	r31, 0x06	; 6
 232:	85 83       	std	Z+5, r24	; 0x05

 

And with a virtual port:

 

static inline void pin_high(const PORT_PIN_t *pin)
{
	if (!pin->is_virtual)
		pin->real_port->OUTSET = pin->mask;
	else
		pin->virtual_port->OUT |= pin->mask;
 22c:	88 9a       	sbi	0x11, 0	; 17

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

Yeah but the compiler KNOWS that. If you write code to affect 2+ bits on any AVR that has IO in SBI/CBI range the compiler will switch to IN/<and/or>/OUT - it's still faster than it doing LD/<and/or>/ST.

 

Yes, but with IN/OUT you have to do a read-modify-write, so three or four instructions (there is no immediate EOR for toggling). It's not atomic either.

 

Quote:
In a thread previously (several years ago) I opined the wasted opportunity in Xmega - there's some IO space that could have been used for more VPorts that is just wasted :-(

 

It would be nice, certainly, as would virtual SET/CLR/TGL registers. I imagine the limit is in silicon though, with MUXes being somewhat expensive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, but with IN/OUT you have to do a read-modify-write, so three or four instructions (there is no immediate EOR for toggling). It's not atomic either.

But this is nothing to do with Xmega/Vports. It's always been like this...

$ cat avr.c
#include <avr/io.h>

int main(void) {
	PORTB |= (1 << 3);
	PORTB &= ~(1 << 5);
	PORTB |= 0xAA;
	PORTB &= ~0xAA;
}
$ avr-gcc -mmcu=atmega16 -Os -g avr.c -o avr.elf
$ avr-objdump -S avr.elf

   [SNIP!]
   
0000006c <main>:
#include <avr/io.h>

int main(void) {
	PORTB |= (1 << 3);
  6c:	c3 9a       	sbi	0x18, 3	; 24
	PORTB &= ~(1 << 5);
  6e:	c5 98       	cbi	0x18, 5	; 24
	PORTB |= 0xAA;
  70:	88 b3       	in	r24, 0x18	; 24
  72:	8a 6a       	ori	r24, 0xAA	; 170
  74:	88 bb       	out	0x18, r24	; 24
	PORTB &= ~0xAA;
  76:	88 b3       	in	r24, 0x18	; 24
  78:	8a 7a       	andi	r24, 0x55	; 170
  7a:	88 bb       	out	0x18, r24	; 24
}
  7c:	08 95       	ret

0000007e <_exit>:
  7e:	f8 94       	cli

00000080 <__stop_program>:
  80:	ff cf       	rjmp	.-2      	; 0x80 <__stop_program>

That's mega16 code.

 

The point of this thread surely is: you can get GPIO performance on an Xmega as good as you can get on tiny/mega *IF* you map your 4 most used ports to the VPorts.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sure, my point is that with the XMEGA ports that have OUTSET, OUTCLR and OUTTGL which occasionally can be faster if you need atomic updates, or if you want to generate a little square wave or something like that. It's a rare corner case but still...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mojo-chan wrote:

static inline void pin_high(const PORT_PIN_t *pin)
{
	if (!pin->is_virtual)
		pin->real_port->OUTSET = pin->mask;
 22c:	81 e0       	ldi	r24, 0x01	; 1
 22e:	e0 e0       	ldi	r30, 0x00	; 0
 230:	f6 e0       	ldi	r31, 0x06	; 6
 232:	85 83       	std	Z+5, r24	; 0x05

 

What I don't get is why the compiler bother going through Z in the first place (when it is used as a constant that is).

This would do the same, in less cycles and in less codespace:

ldi  r24, 0x01
sts  0x0605, r24

I never bothered with virtual ports. I find outclr/outset of greater value to me, avoiding the problems with non-atomic read-modify-writes.

What I like about XMega and bit-addressing, is the gpio registers at address 0-15. They are great for boolean flags. Setting and testing bits there is fast and atomic.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It uses Z if the same (or other close) destination is accessed several times in the same function. It then saves code. Observe:

$ cat avr.c
#include <avr/io.h>

int main(void) {
	UDR0 = 0x55;
}
$ avr-gcc -mmcu=atmega168 -Os -g avr.c -o avr.elf
$ avr-objdump -S avr.elf | tail -n 16

00000080 <main>:
#include <avr/io.h>

int main(void) {
	UDR0 = 0x55;
  80:	85 e5       	ldi	r24, 0x55	; 85
  82:	80 93 c6 00 	sts	0x00C6, r24
}
  86:	08 95       	ret

00000088 <_exit>:
  88:	f8 94       	cli

0000008a <__stop_program>:
  8a:	ff cf       	rjmp	.-2      	; 0x8a <__stop_program>

but:

$ cat avr.c
#include <avr/io.h>

int main(void) {
	UDR0 = 0x55;
	UDR0 = 0xAA;
	UDR0 = 0xFF;
	UDR0 = 0x00;
}
$ avr-gcc -mmcu=atmega168 -Os -g avr.c -o avr.elf
$ avr-objdump -S avr.elf | tail -n 22
int main(void) {
	UDR0 = 0x55;
  80:	e6 ec       	ldi	r30, 0xC6	; 198
  82:	f0 e0       	ldi	r31, 0x00	; 0
  84:	85 e5       	ldi	r24, 0x55	; 85
  86:	80 83       	st	Z, r24
	UDR0 = 0xAA;
  88:	8a ea       	ldi	r24, 0xAA	; 170
  8a:	80 83       	st	Z, r24
	UDR0 = 0xFF;
  8c:	8f ef       	ldi	r24, 0xFF	; 255
  8e:	80 83       	st	Z, r24
	UDR0 = 0x00;
  90:	10 82       	st	Z, r1
}
  92:	08 95       	ret

00000094 <_exit>:
  94:	f8 94       	cli

00000096 <__stop_program>:
  96:	ff cf       	rjmp	.-2      	; 0x96 <__stop_program>