USB performance on XMEGA

Go To Last Post
16 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've been experimenting with a minimal USB stack based on the nonolith/Kevin Mehall one. I made a few fixes for XMEGA and successfully implemented a bootloader.

 

In benchmarks sending a 64 byte buffer via a 64 byte bulk endpoint, the maximum transfer rate is limited to 1.9Mb/sec. That's pretty far from the 12Mb bus speed. Okay, 12Mb includes all the protocol overhead, coding and the like, but I'm a little disappointed with 2Mb.

 

I did a similar experiment with the ASF stack and got 1.9Mb, the same.

 

Has anyone managed to get over 2Mb/sec from an XMEGA? I'm thinking that this might be the limit for the chip. Maybe ARM can go faster.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mojo-chan wrote:
I'm thinking that this [1.9Mb/s] might be the limit for the chip.
That would be surprising for a USB megaAVR is approx 8Mb/sec.

PJRC

PJRC

C code for Teensy: USB Serial

USB: Virtual Serial Port

https://www.pjrc.com/teensy/usb_serial.html

(page end)

Transmit Bandwidth Benchmark

...

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hmm... I must investigate, maybe it needs ping-ping mode or something. Or larger packets.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mojo-chan wrote:

 

Hmm... I must investigate, maybe it needs ping-ping mode or something. Or larger packets.

  That ping-pong thing looks like just extra complications to me.  Sending bigger messages will increase the throughput.  With 1023 byte messages you should get close to 8 Mb/s.

 

I'm assuming your are just sending data, not programming the chip.  Writing and reading flash is the major speed limit.  I can write 52kB of flash in 1.8 seconds.  If I erased the entire application flash at once, it might be faster.  I can wait 1.8 seconds though.  Reading it back takes 0.31 seconds.  

 

I use 256 byte blocks.  I know those morons at Microsoft won't guarantee to send an end of message indication, but I don't care.  I've been using this bootloader for a year and that stupid thing hasn't hit yet. 

 

I wish someone would fix the Microsoft USB CDC driver.    

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Using multipacket transfer mode with a 1024 byte buffer I'm still only getting 6.5Mb/sec.

 

It's literally just the following loop:

 

	uint8_t buffer[BUFFER_SIZE];
	for(;;)
	{
		e->DATAPTR = (uint16_t)buffer;
		e->AUXDATA = 0;	// for multi-packet
		e->CNT = BUFFER_SIZE;// | USB_EP_ZLP_bm;
		LACR16(&(e->STATUS), USB_EP_BUSNACK0_bm | USB_EP_TRNCOMPL0_bm | USB_EP_OVF_bm);
		while((e->STATUS & USB_EP_TRNCOMPL0_bm) == 0);
	}

 

That's probably close to the maximum amount of data I can prepare to send anyway. 1GB of data in about 21 minutes, which means my test will take a day or two.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think the maximum transfer byte count is 1023 bytes.

 

The byte count field is 10 bits.  You can specify any byte count from 0 to 1023.  Zero means no data and it is a legitimate and useful thing, also known as a ZLP.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Will your bootloader fit into 4k of flash?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'll check on Monday... 1024 seemed to work while 1023 did not, at least without the ZLP. I didn't experiment any further because I was busy eating hog roast :)

 

The HID bootloader won't fit in 4k, the ASF is too large. The other one, the one that uses bulk endpoints and the Microsoft extended descriptors to avoid needing a driver (also Linux compatible) will fit in 4k though. It nearly goes in 2k, I might try to get it down to that level one day.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

By the way, I need the extra speed for a different project, not the bootloader :-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Interesting stuff. 

 

If you are using some kind of driver to feed data to USB, it may be smart enough to chop up data buffers that are bigger than 1023 and send it in smaller chunks.

 

When I'm testing throughput, I don't usually use 1023.  I do often use 1024 - 64.

 

You may know this already but I'll mention it. The receiving side can recognize the end of a message by a short packet.  That's a data packet that isn't full.  In this case that would be a packet containing less than 64 bytes.  The receiving end will see a short packet when you send 1023, but not when it receives 1024 - 64, or any size that's a multiple of 64.

 

You might find it useful to use the isochronous transfers instead of bulk.  I don't know, I never used it. 

 

I'm trying to guess what protocol you are using.  I guess it is USB CDC ACM.  

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Okay, I used 960 byte chunks and got the same result, same speed. 1023 byte chunks dropped down to 5.7Mb/sec. I don't know why that is, I'm not sending a ZLP.

 

I'm not sure why 1024 worked. Maybe it's a bit like the DMA controller, where if you write zero it does the decrement before the test and you get the maximum buffer size.

 

I managed to get up to 7.8Mb/sec by increasing the buffer size on the PC end to 4096. Any higher makes no difference. This is libusb, so maybe there is some overhead there too. I could try WinUSB but I think it's fast enough now, and I want to keep it portable.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

7.8 Mb/sec is good.  I think libusb is good, but I'm not an expert.  I was originally thinking of using that instead of WinUSB.  The problem was, there were various flavors of it that had been developed over the years and I didn't know which to use.  I had a lot of stuff to get working, so I thought, screw that.  I'll use WinUSB.

 

The advantage of libusb, as apparently you know, is it works on most operating systems.  

 

I'm guessing you did send ZLP when you tried stuffing 1024 (1 followed by 10 zeros) into a 10 bit field. That's how you send a ZLP.  Just set the byte count to zero.  The usual use of it is to send a "short packet" in the case where the message byte count is exactly a multiple of 64.  Our hardware can do that automatically.  Winusb on the host, and I guess libusb too, can be told to do the same thing.

 

It's my understanding that using libusb does require installing a driver on Windows, but it's very easy to do. 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The different versions of libusb were an old thing with the driver side. I don't use the libusb driver, I just use the library to talk to USB devices. There is only one library.

 

I included the Microsoft extended USB descriptors in my firmware. They tell Windows to install the WinUSB driver automatically, no driver files or anything required. Linux ignores them and in both cases you can use libusb to talk to the device.

 

I don't think there would have been a ZLP. If I find the time I'll break out the logic analyzer and check.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have your realised that USB is a shared bus and it is out of spec to pump it full of data?

 

Some time ago I bought some Logic Analyser boards and they have a cypress CY7C68013.

It is based around the good old 8051 kernel (Plenty of tools, I use SDCC on Linux) but it does not even remtely look like an Xmega.

One of the fun parts though is it's High speed USB peripheral ( 480Mbit/s).

Paul van der Hoeven.
Bunch of old projects with AVR's:
http://www.hoevendesign.com

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Paulvdh wrote:

Have your realised that USB is a shared bus and it is out of spec to pump it full of data?

 

On what do you base this claim?

 

In fact bulk packets are designed precisely for this purpose. They are the lowest priority, using spare bandwidth not needed for other endpoint types. Thus you are allowed to flood the bus with them, as long as your prioritize everything else.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yeah right.  I don't think you can overload the bus normally.  Apparently drivers on some OSs allow you to cheat like use 64 byte data packets on low speed devices.