XMega32C4 a good choice for LUFA?

Go To Last Post
23 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm working on a project I call Orthrus. I've built a prototype with an ATMega32U2, and the problem is that my software AES implementation slows it down by a factor of 5. For the application, I could sort of live with a hard disk that runs at 150 kB/sec, but 30 is just too slow to be useful.

 

My thinking at the moment is that if I move to the XMega32C4, I could use the hardware AES module in parallel with the I/O so that the encryption penalty isn't so bad. I have some concerns, though: The XMega support in LUFA is described as experimental. Is it deficient? Has anyone done anything with it with this chip? Also, can one use ordinary AVR ISP programming with the XMega like with the ATMega (using avrdude and a usbtiny)?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think the usual choice for USB on Xmega is actually the dreaded "ASF".

 

http://asf.atmel.com/docs/latest...

 

As you say, Dean describes his LUFA support for Xmega (and UC3) as:

 

Having said that an "experimental preview" from Dean could actually ended up being better than the ASF stuff!

 

The two programming mechanisms for Xmega are either JTAG or PDI. The latter specifically invented for Xmega just to piss off people who already had a tiny/mega ISP programmer cheeky. It's possible that the firmware of some "3rd party" programmers may have been updated to include PDI support.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Neat project.

 

The hardware RNG is nice, but consider that you can create good quality (for cryptographic purposes) random numbers using only the 128kHz oscillator in the AVR.  The idea is that this RC oscillator has (as all oscillators do) a bit of jitter.  By sampling the LSB of a timer running from the main clock (be it the 8 MHz RC, or a crystal, or...) in the WDT ISR, you can generate one bit of entropy for each overflow of the WDT (in interrupt-only mode). 

 

I've done this and it works well.  The gotcha is that you can only generate 1 bit every 15 ms (minimum period for the WDT interrupt), so 64 bits takes about one second.  I use it to seed an LFSR, so I've not needed it (or tested it) for generating large quantities of random numbers.  I did let one run for a day, generating about 750 kB.  I then applied standard tests against this dataset with good results.

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"Read a lot.  Write a lot."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 

joeymorin wrote:

Neat project.

 

The hardware RNG is nice, but consider that you can create good quality (for cryptographic purposes) random numbers using only the 128kHz oscillator in the AVR.

 

I considered that, but I can get a conservative 10 kHz sampling rate (the actual "frequency" of the generator is in the high-ish hundreds of kHz) from the hardware, and it adds only around a dollar to the BOM. It's also excellent for marketing. :)

 

Seriously, though, have you run a sample from that technique through DieHarder to gauge the quality? I'd guess (obviously without having tried it) that the jitter would have some cyclicality in the short term. That said, similar DieHarder tests with my gizmo are pending (I'm missing a part on the prototype).

 

What's killing me right now, though, is the AES performance on an ATMega32u2 running at 16 MHz. It's pitiful. And the SPI clock being stuck at 8 MHz doesn't help either.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 

clawson wrote:

The two programming mechanisms for Xmega are either JTAG or PDI. The latter specifically invented for Xmega just to piss off people who already had a tiny/mega ISP programmer cheeky. It's possible that the firmware of some "3rd party" programmers may have been updated to include PDI support.

 

Groan.

 

Ok, what's the best option for someone developing on a mac who's never done either of them before? The hardware is a blank slate - I can set it up for either or both (assuming it doesn't take up pins I need for something else). What I do *not* want to do is do programming over USB. If you want to inject rogue firmware in, you're gonna have to crack the case open. :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

PDI (Program-Debug-Interface) works pretty well.   And uses less pins than JTAG.

 

ATMEL-ICE,  DRAGON, JTAGICE-2, JTAGICE-3 all work with PDI.   But of course PDI, debugWIRE and aWIRE are Atmel specific.

 

Xmega works pretty well but you may want to consider ARM Cortex-M3 or M4.    You will have most tools and IDE software on a Mac when you go to ARM.

 

You can probably run Atmel PC software on a Virtual PC but I don't know how painful it might be.

 

David.

Last Edited: Tue. Apr 18, 2017 - 06:16 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What's the PDI equivalent of AVRDude? I will check if my avr-gcc accepts an mmcu argument for xmega. If I can figure out how to get a .hex into the chip, the rest I think I can manage.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There should be a suitable dragon_pdi programmer (or equivalent).  Just type avrdude -c

 

Of course there is atprogram.exe from Atmel.

 

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, it turns out the C4 doesn't have the AES subsystem. sad The C3 does... maybe. The datasheet says that "selected devices only" have it, but I don't see where it says which ones.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Seriously, though, have you run a sample from that technique through DieHarder to gauge the quality

Yup.  Cryptographically sound at first glance, but not exhaustively investigated.

 

What's killing me right now, though, is the AES performance on an ATMega32u2 running at 16 MHz. It's pitiful.

You may want to consider less computationally expensive alternatives like elliptic curve encryption.

 

 

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"Read a lot.  Write a lot."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:

You may want to consider less computationally expensive alternatives like elliptic curve encryption.

 

 

 

EC is the wrong tool for the job. I need a symmetric algorithm for this.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
I think the usual choice for USB on Xmega is actually the dreaded "ASF".

 

http://asf.atmel.com/docs/latest...

[ASF-3.34.1]

The follow-on to ASF3 is ASF4 but ASF4 is not on XMEGA C (XMEGA AU has ASF4)

I'm having difficulty getting to ASF4 USB on XMEGA32A4U (USB device instance error) (likely I'm mis-operating)

clawson wrote:
It's possible that the firmware of some "3rd party" programmers may have been updated to include PDI support.
There's USBasp firmware that does PDI.

Dean's LUFA AVRISP2 is popular.

 


Atmel START

Atmel Software Framework 4 (ASF4)

Introduction

http://atmel-studio-doc.s3-website-us-east-1.amazonaws.com/webhelp/GUID-4E095027-601A-4343-844F-2034603B4C9C-en-US-1/index.html?GUID-D988E54F-D11D-4617-8C23-60B4CACFCB68

via http://start.atmel.com/

Jim's Projects

Cheap USBASP knockoff programmer

Posted on December 18, 2014 by Jim

http://jimlaurwilliams.org/wordpress/?p=4803

...

 

Update: PDI programmer and more cable stuff

...

http://www.fourwalledcubicle.com/AVRISP.php

 

Edit : Atmel START

 

"Dare to be naïve." - Buckminster Fuller

Last Edited: Tue. Apr 18, 2017 - 10:03 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nsayer wrote:
Ok, what's the best option for someone developing on a mac who's never done either of them before?
It's AVRDUDE other than via Atmel Studio :

  • CrossPack
  • Homebrew
  • Atmel Studio on Windows on Parallels

PDI is dedicated and present whereas JTAG is shared (upper half of port B on XMEGA A1U) and is present on some XMEGA (PDI only on XMEGA A4U)

 


https://www.obdev.at/products/crosspack/index.html

https://github.com/obdev/CrossPack-AVR

Homebrew — The missing package manager for macOS

https://brew.sh/

ka7ehk is a macOS operator (there are at least a few more)

http://www.avrfreaks.net/forum/avr-studio-mac-linux by ka7ehk

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

some of Atmel's cortex m7 offerings have got fast crypto engines as well as fast spi. These might be a better choice than the AVR offerings.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nsayer wrote:
The C3 does [AES]... maybe.
AES is in the XMEGA384C3 datasheet.

http://www.microchip.com/wwwproducts/en/atxmega384c3

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nsayer wrote:
I need a symmetric algorithm for this.
Would a crypto authenticator chip connected to your preferred AVR work?

http://www.atmel.com/tools/CryptoAuthLib.aspx?tab=documents

XMEGA A3BU was added to CryptoAuthLib 2016-Jan (zip file, README.md)

 


http://start.atmel.com/#examples/cryp

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kartman wrote:
some of Atmel's cortex m7 offerings have got fast crypto engines as well as fast spi. These might be a better choice than the AVR offerings.
Intent is reference for discussion :

AVR1318: Using the XMEGA built-in AES accelerator

http://ww1.microchip.com/downloads/en/AppNotes/doc8106.pdf

(page 1)

...

The AES uses 375 clock cycles to execute one encryption/decryption [128 bits] after the Key and State memory is loaded and the mode of operation is selected.

...

via

http://www.microchip.com//wwwAppNotes/AppNotes.aspx?appnote=en591666

via

http://www.microchip.com/wwwproducts/en/atxmega32a4u

 

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I'm sort of starting to warm to the ATXMega384C3. It is on the LUFA supported chips list, and from what I can tell (I'm confirming this with @MicrochipTech now) it's the minimum unit that supports the AES subsystem. What I like is that I believe I can set up a combination of DMA and interrupt driven machinery to precalculate the entire sector's worth of AES counter mode XOR material in the background. If I start that going first, it likely can get a head start while the SD card read/write command is being sent and likely get far enough ahead of the actual I/O that I won't have to wait on the crypto at all. Doubling the SPI clock and not waiting on crypto suggests to me that this solution would yield >300 kB/sec since the current version with equivalent null crypto on the 32u2 is getting ~150 kB. Not fantastic, but I think doing better would require using the more modern access modes on the SD cards, at which point it would also be time to consider super-speed USB.

 

This is my first XMega design. If you guys want to take a look and tell me all the ways it's wrong, that would likely be fun: https://cdn.hackaday.io/files/20...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

nsayer wrote:
and from what I can tell (I'm confirming this with @MicrochipTech now)

C:\SysGCC\avr\avr\include\avr>grep "AES_t \*" *
iox128a1.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Crypto Module */
iox128a1u.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Crypto Module */
iox128a3.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Crypto Module */
iox128a3u.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */
iox128a4u.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */
iox128b1.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */
iox128b3.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */
iox16a4.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Crypto Module */
iox16a4u.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */
iox192a3.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Crypto Module */
iox192a3u.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */
iox256a3.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Crypto Module */
iox256a3b.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Crypto Module */
iox256a3bu.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */
iox256a3u.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */
iox32a4.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Crypto Module */
iox32a4u.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */
iox384c3.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */
iox64a1.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Crypto Module */
iox64a1u.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Crypto Module */
iox64a3.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Crypto Module */
iox64a3u.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */
iox64a4u.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */
iox64b1.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */
iox64b3.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */

I'm guessing it's a pretty fair bet the GCC headers would not be defining a location for a block of AES registers in a chip that does not have AES so I think you can take it as read that all the above have it. Clearly that includes:

iox384c3.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks - I think that's good confirmation.

 

If I have a misgiving, it's that the chip is outrageous overkill for what's going on (I'll be surprised if I use 1/50th of the flash)... except for the AES accelerator. I guess I can get past that given that I've used Raspberry Pi Zeros as SPI masters for 7 segment clock displays. :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Why then are you paying for 384K of flash when:

iox16a4.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Crypto Module */
iox16a4u.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */

seem to confirm that a couple of the 16K flash models have the AES unit too ?

 

BTW as moderator can I ask you to modify your signature? This is a non-commercial site, the only place limited advertising is permitted is the marketplace forum.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

Why then are you paying for 384K of flash when:

iox16a4.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Crypto Module */
iox16a4u.h:#define AES    (*(AES_t *) 0x00C0)  /* AES Module */

seem to confirm that a couple of the 16K flash models have the AES unit too ?

 

 

Huh. The A4U might work. I'll have to look into it.

 

EDIT: Amusingly, the 32A4U is a nickel less at Digikey. :)

 

Last Edited: Wed. Apr 19, 2017 - 05:52 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've ordered prototype boards for the xxA4U and have some 32A4U parts on the way. I also bought an AtmelICE and the plan is to use avrdude -c atmelice_pdi for programming. I haven't exhaustively started porting the firmware yet, but LUFA does seem to at least compile when properly set up for the A4U, and it looks like I can use the internal 32 MHz clock with USB SOF correction for the system clock and the PLL to turn that into 48 MHz for USB. Finally, it looks like I can eventually try USART0 in MSPI mode to improve the throughput. The eventual hope is to break through 500 kB/sec (the 32u2 with no crypto is doing 200 now, but that's at 8 MHz SPI instead of 16 and no SPI double-buffering). It's not going to be a speed demon, but that will at least be minimally usable, I think. Schematic: https://cdn.hackaday.io/files/20772888709248/Orthrus%20v2.0.pdf

Last Edited: Fri. Apr 21, 2017 - 05:46 PM