I'm writing a bootloader for the XMega that loads the program from an FRAM chip. The checksums are compared as data is written to the FRAM, so there isn't much chance of corrupt data being saved in flash.
I decided to start by writing a UART bootloader and so far I've got one working based on the AVR1605 and AVR1316 app notes. These contain the important driver "sp_driver.S"; a UART "PC interface" to the driver, and some windows software written in C++. The bulk of the code that controls the bootloader is actually on the software end, so I moved this to the 32E5 because my final bootloader won't be able to make use of the software.
The idea with my UART bootloader is that it can receive a pure hex file sent from a terminal emulator such as Tera Term. I've got this working but I'm having issues with data speed. My bootloader only works if I set transmit delays in Tera Term, so it injects 1ms between each character when sending a file. Unfortunately this makes transferring files very slow and I'd like my bootloader to be more robust. I decided that the MCU must be taking too long processing characters as they come in and missing the start of the next one, so I moved to interrupt based data reception and a cyclic buffer. I'm still having trouble when a full page is received (128 bytes) and the data is loaded to flash. I've narrowed the problem down to the SPM instruction which is taking about 8ms to do a page write and when it does this the micro is dead to the world. I was not expecting this behavior as the datasheet and app note make it sound like other stuff can be done while the write is being done. Why else would it be possible to use interrupts to monitor the SPM instruction otherwise? It seems like I should be able to get the XMEGA to do other things while the SPM instruction is happening.
The app note code follows every call to an SPM routine with an "SP_WaitForSPM();"; what's the point of this if the micro locks up when it's doing and SPM instruction anyway?
There's roughly 370 characters in the hex file for an 128 byte page, if the 8ms page write was happening in the background, theoretically I could transmit the hex files at 460800bps (not that I need to), but at the moment I'm stuck at 9600 with 1ms transmit delays to account for the slow-as-molasses page write. I can avoid the terminal workaround by writing some software to transmit pages in chunks, but it's still going to have to wait 8ms to let the XMega catch up.
Here is the assembly code where I've added an LED toggle to time the instruction:
movw ZL, r24 ; Load R25:R24 into Z.
ldi r17, 0x20 ; Toggle pin 5.
sts 0x0667, r17 ; Toggle LED
sts NVM_CMD, r20 ; Load prepared command into NVM Command register.
ldi r18, CCP_SPM_gc ; Prepare Protect SPM signature in R18
sts CCP, r18 ; Enable SPM operation (this disables interrupts for 4 cycles).
spm ; Self-program.
clr r1 ; Clear R1 for GCC _zero_reg_ to function properly.
out RAMPZ, r19 ; Restore RAMPZ register.
sts 0x0667, r17 ; Toggle LED
And here is the scope measurement:
Any help with this would be greatly appreciated. I'm hoping to make the UART bootloader open-source when I'm happy with it (if my boss permits it). I had trouble finding a simple bootloader that I could easily modify before starting this project, so I think it could be useful to others.