Running code from cpu-local sram =1/5 speed of flash

Go To Last Post
7 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've got some time-critical assembler code that I want to run from the fastest memory on the AT32UC3A364, which I thought would be local SRAM (0x00000000 - 0x0000ffff).

What I did to get the code "booted" into SRAM was to just use a .section .data directive before the assembly code instead of .section .text. My assumption (which appears to be correct) was that the code would be in flash then copied to SRAM by the startup code just like any other initialized data. This seems to work fine as the code is indeed in SRAM and its all there and it seems to run perfectly. Except for one thing... It runs at about 1/5 the speed of the same code from flash. My measurement of the speed is to generate a pulse and measure it with a scope. When the code runs from flash the pulse is 180nS. When run from SRAM, the pulse is 900nS wide. The code that generates the pulse runs with ALL interrupts disabled and only touches local GPIOs (0x4000xxxx).

Doesn't the SRAM "run" at the same speed as the CPU i.e. no peripherals or HSB accesses are involved?

I appreciate any and all comments.

Thanks very much!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The SRAM runs at the same speed as the CPU, but the instructions are fetched via the HSB, so the HSB and the CPU have to fight over who gets access to the SRAM, which slows everything down.

I recommend using the HRAM for what you want to do. It’s connected to the HSB just like the Flash memory and if you don’t access it from your program, the CPU instruction fetch will have it all it itself. I also recommend configuring the HSB to always reserve the HRAM for the CPU instruction fetch, which saves you another cycle per burst access. The HRAM can deliver one (or even two) instruction per cycle and thus is just as good as the Flash at speeds below 36 MHz. At greater speeds it's better than the Flash because it doesn’t need any wait-states. There are no faster memories available and you wouldn’t even need any because the CPU can’t process more than one instruction per cycle anyway.

For one of my projects I’ve been running the whole program from the HRAM to avoid the Flash wait-states and it works very well. This only works because my program is small enough to fit into the 64 KiB of the HRAM, of course, but that shouldn’t be a problem for you, as your MCU has only 64 KiB of Flash anyway.

A bit of trivia: 32 KiB of HRAM are faster and use less power than 256 KiB of Flash, according to my measurements. ;-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks very much for the info. I will definitely give it a try! I'll post the results.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

HI. I'm back to working on this. Did you modify your linker script and startup code to create a new initialized data section, or did you just manually copy the code from flash to HRAMcX in your initialization. I'm leaning towards the simple route myself though the .lds & startup.s code solution is more elegant.

Thanks,
Wade

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wdawson wrote:
Did you modify your linker script and startup code to create a new initialized data section, or did you just manually copy the code from flash to HRAMcX in your initialization.
I modified linker script as described in the application note AVR32825: Executing code from external SDRAM, just with everything adjusted for the HRAM instead of external SDRAM. The Atmel website has some example code that goes with the application note.

I thought it came with some code to copy application code from internal Flash to SDRAM, but now I can’t find it. I think I modified the startup assembly file myself to copy the application code from Flash to HRAM just like it copies the .data section content from Flash to SRAM. The startup assembly file was called crt.x in the ASF 1.7. Maybe it’s called startup.s now.

The application note describes how to place certain functions in SDRAM, but I reversed that by fiddling with the linker script and placed everything in HRAM, except for a function to switch over to a 12 MHz crystal, which I run first to speed up the copying process.

Startup code from crt.x, loaded at 0x80002000:
1. Set up the stack pointer.
2. Call init_12MHz(), located in Flash, which enables the 12 MHz crystal and switches over to it. Directly writes to registers and doesn’t use any ASF functions because the HRAM application code isn’t populated yet.
3. Copy code to HRAM.
4. Copy .data section to SRAM.
5. Blank .bss section.
6. Call main().

Might’ve forgotten something because I don’t have the code here right now. Just don’t remove anything from crt.x and only add a loop to copy the code into HRAM.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks very much for your help! I've managed to put my critical code in a HRAM0 segment and modified the startup.s file to copy the code from flash to HRAM using symbols from the linkscript - basically like .data section as you mentioned.

Everything works fine until I try to put the evba in HRAM also. I'm doing this because I have an NMI isr. Basically as soon as I enable the pll the code jumps to the "Bus Error Instruction Fetch" vector @ evba+ 0xC

I would like to place ALL code including the small amount of ASF (mostly USB setup handlers) in HRAM also.

How did you get all asf code into HRAM0? I can use a pragma for code in my files, but how do I re-direct the rest?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wdawson wrote:
How did you get all asf code into HRAM0? I can use a pragma for code in my files, but how do I re-direct the rest?
The linker script has some instructions to “grab” all sections of a certain type to place them into one section. By default, the .text section takes all .text sections from your compiled code.

I renamed the original .text section in the linker script to .text_flash, to avoid any confusion, and called my section .text_hram. Then I set .text_flash to only “grab” .text.flash sections and .text_hram to “grab” .text sections. That way, all .text sections, including all ASF code, are placed into .text_hram by the linker. If I want a function to be placed in Flash, I need to give it the section .text.flash attribute.

This also works the same with the .rodata section.

I also moved the .exception section in the linker script right before the .text_hram section and renamed it to .exception_hram. It is used for the exception/interrupt handlers.