AVR-Dx accessing Flash in CPU Data Space

Go To Last Post
8 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The AVR-Dx family support mapping some of flash into CPU data space, so it can be accessed like RAM using the LD instruction instead of LPM. In the AVR128DB datasheet this is discussed in section 11.3.2.

 

This sounds great, but is there any compiler support for this? When I declare a constant array using the __flash keyword, the compiler still generates code using LPM to load the data, so it's not taking advantage of this feature. I'm looking for some kind of __mapped_flash attribute that will put constant data into a region of flash that's mapped to CPU data space, and then automatically use LD with the correctly calculated address to load the data. I could probably accomplish the same thing myself by playing with linker settings and some address remapping macros, but I don't want to reinvent the wheel.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Only part of the flash (32KB) is mapped to the data space.
The part to be mapped is dynamically switched programmatically, so the compiler or linker has no way of knowing the address.
Therefore, it is not possible to determine the data address using __flash, so the compiler has no choice but to use LPM.

I couldn't find the official invention of that wheel.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Well, avr-gcc doesn't use that feature, that's for sure. It would imply quite a lot of work (and probably inefficiency), to keep track of which flash page contains a given variable, so they just reverted to use the classic LPM method.

 

Or, keep a fixed flash page mapped, limiting the flash space for data to 32K. Also not ideal for a 128K MCU. I guess the problem here is that by default the top 32K of flash is RAM mapped, while I think the default linker script places variables after code, so they could be in a random flash page, depending on code size. 

 

So, I'm guessing only AVR-Dx with 32KB RAM will use LD instructions.

 

But I don't know what is the actual mechanism by which the compiler is told a given MCU should use LD or LPM. Is it some magic inside the linker script? Via section addresses or sizes?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

That makes sense. Personally I'm using the AVR128DB, but I'm not really using the full 128KB of flash. If I could permanently map 32K of flash into 32K of RAM space, that would be valuable to me.

 

One question: is there any performance penalty when running code or accessing data (with LPM) in the upper 64K of flash on a 128K device? I've read there is an ELPM instruction for reading flash memory above 64KB, but I'm unclear why, since I thought flash addresses were word addresses and so 128Kbytes of flash could be addressed with a 16-bit pointer.

 

For anyone who's done this flash mapping dance, do you have some macros or linker scripts you could share, or an outline of the approach? I'm thinking I need to create a new linker section called ".mapped_flash" at flash address C000, which is C000 16-bit words or 96Kbytes from the start of flash. By default this is mapped to RAM address 8000. I'll have to read up on how to edit linker scripts.

 

Then add __attribute__ ((section (".mapped_flash"))) to all the constant strings and tables I want to store in this region of flash.

 

Then whenever I reference one of these strings or arrays, I need to convert the address. It should subtract C000 from the flash address to get the offset in the mapped section, then multiply by two to convert words to bytes, then add 8000 to get the RAM offset of the mapped section. Something like:

 

#define MAPPED_PTR(x) ((void*)0x8000+(((void*)(x)-0xC000)*2))

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

On a more careful review of the AVR instruction set manual, the whole motivation for doing this seems to have evaporated. I'd thought I could save one cycle on table lookups by mapping flash to data space and using LD (2 cycles) instead of LPM (3). But no. From the instruction set manual:

 

Cycle time for data memory access assumes internal RAM access, and are not valid for access to NVM. A minimum of one extra cycle must be added when accessing NVM.

 So it's the same speed either way. In that case, I don't see a whole lot of value to the flash remapping feature.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Case I went to:

I already had a function like this.
send_string (uint8_t * ptr);

 

ptr is the address of the data space and uses the LD instruction.
But I put it in flash because of the large amount of string data.
Mapping to the data space has greatly reduced my effort.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Not used them myself but I thought you simply need to use "const". That would then put the items into ".rodata" and that is mapped to be accessed using LD. Must try an experiment.. (I'm sure curtvm has probably showed this somewhere else already with a link to Godbolt!)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0


bigmessowires wrote:
I've read there is an ELPM instruction for reading flash memory above 64KB, but I'm unclear why, since I thought flash addresses were word addresses and so 128Kbytes of flash could be addressed with a 16-bit pointer.

 

It's true that access to flash is via a 16 bit bus, so, at first glance, it would seem that you could access all 128K of data with a 16 bit pointer.

 

However, note that the destination register of LPM is just 8 bit, so how does LPM decides, from the 16 bits of data present in the bus, which ones to copy into the destination register?

In practice this is achieved by using an extra bit to decide if the low byte or the high byte go into the destination register. This bit is bit0 of the "address" sent to LPM, so bit0 is not a "real" address bit.

But from the programmer's point of view, it is as if LPM addresses byte data instead of word data.

 

This is explained in detail in the AVR instruction manual -  https://ww1.microchip.com/downloads/en/DeviceDoc/AVR-Instruction-Set-Manual-DS40002198A.pdf

 

 

Now if the AVR had a "LPMW" instruction with a register pair as destination, aligned word data could be addressed, e.g.

LPMW r17:r16, Z

I think such instruction would be useful in some situations, it would take the same execution time as LPM but load twice as much data. I think there are still some AVR opcode slots available so I leave it as proposal... (not gonna happen sad)