How to enforce alignment of array?

Go To Last Post
21 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I just have finished a set of functions and I want to pass them on to other users of our hardware, preferrably in form of linkable binary object or library.

However, for sake of speed, it depends on an array address-aligned in RAM (i.e. (int)array & (sizeof(array) - 1) == 0 ).

It appears, that there is no means to enforce it at object/library level, so my current "policy" is to put this into the documentation: as it is the user who has control over the linker, it is also the user who has to declare the array in the application into which the library is to be linked in; and has to ensure it will be aligned properly.

However, this solution is not as "self-contained" as I would like it to be.

Any ideas and suggestions?

Thanks,

Jan Waclawek

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Do a little pointer arithmetic and throw back an error if abused?

if ((unsigned int)p & A_HANDY_MASK) return BAD_ADDRESS_ERROR;

That should be pretty low overhead.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

dbc wrote:
Do a little pointer arithmetic and throw back an error if abused?

Well, not quite a bad idea; but in truly embedded systems (a.k.a. blackbox) there's always the question, where exactly is that "back" where the error should be thrown upon? ;-)

I'd prefer the error link-time... any suggestions?

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Allocate the buffer one larger than necessary.

The receiver reads her own buffer address. If it is odd, she loads into buffer+1. A global pointer is set to the even address for subsequent use. Alternatively the even test is applied before every use.

David.

p.s. why does a buffer have to be word-aligned ? The assembler will keep all your code and data aligned in flash. SRAM has no such limitations.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Put the aligned array in an asm file-

//my_array.S
.section .data
.set ARRAY_SIZE,4

.global array
.balign ARRAY_SIZE
array:
//uint8_t array[]={0,1,2,3};
.dc.b 0,1,2,3

test-

#include 

extern uint8_t array[];
uint8_t test2=8;
uint8_t test3=7;
uint8_t test4;

int main(){
    PORTB=test2; //0x100
    PORTB=array[0]; //0x0104
    PORTB=test3; //0x101
    PORTB=test4; //0x108 (bss)
}

(sorry, I read the question wrong, I will have to change this)

I'm not sure if 'an array address-aligned in RAM' means an array or just an int as shown.

plan b-

//my_array.S
.section .data
.set VAR_SIZE,2
.global array
.balign VAR_SIZE
array:
.ds.b VAR_SIZE
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

David,

this would work if I would require alignment to boundary 2 - and not too efficiently, as the address would need to be adjusted run-time (either in all cases or via a pointer). I am trying to squeeze out the last cycle out of these routines, so this is a no go. Not to mention, that my alignment is currently 0x40, so that would mean to "throw away" an extra 0x3F bytes.

curtvm,

thanks, this inspired me to dig into as's :-P documentation. As I still resist to dive into an external asm file, I simply put something similar into an inline asm snippet... OK, dirty, but works. I tried two variants, preferring the latter:

".comm rs485buffer,128,0x40 \n\t"

and

    ".section .noinit9 \n\t"
    ".balign 0x40 \n\t"
    "rs485buffer: \n\t"  // let's "manually" declare the buffer, aligned to 0x40
    ".skip 0x40+0x40 \n\t"
    ".section .text \n\t"

Thanks to all for the ideas.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Jan,

I have memories of ASM programming where storage was aligned to page boundaries, and all sorts of cycle optimisations employed.

Now I look at a C program and think: go with the flow.

You can perform on the structures wherever they are in memory. The AVR index registers have no restriction.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The index registers allow only fixed-offset access in AVR - there is no LD Rx, address+Ry instruction.

Having 2^n-aligned buffers allow me to use OR to add the offset to the base, which is certainly faster than a 16-bit addition; besides it allows me to use AND for wraparound of the offset after incrementing (for circular buffers), which is way faster than compare and jump.

This is the price I have to pay for using the "cheap" silicon of AVR.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:
It appears, that there is no means to enforce it at object/library level, so my current "policy" is to put this into the documentation: as it is the user who has control over the linker, it is also the user who has to declare the array in the application into which the library is to be linked in; and has to ensure it will be aligned properly.

However, this solution is not as "self-contained" as I would like it to be.

You might be able to test the condition with a supplemental linker script.
To make sure it gets used, you might want to have it
define something referenced by the array definition.

Can one persuade ar to include and index a linker script?
If so, would ld use it or choke on it?

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
where exactly is that "back" where the error should be thrown upon?

Always an issue.

As you say, it can't be known until link time, and the caller has responsibility for any section/alignment/linker-script tweakage. As long as the caller delcares the array, that is.... how about if you implement a call in your library to allocate the array? Of course, that only is practical if your code doesn't end up being a full replacement for malloc(). But if you only need a few buffers, maybe that's a solution.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Whatever enforcement method you choose,
your documentation should probably include an
example of how to use the linker to make it work.
If sizeof(array) is a power of two up to 256,
array could be placed at either
the beginning or end of RAM.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But, gentlemen, thanks to curtvm, I have already implemented and posted the ultimate solution = aligned buffer definition/allocation directly in the module source... So no need for runtime allocation, enforcement, documentation etc...

But thanks anyway. I appreciate your help.

Jan

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:
But, gentlemen, thanks to curtvm, I have already implemented and posted the ultimate solution = aligned buffer definition/allocation directly in the module source... So no need for runtime allocation, enforcement, documentation etc...
Are you sure it works?
Don't you need something to ensure
that .noinit9 starts on a 0x40 boundary?
My recollection is that there are limits on how
stringently the linker will let you enforce alignment.
The Algn column in an lss file I'm looking at
has 2**1 for .text and 2**0 for the ram sections.
I tried to figure out where these numbers come from,
but couldn't find it.

If curtvm's method works on ram, will it work on flash?
If so, I could have saved myself a lot of trouble.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:

Don't you need something to ensure
that .noinit9 starts on a 0x40 boundary?

Isn't the .balign ensuring that 'r485buffer' is aligned on such a boundary (presumably this may be offset from the start of .noinit9 if it's not on such a boundary?)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
Are you sure it works?
Don't you need something to ensure
that .noinit9 starts on a 0x40 boundary?

Let me remind you of the whole sequence (don't forget, that this is simply inserted at a beginning of an inlined asm function, so this directly passes to assembler):
".section .noinit9 \n\t" -- we tell the assembler that this will go to the .noinit section (which is linked after .bss)
".balign 0x40 \n\t" -- this is the alignment command - wherever noinit9's "current pointer" currently is, it is increased so that it's aligned on 0x40 boundary
"rs485buffer: \n\t" -- by this we create the "rs485buffer" symbol
".skip 0x40+0x40 \n\t" -- and here we reserve space for the buffer
".section .text \n\t" -- this is necessary so that subsequent commands in the inline asm go to .text (i.e. FLASH) and not to .noinit9

skeeve wrote:
My recollection is that there are limits on how
stringently the linker will let you enforce alignment.

All the components - compiler, assembler, linker - have their opinions on alignment... ;-) I was playing half of this morning with a gcc-specific variable attribute: __attribute__((__align__(0x40))); however, the compiler decided that the maximum alignment unit is 1... I guess, leaving the safe paths might be a dangerous thing in gcc.

skeeve wrote:
If curtvm's method works on ram, will it work on flash?

Oh yes, certainly it does. The whole GNU suite has no idea about memory classes - FLASH, RAM, EEPROM - it's all a big flat memory space for gcc, as and ld. The trick in the AVR toolchain is to assign these memories far far away from each other, and then mask out the upper address bits.

For example, this:

void Test1(void) {
  static volatile unsigned char a;
  a = 1;
}


void Test2(void) {
  static volatile unsigned char a;
  a = 2;
}

void dummy_function(void) __attribute__((__naked__));
void dummy_function(void) {
  __asm__(
    ".balign 0x100 \n\t"
  );
}

void Test3(void) {
  static volatile unsigned char a;
  a = 2;
}


void main(void) {
  Test1();
  Test2();
  Test3();
}

compiles/assembles/links to this:

00000100 :
 100:	cf 93       	push	r28
 102:	df 93       	push	r29
 104:	cd b7       	in	r28, 0x3d	; 61
 106:	de b7       	in	r29, 0x3e	; 62
 108:	81 e0       	ldi	r24, 0x01	; 1
 10a:	80 93 60 00 	sts	0x0060, r24
 10e:	df 91       	pop	r29
 110:	cf 91       	pop	r28
 112:	08 95       	ret

00000114 :
 114:	cf 93       	push	r28
 116:	df 93       	push	r29
 118:	cd b7       	in	r28, 0x3d	; 61
 11a:	de b7       	in	r29, 0x3e	; 62
 11c:	82 e0       	ldi	r24, 0x02	; 2
 11e:	80 93 61 00 	sts	0x0061, r24
 122:	df 91       	pop	r29
 124:	cf 91       	pop	r28
 126:	08 95       	ret

00000128 :
	...

00000200 :
 200:	cf 93       	push	r28
 202:	df 93       	push	r29
 204:	cd b7       	in	r28, 0x3d	; 61
 206:	de b7       	in	r29, 0x3e	; 62
 208:	82 e0       	ldi	r24, 0x02	; 2
 20a:	80 93 62 00 	sts	0x0062, r24
 20e:	df 91       	pop	r29
 210:	cf 91       	pop	r28
 212:	08 95       	ret

00000214 
: 214: cf 93 push r28 216: df 93 push r29 218: cd b7 in r28, 0x3d ; 61 21a: de b7 in r29, 0x3e ; 62 21c: 71 df rcall .-286 ; 0x100 21e: 7a df rcall .-268 ; 0x114 220: ef df rcall .-34 ; 0x200 222: df 91 pop r29 224: cf 91 pop r28 226: 08 95 ret ...

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

wek wrote:
skeeve wrote:
Are you sure it works?
Don't you need something to ensure
that .noinit9 starts on a 0x40 boundary?

Let me remind you of the whole sequence (don't forget, that this is simply inserted at a beginning of an inlined asm function, so this directly passes to assembler):
".section .noinit9 \n\t" -- we tell the assembler that this will go to the .noinit section (which is linked after .bss)
".balign 0x40 \n\t" -- this is the alignment command - wherever noinit9's "current pointer" currently is, it is increased so that it's aligned on 0x40 boundary
"rs485buffer: \n\t" -- by this we create the "rs485buffer" symbol
".skip 0x40+0x40 \n\t" -- and here we reserve space for the buffer
".section .text \n\t" -- this is necessary so that subsequent commands in the inline asm go to .text (i.e. FLASH) and not to .noinit9
What happens if the .noinit output section starts at 0x121?
Does the .balign cause the assembler to tell
the linker about the alignment requirement.
Quote:
skeeve wrote:
My recollection is that there are limits on how
stringently the linker will let you enforce alignment.
All the components - compiler, assembler, linker - have their opinions on alignment... ;-) I was playing half of this morning with a gcc-specific variable attribute: __attribute__((__align__(0x40))); however, the compiler decided that the maximum alignment unit is 1... I guess, leaving the safe paths might be a dangerous thing in gcc.
Ah. that is what I was remembering.
I think I'd been trying to reserve a page of SPMable flash.
That's why I didn't try .balign on the
code I mentioned in another thread.
What is the meaning of the Algn column in .lss files?
Quote:
skeeve wrote:
If curtvm's method works on ram, will it work on flash?

Oh yes, certainly it does. The whole GNU suite has no idea about memory classes - FLASH, RAM, EEPROM - it's all a big flat memory space for gcc, as and ld. The trick in the AVR toolchain is to assign these memories far far away from each other, and then mask out the upper address bits.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
What happens if the .noinit output section starts at 0x121?
Does the .balign cause the assembler to tell
the linker about the alignment requirement.

Yes; it would simply skip the 0x121-0x13F area and start at 0x140 (if 0x40 alignment is chosen).

The following is a piece of my .map file:

                0x00800073                uartRxHead
                0x00800074                uartRxHeadTmp
                0x00800075                PROVIDE (__bss_end, .)
                0x0000081e                __data_load_start = LOADADDR (.data)
                0x0000081e                __data_load_end = (__data_load_start + SIZEOF (.data))

.noinit         0x00800075       0x8b
                0x00800075                PROVIDE (__noinit_start, .)
 *(.noinit*)
 *fill*         0x00800075        0xb 00
 .noinit9       0x00800080       0x80 rs485.o
                0x00800100                PROVIDE (__noinit_end, .)
                0x00800100                _end = .
                0x00800100                PROVIDE (__heap_start, .)

I must say, curtvm opened me up new horizons, thanks... :-D

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

So if .noinit9 happens to be based at 0x141 it'll *fill* 0x3F bytes to align it to 0x180?

Is it really a great idea to waste 63 bytes of precious SRAM in an embedded, limited resource, processor?

(on average I guess it'll waste ~32 bytes)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Putting it into .noinit (if there is nothing else in .noinit - unfortunately linker puts un-numbered sections after numbered, but it's quite rare anybody would put anything into .noinit) places the array after all other variables, so its quite unlikely that there is enough space for the buffer and at the same time it is un-alignable.

In fact, it would be fine to link these buffers backwards from the top of the available memory. Or, even better, to mix them with variables so that it fills up every possible place. Both of these would require a different, "better", "smarter" linker. I don't intend to write one, nor to patch the existing.

JW

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If one happened to be tight on sram, and 'needed' alignment that normally would cause too much waste, you could also create a section starting at the normal .data address (assuming its 0x0100) and send a linker option to move .data up by whatever you need. 0x100 should align with most anything one would want.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

curtvm wrote:
If one happened to be tight on sram, and 'needed' alignment that normally would cause too much waste, you could also create a section starting at the normal .data address (assuming its 0x0100) and send a linker option to move .data up by whatever you need. 0x100 should align with most anything one would want.

Well, one of the real applications is in ATMega8, where SRAM starts at 0x60. However, I take the point, thanks.

I used the noinit section as it is already defined in the default linker script (or wherever it is, I did not investigate), so the module now is linkable without a need for the end user to play with the linker, which is usually something the users don't want to play with.

There can be played all sorts of games with the linker, in case of SRAM shortage; not to say that the size of buffers had to be reconsidered in that case, too. Time will show if this is really necessary; and I can then try to rewrite the said section to be placed conditionally to either the beginning or the end of SRAM.

JW