mega2560 AVR-GCC + EICALL

Go To Last Post
7 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

 

I've written a function included in the bootloader at a known address.

 

My original aim was to simply call the function from the application code, as a normal C function, have the relevant stack frame values pushed to the stack, and return afterwards.

 

The when compiling C code using AVR-GCC 4.9.2 for the mega2560 it is not possible to simply call a function in the upper half of the address space, due to the limitation of GCC with using 16 bit addresses.

 

In order to find a solution I thought it would be possible to simply follow the AVR GCC C call conventions and just make a manual call using EICALL.

 

Both the application and bootloader are compiled with the same AVR-GCC and both use the -std=gnu99 and -Os (Optimization small) compiler flags.

 

The function in the bootloader is a simple verification function that takes two uint16_t parameters a & b, and then returns uint16_t  a+b.

 

Below is the application code which calls the bootloader function:

#define offset 0x1f246
#define R1 *((volatile uint8_t *)0x01)
/* R2 - R30 not shown, but similar */
#define R31 *((volatile uint8_t *)0x1f)

uint16_t __attribute__ ((noinline)) do_sf(uint16_t a, uint16_t b);

uint16_t __attribute__((optimize("O0"))) do_sf(uint16_t a, uint16_t b) {
    uint16_t res = 0;
    __asm__ __volatile__ ("":::"memory");
    __asm__ __volatile__ ("push 0x1c"); // R28
    __asm__ __volatile__ ("push 0x1d"); // R29
    __asm__ __volatile__ ("":::"memory");
    R24 = (0xff & a);
    __asm__ __volatile__ ("":::"memory");
    R25 = (0xff & (a >> 8));
    __asm__ __volatile__ ("":::"memory");
    R22 = (0xff & b);
    __asm__ __volatile__ ("":::"memory");
    R23 = (0xff & (b >> 8));
    __asm__ __volatile__ ("":::"memory");
    EIND = (0xffff & (offset >> 16));
    __asm__ __volatile__ ("":::"memory");
    R30 = (0xff & offset);
    __asm__ __volatile__ ("":::"memory");
    R31 = (0xff & (offset >> 8));
    __asm__ __volatile__ ("":::"memory");
    __asm__ __volatile__ ("eicall");
    __asm__ __volatile__ ("":::"memory");
    EIND = (0x0);
    __asm__ __volatile__ ("":::"memory");
    res = ((uint16_t)R24) + (((uint16_t)R25) << 8);
    __asm__ __volatile__ ("":::"memory");
    __asm__ __volatile__ ("pop 0x1d"); // R29
    __asm__ __volatile__ ("pop 0x1c"); // R28
    __asm__ __volatile__ ("":::"memory");
    return res;
}

The __asm__ __volatile__ ("":::"memory") operation is to create a compile time barrier and prevent instructions that perform memory access being reordered. I'm not sure if I've used it correctly.

 

The function in the bootloader is declared:

uint16_t __attribute__((noinline)) program_functions(uint16_t a, uint16_t b);

uint16_t program_functions(uint16_t a, uint16_t b) {
    sendchar('s');
    sendchar('f');
    sendchar(':');
    sendchar(' ');
    sendchar('0');
    sendchar('x');
    PrintHexByte(0xff & (a >> 8));
    PrintHexByte(0xff & a);
    sendchar(' ');
    sendchar('0');
    sendchar('x');
    PrintHexByte(0xff & (b >> 8));
    PrintHexByte(0xff & b);
    sendchar('\n');
    return a + b;
}

With the following calling code:

Println("Test");
_delay_ms(10); // Allows time for application serial to settle
res = do_sf(temp1, temp2); // res, temp1 and temp2 are uint16_t
sprintf(buf, "do_sf(%u,%u): %u", temp1, temp2, res);
Println(buf);

Then the output on the serial line when executed:

Test
sf: 0x00F2 0x0000
do_sf(0,0): 6130

Test
sf: 0x00F2 0x0001
do_sf(0,1): 6131

Test
sf: 0x00F2 0x0002
do_sf(0,2): 6132

Test
sf: 0x00F2 0x0009
do_sf(0,9): 6139

Test
sf: 0x00F2 0x0000
do_sf(1,0): 6130

Test
sf: 0x00F2 0x0000
do_sf(2,0): 6130

Test
sf: 0x00F2 0x0000
do_sf(9,0): 6130

This leads me to believe that the function declared in the bootloader/NRWW section has been called correctly, but there appears to be another problem occurring during parameter passing and/or function execution.

 

I've decompiled the object files with avr-objdump -S and had a look at the resultant assembly. I've considered that there might have been an issue with call-clobbered registers, but that seems to be fine as I push R28 & R29 onto the stack and restore them later.

 

The disassembled application EICALL function:

uint16_t do_sf(uint16_t a, uint16_t b) {
    uint16_t res = 0;
    __asm__ __volatile__ ("":::"memory");
    __asm__ __volatile__ ("push 0x1c"); // R28
   0:	cf 93       	push	r28
    __asm__ __volatile__ ("push 0x1d"); // R29
   2:	df 93       	push	r29
    __asm__ __volatile__ ("":::"memory");
    R24 = (0xff & a);
   4:	a8 e1       	ldi	r26, 0x18	; 24
   6:	b0 e0       	ldi	r27, 0x00	; 0
   8:	8c 93       	st	X, r24
    __asm__ __volatile__ ("":::"memory");
    R25 = (0xff & (a >> 8));
   a:	e9 e1       	ldi	r30, 0x19	; 25
   c:	f0 e0       	ldi	r31, 0x00	; 0
   e:	90 83       	st	Z, r25
    __asm__ __volatile__ ("":::"memory");
    R22 = (0xff & b);
  10:	60 93 16 00 	sts	0x0016, r22
    __asm__ __volatile__ ("":::"memory");
    R23 = (0xff & (b >> 8));
  14:	70 93 17 00 	sts	0x0017, r23
    __asm__ __volatile__ ("":::"memory");
    EIND = (0xffff & (offset >> 16));
  18:	81 e0       	ldi	r24, 0x01	; 1
  1a:	8c bf       	out	0x3c, r24	; 60
    __asm__ __volatile__ ("":::"memory");
    R30 = (0xff & offset);
  1c:	86 e4       	ldi	r24, 0x46	; 70
  1e:	80 93 1e 00 	sts	0x001E, r24
    __asm__ __volatile__ ("":::"memory");
    R31 = (0xff & (offset >> 8));
  22:	82 ef       	ldi	r24, 0xF2	; 242
  24:	80 93 1f 00 	sts	0x001F, r24
    __asm__ __volatile__ ("":::"memory");
    __asm__ __volatile__ ("eicall");
  28:	19 95       	eicall
    __asm__ __volatile__ ("":::"memory");
    EIND = (0x0);
  2a:	1c be       	out	0x3c, r1	; 60
    __asm__ __volatile__ ("":::"memory");
    res = ((uint16_t)R24) + (((uint16_t)R25) << 8);
  2c:	2c 91       	ld	r18, X
  2e:	80 81       	ld	r24, Z
    __asm__ __volatile__ ("":::"memory");
    __asm__ __volatile__ ("pop 0x1d"); // R29
  30:	df 91       	pop	r29
    __asm__ __volatile__ ("pop 0x1c"); // R28
  32:	cf 91       	pop	r28
    __asm__ __volatile__ ("":::"memory");
    __asm__ __volatile__ ("eicall");
    __asm__ __volatile__ ("":::"memory");
    EIND = (0x0);
    __asm__ __volatile__ ("":::"memory");
    res = ((uint16_t)R24) + (((uint16_t)R25) << 8);
  34:	90 e0       	ldi	r25, 0x00	; 0
  36:	98 2f       	mov	r25, r24
  38:	88 27       	eor	r24, r24
    __asm__ __volatile__ ("":::"memory");
    __asm__ __volatile__ ("pop 0x1d"); // R29
    __asm__ __volatile__ ("pop 0x1c"); // R28
    __asm__ __volatile__ ("":::"memory");
    return res;
}
  3a:	82 0f       	add	r24, r18
  3c:	91 1d       	adc	r25, r1
  3e:	08 95       	ret

As well as the disassembled bootloader function:

uint16_t __attribute__((optimize("O0"))) program_functions(uint16_t a, uint16_t b) {
 120:	cf 93       	push	r28
 122:	df 93       	push	r29
 124:	00 d0       	rcall	.+0      	; 0x126 <program_functions+0x6>
 126:	1f 92       	push	r1
 128:	cd b7       	in	r28, 0x3d	; 61
 12a:	de b7       	in	r29, 0x3e	; 62
 12c:	9a 83       	std	Y+2, r25	; 0x02
 12e:	89 83       	std	Y+1, r24	; 0x01
 130:	7c 83       	std	Y+4, r23	; 0x04
 132:	6b 83       	std	Y+3, r22	; 0x03
    sendchar('s');
 134:	83 e7       	ldi	r24, 0x73	; 115
 136:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    sendchar('f');
 13a:	86 e6       	ldi	r24, 0x66	; 102
 13c:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    sendchar(':');
 140:	8a e3       	ldi	r24, 0x3A	; 58
 142:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    sendchar(' ');
 146:	80 e2       	ldi	r24, 0x20	; 32
 148:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    sendchar('0');
 14c:	80 e3       	ldi	r24, 0x30	; 48
 14e:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    sendchar('x');
 152:	88 e7       	ldi	r24, 0x78	; 120
 154:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    PrintHexByte(0xff & (a >> 8));
 158:	89 81       	ldd	r24, Y+1	; 0x01
 15a:	9a 81       	ldd	r25, Y+2	; 0x02
 15c:	89 2f       	mov	r24, r25
 15e:	99 27       	eor	r25, r25
 160:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    PrintHexByte(0xff & a);
 164:	89 81       	ldd	r24, Y+1	; 0x01
 166:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    sendchar(' ');
 16a:	80 e2       	ldi	r24, 0x20	; 32
 16c:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    sendchar('0');
 170:	80 e3       	ldi	r24, 0x30	; 48
 172:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    sendchar('x');
 176:	88 e7       	ldi	r24, 0x78	; 120
 178:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    PrintHexByte(0xff & (b >> 8));
 17c:	8b 81       	ldd	r24, Y+3	; 0x03
 17e:	9c 81       	ldd	r25, Y+4	; 0x04
 180:	89 2f       	mov	r24, r25
 182:	99 27       	eor	r25, r25
 184:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    PrintHexByte(0xff & b);
 188:	8b 81       	ldd	r24, Y+3	; 0x03
 18a:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    sendchar('\n');
 18e:	8a e0       	ldi	r24, 0x0A	; 10
 190:	0e 94 00 00 	call	0	; 0x0 <sendchar>
    return a + b;
 194:	29 81       	ldd	r18, Y+1	; 0x01
 196:	3a 81       	ldd	r19, Y+2	; 0x02
 198:	8b 81       	ldd	r24, Y+3	; 0x03
 19a:	9c 81       	ldd	r25, Y+4	; 0x04
 19c:	82 0f       	add	r24, r18
 19e:	93 1f       	adc	r25, r19
}
 1a0:	0f 90       	pop	r0
 1a2:	0f 90       	pop	r0
 1a4:	0f 90       	pop	r0
 1a6:	0f 90       	pop	r0
 1a8:	df 91       	pop	r29
 1aa:	cf 91       	pop	r28
 1ac:	08 95       	ret

My specific question is am I following the C function call conventions correctly to get the parameters and return values passed in a consistent and correct way?

 

Is there a more robust and/or pragmatic method?

 

Kind Regards,

Nox

This topic has a solution.
Last Edited: Sat. Jul 29, 2017 - 06:04 PM
This reply has been marked as the solution. 
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

I don't think you can store values into registers with C code the way you have done; you need to do all of that in ASM, based on knowing that the ABI put the arguments into specific registers...

 

I've written a function included in the bootloader at a known address.

The when compiling C code using AVR-GCC 4.9.2 for the mega2560 it is not possible to simply call a functions in the upper half of the address space, unless EICALL is used.

Are you sure about that?  If you have a known (constant) address, I don't have any trouble with

int call_boot(int a, int b) __attribute__((noinline));
int call_boot(int a, int b)
{
    asm("jmp 0x12346");
}

(Note that for a function with two int arguments, they're all passed in registers.  This essentially creates a manually-formed "trampoline", I think.)

 

Or, you can do something sneaky like:

void long_call(int a, int b, unsigned long daddr) __attribute__((noinline));
void long_call(int a, int b, unsigned long daddr)
{
    asm volatile (" push r20   ;; High byte of 22bit PC \n"
		  " push r19   ;;  middle byte \n"
		  " push r18   ;;   low byte \n"
	);
    return;        // "return" to the address we put on the stack
                   // Note that a and b are still in place...
}

(You might have to play with the byte order.  It wasn't exactly clear which way PCs are stacked for call/ret...)

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

 

Brilliant, thanks for the response. I never thought about using a jump from within an empty function.

 

I tried the code in the first example. but the MCU reset immediately upon function call.

 

I think you are on the right track though, so I built a variant on your example:

uint16_t do_sf(uint16_t a, uint16_t b) __attribute__ ((noinline)) __attribute__ ((naked));

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wreturn-type"
uint16_t do_sf (uint16_t a, uint16_t b) {
    asm("push r24 ; \n"
        "ldi r24, 0x01 ; 1 \n"
        "out 0x3c, r24 ; 60 \n"
        "pop r24 ; \n"
        "ldi r30, 0x46 ; \n"
        "ldi r31, 0xf2 ; \n"
        "eijmp ; Jump to 0x1f246 ");
}
#pragma GCC diagnostic pop

I stand under correction but from what I can tell on the mega2560 the extended addressing instructions are needed to access address ranges greater than the 16bit max (128K words) of those extended instructions counterparts. (out 0x3c, r24 writes the most significant byte of the address to the EIND register via r24 while preserving it's value in the preceding and following lines.

 

The #pragma directives are to let the compiler know that it need not worry that no return type is specified (As the correct return value is already written to the registers).

 

In order to prevent the additional stack frame from being written during jump function call and the MCU returning twice I used the naked attribute to strip off the ASM for stack frame setup and destruction. As far as I understand the function at the destination address will have its own setup and destruction code that will be called in place.

 

But the device still resets after returning (the flashend value is only written during startup), although the parameters and return value appear to be passed backwards and forwards correctly (as the do_sf(...) output originates from within the application section):

flashend: 262143
Test
sf: 0x0000 0x0000
do_sf(0,0): 0

flashend: 262143
Test
sf: 0x0000 0x0001
do_sf(0,1): 1

flashend: 262143
Test
sf: 0x0000 0x0002
do_sf(0,2): 2

flashend: 262143
Test
sf: 0x0001 0x0000
do_sf(1,0): 1

flashend: 262143
Test
sf: 0x0002 0x0000
do_sf(2,0): 2

flashend: 262143
Test
sf: 0x0002 0x0002
do_sf(2,2): 4

flashend: 262143
Test
sf: 0x0200 0x0200
do_sf(512,512): 1024

flashend: 262143
Test
sf: 0x03E7 0x03E7
do_sf(999,999): 1998

flashend: 262143

I also tried similar code as that given above, but with an eicall instead but that then reset the device even before return from the BLS.

 

I've attached logs of the SRAM state prior and post function call to check and see if there any obvious problems but it all seems in order (files attached: lines 31 and 33-35 differ).

 

Kind Regards,

 

Nox

Attachment(s): 

Last Edited: Sat. Jul 29, 2017 - 04:49 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Noxissist wrote:

The when compiling C code using AVR-GCC 4.9.2 for the mega2560 it is not possible to simply call a function in the upper half of the address space, due to the limitation of GCC with using 16 bit addresses.

 

The GCC manual explains EIND and caveats in very detail:

 

http://gcc.gnu.org/onlinedocs/gc...

 

 

/* R2 - R30 not shown, but similar */

 

This is Hack and invokes Undefined Behaviour.  If it works, then out of pure luck.

 

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi,

 

I've worked through westfw's second example and am pleased to say it works!

 

It wasn't immediately obvious but using the return as a jump accepting a 3 byte parameter is a very clever way to go about it.

 

Thank you for the advice!

Nox

 

P.S. Sprinter:

WRT #define R1 *((volatile uint8_t *)0x01) - As I understand it the registers are mapped to the starting addresses of memory space and could be driven to a new state in the background, so treating registers directly as pointers to volatile bytes should always work.

WRT EIND EICALL & EIJMP - I tried that methodology already as shown in my previous post, but unfortunately it just crashed the device straight away. (I started setting up SimulAVR to delve deeper, but I'll leave that for a later date).

 

 

Last Edited: Sat. Jul 29, 2017 - 06:01 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

WRT #define R1 *((volatile uint8_t *)0x01) - As I understand it the registers are mapped to the starting addresses of memory space and could be driven to a new state in the background, so treating registers directly as pointers to volatile bytes should always work.

Yeah, but there's no guarantee that the compiler won't be using those registers for something else in between the C statements (or as part of the C statements.)  I don't think the compiler has any idea that those pointers modify what it thinks are registers.

 

 

From what I can tell on the mega2560 the extended addressing instructions are needed to access address ranges greater than the 16bit max (128K words) of those extended instructions counterparts.

That's true for the indirect addressing type calls (ELPM, EICALL, EIJMP) that have nominally 16-bit registers involved, but the way I read the instruction description, the JMP and CALL instructions can access the full 8MB of possible destinations (as long as they're constants (as required by the JMP/CALL instructions anyway.) (Though I'm not sure whether the linker would handle symbolic lables...)

It's VERY possible that when using a constant integer in the JMP instruction, it needs to be either half or double the "expected" destination address, due to ambiguities in word vs byte addressing.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks, no worries. The register trick works in my case, I doubled checked the assembly to make sure.

 

If you check the map file from the linker then you'll see byte addresses, but if you get a pointers' address at run time it will be a word (2 byte) address.

 

Best wishes,

Nox