inline assember - set/get a uint32_t to registers...

Go To Last Post
13 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I have a bootloader that acquires its serial number from the bootloader section, but I don't want the app to be able to read the bootloader section so I will have the fuse bits set to not allow that.  Basically I want to send a uint32_t parameter to the application code from the bootloader code.

 

I have done this through SRAM using a memcpy to an address I don't think is in use and it works, but what I'd really like to do is load the uint32_t into 4 registers that are not used in the gcc initialization code and then reacquire them in the application code when it begins.

 

Bootloader:

uint32_t sn

savetoregs(sn); //macro to save sn to something like r20/r21/r22/r23, etc.

asm("jmp 0");

 

App:

uint32_t sn

sn=loadfromregs(); //macro to load sn to something like r20/r21/r22/r23, etc.

 

I am just baffled with the inline asm examples I've seen, though I've tried to understand them.  It may be a lack of assembly knowledge that keeps me from getting it fully however.  I found the pgm_get_far_address macro for example that looks like this:

#define pgm_get-far_address(var)                          \
({                                                    \
	uint_farptr_t tmp;                                \
                                                      \
	__asm__ __volatile__(                             \
                                                      \
			"ldi	%A0, lo8(%1)"           "\n\t"    \
			"ldi	%B0, hi8(%1)"           "\n\t"    \
			"ldi	%C0, hh8(%1)"           "\n\t"    \
			"clr	%D0"                    "\n\t"    \
		:                                             \
			"=d" (tmp)                                \
		:                                             \
			"p"  (&(var))                             \
	);                                                \
	tmp;                                              \
})

but after trying to modify it to do what I want, I am not getting it!

 

Can an an inline asm macro be done to do something like this?  Any ideas on how to do it?

 

I also got an error about only registers above 16 must be used, why is that?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Maybe I'm getting somewhere:

#define test(var)                          \
({                                                    \
                                                      \
	__asm__ __volatile__(                             \
                                                      \
			"lds	R21, %0"           "\n\t"    \
			"lds	R22, %0+1"           "\n\t"    \
			"lds	R23, %0+2"           "\n\t"    \
			"lds  R24, %0+3"          "\n\t"    \
		:                                             \
		:                                             \
			"p"  (&(var))                             \
	);                                                \
})

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0
#define save4(var) \
  ({ __asm__ __volatile__( \
  "lds R21, %0\n" \
  "lds R22, %0+1\n" \
  "lds R23, %0+2\n" \
  "lds R24, %0+3\n" : : "p" (var)); })

#define load4(var) \
  ({ __asm__ __volatile__( \
  "sts %0, R21\n" \
  "sts %0+1, R22\n" \
  "sts %0+2, R23\n" \
  "sts %0+3, R24\n" : : "p" (var)); })

Got it working!  Hopefully the right way!

 

Saving:

  //does the app have a tag and its crc is good
  if (findtag(0,BOOT_LOCATION,&ui1) && testcrc(0,ui1))
    {
      save4(&sn);
      asm("jmp 0");
    }

Loading:

int main(void)
{
  load4(&sn);

 

Last Edited: Wed. Apr 18, 2018 - 10:04 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What an extraordinarily convoluted way to achieve...

typedef void (*reset_fn_t)(uint32_t);

int main(void) {
    reset_fn_t reset;
    reset = (reset_fn_t) 0;
    reset(0xBABEFACE);
}

That does a CALL 0 with R25:24:R23:R22 (as dictated by the GCC ABI) containing 0xBABEFACE. Obviously, if you preferred you could pass the address of a uint32_t instead:

typedef void (*reset_fn_t)(uint32_t *);

uint32_t sn = 0xBABEFACE;

int main(void) {
    reset_fn_t reset;
    reset = (reset_fn_t) 0;
    reset(&sn);
}

in this case a 16 bit pointer in R25:R24 would be passed.

 

But I guess it's true that in either case you need "something special" in the CRT (.init1?) to then collect R25:R24 or R25:R24:R23:R22 before those registers start to be used by something else

 

For me the easiest would be:

typedef void (*reset_fn_t)(void);

uint32_t sn = 0xBABEFACE;

int main(void) {
    reset_fn_t reset;
    reset = (reset_fn_t) 0;

    GPIOR1 = &sn & 0xFF;
    GPIOR2 = &sn >> 8;
    reset();
}

then the receiving code doesn't need to fuss about with early R25... but simply in main:

uint32_t sn;
uint32_t * ptr;

int main(void) {
    ptr = (uint32_t *)GPIOR1;
    ptr |= (uint32_t *)(GPIOR2 << 8);
    sn = *ptr;
}

I picked the (probably unused) GPIOR1/GPIOR2 rather than involving GPIOR0 as that might be used right up to the last minute in the bootloader (obviously place to hold binary flags) but I guess that once the decision is made to go to the app then it becomes redundant so you could easily use it instead.

 

BTW did you read the bootloader FAQ in the tutorial forum? I'm pretty sure I aired the idea of using these GPIORn registers to pass data between domains in that.

 

Another possible use is to pass a pointer to a dispatch table so that the bootloader can share execution entry points with the app. But I guess that if it's being locked so it cannot be read that table itself would need to be constructed in RAM before the dispatch.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:

But I guess it's true that in either case you need "something special" in the CRT (.init1?)...

 

Does GCC's CRT0 clear out or use all the registers?

 

I know on CVAVR it zero's out R2-R14, and uses R22-R31. So you could, using CV, leave things in R15-R22 and they would still be intact at the top of main().

#1 This forum helps those that help themselves

#2 All grounds are not created equal

#3 How have you proved that your chip is running at xxMHz?

#4 "If you think you need floating point to solve the problem then you don't understand the problem. If you really do need floating point then you have a problem you do not understand." - Heater's ex-boss

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Brian Fairchild wrote:
Does GCC's CRT0 clear out or use all the registers?
Here's a fairly typical CRT (I deliberately added some .data and .bss variables so the loops would be included):

 

00000000 <__vectors>:
   0:   0c 94 2a 00     jmp     0x54    ; 0x54 <__ctors_end>
   4:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
   8:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
   c:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  10:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  14:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  18:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  1c:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  20:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  24:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  28:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  2c:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  30:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  34:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  38:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  3c:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  40:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  44:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  48:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  4c:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>
  50:   0c 94 47 00     jmp     0x8e    ; 0x8e <__bad_interrupt>

00000054 <__ctors_end>:
  54:   11 24           eor     r1, r1
  56:   1f be           out     0x3f, r1        ; 63
  58:   cf e5           ldi     r28, 0x5F       ; 95
  5a:   d4 e0           ldi     r29, 0x04       ; 4
  5c:   de bf           out     0x3e, r29       ; 62
  5e:   cd bf           out     0x3d, r28       ; 61

00000060 <__do_copy_data>:
  60:   10 e0           ldi     r17, 0x00       ; 0
  62:   a0 e6           ldi     r26, 0x60       ; 96
  64:   b0 e0           ldi     r27, 0x00       ; 0
  66:   ec e9           ldi     r30, 0x9C       ; 156
  68:   f0 e0           ldi     r31, 0x00       ; 0
  6a:   02 c0           rjmp    .+4             ; 0x70 <__do_copy_data+0x10>
  6c:   05 90           lpm     r0, Z+
  6e:   0d 92           st      X+, r0
  70:   ac 36           cpi     r26, 0x6C       ; 108
  72:   b1 07           cpc     r27, r17
  74:   d9 f7           brne    .-10            ; 0x6c <__do_copy_data+0xc>

00000076 <__do_clear_bss>:
  76:   20 e0           ldi     r18, 0x00       ; 0
  78:   ac e6           ldi     r26, 0x6C       ; 108
  7a:   b0 e0           ldi     r27, 0x00       ; 0
  7c:   01 c0           rjmp    .+2             ; 0x80 <.do_clear_bss_start>

0000007e <.do_clear_bss_loop>:
  7e:   1d 92           st      X+, r1

00000080 <.do_clear_bss_start>:
  80:   a0 38           cpi     r26, 0x80       ; 128
  82:   b2 07           cpc     r27, r18
  84:   e1 f7           brne    .-8             ; 0x7e <.do_clear_bss_loop>
  86:   0e 94 49 00     call    0x92    ; 0x92 <main>
  8a:   0c 94 4c 00     jmp     0x98    ; 0x98 <_exit>

0000008e <__bad_interrupt>:
  8e:   0c 94 00 00     jmp     0       ; 0x0 <__vectors>

00000092 <main>:
  92:   80 e0           ldi     r24, 0x00       ; 0
  94:   90 e0           ldi     r25, 0x00       ; 0
  96:   08 95           ret

00000098 <_exit>:
  98:   f8 94           cli

0000009a <__stop_program>:
  9a:   ff cf           rjmp    .-2             ; 0x9a <__stop_program>

I coloured the vectors in blue and main in grey. The rest is the CRT. As you can see mega16 (in this example) has a JMP at 0 across the vectors then early code clears R1 and SREG then sets the stack to RAMEND. After that come the loops that copy initial data from flash to .data in RAM and that writes 0x00 to the locations in .bss

 

By the time main() is reached the following registers have been used: R1, R28, R29, R17, R26, R27, R30, R31, R0, R18

 

Now GCC has a way that code sequences can be "inserted" into the CRT. The initial stuff is in 10 sections (.init0 to .init9) and on the whole the existing CRT only uses the even numbered sections so by directing code into odd numbered sections (init2 say) you can "break in" at various stages in the above. But you still face the issue that even if you intercept R25..R22 "early" you still have to find somewhere to put them until the complete setup is ready and they can be stored into allocated places. A possible candidate for this, in fact, is GPIOR0, GPIOR1 and GPIOR2 but, as I showed, if you are going to use those you might as well load them up in the bootloader and unload them in main() of the app.

Last Edited: Thu. Apr 19, 2018 - 12:09 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What I did with the save4/load4 macro does indeed work, but something more elegant would be better.

 

Are GPIORx really do nothing registers?  Connected to nothing?  I wish there was a GPIOR3!  Do most AVR's have the GPIORx registers?  What about TCNT0, do most/all AVR's have an 8 bit timer 0?  I could store a value in it, and then retrieve it and zero TCNT0 afterwards...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I think everything from 2005/2006 onwards (so starting with mega48/88/168) has got GPIORn registers. The "mega"s tend to have three. I've always thought it a bit stupid that only one of those is in the range of SBI/CBI opcodes as it's a perfect candidate for holding 8 individually accessible bit state variables. The other two are only in IN/OUT range so just lend themselves to be used for holding special 8 bit quantities.

 

In Xmega they went over board with GPIORn - I think there's 16 and I think they are all right at the bottom in SBI/CBI range.

 

When I used mega16 in the past the register I used to pick for my bit vars (because I wasn't using TWI) was the TWAR register as it has 8 fully writeable/readable bits and is in SBI/CBI range. As you say you could use things like TCNTn for this too. If it's whole 8 bits then it's simply preferable for them to be in IN/OUT range but if you want to use them as bit variables you need to find something in SBI/CBI range - so in the first 32 SFRs.

 

For the purposes of this passing boot to app data the fact is that by the time you are about to make the app jump you should have got all the peripherals turned off (most bootloaders use a watchdog reset for this) so in theory just about any full read/write register in SFR range could be used. I'd guess most AVRs have at least 20 and quite possibly considerably more such registers.

 

EDIT: of course there's also the RAM. If the bootloader and the app agreed on some location you could arrange to pass huge amounts of data just by having some fixed RAM address burned into both. The only thing is that you probably don't want the CRT to wake up and write or wipe large chunks so best to keep it out of the way of the programs own allocated variables (both programs).

Last Edited: Thu. Apr 19, 2018 - 12:39 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

From bootloader to application, I usually return any changed peripheral registers to the default data sheet values if I have to.  I prefer to use the watchdog reset method you mention if the bootloader determines whether it can run the application early before the peripheral registers are changed.  Something like:

 

disable watchdog

test bootloader crc [halt if failed, this will be noticed at production]

if bootloader signal (such as button pressed) goto update

get serial to pass to application (serial is stored in bootloader flash, not to change with app code change)

test application and execute if good

update:

setup usart, timer, etc

update code

when finished, do a watchdog reset

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Because of BOOTRST when you force a watchdog reset execution comes back to the entry of the bootloader. It's here you read MCUSR and if the WDRF is set you know this is a reset the bootloader forced upon itself (things get a bit more complex if the app itself might also use the watchdog and cause WDRF!). So when the bootloader starts with WDRF set you don't do very much else (apart from preparing to pass this serial number) and then you CALL 0.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Interesting - so a power on reset+some condition=bootloader, watchdogreset=application?  I guess I always think of WDRF as something went wrong, not that I usually do anything with that except run again anyway.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Except that in this you will be doing a deliberate:

   // ...everything else...

   // finally off to the newly programmed application...
   wdt_enable(WDTO_15MSWDTO_15MS);
   while(1);
}

which is "right" not wrong ;-)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks clawson!