Writing a bootloader larger than device boot section!

Go To Last Post
16 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hello

I'd like to share my experience and success about (see topic :wink:) on AVR-GCC.

Note: Here, I used AVRStudio, so the configuration options are targetted toward it.

I came to a point where I needed to write a bootloader for an ATmega128, but which was too big to fit in the 4K words (8KB) maximum boot section size.

No matter how I searched, I could not find a specific answer for my problem.

As anyone who spent some time writing bootloaders/applications on AVR-GCC probably know, there are things that are wonderfull about gcc, and things that are a pain in the ...

Ok, first, to load something to a specific flash location, you have to go though section relocation. So, for example, if you have a small bootloader, and you put everything in the '.text' section, you can simply tell the linker to load '.text' section at the begining of AVR boot section (In AVRStudio, set memory settings 'Flash, .text, 0xF000', it will pass the proper command to the toolchain, i.e. '-Wl,-section-start=.text=0x1e000'). Then, program AVR, and set fuse bits properly and voila! That is the easy part...

Now, your above code will have the interrupt vectors at the begining of AVR boot section, and setting proper IVSEL bit will make AVR jump to the proper interrupts.

One thing that must be understood is that those interrupt vectors are always placed at the begining of the '.text' section. This is hardcoded in gcc and there's no way (that I found out anyway!) to modify this. Here, this cause no problem as long as you set the '.text' section to the begining of AVR application or boot section. I'll come back to this later...

Now, you've coded, and coded, and coded... up to a point where you can no longer fit your bootlooder in the AVR :( Ok... so, let's say we use more that 4Kwords (8KB). After all, why not! Think of it, you could just use 8Kword (16KB) from 0xE000~0xFFFF (0x1e000~0x1ffff), and let the AVR boot at 0xF000 (0x1e000). We simply take into account that we use more space, and leave the rest for downloaded applications.

Problems start to arrive when you realize that you can no longer simply put everything in the '.text' section. Remember above that the interrupt vectors are set to the beginning of the '.text' section. So, when the AVR first boot, it will jump in the middle of the .text section, and not to the boot vector. It is no use trying to move the interrupt vectors.

Another approach would be to create 2 sections. .text at 0xF000 and .something_else at 0xE000. This will work fine as long as the .text section does not grow past the end of the AVR program memory. So, this come down to selecting functions and moving them to .something_else section. However, this can be really tedious, as *every* functions that you wish to transfer need to be declared with __attribute__((section(".something_else"))). This involve looking at and modifying every source files :(

There must be a simple way.

Here is what I came up after some careful taught.

I could write everything in the '.text' section, with pretty much all of my code, except for the bootloader specific functions (flash page program/write/... functions) which need to be in AVR boot section, which I put in the '.bootloader' section with '__attribute__ ((section (".bootloader")))', or simply 'BOOTLOADER_SECTION' defined in .

Now, configure linker to put '.text' at 0xE000 (0x1c000), and since the boot section is now pretty tiny, let's saw we only use 1Kword (2KB), and put '.bootloader' at 0xFC00 (0x1f800). Argh, the interrupt vectors are wrong again :(! They start at the begining of .text section.

Ok, let's try something. If we could create an interrupt jump table, and put it at the begining of the boot section, we could simply put a lot of 'jmp xxx', one for each interrupt, and jump to the proper interrupt vector, which is at the begining of the .text section. The jump table could be created as a simple function, containing only inline assembler. This can be done as so:

void int_vect_jump(void) \
     __attribute__((naked)) \
     __attribute__((section(".boot_int_vec")));
void int_vect_jump(void)
{
   //Adjust to the number of interrupt for your device
   __asm__ __volatile__ ( \
      "jmp 0x1c000  \n\t" \
      "jmp 0x1c004  \n\t" \
      "jmp 0x1c008  \n\t" \
...snip...
      "jmp 0x1c084  \n\t" \
      "jmp 0x1c088  \n\t" \
      ::                  \
   );
}

The attribute 'naked' tell gcc not to put prologue (register saving, stack adjustment, ...) and epilogue (register restoration, return instruction). It put *only* the inline assembly code. And we bind it to a section called '.boot_int_vec', which is the only code in that section.

So, let's try it:

- .text, bulk of the code, start at 0xE000 (0x1c000)

static void func_in_text_section(void)
{
   ...
}

int main ()
{
   ...
   func_in_text_section();
   ...
   if (!boot_program_page (...))
      ...
}

- .boot_int_vec, interrupt jumps, start at begining of boot section ex: 0xFC00(0x1f800)

void int_vect_jump(void) \
     __attribute__((naked)) \
     __attribute__((section(".boot_int_vec")));
void int_vect_jump(void)
{
   //Adjust to the number of interrupt for your device
   __asm__ __volatile__ ( \
      "jmp 0x1c000  \n\t" \
      "jmp 0x1c004  \n\t" \
      "jmp 0x1c008  \n\t" \
...snip...
      "jmp 0x1c084  \n\t" \
      "jmp 0x1c088  \n\t" \
      ::                  \
   );
}

- .bootloader, code that *need* to be in boot section (flash programing code), start somewhere *after* the interrupt jump section

uint8_t boot_program_page (...) BOOTLOADER_SECTION;

uint8_t boot_program_page (...)
{
   ...
}

Finally, set linker options in AVRStudio for the sections start address.

And voila!

Regards

Big Boy

Keywords: moving interrupt vector, bootloader, remapping interruptions, boot.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

What on earth are you doing in a bootloader that requires more than 4K? (you haven't implemented a subset of TCP/IP or something have you??)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

try to avoid using interrupts in the bootloader. This will save lots of flash. Usually bootloader only bootloads, so polling for isr may be an option.

Have you details to share with us, about your bootloader?
(8k is really *lots* of flash)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Wowza! It doesn't take too much effort to code up a bootloader that weighs in at less than 512 bytes.

Even the stock AVR109 bootloader, written entirely in C, can be compiled down to only a K or so.

Quote:
try to avoid using interrupts in the bootloader. This will save lots of flash. Usually bootloader only bootloads, so polling for isr may be an option.

I think the point was that GCC will always produce an interrupt vector table for every program, regardless of whether or not that program actually uses interrupts.

So when you write a bootloader in GCC, unless you play some tricks, the interrupt vector table will be taking up space no matter what.

The OP's bootloader, being larger than the 8 kB allowed, was shifted down in Flash to a location below the lowest address that the reset vector can point to -- and therefore, the entry point to the bootloader (ie the reset vector) needed to be located *in the middle of the hex file* instead of at the beginning.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

lfmorrison wrote:
So when you write a bootloader in GCC, unless you play some tricks, the interrupt vector table will be taking up space no matter what.

Luke,

For the purposes of writing a "tight" C bootloader (Bob's thread in main forum) I was contemplating how to defeat vector table generation. Can you think of a better way than simply editing the linker script so it doesn't link in the "*(.vectors)" ?

Cliff

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

not only the vector tables takes a lot of space, the ISR themselved too: They have to store all the context on stack, and afterwards restore them again.... So a singel ISR easliy smashes the 100-byte line..

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

coldtobi wrote:
not only the vector tables takes a lot of space, the ISR themselved too: They have to store all the context on stack, and afterwards restore them again.... So a singel ISR easliy smashes the 100-byte line..

I understand. It's just that the Big Boy hasn't specifically said yet that there are actually any ISRs in use right now - so far we only know that he wanted to create a facility in which interrupts hypothetically *could* be handled properly in a bootloader, even though that bootloader turned out to be bigger than the chip's maximum bootloader block.
____________________

Of course, to be pedantic, not every ISR needs to save and restore the whole context -- I can easily write an ISR in C which might only need to save and restore 1 or 2 registers.

The compiler will only save/restore the WHOLE context if you write a very complicated algorithm in the ISR which needs to make use of lots of registers, or if there's the potential to make any non-inline function calls during the course of executing of the ISR.
_______________

clawson wrote:
For the purposes of writing a "tight" C bootloader (Bob's thread in main forum) I was contemplating how to defeat vector table generation. Can you think of a better way than simply editing the linker script so it doesn't link in the "*(.vectors)" ?

I have never needed to give it much consideration because I've never gone anywhere near outgrowing my ATmega's bootloader block.

But I think editing the linker script would probably be the first approach I'd take.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
clawson wrote:
For the purposes of writing a "tight" C bootloader (Bob's thread in main forum) I was contemplating how to defeat vector table generation. Can you think of a better way than simply editing the linker script so it doesn't link in the "*(.vectors)" ?

I have never needed to give it much consideration because I've never gone anywhere near outgrowing my ATmega's bootloader block.

But I think editing the linker script would probably be the first approach I'd take.

I did it already some weeks ago; but fiddling around with linker scripts is nasty. ;-)
http://blog.coldtobi.de/1_coldto...

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Actually, I'm currently making a project that's more heavy than standard simple embedded applications. I hooked-up the mega128 to a FPGA evaluation board (via external memory interface), and using the FPGA as an integrated SoC (less the micro-controller). In the FPGA, I'm designing a video controller, audio controller, PS2 interface, SRAM interface (which give access to 1MB SRAM, shared between CPU (via paging), video and audio), external IO interface (to turn the evaluation board leds/7-seg display on/off, and read the switches), ...

The AVR simply access the SoC via memory mapped IO.
0x0000~0x10FF - Internal to AVR
0x1100~0x11FF - FPGA Info/Reset
0x1200~0x12FF - SRAM Banks config
0x1300~0x13FF - External IO
0x1400~0x14FF - PS2 Keyboard & Mouse
0x1500~0x2FFF - Unused (yet, still have to add goodies, like audio, 802.11 MAC, ...)
0x3000~0x3FFF - Video controller
0x4000~0x7FFF - 4 banks of 4KB, can access anywhere in the eval board SRAM. Used for extending memory of the AVR, Video character page/character ROM/4 tile planes/tiles ROM, audio samples, ...
0x8000~0xFFFF - First 32KB of eval board SRAM. Used as fixed external memory for my memory hungry AVR applications :)

I done all of the above except for the audio (give me time!! :)).

Now, I hooked-up some other things directly to the AVR, like RTC, SD card, serial flash, ISP and JTAG connectors, and 1 UART level converter (actually, I use a UART level converter that's built-in on the eval board, so I simply route 2 wires to the FPGA from the TXD/RXD).

Now, what I wish to do is write a stub on the AVR that display a menu of the applications on the SD card, load the selected one (user select the application in the menu, via keyboard), and jump to the application.

So, the loader stub do the following, on boot:

- Initialize external memory interface.
- Initialize IO, and one timer (interrupt). The timer is used to give a 10ms call to the SD module, and yes, I do need at least this interruption in bootloader.
- Initialize video, load character set ROM and menu. The character set ROM and menu screen buffer is pre-stored on the serial dataflash, which I access like a normal drive using Chan's FatFs module, so this use only minimal AVR program space.
- If video initialization fail, due to corrupted serial flash for example (AVR can't load character set ROM or screen buffer), look on SD and try to load and run a failsafe application.
- Handle the menu, which involve
---Listing the applications that are on the SD root directory
---Handling keyboard up/down arrow and enter key
---Updating/scrolling menu.
- Load and flash selected application to application space.
- Restore the select few registers I used in the stub, disable timer, disable interrupt and switch back interrupt vector to application space (IVSEL<-0).
- Jump to 0x0000

So, in my case, I really need that stub to be independent from the application. As you can see, this is more than a simple bootloader. The space is consumed as follow:

Optimization: Os

Interrupt vectors: 0x8c
Serial flash module: 0x2c8
SD/MMC module: 0x480
Disk IO module (which dispatch calls to serial flash or SD/MMC): 0x94
FatFs (which handle FAT access): 0xcc4
spi module: 0xe4
Flash programing module: 0x186
main: what's left :roll:
plus about 0x100 of overhead for libc functions.

As you can see, most of the space is took by the drive access modules, and this is really minimal space as I turned-on switches to use the lowest space possible (read-only, minimal function set, ...).

Now, with the method I posted above, I can use much more space :wink:.

Big Boy

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

It's actually pretty simple to generate the bootloader without interrupt vectors.

First, create a normal application. You'll need a simple main() so the compiler/linker doesn't complain.

Second, make sure that all of the routines for your bootloader are in a section of their own. For example:

void DownloadFirmware( void ) __attribute__ ((section (".boot")));

Third, make sure your bootloader entry point is "naked":

void Monitor(void) __attribute__ ((naked)) ((section (".boot")));

Fourth, make sure you initialize the universe in your entry point. The code below is for my ATmega2560:

void Monitor(void)
{
   long address;
   unsigned char val;
   uint8_t sreg;
   uint8_t badCRC = 0;

   // Make the universe sane
   asm volatile( "clr r1         \n\t"
				     "ldi r29, 0x02   \n\t"
				     "ldi r28, 0x00   \n\t"
			       );

   // Disable interrupts.
   sreg = SREG;
   cli();

   SP = RAMEND;

   /* Force all pins to be inputs at 0.*/
   DDRA = 0;
   PORTA = 0;

   DDRB = 0;
   PORTB = 0;

   . . .

Fifth, add the appropriate calls to the compiler/linker to put your code at the proper place (the following is a mod to an MFile-generated makefile):

#---------------- Linker Options ----------------
. . .
LDFLAGS += -Wl,--section-start=.boot=0x3E000
. . .

Finally, once the code is generated and you have a .hex file, use the srec routines to snip out your boot. You can later use the same srec routines to put your boot with the real app, if you so desire:

%.hex: %.elf
	@echo
	@echo $(MSG_FLASH) $@
	mv $@ $(TARGET).org.hex
	srec_cat $(TARGET).org.hex -Intel -crop 0x03e000 0x040000 -o $(TARGET).boot.hex -Intel
. . .

There ya go! No interrupt vectors! Make sure to turn off interrupts as soon as possible in the boot entry point. Also, don't use avr-libc routines in your code - they won't be available, for the most part (macros defined in avr-libc of course will work).

The above is intended as a guide, not finished code. But it is possible and demonstrably works.

Stu

Engineering seems to boil down to: Cheap. Fast. Good. Choose two. Sometimes choose only one.

Newbie? Be sure to read the thread Newbie? Start here!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Lost again - why does the bootloader have to crank up a video system. Surely the job of a bootloader is:

1) see if new code is being sent and if so receive and program it, else
2) start the app.

Normally it can get to (2) within 1 second say. So what entertaining picture is it that you need to show on the video display for that 1 second? (else why can't the app read the SD, init the video and show the picture.

On the occasion of (1) and some code being received (which presumably the user knows about because he instigated the bootloading operation) is it really that bad that for the seconds it takes for the device to receive and program the code that the user is without video?

The whole thing about bootloader is that you want them to contain the MINIMUM amount of code possible because, while you may well update the app from time to time to fix bugs and add features the boot code is set in stone on day one so you want it small so that every part of it can have been thoroughly tested. The last thing you want to do is put app functionality into the bootloader that does not need to be there!

I am really confused!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

@Stu,

an interesting approach - "chopping" the generated code (I already have BOOTLOADER_SECTION code in my app so I guess I could use that as a template).

@coldtobi,
Ah ha! Thanks for the link, it was the -nostartfiles "magic" that I was looking for (turns out it's not as simple as just editing the .vectors out of the avr5.x!)

Cliff

PS Once I looked up -nostartfiles in the manual I then found -nostdlib which is even closer to what I was looking for as it omits the copy_data/clear_bss stuff too. In fact I just got this:

AVR Memory Usage
----------------
Device: atmega16

Program:       6 bytes (0.0% Full)
(.text + .data + .bootloader)

Data:          4 bytes (0.4% Full)
(.data + .bss + .noinit)

:lol: :lol: :lol:

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Lost again - why does the bootloader have to crank up a video system. Surely the job of a bootloader is:

1) see if new code is being sent and if so receive and program it, else
2) start the app.

...

Actually, I intend to make a small computer/game platform. Remember the old days of the Commodore 64?! Now, every time I turn the power on (or hard reset), I wish the user to be able to select which application he want to load. The applications are on the SD card. Of course, I could restrict to load only one application per SD card (the first one it find for example), but personally, I do think that wasting 99.99% of that 1GB SD card in free space (having a single application on it) is more of a concern than wasting 1 or 2 extra KB of the ATmega for a proper loader that give the user a way to put 50, 100, 200... applications on a single SD, plus the comfort of having it on screen to select...

Remember, it's always harder/heavier to code when you have to think about simplicity of use (especially if your project is aimed for use by a typical user, who just wish to put da card in, throw ze switch, and play da game! :lol:).

So, basically, what I have in mind is just a sophisticated, polished, loader for the casual user... :wink:

Of course, I think about making a smaller bootloader, that load the menu selection application from the serial flash, which in turn select and load the SD card application, but here, I have 2 way of doing it:
- Either load the menu stub application at 0x0000, and load over the user application, every time the power is applied, but this will wear down the flash much quicker.
- Or, load the stub at, let's say, 0xD000~0xEFFF, if it's not already there, and then, load the user selected application on the rest of the program memory, but this have the problem that the stub will duplicate all the disk access library (even though they may be compiled as minimal version). This will take more of the precious 128KB of flash.

The easiest way (if we can call it that way) is to simply make one bootloader application, with all the things required to load the user application, in the upper memory space, a little like the C=64 kernel (but which isn't really a kernel since it only load application), with the 'only' problem being the interrupt vectors.

Big Boy

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Have you considered the 10,000 cycle write life of the AVR code flash? The AVRs, being Harvard, aren't really a processor for "dynamic code loading ". But if you want to do dynamic program loading you probably should look at a byte code interpreter like p-code or JVM and having the programs on the SD written in byte-code in which case the AVR application in the code flash would never change.

But I'm intrigued, why did you design a complex system like this around a completely inappropriate processor like an AVR? Why not an ARM. With that you'd have sufficient MIPS to do both 6502 and Z80 realtime emulation and you'd be able to run the actual games from not only the C64 but most of the other early 80's home computers,

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

clawson wrote:
Have you considered the 10,000 cycle write life of the AVR code flash?

This actually is not for mass market :) It's more of a home project (that I could classify in the 'serious' ones), to be used eventually, I don't know, maybe as a demo to get a job! :) At least to play for a while with AVR!

For the flash wears, I did taught of that, and since most of the time, I load the same application 5~10 times in a row, the AVR does not re-program a sector that have no change from the data to program.

Quote:
But I'm intrigued, why did you design a complex system like this around a completely inappropriate processor like an AVR?

Of course, I could have chosen another processor, but I do like AVR! Plus, it's easy to connect to my evaluation board, come with built-in flash that is easy to program with an ISP cable, easy to debut via JTAG, and mostly because I already have a lot of experience with AVR and had several AVR at hand (including an ATmega128 on a breadboard) :)

I did taught a while about other processors, but always came to a no because of different problems (need extra PCB, or can't fit all of the memory signals to the 30 some pins I have available to interface with the eval board, or require external flash which is not easy to program via ISP and again, which require another custom PCB, ...).

Big Boy!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I've been playing around with the need to eliminate the vectors from a standalone bootloader.

I looked at editing the linker script... but why? There's an avr-ld option for this.

Just create a file called discardvect.x, with this one line:

SECTIONS { DISCARD : { *(.vectors) } }

Then include discardvect.x as if it were an object file when it's creating your .elf file... i.e., in your Makefile, add it to the OBJS macro or to the LDFLAGS macro, like this:

LDFLAGS = -Ttext=$(BASEADDR) $(EXTMEMOPTS) discardvect.x

This tells avr-ld to add the command in discardvect.x to whatever linker script it decides to use, and this script just tells it to discard the .vectors section. Done.

Works for me. Doesn't get rid of the rest of the startup code. I looked to see if any others could be removed, but save for 4 bytes of the __bad_interrupt ISR, the rest is setting up the stack, copying initialized data, and clearing uninitialized data to zero, all of which are necessary for ANSI C. What's left is 62 bytes of initialization code on an ATmega168, down from 162 when the interrupt vector table is included.