In a previous thread I asked about some peculiar behavour I observed with dtostrf() w.r.t. a project of mine that is very tight on FLASH space.
This same project has some tight timing constraints as well, and relies on a variety of interrupt service routines, most of which hook one or more functions, often in a cascading chain.
Specifically, when an interrupt fires:
1) The ISR runs and may do some stuff before...
2) ...a handler function is called via pre-loaded function hook (function
pointer), which then re-assigns the same hook so that the next time the
same interrupt fires...
3) ...a different handler will be called by the ISR...
5) Some condition is met and the chain is terminated, usually by resetting
the hook to some initial value, and disabling the interrupt
In many cases, multiple interrupt sources and ISRs can be involved, whereby a handler that was hooked by one interrupt's ISR might re-assign the hook of a different interrupt's ISR.
These 'handler chains' are a great way to tackle complex event-driven program flow problems using simple objects linked together via repeating interrupts. However, speed is always a concern when it comes to interrupt service routines. This is where the default behaviour of GCC is giving me grief.
Looking at the assembly generated by GCC, I have found that the compiler-generated prologues and epilogues are somewhat inefficient, both those for the ISRs, and for the handlers that get hooked. In many cases a dozen or more registers that are never used in the ISR are pushed and popped. That adds 48 cycles to a very tight cycle budget. It would appear that GCC has a standard list of registers to save/restore when generating prologues and epilogues. However, that doesn't explain all of it. Looking deeper, I noticed that not only were the registers used directly by an ISR being pushed by it's prologue, but every register used by every handler that it could possibly hook was also being pushed by the ISR, even though many if not most of the invocations of that ISR would never result in the use of those registers.
In more detail, a stand-alone ISR that only changed SREG, r30, and r31 would actually push those items AND r0, and r1. For an ISR that employs one or more hooks, and one or more handlers for each, I found that r18 through r27 were also being pushed by the ISR itself, and any remaining registers were pushed by the handlers that used them.
My desire is to control this behaviour, such that only the minimum subset of registers used directly by an object (ISR or handler) are pushed/pulled by that object. While the overall worst-case cost of pushing/pulling all of the required registers will not decrease, I would see huge benefits for some of the handlers.
Early in my project I addressed this by using ISR_NAKED and __attribute__ ((__naked__)), compiling, looking at the assembled output to determine what registers were being used, and hand-crafting prologues and epilogues with __asm__ __volatile__ ("") for each ISR and handler. As my project has grown, this approach has rapidly gotten out of hand. In truth, I know it's a bad idea for a lot of reasons (portability, maintainability, survivability in the face of new compiler versions, etc.), but I have not known of any way to exert fine control over the compiler in this regard.
If there's anyone who can make a suggestion, perhaps a compiler directive, a pragma, a commandline option, even a way to patch the source of GCC, or just a place to start looking in the source code of whatever part of the toolchain is responsible... I would appreciate it. Apart from, of course, "Why don't you just write it all in assembler and be done with it?" ... This project is already over 2000 lines of C/C++ code, excluding whitespace and comments. I shudder at the thought of doing it entirely in assembler).
Again, many thanks.