Compiler-generated prologues and epilogues

Go To Last Post
71 posts / 0 new

Pages

Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Greetings listers,

In a previous thread I asked about some peculiar behavour I observed with dtostrf() w.r.t. a project of mine that is very tight on FLASH space.

This same project has some tight timing constraints as well, and relies on a variety of interrupt service routines, most of which hook one or more functions, often in a cascading chain.

Specifically, when an interrupt fires:

1) The ISR runs and may do some stuff before...
2) ...a handler function is called via pre-loaded function hook (function
pointer), which then re-assigns the same hook so that the next time the
same interrupt fires...
3) ...a different handler will be called by the ISR...
4) ...etc...
...
5) Some condition is met and the chain is terminated, usually by resetting
the hook to some initial value, and disabling the interrupt

In many cases, multiple interrupt sources and ISRs can be involved, whereby a handler that was hooked by one interrupt's ISR might re-assign the hook of a different interrupt's ISR.

These 'handler chains' are a great way to tackle complex event-driven program flow problems using simple objects linked together via repeating interrupts. However, speed is always a concern when it comes to interrupt service routines. This is where the default behaviour of GCC is giving me grief.

Looking at the assembly generated by GCC, I have found that the compiler-generated prologues and epilogues are somewhat inefficient, both those for the ISRs, and for the handlers that get hooked. In many cases a dozen or more registers that are never used in the ISR are pushed and popped. That adds 48 cycles to a very tight cycle budget. It would appear that GCC has a standard list of registers to save/restore when generating prologues and epilogues. However, that doesn't explain all of it. Looking deeper, I noticed that not only were the registers used directly by an ISR being pushed by it's prologue, but every register used by every handler that it could possibly hook was also being pushed by the ISR, even though many if not most of the invocations of that ISR would never result in the use of those registers.

In more detail, a stand-alone ISR that only changed SREG, r30, and r31 would actually push those items AND r0, and r1. For an ISR that employs one or more hooks, and one or more handlers for each, I found that r18 through r27 were also being pushed by the ISR itself, and any remaining registers were pushed by the handlers that used them.

My desire is to control this behaviour, such that only the minimum subset of registers used directly by an object (ISR or handler) are pushed/pulled by that object. While the overall worst-case cost of pushing/pulling all of the required registers will not decrease, I would see huge benefits for some of the handlers.

Early in my project I addressed this by using ISR_NAKED and __attribute__ ((__naked__)), compiling, looking at the assembled output to determine what registers were being used, and hand-crafting prologues and epilogues with __asm__ __volatile__ ("") for each ISR and handler. As my project has grown, this approach has rapidly gotten out of hand. In truth, I know it's a bad idea for a lot of reasons (portability, maintainability, survivability in the face of new compiler versions, etc.), but I have not known of any way to exert fine control over the compiler in this regard.

If there's anyone who can make a suggestion, perhaps a compiler directive, a pragma, a commandline option, even a way to patch the source of GCC, or just a place to start looking in the source code of whatever part of the toolchain is responsible... I would appreciate it. Apart from, of course, "Why don't you just write it all in assembler and be done with it?" ... This project is already over 2000 lines of C/C++ code, excluding whitespace and comments. I shudder at the thought of doing it entirely in assembler).

Again, many thanks.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Sat. Jul 21, 2012 - 07:06 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
Looking at the assembly generated by GCC, I have found that the compiler-generated prologues and epilogues are somewhat inefficient
Because you have set up a situation where the compiler can not know the context of the function call within the ISR. It must therefore save and restore any register that might be clobbered by the function call.
Quote:
If there's anyone who can make a suggestion
Don't make function calls within an ISR.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
Because you have set up a situation where the compiler can not know the context of the function call within the ISR. It must therefore save and restore any register that might be clobbered by the function call.

Koshchi,

Actually it can, and is demonstrating that it does. It knows exactly what registers are used by each and every function. If the ISR that calls a function has already pushed a register which that function uses, the function doesn't do it again. If, however, a function uses a register that was not pushed by it's calling ISR, it will push it itself.

The biggest issue for me is that the compiler appears to be aware of all of the registers used by all of the handlers that are ever called via the function hook. It then pushes most (sometimes all) of those registers right away at the top of the ISR, even if many of the handlers will never use the majority of them. This saves the need to push them later in each handler, but comes at the cost of added latency in running the acutal code that services the interrupt. Some bits of code within the handler chain are less time-critical than others. I've placed the most critical in the ISR itself, where possible. But since the compiler is choosing to push ALL registers used in all the functions called by the ISR at the TOP of the ISR, that time-critical code is being delayed.

This is why I would like to tell the compiler to tighten up it's prologue/epilogue generation to the absolute minimum at each step.

Quote:
Don't make function calls within an ISR.

Not helpful. And not possible, at least for this project. The time-critical, multiple-external-event-driven nature of the application demands it. I have no doubt that in theory it could be avoided, but it would be exceedingly difficult to implement and maintain, and would likely result in much more code both in the source and the final binary. I'm already struggling to squeeze a few dozen bytes out of 32K.

In any case, there's nothing wrong with function calls within an ISR. The TVout library for Arduino does it. In fact there's a facility within the Arduino IDE for attaching a function to an external interrupt. It's called attachInterrupt() and it does exactly what I'm doing.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
Actually it can, and is demonstrating that it does. It knows exactly what registers are used by each and every function. If the ISR that calls a function has already pushed a register which that function uses, the function doesn't do it again. If, however, a function uses a register that was not pushed by it's calling ISR, it will push it itself.
No. That are simply two different sets of registers. One is called "call-clobbered" or "call-used", the other "call-saved". A function can use any call-clobbered register without preserving it. Because the ISR can't know which one of this set is actually used by the function all of them need to be pushed/popped. On the other hand the ISR don't need to worry about the call-saved registers, because whenever a function uses one of them, then the function itself has to preserve it.

http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_reg_usage

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sternst wrote:
No. That are simply two different sets of registers. One is called "call-clobbered" or "call-used", the other "call-saved". A function can use any call-clobbered register without preserving it. Because the ISR can't know which one of this set is actually used by the function all of them need to be pushed/popped. On the other hand the ISR don't need to worry about the call-saved registers, because whenever a function uses one of them, then the function itself has to preserve it.

sternst,

So how can I change the rules that govern what registers are in the call-clobbered list? I would like to reduce that list to zero, or as close to zero as possible. Ideally I would like to limit the scope of this change, i.e. reduce the call-clobbered list only for ISRs and handlers, but leave the compiler defaults intact for the remaining code.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Sat. Jul 21, 2012 - 11:58 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
So how can I change the rules that govern what registers are in the call-clobbered list? I would like to reduce that list to zero, or as close to zero as possible.
By patching the compiler. And don't forget to recompile the AVRlibc afterwards. Good luck!
(and be prepared to see a significant increase of code size in the non-ISR code because of lots more push/pop there)

joeymorin wrote:
Ideally I would like to limit the scope of this change, i.e. recude the call-clobbered list only for ISRs and handlers, but leave the compiler defaults intact for the remaining code.
Two different and incompatible ABIs within one program? Almost impossible.

Stefan Ernst

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

This is a maintenance nightmare design in progress.

Start over and do it right, like it should have been done the first time.

Sid

Life... is a state of mind

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

sternst wrote:
By patching the compiler. And don't forget to recompile the AVRlibc afterwards. Good luck!
(and be prepared to see a significant increase of code size in the non-ISR code because of lots more push/pop there)

As I suspected. I realize it's an unusual situation. I just was hoping for at least a tunable build parameter I could tickle and recompile GCC et al. Perhaps there is one, I'll just have to dig deeper. I recall looking into this a couple of months ago and stumbling across some discussions regarding register pools, allocations, and reservations. It didn't directly apply, but the poster basically had the same question I did: how to modify the register allocation behaviour in a specific instance. He never got a helpful answer. Of course, I can't find it now anyway... However this (although for an older version of GCC) looks promissing:

http://sunsite.ualberta.ca/Docum...

Quote:
Two different and incompatible ABIs within one program? Almost impossible.

I think that's overstating the matter. It would not require a completely new ABI, and not even a change in the way registers are allocated. I have no need to change that. I just want to defer a register push until it is needed.

In any case, even if a completely different ABI was necessary, it shouldn't pose a problem since I only wish to change the behaviour of the compiler for the ISRs and the functions they call. Under no circumstances do those functions ever get called anywhere but while servicing an interrupt. The only way in which the two code bases talk to each other is through the use of volatiles, so registers must be loaded from RAM anyway. I don't see how I couldn't compile those functions and the ISRs separately from the rest of the code and then link them together.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ChaunceyGardiner wrote:
This is a maintenance nightmare design in progress.

As I mentioned clearly in my first post, I have already identified the challenges with this approach. What I seek is an improved process, not a complete re-design.

Quote:
Start over and do it right, like it should have been done the first time.

[sigh] Been there, done that. This is the 3rd incarnation, and it is the right approach for the application. The challenges are many and great, and I have given this quite a bit of thought. The challenges are, as always, and in no particular order:

A) Interrupt responsiveness
B) Code size
C) Maintainablility

Interrupt resonsiveness is a non-negotiable. In many cases I must service an interrupt in 64 cycles or less. Code size is always a challenge in embedded systems. Arguably I'm trying to shoehorn too much into one application, but it is what it is.

The alternative approaches are:

1) Hook/handler (current approach)
2) Polling-based servicing of interrupt conditions
3) Monolithic ISRs
4) Rewrite of ISR and handlers in assembler
5) Some combination of 1), 2), 3), and 4)

While some parts of the application can best be served by polling, most cannot. At any given time there are as many as 6 different internal and external interrupt sources, that need not only to be serviced, but also co-ordinated. The application currently has 9 separate modules needing interrupt support, each of which requires the co-ordination of most or all of the interrupt sources, but in different configurations. It would be very difficult to write polling routines that could handle all of the possible combinations.

Writing a single massive ISR (for each of the 6 sources) that could service all 9 configurations would be difficult. Adding to the complexity of such an approach is the requirement in most configurations that most of the 6 ISRs need to direct and modify the flow of control through the other 5, throughout a changing runtime environment. It is possible that such a monolithic ISR could be made responsive for some of the configurations, but not all of them. Code size and maintainability would not benefit from such an approach either. As it is, over 600 lines of the 2000+ lines in the application are for the ISRs and handlers.

There is another related approach possible, whereby lightweight ISRs communicate the needed flow control changes to the main code, and that code performs the actions necessary to make those changes, but that approach cannot address the extremely tight timing considerations (64 cycles!). Furthermore, since the main code must operate with interrupts enabled, it has proven difficult to guarrantee that those changes happen in a timely and atomic fashion.

Assembler is always an option. Indeed, I used to have quite a facility with it (30 years ago). The tradeoffs between assembler and a higher-level language are clarity, maintainability, ease of development, interroperability with other code, v.s. speed and code size. Most of the handlers are required to do some kind of math (some modules are for statistical analysis), and then store the result via volatiles for use by the main code. I don't relish the idea of re-inventing the wheel. However, I have already used assembler to tighten up some of the simpler ISR/handler configurations. Converting the rest of them to assembler would be a much larger undertaking, probably one that this application will never see.

No single approach meets all of the challenges, so I have opted for option 5). But I could still benefit from the ability to tune the compiler's behaviour for the code that is best suited to the hook/handler approach, which is most of it. If that means re-compiling the toolchain, then I lose on maintainability. In this case I'd say my original approach of hand-crafted prologues and epilogues, along with detailed inline documentation, is the winner. Yes, it makes me groan. You're allowed to groan too. But I must admit I was hoping for more than just "You suck. Start over."

If there are no suggestions (apart from a re-build of the toolchain, a rewrite in assembler, or a career change) as to how to coax a just-in-time push, on a register-by-register basis, out of the AVR GCC toolchain, perhaps someone can suggest a better way to determine register usage by function. As I am relatively new to the challenges of embedded AVR GCC, I have been using avr-objdump -S on the .elf, washing the output through some filters (some automated, some by hand) to determine which registers get modified or are dependencies (like r1 == 0), and then hand-crafting prologues and epilogues to acheive the just-in-time-push (Groan, please, if you must). I expect there is a much simpler, complete, and less error-prone method to determine register usage, but I simply haven't had to do this before (why would any sane person?) and so I don't know how.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Is it possible to just store the events to be done in the ISRs and handle their actions in the main loop?

Or just use 5 ISRs in assembler that just set a flag and then redirect to one big ISR that tests for what IRQ source was responsible for the event. But then you still need to avoid the indirect call.

For the indirect call there are just a finite number of possible targets. Thus, instead of an indirect call you can consider if-else instead which redirect the call directly; so that the calls can be inlined.

In one of my applications I use the following approach to make ISR pro/epilogue faster:

The application is a game. main computes new frames in the main loop and supplies them in a buffer. An ISR displays a frame. Some IRQs have a high computation load and would need more tham 100% of the ticks until the next IRQ. Because the game is hard real time (electrons in a cathode ray tube won't wait) the ISR is an infinite loop like so:

ISR (foo)
{
    disable foo;
    while (1)
    {
         // ~5000 lines of ISR code
         if (ticks_till_next_foo > 40)
             // enough time for a pro + epilogue
             break;

         // time is short, safe one IRQ frame and
         // jump to top of foo
    } 
    enable foo;
}

This design works better than others that I tried. It offends the "ISR as small as possible" rule and even has a while(1) in an ISR, i.e. the code runs in the ISR and only returns to main and computes the next bits of the 3-buffered frame memory if there is enough time left.

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:
Is it possible to just store the events to be done in the ISRs and handle their actions in the main loop?

SprinterSB,

In some cases, for some modules, yes. And where possible I do exactly that. There are however a number of scenarios that have extremely tight timing requirements, and the only thing the main loop can do is monitor 2 flags: 'done' and 'timed-out'. The overhead of ISR<=>main communication would be too high.

Quote:
Or just use 5 ISRs in assembler that just set a flag and then redirect to one big ISR that tests for what IRQ source was responsible for the event. But then you still need to avoid the indirect call.

I'm not sure I follow. I'm using 6 interrupts: INT0, INT1, OCIE1A, TOIE1, WDIE, and RXCIE0. As far as I know, it's not possible to have more than one ISR per interrupt. An interrupt occurs, the PC loads to the appropriate JMP instruction in the interrupt vector table, which leads to the ISR code itself. There is no way to change this at run time. The conventional way to conditionally run code from within an ISR is to use a function pointer to hook the conditional code, or to use simple flow control statements to conditionally execute blocks of code within the ISR. In both cases, only one ISR is involved.

As mentioned above, the main loop can look at flags that indicate the state of completion/failure of the interrupt handler chain for the given run, but it cannot do more. Most of the rest of the flaggable conditions require timely (inline to the ISR) attention. While the ISR for, say, RXCIE0 might invoke 3 out of a dozen or more USART_RX_vect handlers in any given configuration, the fastest way to do that is with a hook. Using a switch statement or other flow control mechanism to select a code block from so many options eats too many cycles, and right at the top of the ISR where it can least afford to lose them. Once the condition that requires switching handlers is met, the hook is re-assigned, and subsequent invocations of the ISR can quickly get to the correct handler.

To put it in perspective, I'm using ICP1 to timestamp an event. An interrupt is triggered at the same time. The next event may be as little as 64 cycles away. The interrupt response time is 7-10 cycles, and if the prologue takes 36 cycles (as it does when pushing 18 registers), as many as 46 of the 64 available cycles have passed before I even have a chance to store the value of the capture sitting in ICR1. If the ISR is deferred due to another interrupt (like TIMER1_OVF_vect), it is possible that the next event will have come and gone, overwriting ICR1. The initial event that triggered the ISR is lost. In the case of my ISR for TIMER1_OVF_vect, it doesn't ever need to call a hook, it is a fully self-contained and very tight ISR, hand-optimized with assembler. It takes 28 cycles to run. Add the interrupt overhead of 7-10 cycles, and that's 38 cycles. If it fires 1 cycle before an external event, that adds 37 cycles of latency to the ISR that services the external event. 46+37 = 83 cycles. 83 is bigger than 64. 19 cycles too late. In fact, if TOIE1 fires anywhere within 1-20 cycles before an event, the event will be lost. I can improve this situation by reducing the size of the prologue. With just-in-time-push, I can reduce from 36 to 7 cycles, an improvement of 29 cycles, eliminating the possibility of missing an event. Currently I have to do this by hand, and maintainablility is a serious problem.

Quote:
For the indirect call there are just a finite number of possible targets. Thus, instead of an indirect call you can consider if-else instead which redirect the call directly; so that the calls can be inlined.

Again, too time consuming. Add to that the fact that the compiler will push EVERY register used in this monolithic ISR right at the start, and I have no hope of executing the really time-sensitive code in time to latch a captured event.

Just to give you an idea, one of the modules needs to capture, timestamp, and tabulate between 0.25 and 1.25 million events in the span of about 5 seconds. Some of those events are as little as 64 cycles (4 uS) apart, and the captures need to be cycle-accurate, with between 16- and 32-bit accuracy, depending on the event.

Quote:
In one of my applications I use the following approach to make ISR pro/epilogue faster:

The application is a game. main computes new frames in the main loop and supplies them in a buffer. An ISR displays a frame. Some IRQs have a high computation load and would need more tham 100% of the ticks until the next IRQ. Because the game is hard real time (electrons in a cathode ray tube won't wait) the ISR is an infinite loop like so:


This is the same approach used by the excellent library TVout by Myles Metzer, and was indeed the inspiration for my use of handler chains. In TVout, the HSync is generated by a timer configured for PWM, and an associated interrupt triggers an ISR that hooks a handler to output the appropriate line. There are several handlers, each of which does what it does, and then decides whether it's time to switch handlers for the next invocation of the ISR. There's a line output handler, a blanking handler, and a vertical sync handler (which tickles the PWM config of the timer), each of which hands off control to the next by re-assigning the hook that called it. Furthermore, there are several versions of the line output hander depending on F_CPU and the chosen screen resolution. In addition, there are two user-definable handlers separately hooked by the ISR. These allow the user to interleave their own functions, one synchronized with the horizontal refresh frequency, and one with the vertical refresh frequency.

I was encouraged that this approach would work for my application by the fact that the horizontal refresh time is 63.55 uS. My needs are as tight as 64 uS. I note that the TVout library makes possible such thigs as Conway's Game of Life, Tetris, Pacman, and other feats of strength for a wee 8-bit AVR, all the while leaving as much as 39% CPU available for the main line code. In my case, the margins are more like 1-10%, but it is doable, and it is working. It is somewhat more complex with 6 interacting ISRs and a couple of dozen handlers in total, but it's a fantastically flexible way to meet the needs of a highly complex 'hard real time' environment with a tiny piece of silicon.

Quote:
This design works better than others that I tried. It offends the "ISR as small as possible" rule and even has a while(1) in an ISR, i.e. the code runs in the ISR and only returns to main and computes the next bits of the 3-buffered frame memory if there is enough time left.

It's the opposite for me. The interrupt/ISR/handler chains gather, tabulate, and generate data, while the main code performs additional offline and interleaved analysis and reporting, but the jist is the same. The user initiates a function, the main code sets into motion a handler chain, the chain terminates (normally or by timing out), and then mainline code reports back to the user. For some modules, this is a one-shot operation, i.e. gather, analyse, report, wait for next user input. For others, it's an ongoing process, anywhere between 10 - 900 iterations per second.

For the record, the application is working already. Quite well, in fact. What I'm looking to do is fine-tune it with respect to the performance, functionality, and maintainabiliy. The improvement in performance I'm looking for is for extreme edge cases where just a few extra cycles can make the difference. There will be no returning to the drawing board.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Maybe a tad faster processor would help :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jayjay1974 wrote:
Maybe a tad faster processor would help :)

And if my grandmother had wheels she'd be a wagon.

I wish I drove a T-Zero. But I don't. I drive a Ford Focus.

The hardware isn't going to change. It doesn't need to. As I've said, the application is working quite well. I am at the stage of tweaking. I would rather tweak with a compiler directive than by circumventing the compiler with __naked__ and hand-assembled code, but tweaking is all that's left.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Rather than trying tricky compiler tweaks, maybe you should post a profile of flash memory usage, and the compile flags + compiler version you're using.

There may be a few simple places for you to save memory in non-time critical routines. But it's hard to know for sure without more information about your specific application.

- S

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Unless you have made a typo, you say that your interrupts can occur as often as 64us. And that you have 64 cycles to service them.

It seems rather foolish to constrain yourself to 1MHz. A regular Mega can be clocked at 20MHz. An Xmega at 32MHz. There are other families that can be clocked far faster, and / or have faster interrupt response.

If you really do have ISR()s that must take < 64 cycles, they will be fairly simple. e.g. incrementing counters, setting flags, filling a ring buffer.
So you will in fact know whether functions are called and inline them.

Note that you can easily service a second interrupt. Simply test the flag before you exit the ISR(CAPT_vect).

You quite often have a worst case where two edges occur in quick succession, followed by a longer interval. Three buses arriving at once is less likely in real life.
So you might have an INT0_vect that occurs very frequently and a USART_RXC_vect that can wait for a short while.

Note that polling for other interrupt flags inside an ISR() is very efficient (typically 3 cycles). Exiting the ISR() and starting a new ISR() adds a whole new epilogue and prologue.

David.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I see no statement of the MCU clock frequency... but events can come in at a rate of 64 cycles, not microseconds.

If you want the ultimate in speed for an ISR, code that particular handler in assembler and locate it right in the vector table. If adjacent vectors are unused you can use the free space. Saves a few cycles.

And I see no problem in handcrafted assembler code. These situations is one of the few where it's still useful.

I wonder what the application exactly is.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

david.prentice wrote:
Unless you have made a typo...

David,

I did indeed make a typo, my apologies. My previous post should have read:

"...the horizontal refresh time is 63.55 uS. My needs are as tight as 64 cycles, or 4 uS."

Quote:
It seems rather foolish to constrain yourself to 1MHz.

Typo. My fault. I am developing on a 16 MHz ATmega328P. Currently, the test bed is a plain old Arduino Uno Rev 3, but I will likely be deploying on an Arduino Pro Mini 5V/16MHz. I'm aware that there are other chips and other families. The choice was made partly because of availability, and partly because it is a nearly complete solution, neatly packaged and affordable, and because my final production size will likely be measured in the single digits.

Quote:
If you really do have ISR()s that must take < 64 cycles, they will be fairly simple. e.g. incrementing counters, setting flags, filling a ring buffer. So you will in fact know whether functions are called and inline them.

Some of them are, and I do exacly that. Some of them aren't.

Quote:
Note that you can easily service a second interrupt. Simply test the flag before you exit the ISR(CAPT_vect). You quite often have a worst case where two edges occur in quick succession, followed by a longer interval. Three buses arriving at once is less likely in real life. So you might have an INT0_vect that occurs very frequently and a USART_RXC_vect that can wait for a short while.

The bigger issue is losing a captured event to overwrite because of deferral due to a competing interrupt. I cannot test for that. Checking ICF1 simply tells me that one or more captures have occurred since the flag was cleared.

INT0 and INT1 always occur alternate to each other, but each is coincident with a capture on ICP1, sometimes as fast as every 64 cycles. RXCIE0 happens much less frequently, no more than every 704 cycles. However, the nature of the application has them firing all within 128 cycles of each other, at least in the edge cases, which is what I'm trying to address. With three prologues and two epilogues before the third interrupt actually gets down to business, that's a lot of cycles pushing/pulling registers that never get used. Throw in an unlucky TIMER1_OVF_vect in the mix, and that all but guarrantees that I drop an event.

Quote:
Note that polling for other interrupt flags inside an ISR() is very efficient (typically 3 cycles). Exiting the ISR() and starting a new ISR() adds a whole new epilogue and prologue.

I have used this approach before in other applications. As I mentioned in a previous post, I use polling where it is appropriate in this application. For these edge cases, the difficulty is the interaction between 2, 3, or 4 competing interrupts. While I can check for INTF1 from within INT0_vect and then do the work of a different ISR, using this approach for the ISRs of several competing and interacting interrupts will rapidly increase ISR code size and overall code complexity. That's the opposite of what I am trying to do.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jayjay1974 wrote:
I see no statement of the MCU clock frequency... but events can come in at a rate of 64 cycles, not microseconds.

Yup, my bad. Typo. See previous post.

Quote:
If you want the ultimate in speed for an ISR, code that particular handler in assembler and locate it right in the vector table. If adjacent vectors are unused you can use the free space. Saves a few cycles.

Interesting idea! I may be able to use it some day. For this case, I am using vectors 1 & 2 (INT0 & INT1), 7 (WDT), 12 & 14 (TIMER1_COMPA & TIMER1_OVF), and 19 (USART_RX). Even the tightest of my ISRs, hand assembled, is 10 words. Too big to shoehorn into the vector table. The USART_RX_vect ISR is of necessity rather larger.

Quote:
And I see no problem in handcrafted assembler code. These situations is one of the few where it's still useful.

I agree. It has been some time since I've written any sizeable chunk of code in assembler. I have trimmed a few ISRs in this application with assembler. However, those are simple, stand-alone service routines that have little or no C in them.

The problem, as identified in my original post, and then again by a number of helpful respondents, is the sanity of an approach that relies on turning off the compiler-generated prologue/epilogue mechanism, examining the assmbled output, and then jamming in a custom prologue/epilogue. In an optimized-compiler environment, the smallest change to the code might result is wildly different register usage in one or more of these teetering fusions of optimized C and __asm__, requiring a complete rewrite of the custom prologue/epilogues for each. So too with a version change in the compiler or other part of the toolchain. I have tried to use assembler macros to build in a bit of modularity to these custom pro/epilogues, and that has made things easier. However, my holy grail is still a 'just-in-time-push' behaviour from the compiler itself. No custom call-clobbered list, no new and incompatible ABI. Just smarter use of the stack. Then I can do away with the insanity altogether.

Quote:
I wonder what the application exactly is.

It's a diagnostic and troubleshooting aid, to provide live monitoring and statistical analysis of DMX. DMX is a lighting control standard in the entertainment and architectural industries.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

mnehpets wrote:
Rather than trying tricky compiler tweaks, maybe you should post a profile of flash memory usage, and the compile flags + compiler version you're using.

Currently building under Arduino 1.0 IDE under Ubuntu 10.04LTS with stock avr-gcc/avr-g++ 4.3.4 and avr-libc 1.6.7. Default hardwired compiler options include -c -g -Os -fno-exceptions -ffunction-sections -fdata-sections. The arduino build process is well documented (start with http://arduino.cc/en/Hacking/Bui...) and generally geared towards code size.

At the moment I'm building in the Arduino IDE with the thought that I might eventually release the source, allowing anyone to download, build, and deploy their own widget using free tools and inexpensive, readily available hardware, with a minimum of required technical expertise. That's the target audience. I'll admit, custom pro/epilogues and __naked__ are a bit of a hurdle...

Quote:
There may be a few simple places for you to save memory in non-time critical routines. But it's hard to know for sure without more information about your specific application.

A fair chunk of FLASH is consumed by string literals, about 2.5K at the moment. I have reduced this a great deal already (used to be closer to 5) by using more concise messages, and re-using as many strings as I could. The next step is to develop my own string storage management system, perhaps with the use of tokens for often-used words and formatting string fragments. I'm also considering using EEPROM to store some of the string literals. However, the code for that would have to be substantially smaller than the savings, and of course readability and maintainability would suffer. Furthermore, the SRAM overhead involved in assembling strings from FLASH and EEPROM might become a problem. I am tight on SRAM already. Out of the available 2048 bytes, 1536 are statically defined buffers, another 198 bytes in global and global volatiles, and several more bytes scattered here and there. Total reported usage is currently 1764 bytes, leaving less than 300 bytes for stack. I'm already re-using the buffer space AND almost all the global and global/volatiles across all the modules, and have got it down to about as small an SRAM footprint as I can. I might be able to shave as many as 48 more bytes, but at the expense of more code costs in FLASH. These are the typical tradeoffs in embedded programming.

Some FLASH could be saved by migrating away from the Arduino environment altogether, but not much. I've already written my own USART routines and have avoided the use of any other Arduino specific library calls. Just about the only thing I could leave out of the binary is a bit of superfluous code for timer configuration and the 'void setup() / void loop()' construct. Less than 100 bytes. I could also do away with the bootloader (512 bytes) and use a serial programmer. Both approaches would take it out of the realm of the non-(or only-slightly-)technical target audience.

Most of the rest of the FLASH is consumed by the hefty floating point and string functions. Notably, the 64-bit integer routines (yes, I said 64-bit) are rather massive. Yes, I could write my own functions. As mentioned in a separate thread, I do not relish the thought of an adventure in re-inventing the wheel. Smells quite a bit more labour intensive than tweaking in assembler.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you want to reuse strings, you can use avr-gcc 4.7 which supports string merging with progmem strings, see PR43746.

In case the Arduino (or any other) distribution comes with precompiled files (.o, .a) notice that different avr-gcc might implement different ABIs. For example, the Atmel Tools use a different ABi which might cause problems if an Arduino user uses a distribution not ABI-compliant with his tool chain.

avrfreaks does not support Opera. Profile inactive.

Last Edited: Sun. Jul 22, 2012 - 10:32 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:
If you want to reuse strings, you can use avr-gcc 4.7 which supports string merging with progmem strings, see PR43746.

Intriguing. I will definitely look into it. Could save a fair chunk of flash without losing on readability and maintainability. Thanks for the tip!

Quote:
In case the Arduino (or any other) distribution comes with precompiled files (.o, .a) notice that different avr-gcc might implement different ABIs. For example, the Atmel Tools use a different ABi which might cause problems if an Arduino user uses a distribution not ABI-compliant with his tool chain.

The Arduino IDE builds everything from source: core libraries, project libraries, and user code.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

From what I read, GCCs int64 is not that efficient.

You might like http://www.mshopf.de/proj/avr/uint64_ops.html depending on what operations you require.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
However, my holy grail is still a 'just-in-time-push' behaviour from the compiler itself. No custom call-clobbered list, no new and incompatible ABI. Just smarter use of the stack.
From what you write it's not an ABI problem but just a missed optimization. Can you show a code snip that compiles?

Quote:
Then I can do away with the insanity altogether.
Yep. You are right with "insanity". Above you wrote "sanity" ;-)

In one application there is timer 1 overflow and input capture for time measurement. There is no upper bound for the edge which is cought by the overflow.

Because it's time critical, it uses an unoccupied IRQ vector (SPM) to save bit of resources. Snip:

#if defined (__AVR_ATmega8__)
ISR (TIMER1_CAPT_vect)
#elif defined (__AVR_ATmega168__)
void __attribute__((naked))
TIMER1_CAPT_vect (void)
{
	sbi (GPIOR0, 0);
	rjmp (SPM_READY_vect);
}

ISR (SPM_READY_vect)
#endif
{
	hell.icr1 = ICR1;

	uint8_t state = HS_MESS_TIMEOUT;
	
	// if (TIFR & (1 << OCF1A))
#if defined (__AVR_ATmega8__)
	if (ACSR & (1 << ACIC))
#elif defined (__AVR_ATmega168__)
	if (GPIOR0 & (1 << 0))
#endif
		state = HS_MESS_OK;
	
	hell.state = state;
	
	SET (PORTC_AIN1); 
	MAKE_OUT (PORTC_AIN1); 
    
	TIMSK &= ~TIMSK_T1;
}


// Bit ACIS in ACSR merkt, ob ein Overflow (OC1A) auftrat,
// denn beim Betreten der ISR wird das OCF1A gelöscht.
void __attribute__((naked))
TIMER1_COMPA_vect (void)
{
#if defined (__AVR_ATmega8__)
	cbi (ACSR, ACIC);
	rjmp (TIMER1_CAPT_vect);
#elif defined (__AVR_ATmega168__)
	cbi (GPIOR0, 0);
	rjmp (SPM_READY_vect);
#endif
}

As far as I understand your concern is not speed or code size but IRQ respond time. To reduce it, you could re-enable IRQs in the RX handler or use attribute signal instead of interrupt.

jayjay1974 wrote:
From what I read, GCCs int64 is not that efficient.
That's no more true these days. There is still room for improvement for the 64-bit arithmetic in 4.7+ but it's improved compared to older versions of the compiler.

Just try and compile some code snippets.

Quote:
You might like http://www.mshopf.de/proj/avr/uint64_ops.html depending on what operations you require.
This is obsolete with 4.7 up.

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Declare the handlers to be interrupt handlers.
That way they will save the registers they need to.
To use them from C:

#define invoke(handler)  asm ("  CALL " #handler " $ cli")

ISR(foo)
{
if(fred) invoke(fred_handler_vect);
if(hank) invoke(hank_handler_vect);
if(greg) invoke(greg_handler_vect);
}

From assembly:

  .global foo_vect
  SBIC fred
  JMP fred_handler_vect
  SBIC hank
  JMP hank_handler_vect
  SBIC greg
  JMP greg_handler_vect
  RETI

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jayjay1974 wrote:
From what I read, GCCs int64 is not that efficient.

You might like http://www.mshopf.de/proj/avr/uint64_ops.html depending on what operations you require.


jayjay1974,

This looked really promising.

There are 8 places in my code where I must multiply either two nearly full-scale uint32_t variables together to get a uint64_t result, or a uint64_t with a uint32_t. With AVR GCC and Libc, this means type promoting the operand(s) to uint64_t and then performing the multiply.

I used the low-footprint 64-bit routines in place of those 8 operations, and recompiled. Total savings: 12 bytes. I tried again with #define USE_C 1 and the savings became a loss of 450 bytes.

I have done some simple performance benchmarking and found that the one loop that performs 512 64-bit multiply operations takes about 258,000 cycles with the normal routines, and 183,000 with the low-footprint routines. Not including loop overhead, that's ~506 cycles/multiply v.s. ~358 cycles/multiply, an excellent improvement.

However, the handlers that perform 64-bit math are not those that are suffering from the heavy pro/epilogue delays, and the loop in question is executed entirely offline at the end of a capture pass.

I may nevertheless include these low-footprint routines in my final build.

Many thanks.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Tue. Jul 24, 2012 - 05:54 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:
From what you write it's not an ABI problem but just a missed optimization. Can you show a code snip that compiles?

Never said it was an ABI problem. Have always said it was an inefficiency in the compliler-generated prologue/epilogue.

After reviewing the reply by sternst (https://www.avrfreaks.net/index.php?name=PNphpBB2&file=viewtopic&p=977221&highlight=#977221), I whipped up some test code. Here is a relevant snip:

void empty_handler()
{
  __asm__ __volatile__ ("");
}

void (*int0_vect_hook)() = empty_handler;

ISR(INT0_vect)
{

  int0_vect_hook();
  
}

And the output from avr-objdump -S:

000000b8 <__vector_1>:
ISR(INT0_vect)
{
  b8:	1f 92       	push	r1
  ba:	0f 92       	push	r0
  bc:	0f b6       	in	r0, 0x3f	; 63
  be:	0f 92       	push	r0
  c0:	11 24       	eor	r1, r1
  c2:	2f 93       	push	r18
  c4:	3f 93       	push	r19
  c6:	4f 93       	push	r20
  c8:	5f 93       	push	r21
  ca:	6f 93       	push	r22
  cc:	7f 93       	push	r23
  ce:	8f 93       	push	r24
  d0:	9f 93       	push	r25
  d2:	af 93       	push	r26
  d4:	bf 93       	push	r27
  d6:	ef 93       	push	r30
  d8:	ff 93       	push	r31

  int0_vect_hook();
  da:	e0 91 00 01 	lds	r30, 0x0100
  de:	f0 91 01 01 	lds	r31, 0x0101
  e2:	09 95       	icall
  
}
  e4:	ff 91       	pop	r31
  e6:	ef 91       	pop	r30
  e8:	bf 91       	pop	r27
  ea:	af 91       	pop	r26
  ec:	9f 91       	pop	r25
  ee:	8f 91       	pop	r24
  f0:	7f 91       	pop	r23
  f2:	6f 91       	pop	r22
  f4:	5f 91       	pop	r21
  f6:	4f 91       	pop	r20
  f8:	3f 91       	pop	r19
  fa:	2f 91       	pop	r18
  fc:	0f 90       	pop	r0
  fe:	0f be       	out	0x3f, r0	; 63
 100:	0f 90       	pop	r0
 102:	1f 90       	pop	r1
 104:	18 95       	reti

It's clear, as sternst pointed out, that the compiler is generating a prologue that pushes all registers in the call-clobbered list, even if the handler that gets hooked doesn't use them. This is not the case if I call empty_handler() directly, without the function pointer:

000000b8 <__vector_1>:

ISR(INT0_vect)
{
  b8:	1f 92       	push	r1
  ba:	0f 92       	push	r0
  bc:	0f b6       	in	r0, 0x3f	; 63
  be:	0f 92       	push	r0
  c0:	11 24       	eor	r1, r1

  empty_handler();
  
}
  c2:	0f 90       	pop	r0
  c4:	0f be       	out	0x3f, r0	; 63
  c6:	0f 90       	pop	r0
  c8:	1f 90       	pop	r1
  ca:	18 95       	reti

I note that no matter what, __temp_reg__ (r0) and __zero_reg__ (r1) seem always to get pushed/pulled. I suppose that's not unreasonable, but it's not optimal either. The compiler should know when it is necessary, and skip it when it is not.

Quote:
Above you wrote "sanity" ;-)

Yep. I wrote 'sanity' because that is what many helpful respondents and I were calling into question. ;-)

Quote:
In one application there is timer 1 overflow and input capture for time measurement. There is no upper bound for the edge which is cought by the overflow.

Because it's time critical, it uses an unoccupied IRQ vector (SPM) to save bit of resources. Snip:


It's an interesting construct, but I don't see how it saves you any cycles. Bytes, yes.

Quote:
As far as I understand your concern is not speed or code size but IRQ respond time. To reduce it, you could re-enable IRQs in the RX handler or use attribute signal instead of interrupt.

Nested interrupts are even messier.

Using SIGNAL instead of interrupt would allow another interrupt to break in even during the prologue. More importanly, before the captured event gets latched. This could lead to a reversal of latching order. Then I need to code detection of this condition.

If I stick with ISR and call sei() at the top, then the latch AND any potential nested interrupt would still be stuck behind an unnecessarily long prologue.

In any case, it wouldn't solve my problem because each handler for each ISR must make runtime decisions about which interrupts to enable/disable, and which handlers to hook.

Remember that my application is working. I am trying to address edge cases where three events happen within 64 to 128 cycles. As edge cases, I could include code to detect them, but that adds to code size and response time. Cycles are short. It is best to handle the edge cases like any other case. Currently I support these edge cases by defeating the compiler pro/epilogues in a few ISRs with __naked__ and adding my own custom pro/epi, but this is only necessary for those edge cases.

Quote:
This is obsolete with 4.7 up.

Good to know. I'll likely try 4.7 anyway to benefit from string merging with PROGMEM string literals, as you pointed out in an earlier reply.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I wonder if there are compilers that make the callee responsible for saving and restoring clobbered registers, instead of the caller. Here it seems mixed.

If you could set an attribute of a function to save and restore all clobbered registers, the problem would be gone.

I really wonder why the compiler pushes all registers while the called function is usually responsible for save/restoring clobbered registers. Except for those that are allowed to be clobbered (that's where that special attribute would come in). The only downside would be if that function is called from regular code, the list of register to save/restore is unnecessarily long.

Interesting stuff.

I guess this is one of those situation where having boatloads of working registers is a disadvantage. On a 6502 there are only 3 or 4 registers to save :)

Or multiple register banks a la 8051 or Z80.

Just thinking (gibberishly) aloud here :)

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
Declare the handlers to be interrupt handlers.
That way they will save the registers they need to.

skeeve,

Intruiging. I assume you mean that I should employ unused vectors for the handlers?

I have 18 unused vectors (26 minus my 6, minus RESET, minus TIMER0_OVF used by Arduino core). I am currently employing 38 handlers. Those for TIMER1_COMPA don't really need a small prologue, so that leaves me with 27. I have already combined as many as I can without negative returns in either code size, speed, or interrupt responsiveness. This would leave at least 9 handlers without a home.

I could merge your technique with the use of function pointers:

void empty_handler() __attribute__ ((__signal__));
void empty_handler()
{
  __asm__ __volatile__ ("");
}

void (*int0_vect_hook)() = empty_handler;

#define invoke_int0_vect_hook_handler __asm__ ("ICALL\n" : : "z" (int0_vect_hook));

ISR(INT0_vect)
{
  __asm__ __volatile__ ("ICALL\n" : : "z" (int0_vect_hook));
}

This happily leads to:

000000aa <_Z13empty_handlerv>:

void empty_handler() __attribute__ ((__signal__));
void empty_handler()
{
  aa:	1f 92       	push	r1
  ac:	0f 92       	push	r0
  ae:	0f b6       	in	r0, 0x3f	; 63
  b0:	0f 92       	push	r0
  b2:	11 24       	eor	r1, r1
  __asm__ __volatile__ ("");
}
  b4:	0f 90       	pop	r0
  b6:	0f be       	out	0x3f, r0	; 63
  b8:	0f 90       	pop	r0
  ba:	1f 90       	pop	r1
  bc:	18 95       	reti

000000be <__vector_1>:

ISR(INT0_vect)
{
  be:	1f 92       	push	r1
  c0:	0f 92       	push	r0
  c2:	0f b6       	in	r0, 0x3f	; 63
  c4:	0f 92       	push	r0
  c6:	11 24       	eor	r1, r1
  c8:	ef 93       	push	r30
  ca:	ff 93       	push	r31
  __asm__ __volatile__ ("ICALL\n" : : "z" (int0_vect_hook));
  cc:	e0 91 00 01 	lds	r30, 0x0100
  d0:	f0 91 01 01 	lds	r31, 0x0101
  d4:	09 95       	icall
}
  d6:	ff 91       	pop	r31
  d8:	ef 91       	pop	r30
  da:	0f 90       	pop	r0
  dc:	0f be       	out	0x3f, r0	; 63
  de:	0f 90       	pop	r0
  e0:	1f 90       	pop	r1
  e2:	18 95       	reti

Again, it still pushes/pulls r0 and r1 and zeros r1, for both the ISR and the handler, but that's much better than the entire call-used list.

Trying again with a different handler that actually uses some registers, but not from the call-used list:

void call_saved_handler() __attribute__ ((__signal__));
void call_saved_handler()
{
  __asm__ __volatile__ (""
                      : : :
                      "r2", "r3", "r4", "r5", "r6", "r7");
}

void (*int0_vect_hook)() = call_saved_handler;

#define invoke_int0_vect_hook_handler __asm__ __volatile__ ("ICALL\n" : : "z" (int0_vect_hook));

ISR(INT0_vect)
{
  __asm__ __volatile__ ("ICALL\n" : : "z" (int0_vect_hook));
}

Yields:

000000aa <_Z12call_saved_handlerv>:

void call_saved_handler() __attribute__ ((__signal__));
void call_saved_handler()
{
  aa:	1f 92       	push	r1
  ac:	0f 92       	push	r0
  ae:	0f b6       	in	r0, 0x3f	; 63
  b0:	0f 92       	push	r0
  b2:	11 24       	eor	r1, r1
  b4:	2f 92       	push	r2
  b6:	3f 92       	push	r3
  b8:	4f 92       	push	r4
  ba:	5f 92       	push	r5
  bc:	6f 92       	push	r6
  be:	7f 92       	push	r7
  __asm__ __volatile__ (""
                      : : :
                      "r2", "r3", "r4", "r5", "r6", "r7");
}
  c0:	7f 90       	pop	r7
  c2:	6f 90       	pop	r6
  c4:	5f 90       	pop	r5
  c6:	4f 90       	pop	r4
  c8:	3f 90       	pop	r3
  ca:	2f 90       	pop	r2
  cc:	0f 90       	pop	r0
  ce:	0f be       	out	0x3f, r0	; 63
  d0:	0f 90       	pop	r0
  d2:	1f 90       	pop	r1
  d4:	18 95       	reti

000000d6 <__vector_1>:

ISR(INT0_vect)
{
  d6:	1f 92       	push	r1
  d8:	0f 92       	push	r0
  da:	0f b6       	in	r0, 0x3f	; 63
  dc:	0f 92       	push	r0
  de:	11 24       	eor	r1, r1
  e0:	ef 93       	push	r30
  e2:	ff 93       	push	r31
  __asm__ __volatile__ ("ICALL\n" : : "z" (int0_vect_hook));
  e4:	e0 91 00 01 	lds	r30, 0x0100
  e8:	f0 91 01 01 	lds	r31, 0x0101
  ec:	09 95       	icall
}
  ee:	ff 91       	pop	r31
  f0:	ef 91       	pop	r30
  f2:	0f 90       	pop	r0
  f4:	0f be       	out	0x3f, r0	; 63
  f6:	0f 90       	pop	r0
  f8:	1f 90       	pop	r1
  fa:	18 95       	reti

So far, so good. The call-used list is still not pushed...

Now with pure C:

void C_handler() __attribute__ ((__signal__));
void C_handler()
{
  for (uint32_t i=0xFFFFFFFF; i>0; i--)
    __asm__ __volatile__ ("");
}

void (*int0_vect_hook)() = C_handler;

#define invoke_int0_vect_hook_handler __asm__ __volatile__ ("ICALL\n" : : "z" (int0_vect_hook));

ISR(INT0_vect)
{
  __asm__ __volatile__ ("ICALL\n" : : "z" (int0_vect_hook));
}

We get:

000000aa <_Z9C_handlerv>:

void C_handler() __attribute__ ((__signal__));
void C_handler()
  aa:	1f 92       	push	r1
  ac:	0f 92       	push	r0
  ae:	0f b6       	in	r0, 0x3f	; 63
  b0:	0f 92       	push	r0
  b2:	11 24       	eor	r1, r1
  b4:	8f 93       	push	r24
  b6:	9f 93       	push	r25
  b8:	af 93       	push	r26
  ba:	bf 93       	push	r27
  bc:	8f ef       	ldi	r24, 0xFF	; 255
  be:	9f ef       	ldi	r25, 0xFF	; 255
  c0:	af ef       	ldi	r26, 0xFF	; 255
  c2:	bf ef       	ldi	r27, 0xFF	; 255
{
  for (uint32_t i=0xFFFFFFFF; i>0; i--)
  c4:	01 97       	sbiw	r24, 0x01	; 1
  c6:	a1 09       	sbc	r26, r1
  c8:	b1 09       	sbc	r27, r1
  ca:	e1 f7       	brne	.-8      	; 0xc4 <_Z9C_handlerv+0x1a>
    __asm__ __volatile__ ("");
}
  cc:	bf 91       	pop	r27
  ce:	af 91       	pop	r26
  d0:	9f 91       	pop	r25
  d2:	8f 91       	pop	r24
  d4:	0f 90       	pop	r0
  d6:	0f be       	out	0x3f, r0	; 63
  d8:	0f 90       	pop	r0
  da:	1f 90       	pop	r1
  dc:	18 95       	reti

000000de <__vector_1>:

ISR(INT0_vect)
  de:	1f 92       	push	r1
  e0:	0f 92       	push	r0
  e2:	0f b6       	in	r0, 0x3f	; 63
  e4:	0f 92       	push	r0
  e6:	11 24       	eor	r1, r1
  e8:	ef 93       	push	r30
  ea:	ff 93       	push	r31
{
  __asm__ __volatile__ ("ICALL\n" : : "z" (int0_vect_hook));
  ec:	e0 91 00 01 	lds	r30, 0x0100
  f0:	f0 91 01 01 	lds	r31, 0x0101
  f4:	09 95       	icall
}
  f6:	ff 91       	pop	r31
  f8:	ef 91       	pop	r30
  fa:	0f 90       	pop	r0
  fc:	0f be       	out	0x3f, r0	; 63
  fe:	0f 90       	pop	r0
 100:	1f 90       	pop	r1
 102:	18 95       	reti

Great! Only the portion of the call-used list actually used by the handler is pushed.

The one catch is that functions declared with __signal__ return with 'reti' instead of 'ret', so interrupts will be enabled before the calling ISR has returned. This will be a problem for multiple deferred interrupts, which are common for my application, not just an edge case. Even with this caveat, and the unnecessary __temp_reg__ and __zero_reg__ manipulation, this may be a viable approach if no simpler method is to be found. I still think it should be possible to patch the toolchain to implement some kind of just-in-time-push arrangement. This will necessarily increase code size, but if the scope can be controlled, code size can be kept under control.

Thank you, skeeve.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Tue. Jul 24, 2012 - 04:04 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jayjay1974 wrote:
I wonder if there are compilers that make the callee responsible for saving and restoring clobbered registers, instead of the caller. Here it seems mixed.

The mix is moderated by the call-used and call-saved lists. These lists appear to be static, which is what bugs me.

Quote:
If you could set an attribute of a function to save and restore all clobbered registers, the problem would be gone.

Precisely. I would also like an ISR attribute that does the opposite. Or at least one that tells the compiler not to assume the whole call-used list should be saved under circumstances where it normally would (like calling a function from within an ISR).

Quote:
I really wonder why the compiler pushes all registers while the called function is usually responsible for save/restoring clobbered registers. Except for those that are allowed to be clobbered (that's where that special attribute would come in). The only downside would be if that function is called from regular code, the list of register to save/restore is unnecessarily long.

The compiler (or rather it's designers) are playing it safe, which I don't mind. I approve. Under almost any circumstance it's the Right Thing To Do(TM). I get burned because I want to write in C, use function hooks, and I have extremely tight timing requirements. Judging from the reponses I've gotten, this is A) not common, and B) generally a bad idea. I don't believe that A) follows from B), nor vice versa. I think both follow from the fact that there is no compiler support for this kind of approach, and I think that is because there are other ways to do it that more skilled assembly programmers than I have a facility with. I just want another option. I'll probably have to get my hands quite dirty to get it. I've never patched a compiler before, and a project to add dynamic call-used/call-saved list management seems a bit daunting...

Quote:
I guess this is one of those situation where having boatloads of working registers is a disadvantage. On a 6502 there are only 3 or 4 registers to save :)

The MC6809 had 2 8-bit accumulators. Of course, it only had 3 interrupt sources, and you couldn't control what got pushed. IRQ was an external interrupt that would push the whole state (a grand total of 12 bytes to push to the stack, including the PC, the condition codes, 3 16-bit index registers, and the page register), while FIRQ would only push PC and the condition codes. NMI was like IRQ but could not be disabled. And you could generate 3 separate software interrupts, 2 of which would leave hardware interrupts enabled.

Those were the days....

...the days of no built-in USART, no TWI, no built-in general purpose I/O pins, < 2MHz system clocks, no built-in timers, no SPI, no ADC, no PWM, no FLASH or EEPROM.... and no built-in SRAM...!

Ahhh....;-)

jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
The one catch is that functions declared with __signal__ return with 'reti' instead of 'ret', so interrupts will be enabled before the calling ISR has returned. This will be a problem for multiple deferred interrupts, which are common for my application, not just an edge case.

I knew I had previously read something in the data sheet that would be pertinent. From page 15 (8271D–AVR–05/11 http://www.atmel.com/Images/doc8271.pdf):
    7.7 Reset and Interrupt Handling
      ... When the AVR exits from an interrupt, it will always return to the main program and execute one more instruction before any pending interrupt is served.
      ...
      When using the CLI instruction to disable interrupts, the interrupts will be immediately disabled. No interrupt will be executed after the CLI instruction, even if it occurs simultaneously with the CLI instruction.
      ...
A tiny tweak:

void C_handler() __attribute__ ((__signal__));
void C_handler()
{
  for (uint32_t i=0xFFFFFFFF; i>0; i--)
    __asm__ __volatile__ ("");
}

void (*int0_vect_hook)() = C_handler;

#define invoke_int0_vect_hook_handler __asm__ __volatile__ ("ICALL\n" \
                                                            "CLI\n" \
                                                          : \
                                                          : "z" (int0_vect_hook))

ISR(INT0_vect)
{
  invoke_int0_vect_hook_handler;
}

Now gives us a tiny but important change:

000000de <__vector_1>:

ISR(INT0_vect)
  de:	1f 92       	push	r1
  e0:	0f 92       	push	r0
  e2:	0f b6       	in	r0, 0x3f	; 63
  e4:	0f 92       	push	r0
  e6:	11 24       	eor	r1, r1
  e8:	ef 93       	push	r30
  ea:	ff 93       	push	r31
{
  invoke_int0_vect_hook_handler;
  ec:	e0 91 00 01 	lds	r30, 0x0100
  f0:	f0 91 01 01 	lds	r31, 0x0101
  f4:	09 95       	icall
  f6:	f8 94       	cli
}
  f8:	ff 91       	pop	r31
  fa:	ef 91       	pop	r30
  fc:	0f 90       	pop	r0
  fe:	0f be       	out	0x3f, r0	; 63
 100:	0f 90       	pop	r0
 102:	1f 90       	pop	r1
 104:	18 95       	reti

Now even when a handler declared with __attribute__ ((__signal__)) returns with RETI instead of RET, the ISR should continue through it's epilogue uninterrupted, even in the face of one or more pending interrupts. I don't know for certain if this is true. The data sheet isn't explicit about the behaviour with pending interrupts. I will contrive a test and report back.

Also, I noticed in my previous post (quoted above) that I included #define invoke_int0_vect_hook_handler but neglected to reference it in the ISR, instead using the discrete code. Sorry for the confusing typo.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
Now even when a handler declared with __attribute__ ((__signal__)) returns with RETI instead of RET, the ISR should continue through it's epilogue uninterrupted, even in the face of one or more pending interrupts. I don't know for certain if this is true. The data sheet isn't explicit about the behaviour with pending interrupts. I will contrive a test and report back.

Here is some test code. Please excuse the Arduino sketch construct:

uint32_t static volatile timer1_count;
uint32_t static volatile timer2_count;

void handler() __attribute__ ((__signal__));
void handler()
{
  // track number of runs
  timer1_count++;
  // enable TIMER2 OVF, which has a higher priority than TIMER1 OVF
  TIMSK2 = _BV(TOIE2);
  // wait until a TIMER2 interrupt is pending
  while (!(TIFR2 & _BV(TOV2)));
  // wait until another TIMER1 interrupt is pending
  while (!(TIFR1 & _BV(TOV1)));
}

void (*int0_vect_hook)() = handler;

#define invoke_int0_vect_hook_handler __asm__ __volatile__ ("ICALL\n" \
                                                            "cli\n" \
                                                          : \
                                                          : "z" (int0_vect_hook))

ISR(TIMER1_OVF_vect)
{
  invoke_int0_vect_hook_handler;
  // disable both interrupts
  TIMSK1 &= ~_BV(TOIE1);
  TIMSK2 &= ~_BV(TOIE2);
}

ISR(TIMER2_OVF_vect)
{
  // track number of runs
  timer2_count++;
}

void setup()
{
  // set up serial comms
  Serial.begin(460800);
  // reset terminal
  Serial.print("\033c");

  // set TIMER1 normal mode with prescaler to 1024
  // rollover will be F_CPU/65536/1024 = every 4.2 seconds
  TCCR1A = 0;
  TCCR1B = _BV(CS12) | _BV(CS10);
  // reset TIMER1
  TCNT1 = 0;
  // clear pending TIMER1 OVF flag
  TIFR1 = _BV(TOV1);
  // enable TIMER1 OVF
  TIMSK1 |= _BV(TOIE1);

  // set TIMER2 normal mode with prescaler to 1
  // rollover will be F_CPU/256 = 62.5 KHz
  TCCR2A = 0;
  TCCR2B = _BV(CS10);
  // reset TIMER2
  TCNT2 = 0;
  // clear pending TIMER2 OVF flag
  TIFR2 = _BV(TOV2);
  // disable all TIMER2 interrupts
  TIMSK2 = 0;
}

void loop()
{
  // continuously report the number of passes through TIMER1_OVF_vect
  // handler and TIMER2_OVF_vect ISR
  Serial.print("timer1_count=");
  Serial.print(timer1_count);
  Serial.print("  timer2_count=");
  Serial.println(timer2_count);
  // slow it down
  delay(1000);
}

The test is constructed to determine the behaviour suggested by section 7.7 of the datasheet, specifically what happens in the case of A) an interrupt flag becomes set during the execution of code within an ISR or a function declared with __attribute__ ((__signal__)) while interrupts are disabled, then B) the ISR/function terminates with RETI, and C) the next instruction is CLI. There was some question in my mind whether an already-pending interrupt would be serviced before the CLI, or if interrupts would be immediately re-disabled.

I was also curious to see what would happen if a different, higher-priority interrupt were to become pending. Would the same behaviour apply?

It does look like inserting CLI immediately after the ICALL has the desired effect. The output of the test is as follows:

timer1_count=0  timer2_count=0
timer1_count=0  timer2_count=0
timer1_count=0  timer2_count=0
timer1_count=0  timer2_count=0
timer1_count=0  timer2_count=0
  
timer1_count=1  timer2_count=0
timer1_count=1  timer2_count=0
timer1_count=1  timer2_count=0
timer1_count=1  timer2_count=0
timer1_count=1  timer2_count=0
timer1_count=1  timer2_count=0
  

Neither TIMER1_OVF_vect nor TIMER2_OVF_vect ever run again.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi!

joeymorin wrote:
joeymorin wrote:
Now even when a handler declared with __attribute__ ((__signal__)) returns with RETI instead of RET, the ISR should continue through it's epilogue uninterrupted, even in the face of one or more pending interrupts. I don't know for certain if this is true. The data sheet isn't explicit about the behaviour with pending interrupts. I will contrive a test and report back.

The test is constructed to determine the behaviour suggested by section 7.7 of the datasheet, specifically what happens in the case of A) an interrupt flag becomes set during the execution of code within an ISR or a function declared with __attribute__ ((__signal__)) while interrupts are disabled, then B) the ISR/function terminates with RETI, and C) the next instruction is CLI. There was some question in my mind whether an already-pending interrupt would be serviced before the CLI, or if interrupts would be immediately re-disabled.

I was also curious to see what would happen if a different, higher-priority interrupt were to become pending. Would the same behaviour apply?

It does look like inserting CLI immediately after the ICALL has the desired effect.


If interrupts was disabled, then an interrupt will processed after the next instruction after "sei/reti/out SREG, reg" instruction. And I enable interrupts by sei right before reti instruction for reduction of total interrupt latency (the cli in main code will not executed before interrupt processing).

I have myown OS based upon this axiom.

And if you have enough space in the stack, then you don't need to disable interrupts in epilog.

Ilya

Sorry my English.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

501-q wrote:
An interrupt will processed after the next instruction after "sei/reti/out SREG, reg" instruction. And I enable interrupts by sei right before reti instruction for reduction of total interrupt latency (the cli in main code will not executed before interrupt processing).

I have myown OS based upon this axiom.


Hi 501-q,

Interesting assertion. My tests definitely disagree with you. And the datasheet is quite clear. Refer to the excerpt in my previous post above.

I submit that your use of SEI right before the RETI is responsible for allowing the pending interrupt to be processed. From the same section in the datasheet:

    ...

    When using the SEI instruction to enable interrupts, the instruction following SEI will be executed before any pending interrupts, as shown in this example.

      Assembly Code Example
        sei; set Global Interrupt Enable sleep; enter sleep, waiting for interrupt
        ; note: will enter sleep before any pending interrupt(s)
    ...

If I were to do the same:

SEI
RETI
.
.
.
CLI

Then according to the dataheet, SEI sets the I flag, any pending interrupts are deferred in favour of executing the next instruction which is RETI, and then a pending interrupt gets processed, and then CLI is executed, disabling interrupts.

Also, note that my use of CLI is not in the main code, rather in the ISR that calls a function which was declared with __attribute__ ((__signal__)), although that won't affect the relevant behaviour.

It's late, I'll write some test code tomorrow.

Quote:
And if you have enough space in the stack, then you don't need to disable interrupts in epilog.

I don't have a lot of stack space, and that's not my only issue. Nested interrupts could lead to out-of-order latching of captured events. See previous posts.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi!

joeymorin wrote:
501-q wrote:
An interrupt will processed after the next instruction after "sei/reti/out SREG, reg" instruction. And I enable interrupts by sei right before reti instruction for reduction of total interrupt latency (the cli in main code will not executed before interrupt processing).

I have myown OS based upon this axiom.


Hi 501-q,

Interesting assertion. My tests definitely disagree with you.


Why?! Your test proof my assertion. If after 'sei/reti/out SREG,reg' execute 'cli', then interrupt will not processed (till next 'sei/reti').
Quote:
And the datasheet is quite clear.

And you and I read the datasheet differently :-)

Quote:
Refer to the excerpt in my previous post above.

I submit that your use of SEI right before the RETI is responsible for allowing the pending interrupt to be processed. From the same section in the datasheet:

    ...

    When using the SEI instruction to enable interrupts, the instruction following SEI will be executed before any pending interrupts, as shown in this example.

      Assembly Code Example
        sei; set Global Interrupt Enable sleep; enter sleep, waiting for interrupt
        ; note: will enter sleep before any pending interrupt(s)
    ...

If I were to do the same:

SEI
RETI
.
.
.
CLI

Then according to the dataheet, SEI sets the I flag, any pending interrupts are deferred in favour of executing the next instruction which is RETI, and then a pending interrupt gets processed,


Yes, interrupt will processed in this moment.
Quote:

and then CLI is executed, disabling interrupts.

If ISRs ended with 'sei/reti' sequence, then 'CLI' in main code will not executed untill all pending interrupts will be processed.
Quote:
Also, note that my use of CLI is not in the main code,

I do not talk about 'CLI' that is after 'ICALL', but there may be another 'CLI' in main code!
Quote:
rather in the ISR that calls a function which was declared with __attribute__ ((__signal__)), although that won't affect the relevant behaviour.

It's late, I'll write some test code tomorrow.

Quote:
And if you have enough space in the stack, then you don't need to disable interrupts in epilog.

I don't have a lot of stack space, and that's not my only issue.

I reserve 256 bytes for ISR's stack with nested interrupts (I use separate stack for interrupts). It is more than enough.
Quote:
Nested interrupts could lead to out-of-order latching of captured events. See previous posts.

No. At epilog the event already processed.

I regulary use nested interrupts. I enable interrupts after taking an event and putting it into an ring buffer. I process the event after enabling interrupt, but in ISR (when it is necessary). For exclude nested processing I use an flag.

Ilya

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Maybe you just want a small ISR that redirects to the hooks?

#include 
#include 

void __attribute__((signal,used)) __vector_time1_compa_hook1 (void);
void __attribute__((signal,used)) __vector_time1_compa_hook2 (void);

void (*hook)(void) = __vector_time1_compa_hook1;

static void __attribute__((naked,used))
TIMER1_COMPA_vect (void)
{
    asm (" lds r2, %0   $ push r2"
        "$ lds r2, %0+1 $ push r2"
        "$ ret"
        :: "s" (&hook));
}

void __vector_time1_compa_hook1 (void)
{
    hook = __vector_time1_compa_hook2;
}

void __vector_time1_compa_hook2 (void)
{
    hook = __vector_time1_compa_hook1;
}

This assumes R2 is global, i.e. you compile everything with -ffixed-2 and know what you are doing [TM].

The following code holds the hook address in R4/R5 and is some ticks faster:

#define GLOBAL_REG 4

#define STRY2(X) #X
#define STRY(X) STRY2(X)

register void (*hook)(void) asm (STRY (GLOBAL_REG));

static void __attribute__((constructor,used))
init_timer1_compa (void)
{
    hook = __vector_time1_compa_hook1;
}

static void __attribute__((naked,used))
TIMER1_COMPA_vect (void)
{
    asm (" push %0"
        "$ push %0+1"
        "$ ret"
        :: "n" (GLOBAL_REG));
}

And if you are more convenient with assembler for the hook jump pad, e.g. for the first case:

#include 

.macro DEFUN name
.global \name
.func \name
\name:
.endm

.macro ENDF name
.size \name, .-\name
.endfunc
.endm

.text

DEFUN TIMER1_COMPA_vect
    lds r2, hook   $ push r2
    lds r2, hook+1 $ push r2
    ret
ENDF TIMER1_COMPA_vect

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
skeeve wrote:
Declare the handlers to be interrupt handlers.
That way they will save the registers they need to.

skeeve,

Intruiging. I assume you mean that I should employ unused vectors for the handlers?

Not necessarily. With avr-gcc, an ISR need not have an associated interrupt.
It's name does have to end in _vect to avoid a warning.
Quote:
I have 18 unused vectors (26 minus my 6, minus RESET, minus TIMER0_OVF used by Arduino core). I am currently employing 38 handlers. Those for TIMER1_COMPA don't really need a small prologue, so that leaves me with 27. I have already combined as many as I can without negative returns in either code size, speed, or interrupt responsiveness. This would leave at least 9 handlers without a home.

I could merge your technique with the use of function pointers:

How many different values will a function pointer need to have available? 38?
Do you have an IO register available in the SBIC range?
If so, you could do a binary search. 38 choices could require up to five tests.
A dedicated register would also work.
Quote:
The one catch is that functions declared with __signal__ return with 'reti' instead of 'ret', so interrupts will be enabled before the calling ISR has returned. This will be a problem for multiple deferred interrupts, which are common for my application, not just an edge case. Even with this caveat, and the unnecessary __temp_reg__ and __zero_reg__ manipulation, this may be a viable approach if no simpler method is to be found. I still think it should be possible to patch the toolchain to implement some kind of just-in-time-push arrangement. This will necessarily increase code size, but if the scope can be controlled, code size can be kept under control.
That was the reason for the CLI in my macro.
If you are still nervous or just want to save a cycle,
you could post-process the assembly.
IIRC avr-gcc includes enough commentary to allow one to find the right reti's to change.

Iluvatar is the better part of Valar.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

501-q wrote:
Why?! Your test proof my assertion. If after 'sei/reti/out SREG,reg' execute 'cli', then interrupt will not processed (till next 'sei/reti').

Forgive me, perhaps I have misunderstood you:

501-q wrote:
If interrupts was disabled, then an interrupt will processed after the next instruction after "sei/reti/out SREG, reg" instruction. And I enable interrupts by sei right before reti instruction for reduction of total interrupt latency (the cli in main code will not executed before interrupt processing).

It seemed to me you were suggesting that if A) interrupts are disabled (because we're in an ISR or otherwise) and B) an interrupt becomes pending, then C) we execute an SEI or a RETI or otherwise set the I flag with the use of OUT SREG, reg, followed by D) a CLI instruction, then the result will be that the pending interrupt will be processed before the CLI instruction.

This is false.

You seemed also to be saying that you execute SEI immediately before RETI in order to improve interrupt responsiveness. I submit it is this practice which permits a pending interrupt to execute immediately after the RETI.

If I interpreted your words incorrectly, I appologise.

Quote:
And you and I read the datasheet differently :-)

Apparently.

Quote:
Yes, interrupt will processed in this moment.

Then I won't bother writing any test code.

This may be fine for your application, but it is exactly what I don't want. Omitting the SEI before the RETI guarrantees that the next instruction after the interrupts are re-enabled will be CLI, which immediately re-disables the interrupts before any pending interrupts can be processed, which is what I require.

Quote:
If ISRs ended with 'sei/reti' sequence, then 'CLI' in main code will not executed untill all pending interrupts will be processed.

If more than one interrupt is pending, then the CLI immediately after the SEI/RETI will execute after processing only the 1st. Quoting again:
    ...

    When the AVR exits from an interrupt, it will always return to the main program and execute one
    more instruction before any pending interrupt is served.

    ...

Unless of course you are employing the SEI/RETI sequence in all of your ISR code. If you are, then it is true that the instruction immediately following the one that was interrupted by the very first interrupt will not execute until all pending interrupts have been serviced.

Quote:
I do not talk about 'CLI' that is after 'ICALL', but there may be another 'CLI' in main code!

No need to shout. The distinction is irrelevant anyway, I was just trying to be clear about my specific application.

Quote:
I reserve 256 bytes for ISR's stack with nested interrupts (I use separate stack for interrupts). It is more than enough.

Good to know, but tests with my application are less favourable. In edge cases, I would fill up 256 bytes in about 1 mS. I cannot use nested interrupts here.

Quote:
Quote:
Nested interrupts could lead to out-of-order latching of captured events. See previous posts.

No. At epilog the event already processed.

Correct. I was, however, referring to a previous post:

joeymorin wrote:
I don't have a lot of stack space, and that's not my only issue. Nested interrupts could lead to out-of-order latching of captured events. See previous posts.

Which I now quote:

joeymorin wrote:
Using SIGNAL instead of interrupt would allow another interrupt to break in even during the prologue. More importantly, before the captured event gets latched. This could lead to a reversal of latching order. Then I need to code detection of this condition.

I mis-spoke slightly in the earlier post. I'd confused __attribute__ ((__interrupt__)) with __attribute__ ((__signal__)).

However, with the help of skeeve's suggestion I have arrived at a workable solution.

Quote:
I regulary use nested interrupts.

I have used them before as well, but not frequently. In any case, they are inappropriate for the application in question.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
It seemed to me you were suggesting that if A) interrupts are disabled (because we're in an ISR or otherwise) and B) an interrupt becomes pending, then C) we execute an SEI or a RETI or otherwise set the I flag with the use of OUT SREG, reg, followed by D) a CLI instruction, then the result will be that the pending interrupt will be processed before the CLI instruction.

This is false.

You seemed also to be saying that you execute SEI immediately before RETI in order to improve interrupt responsiveness. I submit it is this practice which permits a pending interrupt to execute immediately after the RETI.

Aren't you contradicting yourself here? If the CLI is indeed the next instruction after the RETI, then the pending interrupt would indeed be processed before the CLI as you submit.

Regards,
Steve A.

The Board helps those that help themselves.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
Not necessarily. With avr-gcc, an ISR need not have an associated interrupt.
It's name does have to end in _vect to avoid a warning.

That would never have occured to me:

ISR(foo)
{
  __asm__ __volatile__ ("");
}
00000f40 :

ISR(foo) {
     f40:	1f 92       	push	r1
     f42:	0f 92       	push	r0
     f44:	0f b6       	in	r0, 0x3f	; 63
     f46:	0f 92       	push	r0
     f48:	11 24       	eor	r1, r1
  __asm__ __volatile__ ("");
}
     f4a:	0f 90       	pop	r0
     f4c:	0f be       	out	0x3f, r0	; 63
     f4e:	0f 90       	pop	r0
     f50:	1f 90       	pop	r1
     f52:	18 95       	reti

If I read that correctly, doing so is basically the same as:

void foo() __attribute__ ((__signal__));
void foo()
{
  __asm__ __volatile__ ("");
}
00000f40 <_Z3foov>:

void foo() __attribute__ ((__signal__));
void foo() {
     f40:	1f 92       	push	r1
     f42:	0f 92       	push	r0
     f44:	0f b6       	in	r0, 0x3f	; 63
     f46:	0f 92       	push	r0
     f48:	11 24       	eor	r1, r1
  __asm__ __volatile__ ("");
}
     f4a:	0f 90       	pop	r0
     f4c:	0f be       	out	0x3f, r0	; 63
     f4e:	0f 90       	pop	r0
     f50:	1f 90       	pop	r1
     f52:	18 95       	reti

Yup.

Quote:
...you could do a binary search. 38 choices could require up to five tests.

I think the most a single ISR has to contend with is about a dozen, so I think I could pull it off with 3 tests. I think a 3-to-5-fold binary search might be a few cycles faster than an ICALL. I may one day re-design with this approach, but I must admit that I prefer the hook/handler approach. It is to my eye clearer and easier to maintain, perhaps because it is what I have more experience with. Now that I've got a workable solution to address the superfluous push/pull activity, I'm inclined to stay the course.

Quote:
That was the reason for the CLI in my macro.

[sheepishly] I had missed that on first reading. [/sheepishly]

You'll note my lengthy previous post identifying the problem... an Arduino test sketch... a discussion with 501-q... and my conclusion about how to solve the problem... with a CLI!

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Koshchi wrote:
Aren't you contradicting yourself here? If the CLI is indeed the next instruction after the RETI, then the pending interrupt would indeed be processed before the CLI as you submit.

Negative:

ISR()
{
        PUSH r30
        PUSH r31 
        LDS r30,  0x100 ; load the handler's address from the hook
        LDS r31,  0x101 ;
        ICALL           ; call the handler
        CLI             ; disable interrupts
        PULL r31
        PULL r30
        RETI
}
        
handler() __attribute__ ((__signal__));
handler()
{        
        ...
        ...
        RETI
}

Sequence:

    1) Interrupt occurs 2) ISR begins to service interrupt, disabling interrupts globally
    3) handler is called, interrupts still disabled globally
    4) additional (or different) interrupt occurs, while still in handler, and while interrupts are still globally disabled, setting the appropriate interrupt flag
    4) handler returns with RETI, re-enabling interrupts
    5) control falls back to the next instruction after the ICALL that launched the handler, CLI
    6) as per the datasheet: "When the AVR exits from an interrupt, it will always return to the main program and execute one more instruction before any pending interrupt is served," so CLI is executed before the pending interrupt
    7) the ISR continues uninterrupted through it's prologue
    8) the ISR executes its RETI
    9) control returns to the instruction following the one that was interrupted by 1) above
    10) one more instruction is executed
    11) highest-priority pending interrupt is serviced
I have put this assertion to the test and it proves to be true. See my previous post and the code to test it.

The debate was what would happen if the handler looked like this:

handler() __attribute__ ((__signal__));
handler()
{        
        ...
        ...
        SEI
        RETI
}

Here is a test program. Again, please excuse the Arduiono sketch construct:

#define timer1_count GPIOR0
uint32_t static volatile timer2_count;

void handler() __attribute__ ((__naked__));
void handler()
{

  __asm__ __volatile__ ("push r24          \n"  /* prologue          */
                        "in   r24, 0x3f    \n"
                        "push r24          \n"
                        "in   r24, 0x1e    \n"  /* load timer1_count */
                        "inc  r24          \n"  /* increment         */
                        "out  0x1e, r24    \n"  /* save              */
                        "ldi  r24, 0x01    \n"  /* enable TOIE2      */
                        "sts  0x70, r24    \n"
                        "sbis 0x17, 0      \n"  /* wait for TOV2     */
                        "rjmp .-4          \n"
                        "sbis 0x16, 0      \n"  /* wait for TOV1     */
                        "rjmp .-4          \n"
                        "pop r24           \n"  /* epilogue          */
                        "out 0x3f, r24     \n"
                        "pop r24           \n"
                        "sei               \n"
                        "reti              \n"
                       );
}

void (*int0_vect_hook)() = handler;

#define invoke_int0_vect_hook_handler __asm__ __volatile__ ("ICALL\n" \
                                                            "cli\n" \
                                                          : \
                                                          : "z" (int0_vect_hook))

ISR(TIMER1_OVF_vect)
{
  invoke_int0_vect_hook_handler;
  // disable both interrupts
  TIMSK1 &= ~_BV(TOIE1);
  TIMSK2 &= ~_BV(TOIE2);
}

ISR(TIMER2_OVF_vect)
{
  // track number of runs
  timer2_count++;
}

void setup()
{
  // set up serial comms
  Serial.begin(460800);
  // reset terminal
  Serial.print("\033c");

  // clear timer1_count
  timer1_count = 0;
  
  // set TIMER1 normal mode with prescaler to 1024
  // rollover will be F_CPU/65536/1024 = every 4.2 seconds
  TCCR1A = 0;
  TCCR1B = _BV(CS12) | _BV(CS10);
  // reset TIMER1
  TCNT1 = 0;
  // clear pending TIMER1 OVF flag
  TIFR1 = _BV(TOV1);
  // enable TIMER1 OVF
  TIMSK1 |= _BV(TOIE1);

  // set TIMER2 normal mode with prescaler to 1
  // rollover will be F_CPU/256 = 62.5 KHz
  TCCR2A = 0;
  TCCR2B = _BV(CS10);
  // reset TIMER2
  TCNT2 = 0;
  // clear pending TIMER2 OVF flag
  TIFR2 = _BV(TOV2);
  // disable all TIMER2 interrupts
  TIMSK2 = 0;
}

void loop()
{
  // continuously report the number of passes through TIMER1_OVF_vect
  // handler and TIMER2_OVF_vect ISR
  Serial.print("timer1_count=");
  Serial.print(timer1_count);
  Serial.print("  timer2_count=");
  Serial.println(timer2_count);
  // slow it down
  delay(1000);
}

The output of which is:

timer1_count=0  timer2_count=0
timer1_count=0  timer2_count=0
timer1_count=0  timer2_count=0
timer1_count=0  timer2_count=0
timer1_count=0  timer2_count=0
  
timer1_count=1  timer2_count=1
timer1_count=1  timer2_count=1
timer1_count=1  timer2_count=1
timer1_count=1  timer2_count=1
timer1_count=1  timer2_count=1
timer1_count=1  timer2_count=1
   

This shows that the insertion of SEI immediately before RETI causes a pending interrupt to be serviced before CLI is executed.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:
Maybe you just want a small ISR that redirects to the hooks?

I must admit, I like the direct stack/PC manipulation. That is a trick I can get behind ;-)

I've been spending a great deal of time reading and writing posts in this thread, and almost no time actually working on my code! The next step I think is to implement some of what I've learned in the last couple of days and report back. I expect I will first try __attribute__ ((__signal__)) on all the relevant handlers, combined with tight ISR code declared ISR_NAKED. Capture latching and some other tasks are common to all the handlers for a particular ISR, and that code is currently executed before the handlers are hooked. That won't change, but I think I can combine/convert the required C with/to assember optimized to include some of the suggestions I have received.

Many thanks to all.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

skeeve wrote:
With avr-gcc, an ISR [...] name does have to end in _vect to avoid a warning.
No, it has to start with __vector

The _vect stuff is only macros from headers finally ending up in __vector_

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi!

joeymorin wrote:
501-q wrote:
Why?! Your test proof my assertion. If after 'sei/reti/out SREG,reg' execute 'cli', then interrupt will not processed (till next 'sei/reti').

Forgive me, perhaps I have misunderstood you:

No matter. English is not my native language. Possible that I made errors...

Quote:

Quote:
And you and I read the datasheet differently :-)

Apparently.

There is smile in my post. What I wrote -- it is all I read from datasheet. I made tests only for 'OUT SREG,reg' case. And now I see that you and I undestood (think) the same about interrupts.

Quote:

    ...

    When the AVR exits from an interrupt, it will always return to the main program and execute one
    more instruction before any pending interrupt is served.

    ...

Unless of course you are employing the SEI/RETI sequence in all of your ISR code. If you are, then it is true that the instruction immediately following the one that was interrupted by the very first interrupt will not execute until all pending interrupts have been serviced.


Yes. And it greatly reduce total interrupt latency time in some cases.

Quote:

Quote:
I reserve 256 bytes for ISR's stack with nested interrupts (I use separate stack for interrupts). It is more than enough.

Good to know, but tests with my application are less favourable. In edge cases, I would fill up 256 bytes in about 1 mS. I cannot use nested interrupts here.

In some cases I use low registers for save working registers in ISR. It saves time too. I write start of ISR by assembler and deeper processing make in C.
Quote:

Quote:
I regulary use nested interrupts.

I have used them before as well, but not frequently. In any case, they are inappropriate for the application in question.

Ok. I hope that our discussion was usefull.

WBR, Ilya

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi!

joeymorin wrote:
I expect I will first try __attribute__ ((__signal__)) on all the relevant handlers, combined with tight ISR code declared ISR_NAKED.

Be careful with 'naked' attribute. Compiler does not generate prolog, but use stack for intermidiate variables anyway.

Ilya

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

joeymorin wrote:
I expect I will first try __attribute__ ((__signal__)) on all the relevant handlers, combined with tight ISR code declared ISR_NAKED.
Using ISR_NAKED with ISR basically renders signal void ;-)

ISR will just add some attributes like "used", but with "naked" no pro-/epilogue will be there so that you then have "no ISR pro-/epilogue" instead of "no ordinary pro-/epilogue".

You just pick an other flavor of nothing :lol:

avrfreaks does not support Opera. Profile inactive.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

501-q wrote:
I made tests only for 'OUT SREG,reg' case. And now I see that you and I undestood (think) the same about interrupts.

Yes, it does seem as though we were 'arguing' about the same thing ;-)

Quote:
Quote:
Unless of course you are employing the SEI/RETI sequence in all of your ISR code. If you are, then it is true that the instruction immediately following the one that was interrupted by the very first interrupt will not execute until all pending interrupts have been serviced.

Yes. And it greatly reduce total interrupt latency time in some cases.

I may be able to use this for 2 of the 6 ISRs in my application, and I will experiment with the technique. However, there remains the concern of stack collision. In the edge cases where I could expect nested interrupts to occur, they are not merely 'burst' events. Rather, they would be ongoing.

In such a scenario interrupts would nest uncontrollably and the stack would quickly grow down to collide with static buffers and variables.

If I specified the that my large (1536 bytes) global buffer should appear last in .bss, then the stack would grow down into it first. The advantage for me here is that this buffer is not used by the modules that could experience nested interrupts, so it wouldn't matter (the buffer is partitioned and initialized by the modules that use it). However, I would still see the stack grown down past the buffer into important variables in about 2 mS.

Not counting any other stack needs, each invocation of an interrupt would (for my application) require pushing a minimum of 4 bytes (2 for PC, and 2 for prologue). Even if I had all of SRAM available to me, that's 512 nested interrupts before total failure. For the edge cases in question, I would need to tolerate ten times that before coming up for air. Interrupts could fire as often as every 4 uS. That's just a tiny bit more than 2 mS.

Quote:
In some cases I use low registers for save working registers in ISR. It saves time too. I write start of ISR by assembler and deeper processing make in C.

I am already using virtually all of the low registers: GPIOR0/1/2, unused PORTs, PINs, and DDRs, unused timer registers, EERD, EEARL, etc...

Quote:
Quote:
Quote:
I regulary use nested interrupts.
I have used them before as well, but not frequently. In any case, they are inappropriate for the application in question.
Ok. I hope that our discussion was usefull.

Indeed it has been. I will examine the possibility of employing limited pre-epilogue interrupt nesting in 2 ISRs. Thank you.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

Last Edited: Wed. Jul 25, 2012 - 04:49 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

501-q wrote:
Be careful with 'naked' attribute. Compiler does not generate prolog, but use stack for intermidiate variables anyway.

You are correct. Please see my original post at the very top of this long long page :)

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

SprinterSB wrote:
Using ISR_NAKED with ISR basically renders signal void ;-)

Nope. I am using ISR_NAKED for (some) ISRs, then __attribute__ ((__signal__)) on the handlers hooked by those ISRs. The ISR code is now mostly assembler, integrated with some C (where the compiled code has no or predictable register impact, such as SBI). The effect is that I tightly control what gets pushed/pulled by the ISR, and each handler becomes totally responsible for pushing/pulling it's own weight.

There is some overlap, for some of the heavier handers, i.e. they push registers already pushed by the assembler ISR, but I expect I will be able to keep most of that under control w.r.t my edge case timing needs.

Quote:
ISR will just add some attributes like "used", but with "naked" no pro-/epilogue will be there so that you then have "no ISR pro-/epilogue" instead of "no ordinary pro-/epilogue".

I'm counting on it.

Quote:
You just pick an other flavor of nothing :lol:

Negative. I have coded this, tested it, and it works as expected.

This achieves what I was looking for in my original post: near-optimal (speed, not size) register push within ISRs and handlers, without risks arising from future compiler upgrades or other changes in the build environment.

Delicious.

Cheers,
jj

"Experience is what enables you to recognise a mistake the second time you make it."

"Good judgement comes from experience.  Experience comes from bad judgement."

"Wisdom is always wont to arrive late, and to be a little approximate on first possession."

"When you hear hoofbeats, think horses, not unicorns."

"Fast.  Cheap.  Good.  Pick two."

"We see a lot of arses on handlebars around here." - [J Ekdahl]

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi!

joeymorin wrote:
In such a scenario interrupts would nest uncontrollably and the stack would quickly grow down to collide with static buffers and variables.

Hmm... I may misundestood something...

Look at this. If at some moment in stack are 100 bytes after nested interrupts, then it was needed 200 cycles to pop them from stack some time ago. In this scenario interrupts could be missed if stack is empty.

Ilya

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi!

501-q wrote:
If at some moment in stack are 100 bytes after nested interrupts, then it was needed 200 cycles to pop them from stack some time ago. In this scenario interrupts could be missed if stack is empty.

More detailed.

Say, interrupts are occured every 64 cycles. We want to process all of them.

If stack is growing uncontrolled in case, when we enable interrupts at epilog because of nested interrupts then there is error (I think so): If process of interrupt take less than 64 cycles, then stack will not grow. If process of interrupt take more than 64 cylces, then after 64 (or 32, 22, 16) interrupts we miss one interrupt anyway.

If process of interrupt take less than 64 cycles (say 50), then we may enable interrupts at epilog. In this case we may catch many nested interrupts, but not uncontrollable (say, no more than 50 nested interrupts). If it will be more than 50 nested interrupts, then we miss interrupt anyway.

If I mistaked, then show up me.

Ilya

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hi again!

I control such cases by using ring buffers. To put an event into ring buffer (in nested interrupts) I expend very few cycles. In first of interrupts I process the ring buffer (untill it becomes empty). I save cycles while maintaining the heavy context only in the first of interrupt. To prevent nested processing I use bit flag in register. So, there is only 2 or 3 nested interrupts (3 if there interpose another interrupt).

Some times processing of ring buffer last about 20 interrupts (I saw it by oscilloscope). And it is regular interrupts, not sporadic. Even if worst case take 270 cycles (when interrupts occur every 256 cycles) it works fine.

Ilya

Pages