RTOS - saving context

Go To Last Post
11 posts / 0 new
Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Without re-opening the long and interesting discussion of whether RTOS's are a good idea I had a question re context saving for an AVR.

The usual approach when saving context is to save all registers plus a status register plus a return address = 35 bytes.

The problem with this approach is that a simple task might only be using a few registers but still the full 35 bytes are saved.

I have seen some discussion about the idea of scanning for non zero registers and saving used registers only. this could be done (say) by saving in groups of 4 and saving one additional byte where each bit with a 1 indicated that that 4 registers should be restored - eg b0 = r0-r3.

I was prompted to think of a much simpler idea when reading about register usage @ http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_reg_usage

I have not tested it yet but it seems to me that if we are using GCC for AVR and the context saving is done in a subroutine then we should only need to save the Call-saved registers (r2-r17, r28-r29) as the calling subroutine assumes that the Call-used registers (r18-r27, r30-r31) are clobbered and saves them explicitly if they are in use. This would ~halve stack usage for simple tasks and not increase it for complex tasks.

Does this idea make sense?

Has anyone done some experimenting with this sort of idea?

regards
Greg

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

If you wanted to save ram, then it might be a good idea but the time spent determining what regs to save, saving the required ones and tagging which ones were saved seems more onerous than just stacking all the regs. In terms of code and execution time that is.
You have the issue of an isr causing a context switch to contend with. Considering a timer will cause a task switch as opposed to a yield, the use of your proposed method doesn't sound too enticing.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Kartman wrote:
If you wanted to save ram, then it might be a good idea but the time spent determining what regs to save, saving the required ones and tagging which ones were saved seems more onerous

Agreed. I actually wrote some code to do the checking... with the same conclusion as you.

That led to the idea of only saving the call saved registers. I am wondering if I am missing something obvious.

regards
Greg

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Surely you only need to save the registers if we're talking about pre-emption. And if we are talking about pre-emption then presumably the mechanism is that a timer interrupt says when to switch? If that is so then your assertion:

Quote:

and the context saving is done in a subroutine

won't be true will it? You cannot know when the timer ISR will interrupt.

Also surely the overhead of TCBs in a pre-emptive RTOS is not the 35 bytes for the register saving but the fact that each task has it's own data stack which needs to be large enough to accommodate that task's operation. That is a few hundred bytes per task and that's why the AVR RAM limits the number of tasks possible. So yeah you might cut 35 bytes down to 20 and save a massive 15 or whatever - but does it really help?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

IIRC, FemtoOS does indeed implement a register saving strategy, but i don't have details at hand. Have a look at femtoos.org., Features -> Register compression

Einstein was right: "Two things are unlimited: the universe and the human stupidity. But i'm not quite sure about the former..."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
So yeah you might cut 35 bytes down to 20 and save a massive 15 or whatever - but does it really help?

well - if you have 10 tasks that is 150 bytes!!

regards
Greg

Last Edited: Fri. Aug 23, 2013 - 09:24 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
IIRC, FemtoOS does indeed implement a register saving strategy
That's where I got the idea. If you use 4 registers to calculate the used registers and one byte as a "compression map" then if you only use 4 registers in your task then a context would be 13 bytes I think.... but the compression takes a while to execute.

regards
Greg

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
if you have 10 tasks that is 150 bytes!!

Cliff's argument still holds true, since per task you would need 35 vs. 20 bytes register context space, but hundreds of bytes per-task stack space. If you multiply by 10 tasks, it's still the same factor.

Einstein was right: "Two things are unlimited: the universe and the human stupidity. But i'm not quite sure about the former..."

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

DO1THL wrote:
Quote:
if you have 10 tasks that is 150 bytes!!

Cliff's argument still holds true, since per task you would need 35 vs. 20 bytes register context space, but hundreds of bytes per-task stack space. If you multiply by 10 tasks, it's still the same factor.


Not necessarily.

Stack usage is made up of the initial task stack frame that is always there plus the stack usage of functions plus a context. In many applications I would guess that much of the stack usage is in the initial stack frame and this data is present as globals or statics whether you use an rtos or not.

An AVR is not really suited to applications that call many layers deep and allocate large stack frames at each level.

regards
Greg

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The key thing in an RTOS in a limited RAM environment is for each task to be able to specify their needed stack size and for the authors of the tasks to make sensible choices. Just giving each task 200 bytes of stack is not a wise idea but he stack that requests 100 and uses 180 is also a problem! I'd have thought that was where the real saving was to be made persuading the task authors to make sensible requests for stack space. I'm guessing too many say "oh, emm, 200 - 'just in case'" - then use 20.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Quote:
An AVR is not really suited to applications that call many layers deep and allocate large stack frames at each level.

The same could be said for any processor with limited resources.