Preemptive vs. Cooperative Multitasking

Go To Last Post
118 posts / 0 new

Pages

Author
Message
#1
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Cooperative vs. Preemptive Multitasking

 

I am interested in and working with multitasking since the old days of Apple II.

 

Win 3.1 used cooperative multitasking, Win9x in parts.

From Win NT full preemptive multitasking was used.

 

Preemptive MT is a must for multiuser/multitasking operation and for „Real Time“

operating systems – mainly to garantee that a blocking task does not block the whole

system.

 

But is this also true for microcontrollers ?

Well, there are a lot of commercial and free RTOS (preemptive) for microcontrollers available in the

net. But testings some of them i had problems to do the real world jobs.

Here are some points:

 

1) Most tasks are short (toggle a pin) but should be called often

2) Some libraries do not want to be interrupted somewhere in their code

3) Some are even not able to run with an AVR328 (32k Flash, 2k Ram)

4) Most are running with time slices of 1 ms, which is far too slow for my work

5) They are often not easy to use: estimate a stack size in advance, do complicated

initialisation

 

There are some schedulers (cooperative MT), but most of them are not suitable for

a real world application.

 

I made a cooperative MTOS which is smaller and faster than most of the other solutions.

But maybe I am wrong – maybe there is someone in this small world to proof by example

that you can do it better, smaller and faster with preemptive MT.

 

I dont want to discuss, but want to see examples - no Djihad ;)

 

For this I made a very simple Demo (Arduino) which demonstrates what I mean:

(The stripped down principles of my OS)

 

RunDemo1 does: (from appended source)

 

Theme: Shows an easy to use scheduler for multitasking

Tasks:

1) Toggle LED of Arduino every 20 ms

2) Toggle PIN3 every 1 ms

3) Write "Hello" + flag state 100 times per second

4) Receive characters from serial line (115200 baud)

    No interrupt, just polling to show how fast it could be

   [ yes, performance with enabled interrupts will be better ;) ]

  Do not loose characters

  filter command characters '0' and '1' and store them into command queue

(max 20 commands per line)

5) Execute commands: set PIN3: '0'=LOW, '1'=HIGH

 

Sketch uses 2,076 bytes (6%) of program storage space. Maximum is 32,256 bytes.

Global variables use 176 bytes (8%) of dynamic memory, leaving 1,872 bytes for local variables. Maximum is 2,048 bytes.

 

Hera are 2 images to show how it works:

 

RunDemo1A

 

 

RunDemo1B

 

Here is the full Code: (no libraries are used)

 

/*
    RunDemo1  (here: for AtMega328)
    
    (C) 2015 Helmut Weber
    
    Its purte C
    When changiing Serial IO, it even will run on an Apple II, PC
    or any microcontroller
    
    Theme: Shows an easy to use scheduler for multitasking
    
    Tasks:
    1) Toggle LED of Arduino every 20 ms
    2) Toggle PIN3 every 1 ms
    3) Write "Hello" + flag state 100 times per second
    4) Receive characters from serial line (115200 baud)
       No interrupt, just polling
       [  yes, performance with enabled interrupts will be better ;) ]
       Do not loose characters
       filter command characters '0' and '1' and store them into command queue
       (max 20 commands per line)
    5) Execute commands: set PIN3:  '0'=LOW, '1'=HIGH   

Sketch uses 2,076 bytes (6%) of program storage space. Maximum is 32,256 bytes.
Global variables use 176 bytes (8%) of dynamic memory, leaving 1,872 bytes for local variables. Maximum is 2,048 bytes.

*/

 

// defines for baudrate
#define BAUD115200 (16)
#define BAUD76800 (25)
#define BAUD57600 (34)
#define BAUD38400 (51)
#define BAUD9600 (207)

 

// ===================== Start of operating system ======================
// just 3 functions !

#define             NUMRUNS            10
#define             MAXPRIORITY        0x0f
#define             MAXLEVEL           5
// Definition for states

// states
#define                STOPPED          0x80
#define                WAITING          0x40
#define                RUNNING          0x20
#define                WAITEVENT        0x10

#define                PRI_KERNEL       0x0
#define                PRI_DISP         0x1
#define                PRI_SYSTEM       0x2
#define                PRI_USER0        0x4
#define                PRI_USER1        0x5
#define                PRI_USER2        0x6
#define                PRI_USER        0x7

unsigned char          numruns;
void                   (*runfunction[NUMRUNS])(void);
unsigned long          runinterval[NUMRUNS];
unsigned long          lastrun[NUMRUNS];
unsigned char          priorities[NUMRUNS];


// Init a task
// could be call from SETUP or inside of functions
unsigned int run(void (*userfunction)(void), unsigned long interval, unsigned char priority) {
  char i, j;

  i = numruns;

  // replace STOPPED tasks
  for (j = 0; j < numruns; j++) {
    if (priorities[j] & STOPPED)  {
      i = j;
      break;
    }
  }

  runfunction[i] = userfunction;
  runinterval[i] = interval;
  lastrun[i] = micros();
  priorities[i] = priority;
  if (i >= numruns) numruns = ++i;
}

char    level, irlevel, thisJob;

unsigned long m;
unsigned long m2;
unsigned int curtask;

 

// call this from loop: runner(MAXPRIORITY);
// does multitasking

void runner(unsigned char maxPriority) {
register unsigned char *p;

  if (level >= MAXLEVEL) return;

  level++;

  for (curtask = 0; curtask < numruns; curtask++) {
    //MARKER8;
    p = &priorities[curtask];
    if ((*p & 0xf0) == 0) { // excluding STOPPED and WAITING, RUNNING jobs
      if (*p <= maxPriority) {
        if ( (micros() - lastrun[curtask]) >= runinterval[curtask]) {
          lastrun[curtask] = micros();
          *p |= RUNNING;
          thisJob = curtask;
          (*runfunction[curtask])();
          *p &= ~RUNNING;
          //if (*p==PRI_KERNEL) { level--; return; }// prefer KERNEL tasks
        }
      }
    }
  }
  level--;
}

 

// call this from inside of long working functions
void internrunner(unsigned long dely, unsigned char maxPriority) {
  volatile unsigned long m;
  byte mycurtask;
  //MARKER8;
  if (dely > 65) dely -= 65; // CORRECT time for dely >= 100

  irlevel++;

  mycurtask = thisJob;

  m = micros();
  do {
    runner(maxPriority);
  } while ((micros() - m) < dely);
  irlevel--;
}

// ===================== End of operating system ======================

 

// Blink LED 20 ms
void Blink() {
  digitalWrite(13, !digitalRead(13));
}

//Toggle Pin2 1 ms
void Toggle() {
  digitalWrite(2, !digitalRead(2));
}

 

// Write a char to Serial line
void SerialWriteChar( char c) {
  // wait for transmitter empty
  while ( !( UCSR0A & (1<<UDRE0)) )     internrunner(40, PRI_USER);
  UDR0=c;
}

#define CMDBUFLEN 20

char Cmd[CMDBUFLEN], cmdHead, cmdTail;

void DoCmd() {
  if(cmdTail != cmdHead) {
    switch(Cmd[cmdTail]) {
      case '0': digitalWrite(3,LOW); break;
      case '1': digitalWrite(3,HIGH); break;
    }
    cmdTail++; if (cmdTail==CMDBUFLEN) cmdTail=0;
  }
}

inline void TestCommand(char c) {
  if ((c>='0') && (c<='9')) {
    Cmd[cmdHead++]=c; if(cmdHead==CMDBUFLEN) cmdHead=0;
  }
}  

void SerialWriteString (char *st) {
  while (*st) { SerialWriteChar(*st++); internrunner(100,MAXPRIORITY); }
}

void SerialIO () {
char c;
  while (UCSR0A & (1<<RXC0)) {
    c=UDR0;
    TestCommand(c);
    SerialWriteChar(c);
  }
  internrunner(40,PRI_KERNEL);
}

bool flipFlop;

void DoHello() {
  flipFlop = !flipFlop;
  SerialWriteString("Hello ");
  if (flipFlop) SerialWriteString("true\n");
  else          SerialWriteString("false\n");
}

void setup() {
  pinMode(13, OUTPUT);
  pinMode(2, OUTPUT);
  pinMode(3, OUTPUT);

  // Init Serial
  cli();
  UCSR0A |= (1 << U2X0);     // double read / write speed
  UCSR0B |= (1 << RXEN0) |   // enable receiver
            (1 << TXEN0);    // enable transmitter
  UBRR0 = BAUD115200;
  sei(); // activate interrupts
  // Serial init done
 
  SerialWriteString("Start\n");
 
  // This is all you need to do to convert functions to tasks
  // Init Tasks
  run(Blink,   20000, PRI_USER);    // Toggle LED every 20 ms
  run(Toggle,   1000, PRI_KERNEL);  // Toggle PIN3 every 1 ms
  run(SerialIO,  100, PRI_KERNEL);  // Test RX, filter and save commands
  run(DoCmd,     100, PRI_USER);    // execute commands from buffer
  run(DoHello, 10000, PRI_USER);    // write 100 x Hello / s
}

// do multitasking
void loop() {
  runner(MAXPRIORITY);
}

 

 

Last Edited: Wed. Feb 4, 2015 - 09:52 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Hmm, well due to the small ram size, thus small stack space on AVR8 processors, does not leave much room for more then a few tasks, which would explain why you only see PE MT kernals for the larger ATMegas.   As for "real world" I have used CO MT in many of my projects, as the stack space needed is much less and you have fewer problems with libraries that are not re-entrant.  It's a mater of programming skill I guess.   I'm not sure why you need a tick of 1mS or less, generally as a rule, the faster the tick rate the more overhead is incurred with task scheduling.  I don't ever recall needing a tick rate greater then 10mS in any of my real world projects.   Please explain your need for speed, so I may understand your app better.   

As with all things programming, use the tools you need for you app, so I'm not being critical of you efforts, they seem to work for you.  As for PE or CO MT, use what is best for the app at hand, one is not "better" then the other, each are tools to be used.

 

Jim

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh, you may want to see what others have done in the arduino for mt, check out ArdOS here: https://bitbucket.org/ctank/ardo...

You may find it interesting, as it can be configure either PE or CO.

 

Jim

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I don't ever recall needing a tick rate greater then 10mS in any of my real world projects.

Just to note that the "tick" in Windows and Linux used to be 10ms (100Hz) but as CPU speeds increased and the relative "cost" of the task switching reduced they upped it (well "jiffies" in Linux) to 1,000Hz i.e. 1ms

Please explain your need for speed, so I may understand your app better.

I could be reading it wrong but I'm guessing that if, for example, the OP wanted to generate a 38kHz IR signal he'd presumably want "ticks" at 38kHz at least so he could toggle the signal (actually presumably 76kHz?). This may not necessarily be the best way to generate such a signal though. Far better to have the hardware generate the 38kHz and then just use a relatively "slow" task to modulate it perhaps?

 

I think when you are doing cooperative and perhaps even with preemptive there are going to be some things in the system that need to happen more often than the "OS tick" and for those I guess you use the traditional tools of the MCU like a timer ISR "owned" by one of the tasks or something?  (again Linux/Windows are examples of this - there are interrupting events happening in the kernel at far faster rates than the "apps" are being ticked).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Your use of internrunner() may cause one task to block another for a longer time than desired. Here's some pseudocode to demonstrate the problem:

taskA:
    call internrunner until we can write to the UART
    write a byte to the UART
    return

taskB:
    call internrunner until we can read from the UART
    read a byte from the UART
    return

Now if we run taskA and taskB, if taskA runs first and it must call internrunner, then taskB will run. taskB will call internrunner until it can read from the UART (which may be a long time if nothing is sending data to the UART). Meanwhile, your scheduler sees taskA as "running", so it cannot run taskA again until taskB returns. Therefore, a task that reads data from the serial port can indefinitely block another task that writes data to the serial port. Needless to say, this is bad.

Multitasking can be tricky and difficult to get right. When I have to work within a small system, I like to create tasks that complete right away rather than block. Usually each task has to store its state somewhere besides the function's stack frame so it can resume where it left off the next time it runs. In the case of a task that reads/writes data to the serial port, if the serial port is not ready, the function must return right away to let other tasks run.

Last Edited: Wed. Feb 4, 2015 - 03:10 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There is definitly a place for multitasking but it's not free or always better.

Most embedded devices work really well with a well written state machine. In fact using an OS does not eliminate the need for a state machine, it just makes it more complicated.

an RTOS becomes more useful when the project goes from small to medium,or is it medium to large? :)

Nah, Large IMHO is millions of lines of code, so small to less small...

 

Inter Task communication is nothing to take lightly.

Keith Vasilakes

Firmware engineer

Minnesota

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

ki0bk wrote:
Oh, you may want to see what others have done in the arduino for mt, ...
Another one :

Arduino Playground - QP framework

http://playground.arduino.cc/Code/QP

...

And you get a powerful multitasking support, without worrying about semaphores and other such low-level mechanisms typically found in RTOSes. Instead, you can work at a higher level of abstraction of events, state machines, and active objects.

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I once set up an "always ready" interface on the original Macintoshes, the idea being to eliminate any watch cursor for the GUI interface. Long tasks could call CheckEvent a few times per second, which could start another task on top of the existing stack, and return only when that task completed. So you could start several lengthy image processing tasks, then continue to do image scaling and zooming in foreground while the background tasks would unwind the stack. It required software locks on data arrays and other sharable constructs and restoration of a few bytes of system globals on each process reentry (e.g. mouse down location, active window pointer), but with those done properly the multitasking was quite reliable. 

 

The 68000 context switch was pretty fast and it turned out to be useful for real-time data acquisition. Rather than an interrupt to transfer incoming data to a buffer and set a flag for a foreground process to handle it, calling the foreground process for polling a few times a second gave more consistent update speeds and even slightly more throughput.  Something to be considered for applications where foreground tasks need to be done as fast as possible but background tasks have no time constraints (and you can live with the weird last-in first-out process completions).

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks for commenting my post.

 

I' ll try to reply

 

#2 + #3

Low Stackspace indeed is one point. CO needs less stackspace - most important for AtTinys. I use AtTiny85 sometimes.

I tried ArduOS, ChibiOS, NilOS and some others. For me not usable in PE Mode, but yes, they do well in CO mode. But you have to define the used stackspace.

So why not use my own, which uses less stack.

 

Timers are are not growing on trees so it is not possible to use a timer for every task, which - of cause - would work well usiing it.

 

#4

I always have projects, where even 1 ms is too much. AND: PE multitasking with 1 ms ticks does NOT mean, every task gets timeslice every ms ! It just means every ms the OS has to decide, which task should go on.

I will give an example of a working project lat the bottom of my post.

 

#5

TaskA waits for the transmit buffer to be free. At 115200 that is (worst case) 10us later - so it blocks not longer than 10us.

TaskB is a problem in all programs - never wait for an event (here: receiver gets a byte) in a loop, because it consumes CPU time in ALL OS or programs without OS!

The above program works, because it  tests  if there is a byte in the receiver buffer - I think (when not using interrupts) it is the way to go.

A buffer overrun could appear if the time between tests is too long, but I never got one.

In fact I sometimes think that the 328 has some bytes buffer in the receiver, but I did not find anything in the docs.

The demo programm tests every 50 us (average) up to maybe 120 us. It should be too slow, but it works as you can see in the LA images.

You are able to test the demo.

 

In real world I do use the receiver (and transmitter) with interrupt buffering  incomming/ outgoing characters and set a signal for tasks which are waiting for that signal.

(WAITEVENT is declared in my stripped down demo, but not used)

 

#6

"In fact using an OS does not eliminate the need for a state machine ..."

In fact using a state machine does not eleiminate the need of an MTOS ... is true as well :)

 

#7 I have not worked with the Quantum Leap state machine until now. But there are diifferent versions for Arduino and Arm. Whats about AtTinys?

But thanks for the link, i will test it sometimes.

 

#8

ChibiOS was invented with 68000 in mind. But task switches need time and space even if there is nothing to do.

Well, in Visual Basic you got DoEvent() , which was nothing else as a cooperative switch to the OS outside normal taskswitches.

In fact every normal MT OS has the possibility to return to the scheduler without a timer tick.

But: Less space, the possibility to influence the switching, the use of libraries not made for preemption are points for CO

Another point is: CO implemented in pure C works on ALL processors - even without a timer.

 

Here is a working project:

A morse Encoder / Decoder transmitting at radio frequency 1 MHz

All you need is an AtMega328 (Arduino) and a 10 cm wire

 

1. Part

The Morse Encoder encodes text to morese code. The morse code (DITs, DAHs) is transmitted as a 800 Hz tone. The tone is done by starting and stopping  1 Mhz pulse packets with 800Hz.

When i put a 10 cm wire at pin 2 of my Arduino I can hear the morse tones in my radio (max distance = 2m) tuned at 1.00 MHz.

 

2. Part of the program, running in background:

Decodiong morsecode comming in as TTL HIGH/LOW signals at any speed at pin 3, maybe dramatically different from Part 1.

(with a micro, an ampl. to convert tone to TTL it is possible to decode morse signals travelling around the world)

 

3. Part

All sended and received signals are written to the serial line, with DEBUG defined  together with much much more information

 

4. Part

Receiving commands over the serial line to change the text to send, to change the tone frequency of 800 Hz, to change the morse speed

without stopping the sending and receiving parts

 

That all together works up to 120 Words per Minute WpM, which is much faster than even professionals can understand online.

I am sure with some more work it could be done with an AtTiny85.

 

I could post the source, if somebody is interested in.

 

What i mean is: Show me a (328 or less)  project you can do  with PE and I can not with CO.

I think I can show you some projects, which could not be done with PE- 1ms ticks

That' s the challenge. The price is one beer!

But: source code or demonstrations or explanations,  not ideas or believings

 

If you have such a project let me know.

I am really interested in such a project to see the limits of my OS.

 

 

 

Last Edited: Wed. Feb 4, 2015 - 09:20 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

heweb wrote:
#7 I have not worked with the Quantum Leap state machine until now. But there are diifferent versions for Arduino and Arm. Whats about AtTinys?
tinyAVR has one mention in the QP-nano manual.

Moviing from an Arduino trademark board to a tinyAVR will reduce the license scope; that may or may not be acceptable.

QP has a specific increased license scope for the Arduino trademark, ARM mbed trademark, and the Raspberry Pi trademark.

 

Ref.

QP frameworks for AVR

http://www.state-machine.com/avr/index.php#megaAVR

http://www.state-machine.com/avr/QDKn_AVR-GNU.pdf (1.2MB, page 4)

...

However, the described port should be applicable to most AVRmega devices (such as ATtiny and ATmega families) big enough to accommodate QP-nano.

...

The QP-nano framework can manage up to 8 concurrently executing hierarchical state machines and requires only 1-2KB of code (ROM) and just several bytes of RAM.

Open Source Licensing

http://www.state-machine.com/licensing/open.php

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

heweb wrote:

Another point is: CO implemented in pure C works on ALL processors - even without a timer.

 

Just to be clear...your code in the first post doesn't work on all processors and does use a timer. The Arduino eco-system uses timer0 to deal with the 'micros()' library call.

'This forum helps those who help themselves.'

 

pragmatic  adjective dealing with things sensibly and realistically in a way that is based on practical rather than theoretical consideration.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

You could probably get the code smaller,faster and more reliable by not using function pointers.

Try getting fatfs to work - in the cooperative context and maintaining 'real time'. That's a nasty example!

Sometimes i use both techniques - i wrote a cooperative based comms engine for a web back end that runs under a preemptive kernel. So both techniques have their uses. In a given application, one may be better than the other.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

==> #11

Well, I used micros() , but I had no problems to port the code to Linux and Windows  - both have the possibility to check microseconds.

But even if you do not have a timer, it will work with an incrementing counter in the scheduler. The times then are relative, but it works.

Which processor with Ansi C can not run that program ?

Which PE without any timer does it better ?

PE without timer ?

 

==> #12

Why do you think, function pointers are unreliable, but pointers into the stack to switch the context (like PEs are using)  are reliable ? 
I am with you : Sometimes i use both techniques - i wrote a cooperative based comms engine for a web back end that runs under a

preemptive kernel. So both techniques have their uses. In a given application, one may be better than the other.

There is no doubt - i would prefer a PE, but on AVR it must be faster.

I can go down to 50us cycle time with a jitter of +/- 20 % at most.

 

The digitalWrite() on Arduino need about 5us.

Shifting 8 bits into an 74164 with bit-banging needs 3-4 us for all 8 bits!

I prefer to use digitalWite() because it is so clear - but when time is the factor it is unusable!

Thats the same for me with the PEs i've testet with AVRs.

The 328 is able to do serial transfer with 1000000 baud.

I could not use an OS ticking in ms - thats the background.

 

==> #10

Quantum Leaps Framework for state machines seems to be a mighty tool! I will try it.

But:

They have an example for their framework: PEdestrian LIght CONtrolled (PELICAN) Crossing

                  http://www.state-machine.com/res...

 

"This Application Note describes the PEdestrian LIght CONtrolled (PELICAN) crossing as an example

application for the QP state machine framework. The PELICAN crossing example demonstrates a non-

trivial hierarchical state machine."

 

But as so often the simple examples do not show the real potential of a tool.

It is of cause an (THE) example to use state machines, but it is possible to do it with a scheduler including an Eventdispatcher.

Then the state machine is built into the event-timings

 

Is there anybody to show me a better solution (clearer, faster, smaller). Do you have some code for me - using any tool you like ?

 

I added another rule to make it a bit more complicated:

When operator switches OFF, the runnung cycle MUST complete. I think, thats more real life! Then switch OFF

As you can see it was easy to implement the rules without any state machine!

That is my answer to the question if a state machine with RTOS is always the better solution:

Compare with QPs code !

                           NORMAL OPERATION
RED YEL GRN     STP GO SEC
-   -   X       X   -  0
PED SWITCH PRESSED     9
Ped Cycle activated    9
RED YEL GRN     STP GO SEC
-   X   -       X   -  9
X   -   -       X   -  12
X   -   -       -   X  13
X   -   -       X   -  17
X   -   -       -   -  18
X   -   -       X   -  18
X   -   -       -   -  19
X   -   -       X   -  19
X   -   -       -   -  20
X   -   -       X   -  20
X   -   -       -   -  21
X   -   -       X   -  21
X   -   -       -   -  22
X   -   -       X   -  22
X   -   -       -   -  23
X   -   -       X   -  23
X   -   -       -   -  24
X   -   -       X   -  24
X   -   -       -   -  25
X   -   -       X   -  25
X   -   -       -   -  26
X   -   -       X   -  26
X   -   -       -   -  27
                            NORMAL OPERATION
RED YEL GRN     STP GO SEC
-   -   X       X   -  27

Switching off the traffic light inside a crossing cycle:

                            NORMAL OPERATION
RED YEL GRN     STP GO SEC
-   -   X       X   -  0
PED SWITCH PRESSED     7
Ped Cycle activated    7
RED YEL GRN     STP GO SEC
-   X   -       X   -  7
>>>>>>>>>>>>>>>>>>>>>>>>>OFFLINE: Finishing PED Cycle
X   -   -       X   -  10
X   -   -       -   X  11
X   -   -       X   -  15
X   -   -       -   -  16
X   -   -       X   -  16
X   -   -       -   -  17
X   -   -       X   -  17
X   -   -       -   -  18
X   -   -       X   -  18
X   -   -       -   -  19
X   -   -       X   -  19
X   -   -       -   -  20
X   -   -       X   -  20
X   -   -       -   -  21
X   -   -       X   -  21
X   -   -       -   -  22
X   -   -       X   -  22
X   -   -       -   -  23
X   -   -       X   -  23
X   -   -       -   -  24
X   -   -       X   -  24
X   -   -       -   -  25
                            NORMAL OPERATION
RED YEL GRN     STP GO SEC
<<<<<<<<<<<<<<<<<<<<<<<<<PED Cycle finished  GO OFFLINE
                            OFFLINE
-   -   -       X   -  26
                            OFFLINE
X   -   -       -   -  26
-   -   -       X   -  27
                            OFFLINE 

 

Here is the code. RUN.H is the full OS here. It includes Semaphores for resources (not used here) and an Eventdispatcher (last parameter in run() )

I think, it is easy to understand and it was easy to write:  (start with setup() as I have done)
 

// PELICAN
// (C) 2015 Helmut Weber
//
//
// The rules are from:
// http://www.state-machine.com/resources/AN_PELICAN.pdf page2
//
// I added following rule:
// switch OFFLINE is not allowed during PEDs crossing the street
// remember switch OFFLINE and execute when applicable


#include "runlocal.h"

// Show Event-Bits
//#define DEBUG

// Flags for EVENTS  
#define  PEDS_WAITING    0
#define  YELLOW          1
#define  RED             2
#define  WALK            3
#define  WALK_FL         4
#define  OFFLINE         5
#define  TRYONLINE       6

// BITS of TRAFFIC LIGHT
#define  LGO             0
#define  LSTOP           1
#define  LGREEN          2
#define  LYELLOW         3
#define  LRED            4

// Bits of light represent the different lights
unsigned char Light;

char tsk0, tsk1, tsk2, tsk3, tsk4, tsk5, tsk6, tsk7, tsk8, tsk9, tsk10, tsk11, tsk12;

// to remember the time of last state change
unsigned long lastCycle=millis();

// the traffic light is working normal
bool TrafficLightOnline=true;

// defeines to simlify code
#define  SET_EV(X)  BIT_SET(&Event,X)
#define  CLR_EV(X)  BIT_CLEAR(&Event,X)
#define  SET_LI(X)  BIT_SET(& Light,X)
#define  CLR_LI(X)  BIT_CLEAR(& Light,X)

// Here are the functions called by events:

//----------------------------------------------------------------------------
// PED pressed button
void Ped_Switch() {
  Serial.print("Ped Cycle activated    ");   Serial.println(millis()/1000);

  if ((millis()-lastCycle)<7000) {
    Serial.println("PEDs have to wait some seconds");
    while ((millis()-lastCycle)<7000) internrunner(1000,PRI_USER);
  }  
  SET_EV( YELLOW);
  SET_LI(LYELLOW);
  CLR_EV(PEDS_WAITING);
}

//----------------------------------------------------------------------------
// switch light to RED
void Red() {
  CLR_EV(RED);          // Event is used only once
  CLR_LI(LYELLOW);      // Clear YELLOW light
  SET_LI(LRED);           // set   RED    light
}

//----------------------------------------------------------------------------
// switch light to YELLOW for 3 seconds
// then to RED and wait 1 second
// then set WALK
void Yellow() {
  Serial.println("RED YEL GRN     STP GO SEC");
  CLR_LI(LGREEN);      // Clear GREEN light
  SET_LI(LYELLOW);       // Set YELLOW light
  SET_EV( RED);          // set Event RED
  SET_EV( WALK);
  CLR_EV( YELLOW);
}

//----------------------------------------------------------------------------
// Signal WALK for 4 seconds
// then set DONT WALK FLASH
void Walk() {
  SET_LI(LGO);          // set WALK light
  CLR_LI(LSTOP);      // clear STOP light
  SET_EV( WALK_FL);     // set Event WALK_FL
  CLR_EV( WALK);      // clear WALK light
}

//----------------------------------------------------------------------------
// flash DONT WALK for 10 seconds
// then normal state:  DONT WALK, GREEN
void Walk_Fl() {
  CLR_LI(LGO);        // CLEAR GO ligh  
  for(int i=0; i<10; i++) {   // flash DONT WALK light
    SET_LI(LSTOP);    // for 10 seconds
    internrunner(500000,PRI_USER);
    CLR_LI(LSTOP);
    internrunner(500000,PRI_USER);
  }
  lastCycle=millis();         // remember PEDs last crossing
  CLR_EV( WALK_FL); // stop Event WALK_FL
  CLR_LI(LGO);      // clear WALK light
  CLR_LI(LSTOP);    // clear DONT WALK light
    
  DoNormal();
  if (BIT_TEST(&Event,OFFLINE)) {
                Serial.println("<<<<<<<<<<<<<<<<<<<<<<<<<PED Cycle finished  GO OFFLINE");          
                CLR_LI(LGREEN);    // OFFLINE was pressed during PEDs crossing
                CLR_LI(LYELLOW);   // set DONT WALK and RED
                SET_LI(LRED);
                CLR_LI(LGO);
                SET_LI(LSTOP);
  }
}

//----------------------------------------------------------------------------
// traffic light OFFLINE: Blink DONT WALK and RED
void BlinkOffline() {    
    //if (level>1) {
    //  return;
    //}
    if (Event != (1<<OFFLINE)) {
      return;
    }
    Serial.println("                            OFFLINE");
    TrafficLightOnline=false;
    CLR_LI(LGREEN);
    CLR_LI(LGO);
    CLR_LI(LSTOP);
    SET_LI(LRED);                    // set RED light
    internrunner(500000,PRI_USER);           // wait 1/2  seconde
    SET_LI(LSTOP);                   // set DONT WALK light
    CLR_LI(LRED);                  // clear RED light
    internrunner(500000,PRI_USER);           // wait 1/2  second
}
 

//----------------------------------------------------------------------------
// normal operation: DONT WALK and GREEN
void DoNormal() {
  Serial.println("                            NORMAL OPERATION");
  Serial.println("RED YEL GRN     STP GO SEC");
  SET_LI(LSTOP);                    // set DONT WALK light
  CLR_LI(LGO);                      // clear WALK light
  CLR_LI(LRED);                     // clear RED light
  CLR_LI(LYELLOW);                  // clear YELLOW light
  SET_LI(LGREEN);                   // set GREEN light
}

//----------------------------------------------------------------------------
// get commands from serial line  
void GetCmds() {
  if (Serial.available()) {
    unsigned char c=Serial.read();
    
    // decode commandes
    switch(c) {
      case '1':
                SET_EV(TRYONLINE);
                break;
                
      case '0': if (Event!=0) Serial.println(">>>>>>>>>>>>>>>>>>>>>>>>>OFFLINE: Finishing PED Cycle");           
                SET_EV(OFFLINE);
                break;
                
      default:  if (TrafficLightOnline) {
                  SET_EV( PEDS_WAITING);
                  Serial.print("PED SWITCH PRESSED     ");   Serial.println(millis()/1000);

                }
                break;
    }
    
    // clear serial buffer
    do {
      Serial.read();
    }
    while (Serial.available());
  }
}

//----------------------------------------------------------------------------
// set traffic line ONLINE if it is not already
void TryOnline() {
  if (BIT_TEST(&Event,OFFLINE)) {        // is OFFLINE ?
    if (Event == ((1<<OFFLINE) | (1<<TRYONLINE) )) {
                                         // switch to ONLINE
      Serial.println("                            SWITCHED ONLINE");
      Event=0; DoNormal(); TrafficLightOnline=true;
    }
  }
}

unsigned char lastLight;


// show the stae of the traffic line
// could be substituted by relais to controll mthe lights
void ShowLight() {
 
#ifdef DEBUG  
  Debug();
#endif

  if (lastLight==Light) return;
 
  if (BIT_TEST(&Light,LRED))     Serial.print("X   ");
  //else                           Serial.print("off ");
  else                           Serial.print("-   ");
 
  if (BIT_TEST(&Light,LYELLOW))  Serial.print("X   ");
  //else                           Serial.print("off ");
  else                           Serial.print("-   ");
 
  if (BIT_TEST(&Light,LGREEN))   Serial.print("X       ");
  //else                           Serial.print("off     ");
  else                           Serial.print("-       ");
 
  if (BIT_TEST(&Light,LSTOP))    Serial.print("X   ");
  //else                           Serial.print("off ");
  else                           Serial.print("-   ");
 
  if (BIT_TEST(&Light,LGO))      Serial.print("X  ");
  //else                           Serial.println("off ");
  else                           Serial.print("-  ");
 
  Serial.println(millis()/1000);
  lastLight=Light;
}
 
unsigned char DebugFlag;

//----------------------------------------------------------------------------
void Debug() {
char buf[20];
char *pt;
char ev;
  if (Event != DebugFlag) {
    ev=Event;
    pt=buf;
    for (int i=0; i<7;i++) {
      if (ev & 0x40) *pt++='1';
      else           *pt++='0';
      *pt++=' ';
      ev<<=1;
    }
    *pt=0;
    //Serial.println("EVENT   PYRWFOT");
    Serial.println("EVENT   T O F W R Y P");
    Serial.println("        R F L A E E E");
    Serial.println("        Y F A L D L D");
    
    Serial.print  ("        "); Serial.println(buf);
    DebugFlag=Event;
  }
}

//----------------------------------------------------------------------------
void setup() {
 
  Serial.begin(115200);
 
  TrafficLightOnline=true;
  Light=0;
  DoNormal();

  // The tasks:
 
  // Get switches from serial line:
  // '0'   set traffic light offline
  // '1'   set traffic light online
  // any otherL: PED presses buttun
  tsk0=run(GetCmds,              100000, PRI_USER,          0,    0);            // get commands every 100 ms
 
  // PED button pressed
  tsk1=run(Ped_Switch,                1, PRI_USER,          0,    1<<PEDS_WAITING);

  // do YELLOW phase
  tsk2=run(Yellow,                    1, PRI_USER,          0,    1<<YELLOW);   // start immediatly
 
 // do RED phase
  tsk3=run(Red,                 3000000, PRI_USER,          0,    1<<RED);      // wait 3 secondes when set
 
  // PED walks
  tsk4=run(Walk,                4000000, PRI_USER,          0,    1<<WALK);     // wait 4 seconds when set

  // PED is walking, flash DONT WALK light
  tsk5=run(Walk_Fl,             4000000, PRI_USER,          0,    1<<WALK_FL);  // wait 4 seconds when set
 
  // switches traffic light offline
  tsk6=run(BlinkOffline,              1, PRI_USER,          0,    1<<OFFLINE);  // start imeediatly
 
  // set traffic light ONLINE, if it is OFFLINE
  tsk7=run(TryOnline,                 1, PRI_USER,          0,    1<<TRYONLINE);// start immediatly
 
  // show the state of the traffic lighs 5 times a second
  tsk8=run(ShowLight,            200000, PRI_USER,          0,    0);           // every 200 ms
}

//----------------------------------------------------------------------------
// the same procedure as every year :)
void loop() {
  while(1) {
    runner(PRI_USER);
  }
}

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Function pointers as well as the stack is a source of unreliability. Misra did outlaw the use of function pointers in the early days but recinded in later versions of the standard.
In an AVR context, most apps have fixed tasks so that would remove the need for function pointers.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

There have been some very large commercial systems (200+ processes, 100+ users, etc) built on top of cooperative multitasking, BTW.  Cisco's IOS, for instance.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

==> #14

Well, multitasking needs function pointers and/or pointers into the stacke and/or pointers to free/used memory. It works for Windows, Linux, OS X - all OS since the fiirst ones.

And it works for me.

Calling a function is nothing else but calling a routine at an address. These addresses are used by the compiiler. The compiler could not work without pointers to functions. It uses

tables of pointers to functions, of cause. The linker as well.

Do you think compilers are unreliable ?

The 8 bit AVRs do have 16 bit POINTERS X,Y,Z to help the implementation of compilers. They are used as pointers!

Even the 6502 used pointer in the zero page and hda special adressing modes.

In fact C uses a lot of pointers and yes, its true, you can make a lot of mistakes and you can get a lot of trouble with them and it was one of the reasons, Wirth did not allow (ok, not directly) to use pointers when

creating Pascal. But history shows: C/C++ is the winner, only very few people use Pascal today.

I understand, that you mean, task switching - no matter which method you use - should be avoided. - I just have an other opinion and want to use Windows and Linux and MT with AVRs.

 

MISRA:

Misra C:1998 rule 104 reads \"Non-constant pointers to functions shall not be used\"

 

The pointers in my demo are constant, because they are installed just once with the RUN() command (in the full version stopped tasks are replaced by new ones with RUN()) - no violation of MISRA rules at all !

Yes, I know that just replacing stopped functions could bring me into trouble - so I never used it and I know that I have to take care to avoid side effects using them  - but again, I never used it.

 

 

I find it amazing: Billions of instructions inside much more programs than i would like to count  - for hours and days ands weeks - without an (visible) error.

Its like a wonder - but it shows: it is possible to use pointers to everything.

But I shall be impressed very much, if you can write the PELICAN without any MT. So please impress me ... with some code.

And -seriously- I will invest hours to understand, how you managed that.

 

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Looking at the thread you can think, i find that CO is better than PE.

I just want to provoke

Please show me, that you can do it FASTER with PE. I put all my work to trash and i shall be thankful.

But do not argue - give me an example!

What ever I tested - CO with AVRs was smaller and faster. But again: please give me an example to show

how PE could be faster than that what I am doing.

I am willing to learn - but give me something to learn.

DO NOT DO THAT  or  EVERYBODY KNOWS THAT or MAYBE TRY THAT is not really what will change my mind.

There MUST be someone who is willing to show: USE PE LIKE THIS( AVR code follows).

Anybody out there who is willing to do ?

 

 

 

Last Edited: Fri. Feb 6, 2015 - 10:17 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Very few people here use CO or PE. They right real world MCU apps that don't generally use any kind of OS so you may find yourself scratching about trying to find someone to "provoke" in your little debate here.

 

Those of us who have used CO and PE know the merits and demerits of both. It's a bit like asking "which is better: a screwdriver of a hammer?". The answer is "it depends".

 

If one was always a better solution than the other one of the options would surely have died out by now.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

==> #15

Your post gives me an idea.

Maybe PE is the best possibility, when you dont know, which programs the user will start.

But when you have all running tasks under your control because you write them yourself maybe CO is sometimes the better solution?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

 

Heweb, I think you misunderstand me. I'm aware of the implications. I try to avoid using function pointers ,gotos and interrupts - that is not to say I don't use them, but hopefully I use them carefully. Misra also suggests a number of things to avoid when coding. It also gives you an 'out' where you can't avoid them.

If you think my level of paranoia is bad, try doing space rated stuff. The likelihood of corruption due to high energy particles is much higher. Are compilers unreliable? There must be some uncertainty as for safety critical applications you need to use a validated one. Are the programmers unreliable? History would say so - otherwise we wouldn't have as many defects. Are processors unreliable? Yes. Even a billion to one anomaly can happen with a high rate of occurrence. Ever looked at the ECC logs of a server computer? Memory errors happen frequently. For the most part, these errors won't kill anyone.

What do you determine to be multitasking? The AVR is only executing an instruction at the most every clock, so there is only one thread of execution happening at a time. Like QP, my first thought was to use a finite state machine to implement the logic - ultimately, the logic is executed by a finite state machine anyway. Using a state machine you can formally verify the operation of the code thus minimising one part of the unreliability equation. Is using a state machine performing multitasking? If multitasking is a method of running a number of separate tasks at different times to give the illusion of operating in parallel, then even a finite state machine solution would qualify as multitasking.

Like my example in the tutorial section, your code is also a re-arranged super loop. So we're not covering any new ground. Fast? If I was counting cycles and bytes, I'd do it in assembler or an eprom and a latch! Diodes and gates? Cams and switches? Fluidics?

 

Seriously, I'd write my solution to PELICAN much the same as you did but I would've avoided function pointers. That might save a couple of bytes or cycles or it might cost more - without doing measurements we would only guess. Doing it as a finite state machine would be laborious by hand (in days gone by I used to do it by hand) and I gather QP does the dirty work for you - I've not looked at it in depth. Depending on the actual implementation of the finite state machine, it might be smaller but probably larger in terms of bytes but I'd expect the execution to be quicker - everything is solved beforehand.

just remember, when you drive your car, you're putting your life into the hands of a number of micro controllers. Hopefully they will always do the right thing. That is, unless you're driving a trabant.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But when you have all running tasks under your control because you write them yourself maybe CO is sometimes the better solution?

Screwdriver or hammer? I wonder...

 

I actually worked on a 100 task system that was initially implemented CO (actually a model very like the original 16 bit Windows with one main message queue driving everything) and later we moved to Linux which is obviously PE. There were advantages/disadvantages to both approaches.

 

I suppose the main issue of CO is that you have to trust all your programmers! If you, for example, give a rule that "no task must block for more than 5ms and if it is required the work must be broken into a state machine" then you have to be pretty darned sure that all programmers in all circumstances adhere to this rule otherwise your UI (for example) might go "unresponsive" because some greedy task blocks the CPU. Also there are still "fast tasks" in there but they hang off the interrupts and generally just post stuff into the message queue as and when they this other tasks might like to be triggered to handle events (like VM_KEY/VK_BUTTON messages in Windows).

 

On the other hand you could argue that PE makes for lazy programmers. They all think they "own" the CPU!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

==>#20

It would be amazing to meet and discuss. I drove some Mercedes Benz and BMW, but never a trabant - so I can not compare with your experiences concerning that point ;).

Yes, shit (errors) happens and the best of us are writing programs that can live with that.

 

The finite state machine of QP uses MT:

"For these and other reasons experienced programmers turn to the long-know design strategy called event-driven programming, which requires a distinctly different way of thinking than conventional sequential programs. All event-driven programs are naturally divided into the application, which actually handles the events, and the supervisory event-driven infrastructure (framework), which waits for events and dispatches them to the application."

 

In qp_config you can chose:

qp_config.h

// enable preemptive QK kernel (cooperative kernel is used when not defined) #define QK_PREEMPTIVE  

You have the choice: PE or CO

You may NOT select: No multitasking

 

An Event driven paradigma without some kind of MT is not possible !

Event driven programs are nothing else but CO: get events (for instance from interrupts)  and schedule the appropriate task!

 

Yes, a state machine could give the illusion of MT. But when the machine is not a closed black box, when the machine has to react  to external events

then you need a kind of multitasking as well.

 

So QP shows how to combime MT and state machine. They do NOT see the state machine as an alternative to MT!

It is an interesting approach to solve problems - I said that.

I am willing to learn - please show me your code of PELICAN.

 

QP without MT could not work !

 

 

Last Edited: Fri. Feb 6, 2015 - 11:48 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Oh I finally understood what your "==> #N" were about - you are referring to previous post numbers!?!?

 

You do know that if you hit the [Reply] button on post #20 that your message then says "this is a reply to #20" in the header don't you?

 

Anyway I'm just having fond memories if Ventura Publisher running on DR GEM (apart from Macs it was the "goto solution" for desktop publishing at one stage). I'm not sure if the fault was GEM itself or VP but I do know that when you told it to print a 200 page user manual the mouse cursor would switch to an hour glass for several hours while the UI was starved of any kind of service. I think something in there was based on a co-operative model and someone was not following the rules!

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

==> #21

I think we will find the most important point:

CO means  you should know and communicate with the programmers! ( talk with yourself ;) )

For unknown programmers/programs  I would prefer PE!

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

No, sorry, I did'nt know that (ashes on my head).

I am not used to write in forums.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Thanks, i've learned that..

Yes, the enemy of CO are tasks never ending!

That is what i mean: Never wait for an event if you dont know when it comes.

Waiting on TX ready is ok. You know the time the transmitter needs.

Waiting on RX is blocking if you dont know if there will come a character and when.

You have to fine tune your application using CO - much more testing (logic analyzer) and thinking!

But well done it could be very, very fast !

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

But the issue is often when you are waiting for some external/asynchronous to occur and it doesn't happen when expected. A PE system can just sleep the task until something happens. A badly designed CO task might just block everything.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

The special point of my thread is not: What is better at all.

The point was: AVR microcontrollers, limited space, you program the whole stuff - would it under these circumstances be possible to get smaller and faster code using PE compared with CO ?

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

heweb wrote:

...under these circumstances be possible to get smaller and faster code using PE compared with CO ?

 

At the OS level then the answer has to be no; a CO will always be faster simply because a PE will always need a context switch which on an AVR is 72 instructions most of which are 2 cycle ones so around 140+ processor cycles on every task swap.

'This forum helps those who help themselves.'

 

pragmatic  adjective dealing with things sensibly and realistically in a way that is based on practical rather than theoretical consideration.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

for that i use:

System Task (high priority) checking: if that event happened sest an Event-flag   - or -

ISR(...) which set an Event-flag ( I think thats much like QP) and let the dispatcher call the task.

The event-dispatcher has the highest priority in my OS and is created automatically when the first task creatiion with RUN() is called.

It does not start the task but removes the waiting-flag.

With naked ISRs (saving only registers, which are used) setting event-flags like this could be as fast (or faster!) than doing the job within the interrupt.

Yes again - blocking tasks and CO are enemies.

 

 

Last Edited: Sun. Feb 8, 2015 - 10:19 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Waiting for characters at RX:

They are received by Interrupts and transfered to a (small) ring buffer.

A flag is set and another task transfers (and maybe filters) the characters from the small ring buffer

and distribute  them to - for instance - different buffers to build commands for tasks.

To be really fast I had to program some routines like SPRINTF or LTOA ( using INTERNRUNNER at the time consuming points)

to garantee fast round robin times.

That would not be necessary with PE - but then you may wonder, that some library routines stop interrupts for a longer time than

you estimate and you loose any  aspect of REAL TIME OS

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I am not so sure - otherwise i had not started this thread

328  @ 16 MHz: 140  2cycle instructions need about 17.5 us.

And with PE the number of tasks do not influence the time of switching very much.

And you can organize the tasks in arrays, in different queues etc.

Task switching with more than10 tasks within 30 us / switch should be possible.

Creating 5 frequencies=5 tasks toggling pins (without a timer ISR) with very few jitter would be a good example.

 

I think I am an experienced programmer but I know there are (at least) some thousands around the world being much better.

I really want to see a fast working example of PE (doing something in a real project) to compare and I would be lucky to find it.

Comparing examples - I think - would be much more instructive than discussions.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I for one would be interested in seeing your CW code machine q:-) 

PM me with a link if that is easier.

 

Jim

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

heweb wrote:
Task switching with more than10 tasks within 30 us / switch should be possible.
fyi, the creator of QP wrote this

EmbeddedGurus

State Space

Fast, Deterministic, and Portable Counting Leading Zeros

Monday, September 8th, 2014 by Miro Samek

http://embeddedgurus.com/state-space/2014/09/fast-deterministic-and-portable-counting-leading-zeros/

Counting leading zeros in an integer number is a critical operation in many DSP algorithms, such as normalization of samples in sound or video processing, as well as in real-time schedulers to quickly find the highest-priority task ready-to-run.

...

NOTE: In case you wish to use the published code in your projects, the code is released under the “Do What The F*ck You Want To Public License” (WTFPL).

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

jim,

I can send you the code, but:

All commentents are in German because it is for a local CW club.

Do you want it anyway ?

Helmut

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 1

Maybe someone is interested in my investigation.

Here are the timings of an experiment.

First of all the good news:

 

8 Jobs running.

Scheduler tests if jobs are ready to run every 100us, MUCH

faster than a PE OS with 1 ms ticks.

 

(0) Timer1 t=400us

Set CTC Flag but Interrupts are not enabled. This is not a task,

but only inits Timer1

 

1) GetT1() Should look at the flag, increments counter T1 and resets the flag.

Without missing a flag: count 2500+/-1 per second

 

2) Timer2 t=200us

Interrupt is enabled. T2 is incremented inside the interrupt.

A flag is set.

 

3) GetT2() like GetT1, but should count 5000 flags / second.

It can be compared with the counter of the interrupt.

Must be equal +/-1

 

4)-7) 4 Tasks, which increment a their own counters

 

8) ShowResults() is called every second and shows the statistics.

The other tasks should go on and must not miss flags to count during output.

 

GetT1() and GetT2() do not loose the flags of the timers, even using

formatted printf (selfmade) and serial output (Interrupt driven, selfmade)

do disturb the counting of the timerflags.

 

Per second 35634 task calls are done (Interrupts of Timer2 are counted as task):

2500 Timer1 flags counted by GetT1()

5000 Timer2 interrupts

5000 GetT2() counting Timer2 flags

23134 calling 4 Counter Tasks (5773 5780 5787 5794)

35634

 

The program needs 432 bytes SRAM and much less stackspace than any RTOS

GetT2() with PRI_KERNEL is called more than 10000 times per second to garantee

not missing a Timer2 flag set from Interrupt.

 

In this special configuration it could be called a REAL TIME OS because it demonstrates

how it is possible to react on external (or internal) events, here: interrupts within garanteed

less than 200us. I do not mean the interrupt itself, but calling the appropriate task in a multitasking

system.

 

I am sure: that is much more than you can do with a timer ticked PE OS !

 

So – is CO OS the solution ? Yes, when it must be fast.

Yes, when you are using libraries which may not work properly with interruption at any point.

 

>>>>> But there are a lot of drawbacks:

* You need special routines for I/O

* you need a logic analyzer to test the system after modifikation.

Even incrementing two numbers instead of one or a long int instead of int

or adding a task may change the timing.

In our example the system starts to loose some flags and do not count to 5000.

 

Its more like handcrafting shoes to program your customers needs.

But despite all that it is amazing, how fast it can be.

 

Here are the results:

 T1     T2         T2           Counter

 cnt    cnt       IRQ       1       2        3      4

2499 5000     5001 5974 5981 5988 5995

2501 10001 10001 5777 5784 5791 5798

2500 15001 15001 5772 5779 5786 5793

2500 20001 20002 5770 5777 5784 5791

2500 25002 25002 5771 5778 5785 5792

2500 30002 30002 5768 5775 5782 5789

2501 35002 35003 5768 5775 5782 5789         T2cnt is sometimes 1 back

2500 40003 40004 5770 5777 5784 5791         … but synchronizes later

2500 45004 45004 5768 5775 5782 5789

2501 50004 50005 5772 5779 5786 5793

2499 55004 55005 5769 5776 5783 5790

2500 60005 60005 5772 5779 5786 5793

 

Last Edited: Sun. Feb 8, 2015 - 10:07 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In this special configuration it could be called a REAL TIME OS because it demonstrates

how it is possible to react on external (or internal) events, here: interrupts within garanteed

less than 200us. I do not mean the interrupt itself, but calling the appropriate task in a multitasking

system.

 

 

I would expect a preemptive task switch to happen much faster than that. 'Real Time' is whatever is determined to be fast enough for the system. For years, many small embedded projects were done using a 'super loop' - you've just unrolled it and made it a bit more generic and slower. You've traded generality for efficiency. It makes the code easier to write and maintain. Don't get me wrong - I'm not trying to put you down, but you've hardly found anything new. 

Doing a task switch with the AVR8 is relatively slow, on other architectures it can be significantly faster. I've seen figures of 400ns for a task switch on a Cortex M4. Obviously, if you have a 100+MHz micro controller with 128+kbytes of ram, the negatives of a preemptive system fade quickly. One project I did with such a micro controller I still used a cooperative tasker as going preemptive added no benefit. It worked on a 5ms tick.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

GetT1() and GetT2() do not loose the flags of the timers, even using

formatted printf (selfmade) and serial output (Interrupt driven, selfmade)

do disturb the counting of the timerflags.

Helmut,

 

Is that a typo? (Missing "not" perhaps?)

 

Cheers,

 

Ross (enjoying this thread)

 

Ross McKenzie ValuSoft Melbourne Australia

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, Ross, you are right! It is a typo - thank you !

 

The demo pushes the 328 to the limits and the timings where choosen so that  I do NOT loose  any flag .

Thanks again.

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Preemptive systems certainly can run tasks with better granularity than the tick rate. If a task wakes up after waiting for some event the scheduler may run that task immediately, even if another task is currently running.

I've written a preemptive task scheduler based on the BFS scheduler (which automatically gives priority to waking tasks over running tasks). While experimenting with this scheduler, and to make sure waking tasks did run immediately, I set the preemption tick rate to one second (!), and yet the responsiveness of the system did not suffer. In my test I had a task waiting for user input, a task waiting to write audio samples to an audio FIFO buffer, and a task that busy-looped just for the sake of it. The audio didn't drop out (the audio buffer was only about 6ms long) and the user input was handled immediately. The busy loop task was spinning its wheels but didn't have any noticeable impact on the system.

Of course, preemption does have overhead in saving the entire processor context. But on my system (a 68k running at 12 MHz) that overhead would probably be less than 20% if there were a context switch every 100 microseconds, which isn't likely ever to happen in normal circumstances (high-speed tasks, such as the audio driver, are implemented as interrupt handlers).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

christop wrote:
But on my system (a 68k running at 12 MHz) that overhead would probably be less than 20% if there were a context switch every 100 microseconds, which isn't likely ever to happen in normal circumstances (high-speed tasks, such as the audio driver, are implemented as interrupt handlers).
Though dated, here's a 10kHz to 11kHz digital control system :

Linux Journal

Real-Time Control of Magnetic Bearings Using RTLinux

From Issue #135
July 2005

May 26, 2005

By Harland Alpaugh

http://www.linuxjournal.com/article/8029

...

Conclusion

RTLinux is used to control a working rotor test rig at Tufts University. The controller is realized on a conventional Pentium III personal computer using the RTLinux extension of the Linux operating system. The control algorithm is implemented in C. Various control laws can be implemented and tried on an actual experiment.

An additional advantage is the elimination of a target computer, since the real-time OS operates on the same processor as the host computer. Most applications developed as digital control systems launch as a startup executable on a proprietary real-time target computer. The approach presented here differs; it does not target a RT controller based on a proprietary development system. It uses a Linux software environment developed for applications in control and data acquisition requiring hard real-time (deterministic) execution.

That article states the Pentium III is in a 1GHz PC; maybe the follow-on to Intel Galileo 2 will approach 1GHz.

http://arduino.cc/en/ArduinoCertified/Products

"Dare to be naïve." - Buckminster Fuller

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Yes, i would prefer a PE system with faster processors / processors with more ram.

I dont want to reinvent the wheel.

But sometimes it is useful to do some investigations to find the best solution

 

There are reasons to use CO sometimes.

 

1) With Arduino IDE 1.5.x (I use 1.5.8) its is possible to use YIELD(), which is called inside DELAY().

unsigned long Count;

void yield() {
  Count++;
}

void setup() {
 Serial.begin(115200);
}

void loop() {
  delay(1000);
  Serial.println(Count);
}

Output:

189222
378433
567644
756854
946065
1135276
1324487

 

This could be used to start the "Super-Loop".

It is possible to use it with all programs for Arduino and  libraries without influencing the program timings!

With 1000000 baud  it is possible to send 100 bytes  in 1 ms (delay(1)) to the host to help debugging/analyzing

Not possible with PE.

 

For PE you need timer ticks - CO may work without.

There are times you have to react fast. Mostly you do it witjh interupts.

But the AVR328 is not built for nested IRQs - there is even not an NMI - the ancient 6502 had it.

 

With this simple program you can demonstrate that the timer0 IRQ used for micros() and millis()

means you have to take into account that every IRQ may be disturb your timings:

void setup() {
  pinMode(2,OUTPUT);
}

void loop() {
  while(1) {
    asm(".equ   PORTD,0x0B");
    asm(" sbi   PORTD,2");        // set Marker at pin2
    asm(" cbi   PORTD,2");
  }
}

 

 

Without any other IRQ it will be possible to react within 5 us or faster inside your own IRQ. and CO may run without timers.

You are not able to dream of that using PE with timer ticks.

 

Again: I dont want to say that CO is better than PE, but there are still circumstances to use CO.

If task switch timinig  and ram do not  have to be  considered I always will prefer PE.

 

But sometimes it is possible to do jobs with CO  wich are not possible with PE.

This is an Arduino forum and 68000 are very differrent machines.

 

What can I do with an Arduino and - if possible - with an AtTiny85?

 

35600 tasks per ssecond, that mean:

With an average of 28us / running task doing something  (not counted the idle calls)  it will be hard to implement PE to perform faster

and use less stack space and be as responsive - thats what I think.

 

But again: I dont want to win a fight - I want to learn to do it better.

If someone could give me some code for ARDUINO to do the examples better:

Please dont argue, give us your code and you will be my hero, seriously !

 

Yes, Kartman, its true: "You've traded generality for efficiency."

 

But we are not speaking about Cortex M4 or 68000 and of Linux. or implementing an OS for

user programs, we are talking about these very small and cheap MCUs used for Arduino.

And programmers, which do the complete job from OS to all the tasks for a special! purpose.

 

Its not fair, if we discuss, how to leg a tree with a knife and you answer: Buy a gas-engined  saw.

 

"I would expect a preemptive task switch to happen much faster than that."

If you count the cycles for pushing and popping all registers you may compute, how many task switchesw you can

perform in one second without doing something else but task switching. Please give me a number calculating cycles and not

expecting something.

 

Again: I am willing to learn and I would be happy to be convinced with an example - coded for an Arduino, thats our forum -

doing it faster with less ram and same garantee to fetch 7500 events/s from the outside AND do some other tasks like counting

and perform serial output.

Keep in mind: I created some tasks to simulate these events which costs extra time and ram.

 

Maybe there is someone out there who is willing to argue  with an example comparable with mine?

Someone had been sure that there is  worldwide a market for only 100 PCs ;)

Someone told me that PE is faster under all circumstances.

Someone said: "Believing is good, controlling is better".

 

Please make me believe.

Maybe one of the millions of Arduino programmers is willing to help?

 

 

Last Edited: Sun. Feb 8, 2015 - 09:31 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here is  the normally used task switching sequence.

It is "stolen" form nilOS, a stripped down ChibiOS.

 

// Test task swith timing
// From nil.core.c  nilOS, stripped doen ChibiOS
// NOT TO BE USED - ONLY FOR DEMONSTRATION

 

//void _port_switch(thread_t *ntp, thread_t *otp) {
void _port_switch() {
  asm volatile ("push    r2");
  asm volatile ("push    r3");
  asm volatile ("push    r4");
  asm volatile ("push    r5");
  asm volatile ("push    r6");
  asm volatile ("push    r7");
  asm volatile ("push    r8");
  asm volatile ("push    r9");
  asm volatile ("push    r10");
  asm volatile ("push    r11");
  asm volatile ("push    r12");
  asm volatile ("push    r13");
  asm volatile ("push    r14");
  asm volatile ("push    r15");
  asm volatile ("push    r16");
  asm volatile ("push    r17");
  asm volatile ("push    r28");
  asm volatile ("push    r29");

  //asm volatile ("movw    r30, r22");
  asm volatile ("movw    r30, r24");
 
  asm volatile ("in      r0, 0x3d");
  asm volatile ("std     Z+0, r0");
  asm volatile ("in      r0, 0x3e");
  asm volatile ("std     Z+1, r0");

  asm volatile ("movw    r30, r24");
  asm volatile ("ldd     r0, Z+0");
  asm volatile ("out     0x3d, r0");
  asm volatile ("ldd     r0, Z+1");
  asm volatile ("out     0x3e, r0");

  asm volatile ("pop     r29");
  asm volatile ("pop     r28");
  asm volatile ("pop     r17");
  asm volatile ("pop     r16");
  asm volatile ("pop     r15");
  asm volatile ("pop     r14");
  asm volatile ("pop     r13");
  asm volatile ("pop     r12");
  asm volatile ("pop     r11");
  asm volatile ("pop     r10");
  asm volatile ("pop     r9");
  asm volatile ("pop     r8");
  asm volatile ("pop     r7");
  asm volatile ("pop     r6");
  asm volatile ("pop     r5");
  asm volatile ("pop     r4");
  asm volatile ("pop     r3");
  asm volatile ("pop     r2");
  //asm volatile ("ret");

}

void setup() {
  Serial.begin(115200);
  pinMode(2,OUTPUT);
}

void loop() {
  while(1) {
     asm(".equ   PORTD,0x0B");
     asm(" sbi   PORTD,2");        // set Marker at pin 3 for calls to GetT2
     asm(" cbi   PORTD,2");
     _port_switch();
  }

}

 

Some code is missing of course: Maintain the tasks and priorities, the different queues, preparing the task switch , some more lines to realize the task switch, ...

I think we can estimate taskswitch timing from 25 to 30us , maybe more.

30000 - 40000 switchings /s using  100% CPU time!

 

More than 35600 task slices doing some code with CO are not so bad - maybe someone can beat it with PE, I am sure: thats not easy.

I am not expecting wonders!

Thats what we are talking about.

 

 

Last Edited: Sun. Feb 8, 2015 - 09:28 PM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

I for one won't argue that preemptive is faster or better than cooperative in all cases, and almost certainly not in low-end systems like Arduino. I actually do believe in using simpler multithreading techniques where preemptive doesn't make sense.


In fact, in my preemptive system I make use of "tasklets", which are short-running tasks that run to completion and are not preempted (except possibly by an interrupt). They are basically "soft interrupts" because I handle the time-critical task (such as reading data from a hardware register) inside the interrupt handler and then let a tasklet handle whatever can be done later (such as processing the data and sending it to other tasks).

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Can't you just say that a preemptive system will always have more overhead than a cooperative one, just because sometimes (timer ticks) a process will be preempted and incur context switch overhead, even when it would have been better to continue running the current process?  Ie, it's easier to get deterministic timing with PE, but that's a waste whenever you don't need deterministic timing...

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Here are some tests with freeRTOS, a well known Preemtive Multitasking RTOS.

6 Tasks:

1 ) Set Semaphore

2) wait for semaphore and react

3,4,5,6) set a marker (faster than incrementing a number)

 

1. Test without cooperation, just using preemption:

 

As you can see: every task is garanteed to be called every 4000us - more for more tasks.

Thats not what I could use with some of  my applications.

 

But as some told, it could be much better to yield a task at special positions, nothing else but BE COOPERATIVE.

 This 2. example uses this option:

 

Timing is much better. You can garantee every task to get time to operate ever 486us, when ALL tasks be cooperative like this.

Its not an preemptive OS any longer, because it never does a taskswitch by it own. Why not?

The cooperative taskYIELD() is  called so often to improve speed.

Of cause adding more tasks / code inside a task  would change the timing depending on beeing highly cooperative.

No I/O is done here?

I came back to my position:

If I have to  spread taskYIELD() into the program to be fast enough using COOPERATION it is nothing else but CO.

In this case slower than my examples.

As far as I can see my examples are much faster.

(Here: NO counting at all, just reacting on  about 2000 Semaphores/s)

Remember:

 

2500 timer1 Interrupts setting semaphores which are counted in reaction like tasks1  ( here are 2000)

2500 semaphore flags are counted in reaction like tasks2  ( here are 2000)

 

5000 timer2 interrupts incrementiing a counter  inside  IRQ ( does not exist here)

a nother task to count that 5000 semaphores from that task (does not exitst here)

incrementing 4 counters ( here just markers are set)

doing serial I/O ( not existing here)

 

To be fast enough all tasks MUST be cooperativ here . You have to do the same I've done:

Put taskYIELD()  (my internrunner()) into the sprintf and printf and serial I/O routines and ALL other tasks!

 

Maybe someone tells you: "But with an RTOS" a controlling task could be added to controll the system.

I would answer: "I have a free Timer1, which is not used for ticks. It could be used for controlling and may press the "Alert button" much faster!

 

This example is based on frBlink of freeRTOS for Arduino.

I can upload the code if someone wants me to do so..

But maybe this is enough:

  while (1) {
      asm(".equ   PORTD,0x0B");
      asm(" sbi   PORTD,2");        // set Marker at pin 2
      asm(" cbi   PORTD,2");
      taskYIELD();
  }

 

But again - I might be wrong!

Give me examples ;)

 

 

 

Last Edited: Mon. Feb 9, 2015 - 05:00 AM
  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

heweb wrote:

Here is  the normally used task switching sequence.

 

But that code will only work with some compilers (gcc?) as it only saves a subset of the full register array.

'This forum helps those who help themselves.'

 

pragmatic  adjective dealing with things sensibly and realistically in a way that is based on practical rather than theoretical consideration.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Your example is rather synthetic. The outcome is what I'd expect. Can you show us how you would use FatFs in a cooperative environment and sample the adc at one kHz and write the data to a file on a sdcard?

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Right, but the missing parts building the system need much more time. It was presented for estimating the time of task the naked  task switch to get a feeling for the needed time.

What I ment was: Expecting to be much faster than my 28us would be very hard for any PE using the same processor.

The timings of freeRTOS are out of the real world and show longer timings.

As you can see in the demo above: round about 100us for all.

 

 

 

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

In my example of a cooperative tasker, my code doesn't do a task switch technically as all tasks run to completion and there is only one thread of execution (apart from isrs)  so a 'task switch' is just a function call - 3 cycles.

  • 1
  • 2
  • 3
  • 4
  • 5
Total votes: 0

Sure, I'll brush up on my German. 

If your on winlink, you can send to my winlink address, or use my arrl.net address.

Jim TNX

 

 

Pages