Forum Menu




 


Log in Problems?
New User? Sign Up!
AVR Freaks Forum Index

Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Author Message
pruby
PostPosted: Jun 07, 2009 - 03:55 PM
Newbie


Joined: Oct 02, 2008
Posts: 11


This is my first tutorial here so please be nice. I'm pretty new to AVRs, not so new to programming. I've fiddled with AVRs a few times over the last couple of years but only recently got UART <-> RS232 communications up and running. As part of my first project doing that I had a need to interpret strings sent over the UART. Naturally, I did this the complicated way.

I guess the first question to answer is why you might need to do this. The obvious one is the example given above - receiving some form of commands in a text-based format via the UART. It's possibly overkill for simple command parsing but I really wanted to test and prove the feasibility of this technique.

There seems to be a common conception that small microcontrollers just aren't fast enough or have enough memory to handle a `proper' parser. A quick search seems to indicate that people either use basic string comparisons or write small hand parsers. While it's true that most parser generators assume the availability of many KB of memory and as a result don't suit micros, hand-made parsers can also be extremely difficult to change and debug and often turn out to be less efficient than machine-generated ones.

Techniques used frequently on PCs such as using the stack heavily (e.g. recursive descent parsers), pushing and popping tokens on and off possibly large stacks (e.g. Lex/YACC) and using large tables in memory to control the parser don't fit well at all with microcontroller resource limits.

A while ago I came across a state machine compiler and parser generator called Ragel ( http://www.complang.org/ragel/ ), which has much lower requirements to run. It's stable, well-documented and most importantly for us a couple of its output modes are great for low-memory systems. The disadvantage is that it can be very complicated at first and almost lost me several times. I've since learned to love it though, which is why I decided to write this to take people through its capabilities, quirks and potential stumbling points.

To illustrate the capabilities of the system I will go through from start to finish writing an interpreter for a tiny scripting language designed to run on an atmega8 chip. All examples are using avr-gcc but should be fairly easy to port - ragel adds no dependencies to the final code.

I assume readers have good experience with C programming and understand subjects such as pointers. I also assume readers will be able to install ragel and can enter commands on the command line. Windows users can obtain ragel via the Cygwin project installer and use it via the BASH shell that is installed with it. The details of input and output to the parser will not be covered - this sort of parser could be plugged in to any communication medium.

Note on versions of ragel - this article was written for ragel 6.0. Apparently some older versions may not support all of the features used here. In particular 5.3 has been identified as not supporting some options used.


Last edited by pruby on Sep 02, 2009 - 12:27 AM; edited 1 time in total
 
 View user's profile Send private message  
Reply with quote Back to top
pruby
PostPosted: Jun 07, 2009 - 03:56 PM
Newbie


Joined: Oct 02, 2008
Posts: 11


Firstly, let's go over our basic layout of the module for our parser (microscript.rl):
Code:

#include <avr/io.h>
#include <inttypes.h>

%%{
  machine microscript;
 
  main := "";
}%%

%% write data;

static uint8_t cs; /* The current parser state */

void init_microscript( void ) {
  %% write init;
}

void parse_microscript(const char* p, uint16_t len, uint8_t is_eof) {
  const char* pe = p + len; /* pe points to 1 byte beyond the end of this block of data */
  char* eof = is_eof ? pe : ((char*) 0); /* Indicates the end of all data, 0 if not in this block */
 
  %% write exec;
}


This is a lot to take in at once - I'll just cover the basic points. When we write Ragel parsers we embed special instructions inside the host language, in this case C. A Ragel parser requires a machine (parser) definition, a block of any constants and tables needed to operate the machine, an initialisation section and an execution section. The only variable Ragel itself needs to retain between calls to know where it is the "cs" variable - the current state of the parser.

The function init_microscript will need to be called once before using the interpreter. It can then be called again later to reset it. The parse_microscript function can be called repeatedly with chunks of text to be parsed, for example loaded in chunks from EEPROM or an external source. It's perfectly fine to split half way through a line or any other unit of text - ragel will keep its place in the cs variable and pick up where it left off. It can even be run with single characters at a time, though this suffers quite a large overhead.

At the moment our machine contains nothing but we can already build it.

Code:

ragel -G2 -o microscript.c microscript.rl


This will take the file above, microscript.rl, and produce a C source file in microscript.c . The C source file will contain all our instructions from the previous file, with generated code filling in the various blocks where we told it. This file will already look pretty unreadable. I highly recommend taking a look but don't worry if it all looks like line noise - it certainly does to me. There are various ways of generating the parser. For now the -G2 (fully inlined GOTO-driven parser) mode will be the best for our purpose.

So, let's actually allow this machine to parse something now. The easiest thing to start with is numbers:
Code:

%%{
  machine microscript;
 
  number = digit+;
  whitespace = space+;
 
  main := number (whitespace number)*;
}%%


Ragel machine definitions are written as a series of available expressions made up by combining other expressions and raw inputs using operators very similar to regular expressions. digit+ means "one or more of digit", where digit is a built-in definition covering the characters 0-9. space is a built in definition covering all forms of spacing characters - spaces, tabs, new lines, etc. Finally we declare a machine - a fully set up parser declared with ":=". If we build this now we will have a fully functional machine which matches a series of numbers.

A good way of visualising what's going on is to use Ragel's support for creating GraphViz-compatible files. To generate a graph of the parser created you can run (requires GraphViz):
Code:

ragel -Vp -o microscript.dot microscript.rl
dot -Tps -o microscript.ps microscript.dot


Which should produce something like:

Ragel expresses parsers as a finite state machine, with each character coming in driving a transition from one numbered state to another. The states of the machine are illustrated as circles and the transitions between them as arrows. The circle on the left with the IN arrow leading to it is where the machine starts. If in this state it sees a digit it would change to state 3. The transitions are all labelled with the characters that will cause it to take that path. If a character is found which doesn't match a path out of the current state it would signal an error.
 
 View user's profile Send private message  
Reply with quote Back to top
pruby
PostPosted: Jun 07, 2009 - 03:56 PM
Newbie


Joined: Oct 02, 2008
Posts: 11


We don't yet have any means of storing or using those numbers - the parser forgets about the digits as soon as it passes them. We need to add a method of storing them. Firstly let's create a global variable to store each number we come across.

Code:

static uint16_t currentNumber;


Now we need to actually record the numbers as we find them. Ragel lets us embed actions in the transitions in and out of parts of our machine. Let's take a look at how we would write those actions:

Code:

%%{
  machine microscript;
 
  action ClearNumber {
    currentNumber = 0;
  }
 
  action RecordDigit {
    uint8_t digit = (*p) - '0';
    currentNumber = (currentNumber * 10) + digit;
  }

  number = ((digit @RecordDigit)+) >ClearNumber;
  whitespace = space+;
 
  main := number (whitespace number)*;
}%%


This is entirely new so will need some explanation. The action blocks define blocks of C code to run when the action is triggered. We then embed indicators in the expressions of when these actions should be run. To make things as clear as possible I've bracketed off the bits of number where this is attached. The ">" character indicates an entering action. This means that when we encounter the first digit that starts a number this action is performed. We set the current number to 0 so we can start adding in the digits. The '@' character indicates a finishing action. That is it will be run when we have a complete "digit" expression - as we encounter each digit. This action shifts the current number up a digit and adds the current digit in the last place. We use the pointer p which ragel steps through the character block to access the current character.

Ragel usually gets the actions in the order that would be expected. As always, the best way to check is with GraphViz:



This is the same as before but with actions written on the transitions where they will be performed.

EXERCISE: For those keen to experiment, try copying and adjusting this technique so that we also count the number of spaces between numbers in a separate variable. Make sure to set it back to 0 at the start of each block of white space. Hint: instead of reading a digit you'll just be incrementing.


Last edited by pruby on Jun 08, 2009 - 01:49 AM; edited 1 time in total
 
 View user's profile Send private message  
Reply with quote Back to top
pruby
PostPosted: Jun 07, 2009 - 03:58 PM
Newbie


Joined: Oct 02, 2008
Posts: 11


Now let's start fleshing out the language a bit. We can read numbers and they can be separated by spaces but that isn't really much use to us. Let's add some variables to the system and actually store our numbers. To keep things simple I'm going to give the language 26 variables called the fixed names $a through $z. Let's have a look at the definition of variable:

Code:

static uint8_t currentVariable;
static uint16_t variableValues[26];


Code:

  action RecordVariable {
    currentVariable = (*p) - 'a';
  }
 
  var = ('$' [a-z] @RecordVariable);


We set up a place to record the variable as we see it and a table to store the values. Now that we have the definition, we can change the syntax to make our program a series of assignments to variables. It's been a while and quite a lot has changed so here's the complete machine + globals:

Code:

static uint16_t currentNumber;
static uint8_t currentVariable, assignmentVariable;
static uint16_t variableValues[26];

%%{
  machine microscript;
 
  action ClearNumber { currentNumber = 0; }
  action RecordVariable { currentVariable = (*p) - 'a'; }
  action ReadVariable { currentNumber = variableValues[currentVariable]; }
  action SetAssignmentVariable { assignmentVariable = currentVariable; }
  action AssignValue { variableValues[assignmentVariable] = currentVariable; }
 
  action RecordDigit {
    uint8_t digit = (*p) - '0';
    currentNumber = currentNumber * 10 + digit;
  }
 
  var = ('$' [a-z] @RecordVariable);
  number = ((digit @RecordDigit)+) >ClearNumber;
 
  # A value is a variable or a number. If a variable is given as a value, read the number it represents.
  value = (var @ReadVariable) | number;
 
  # An assignment looks like "$a = 3" or "$b=$c"
  assignment = (var @SetAssignmentVariable space* '=' space* value) @AssignValue;
 
  # We're going to use a semicolon on the end of commands to make the boundaries clear
  command = space* assignment space* ';';
 
  # A program can consist of any number of commands
  main := command* space*;
}%%


I won't explain all the C code here - I'll leave it to you to see how that side of it works and will stick to explaining the Ragelisms. The only unusual thing is that since the currentVariable value is overwritten each time we encounter one and we can use variables as values we need to save a copy of the variable to save to in its own location.

One thing to observe is the use of "|" to match one of two machines. This allows us to say either a variable or a number goes here. Based on the text it will try to match both of these and will execute actions for whichever it appears to be. There are a few things to be careful of when combining expressions like this as until it's sure which way to go it can run actions for both possibilities. This doesn't affect our syntax as it's clear whether we have a number or variable from the very first character, so we're safe here.


Last edited by pruby on Feb 10, 2012 - 05:50 AM; edited 6 times in total
 
 View user's profile Send private message  
Reply with quote Back to top
pruby
PostPosted: Jun 07, 2009 - 04:05 PM
Newbie


Joined: Oct 02, 2008
Posts: 11


The graphs of the machine are getting too big for the page now - take a look at http://www.goddard.net.nz/~tim/files/tut01/figure_assign.gif. Notice the whitespace-eating loops and the loop back to the start as it waits for another command. Also notice an intriguing quirk of this design - it doesn't wait for the number to complete before assigning and will keep updating the variable on each new digit. This is one of the major quirks and traps of Ragel - an expression such as a number is considered complete if what is collected so far makes a valid number and completion actions may be run multiple times. In this case it would do no damage but let's get rid of it anyway:
 
 View user's profile Send private message  
Reply with quote Back to top
pruby
PostPosted: Jun 07, 2009 - 04:34 PM
Newbie


Joined: Oct 02, 2008
Posts: 11


I've encountered a small forum issue with percent signs. Please substitute `/, for a percent sign in the code samples if you want to use them.

Code:

  assignment = (var @SetAssignmentVariable space* '=' space* value) `/,AssignValue;


We change the @ - the 'finished' action to a percent sign - the 'leaving' action. This will be run on the first character that follows the expression. In this case it ensures that assignment happens after the number is fully entered, not after each digit. The result is http://www.goddard.net.nz/~tim/files/tut01/figure_assign_once.gif.

So that's pretty boring - let's add IO.
 
 View user's profile Send private message  
Reply with quote Back to top
pruby
PostPosted: Jun 07, 2009 - 04:35 PM
Newbie


Joined: Oct 02, 2008
Posts: 11


Code:

static uint8_t currentPort, outPort;

Code:

  action RecordPort { currentPort = (*p) - 'A'; }
  action RecordOutPort { outPort = currentPort; }
  action SetPortValue {
    switch(outPort) {
      case 1:
        PORTB = (uint8_t) (currentNumber & 255);
        break;
      case 2:
        PORTC = (uint8_t) (currentNumber & 255);
        break;
      case 3:
        PORTD = (uint8_t) (currentNumber & 255);
        break;
    }
  }
  action ReadPortValue {
    switch(currentPort) {
      case 1:
        currentNumber = PINB;
        break;
      case 2:
        currentNumber = PINC;
        break;
      case 3:
        currentNumber = PIND;
        break;
    }
  }
  action SetPortDirection {
    switch(outPort) {
      case 1:
        DDRB = (uint8_t) (currentNumber & 255);
        break;
      case 2:
        DDRC = (uint8_t) (currentNumber & 255);
        break;
      case 3:
        DDRD = (uint8_t) (currentNumber & 255);
        break;
    }
  }
 
  port_in = "PIN" [B-D] @RecordPort;
  port_out = "PORT" [B-D] @RecordPort;
  port_direction = "DDR" [B-D] @RecordPort;
 
  value = (var @ReadVariable) | number | port_in @ReadPortValue;
 
  assignment = (var @SetAssignmentVariable space* '=' space* value) `/,AssignValue;
  set_out = (port_out @RecordOutPort space* '=' space* value) `/,SetPortValue;
  set_ddr = (port_direction @RecordOutPort space* '=' space* value) `/,SetPortDirection;
 
  command = space* (assignment | set_out | set_ddr) space* ';';


OK... that just grew a lot. We've added special variables for ports, both input and output. These are treated as separate possible commands and execute their own actions to write and read data from the AVR ports. I copied the naming used in the C code itself so this should look pretty familiar. We could now write things like this in our micro-language:

Code:

  $a = PINB;
  PORTC = $a;
  PORTD = PINC;
 
 View user's profile Send private message  
Reply with quote Back to top
pruby
PostPosted: Jun 07, 2009 - 04:35 PM
Newbie


Joined: Oct 02, 2008
Posts: 11


The state machine is now much larger. An image can be seen at http://www.goddard.net.nz/~tim/files/tut01/figure_io.gif. Hmm, this is actually starting to look like it could do something if not useful at least visible. To make it really useful we need some way to alter information. I know - let's try adding operators.

Code:

#define OP_ADD 0
#define OP_SUB 1
#define OP_MUL 2
#define OP_AND 3
#define OP_OR 4
#define OP_XOR 5
#define OP_SHIFTL 6
#define OP_SHIFTR 7

static uint8_t currentOp;
static uint16_t leftValue;

Code:

  action SaveLeftValue { leftValue = currentNumber; }
 
  action ApplyOperator {
    switch(currentOp) {
      case OP_ADD:
        currentNumber = leftValue + currentNumber;
        break;
      case OP_SUB:
        currentNumber = leftValue - currentNumber;
        break;
      case OP_MUL:
        currentNumber = leftValue * currentNumber;
        break;
      case OP_AND:
        currentNumber = leftValue & currentNumber;
        break;
      case OP_OR:
        currentNumber = leftValue | currentNumber;
        break;
      case OP_XOR:
        currentNumber = leftValue ^ currentNumber;
        break;
      case OP_SHIFTL:
        currentNumber = leftValue << currentNumber;
        break;
      case OP_SHIFTR:
        currentNumber = leftValue >> currentNumber;
        break;
    }
  }
 
  infix_op = (
    '+' @{currentOp = OP_ADD;}
    | '-' @{currentOp = OP_SUB;}
    | '*' @{currentOp = OP_MUL;}
    | '&' @{currentOp = OP_AND;}
    | '|' @{currentOp = OP_OR;}
    | '^' @{currentOp = OP_XOR;}
    | '<<' @{currentOp = OP_SHIFTL;}
    | '>>' @{currentOp = OP_SHIFTR;}
  );
 
  valueOrOp = value ((infix_op @SaveLeftValue value) `/,ApplyOperator)?;
 
  # An assignment looks like "$a = 3" or "$b=$c". Port output looks similar.
  assignment = (var @SetAssignmentVariable space* '=' space* valueOrOp) `/,AssignValue;
  set_out = (port_out @RecordOutPort space* '=' space* valueOrOp) @SetPortValue;
  set_ddr = (port_direction @RecordOutPort space* '=' space* valueOrOp) @SetPortDirection;


This is now a very minimal but working scripting language. It supports basic arithmetic and I/O. It doesn't support any form of conditional statements, functions or branching yet but if the source of the script supports seeking those could be added fairly easily. Incredibly the Ragel parser generated uses only 1 byte of memory in globals. Including the space for variables the interpreter uses 62 bytes of memory.

The parser compiled as above takes up just short of 4.5KB of flash. That amount of flash isn't really necessary though. Most of that is from repeating those actions throughout the code wherever they need to be run. We can change the -G2 option on the ragel command line to -G1 and it will consolidate them all in one place. This will reduce the flash requirements to under 3KB, at the cost of running very slightly slower. Sometimes this mode will also set up a small table in memory (it decided not to this time for me) - check the size of your compiled executables and decide which option to pick based on that. You could also let it generate the table then move that in to program memory or EEPROM if both space and memory are both at a premium. The other modes are largely unsuitable for machines with little memory.

A complete copy of the code developed through this tutorial has been attached. If you made it this far, congratulations. Please let me know if you notice any mistakes in this tutorial or ask for clarification on any points. Ragel can be very difficult to pick up but as you can see, the results can also be quite spectacular and it can make difficult things very easy.
 
 View user's profile Send private message  
Reply with quote Back to top
pruby
PostPosted: Jun 07, 2009 - 04:41 PM
Newbie


Joined: Oct 02, 2008
Posts: 11


I should say - the graph of the final state machine (large file - ~200K ) is http://goddard.net.nz/~tim/files/tut01/figure_ops.gif.
 
 View user's profile Send private message  
Reply with quote Back to top
thobbins
PostPosted: May 06, 2010 - 10:57 PM
Newbie


Joined: Jan 11, 2006
Posts: 4


Hi pruby,

Thanks to your tutorial I have discovered Ragel and reimplemented my hand-coded FSM code on my ATMEGA 32. Much easier!

Cheers
Tony
 
 View user's profile Send private message  
Reply with quote Back to top
coffee_fan
PostPosted: Jul 14, 2010 - 09:46 PM
Newbie


Joined: Jul 14, 2010
Posts: 1


Pruby, this is a great "first" tutorial, it helped me to easily understand ragel and it's potential.

Thanks.
 
 View user's profile Send private message  
Reply with quote Back to top
ron_sb
PostPosted: Nov 19, 2010 - 12:35 AM
Wannabe


Joined: May 08, 2010
Posts: 51
Location: CA USA

Prior to reading this tutorial I had not heard of Ragel before.

For our hardware I needed to implement a telnet and web interface for configuration and status read-out. Both of course require some form of parser. I had started writing my own but soon came to the point where things started to get real ugly.

I spent half a day playing with Ragel and reading its manual and I must say, it's fantastic! It made implementing both the web server and the telnet interface a snap without using much memory!

Highly recommended.
 
 View user's profile Send private message  
Reply with quote Back to top
vitaly_v_ch
PostPosted: Nov 25, 2010 - 11:55 AM
Newbie


Joined: Jul 02, 2008
Posts: 7


*BIG THANKS*
 
 View user's profile Send private message  
Reply with quote Back to top
Clarkson
PostPosted: Jan 05, 2012 - 09:35 AM
Newbie


Joined: Jan 05, 2012
Posts: 1


Thanks to the recommendation, I have discovered Ragel and reimplemented my hand-coded FSM code on my ATMEGA 32. Much easier! Cheers cell phone spyware Clarkson


Last edited by Clarkson on Mar 05, 2012 - 06:33 AM; edited 2 times in total
 
 View user's profile Send private message  
Reply with quote Back to top
Galina
PostPosted: Feb 09, 2012 - 05:57 PM
Newbie


Joined: Feb 09, 2012
Posts: 1


Very helpful, thank you!
 
 View user's profile Send private message  
Reply with quote Back to top
JohanEkdahl
PostPosted: Feb 09, 2012 - 08:59 PM
10k+ Postman


Joined: Mar 27, 2002
Posts: 22189
Location: Lund, Sweden

Very interesting. The weekend just got another activity on its tentative schedule! Although I have just skimmed the firt two part it seems to be well written!

Quote:
I should say - the graph of the final state machine (large file - ~200K ) is http://goddard.net.nz/~tim/files/tut01/figure_ops.gif

Quote:

I get a 404 for both of those...

EDIT: Doh! I didn't even notice that the tutorials are several years old. I thought I browsed the Tutorials often enough to know what is here, but here I failed miserably.


Last edited by JohanEkdahl on Feb 09, 2012 - 09:21 PM; edited 2 times in total
 
 View user's profile Send private message Visit poster's website 
Reply with quote Back to top
clawson
PostPosted: Feb 09, 2012 - 09:16 PM
10k+ Postman


Joined: Jul 18, 2005
Posts: 71689
Location: (using avr-gcc in) Finchingfield, Essex, England

Quote:

I get a 404 for both of those...

That's what happens when people host tutorials off-site. I suppose it was 3 years ago and I suppose no one would have noticed if Galina had not just resurrected the thread.

_________________
 
 View user's profile Send private message  
Reply with quote Back to top
RickB
PostPosted: Feb 10, 2012 - 03:43 AM
Posting Freak


Joined: Jan 30, 2005
Posts: 1030
Location: Junction City, OR USA

https://github.com/pruby
http://www.goddard.net.nz/

This is why ALL tutorials should be hosted in their entirety on AVR FREAKS.

He does not mention Ragel on his website.

Rick

I emailed him.

His reply.

Quote:
Thanks for pointing out. Files missing from new server - will look in backups when get home.

Cheers,

Tim

Sent from my phone
 
 View user's profile Send private message  
Reply with quote Back to top
pruby
PostPosted: Feb 10, 2012 - 05:30 AM
Newbie


Joined: Oct 02, 2008
Posts: 11


Thanks for email Rick - images have been restored. What's the best way to store images on avrfreaks for use in tutorials like this? If can be stored and served as images like that, would be happy to edit posts to load from there.

Cheers,

Tim
 
 View user's profile Send private message  
Reply with quote Back to top
clawson
PostPosted: Feb 10, 2012 - 09:23 AM
10k+ Postman


Joined: Jul 18, 2005
Posts: 71689
Location: (using avr-gcc in) Finchingfield, Essex, England

Quote:

What's the best way to store images on avrfreaks for use in tutorials like this?

Here's a bit of a trick.

Go to the private messages section (for example click the "you have no new messages" link above). Send a PM to yourself, when in the message editor use "Attachement" to add up to 3 images to the post. Once sent open the PM you just got and highlight each picture in turn and "Copy Image Location". Over where you want to embed the pictures (Freaks in a second tab) past each URL in turn, highlight it and click the [IMG] button. Eh voila!

_________________
 
 View user's profile Send private message  
Reply with quote Back to top
kulans
PostPosted: May 11, 2013 - 11:26 AM
Newbie


Joined: May 11, 2013
Posts: 1


Clawson, that sounds like a very cool trick. I would never had come up with it myself. Thanks for sharing
 
 View user's profile Send private message  
Reply with quote Back to top
Display posts from previous:     
Jump to:  
All times are GMT + 1 Hour
Post new topic   Reply to topic
View previous topic Printable version Log in to check your private messages View next topic
Powered by PNphpBB2 © 2003-2006 The PNphpBB Group
Credits