A Hello World for Ruby on Ragel 6.0
January 13th, 2008 byThis is an updated version of this tutorial. This updated version is compatible with Ruby 1.8 and Ruby 1.9, and Ragel 6.0. A version of this tutorial in Portuguese is available here.
By the end of this post, you’ll be able to turn a simple string “h” into the much longer and more interesting string “hello world!” using the magic of Ragel, all from the comfort of Ruby. Ragel is a very powerful state machine compiler and parser generator, which is at the heart of software like Mongrel and Hpricot. It’s able to generate C, C++, Objective-C, D, Java or Ruby code.
Ragel has excellent documentation provided by the author. My goal here is just to give you some context so that the documentation “sticks” when you read it, and to give you a working example which you can modify as you explore Ragel’s functionality. If you want to skip ahead, the full example is here.
The first step, of course, is installing Ragel. The Ragel home page has a Download section which lists ports for various platforms. If you already have Ragel installed, check that the version is 6.0 or higher. You can also compile and install Ragel from the source. Even if you don’t want to install from source it’s handy to have a copy of it to get some examples to play with. The subversion repository for Ragel is located here:
http://svn.complang.org/ragel/trunk/
As usual the test/ directory is your friend, also check out the examples/ directory. As per this thread, try searching for “LANG: ruby”.
When writing Ragel code, you create a file with a .rl extension. The .rl file is written in the “host” language, in this case Ruby, and the Ragel machine specification is embedded within the Ruby code using special delimiters. There’s actually no obligation to specify a state machine, so a perfectly valid .rl file is:
puts "hello world"
Don’t worry, I’m going to do a better Hello World than that, but this is a good place to start. To convert this .rl file into an executable .rb file, use the “ragel” command with a -R flag to indicate that you want Ruby code.
ragel -R hello_world.rl
This will create a file entitled hello_world.rb with the following contents:
# line 1 "hello_world.rl"
puts "hello world"
I’ll, er, leave executing that file as an exercise for the diligent student.
Ragel actually does this conversion in 2 stages. First it creates an XML file, then converts the XML to Ruby. If you want to view this intermediate XML then you can pass the -x flag in addition to the -R flag.
ragel -R -x simple_state_machine.rl > simple_state_machine.xml
Now, let’s write some actual Ragel. Start a new .rl file or download the example and read along. We’re going to create a machine which prints “hello world!” when it’s passed the string “h”, and does nothing otherwise. To indicate to the ragel compiler that we are writing instructions for it, and not Ruby code, we need to place our Ragel code within double-percent-sign-curly-brackets %%{ and }%% , or you can enter a single line instruction by just typing %%. (See page 6 of the User Guide.) Here’s our state machine specification:
%%{
machine hello;
expr = "h";
main := expr @ { puts "hello world!" } ;
}%%
A quick overview of what’s happening here. The name of this machine is “hello” (Ragel makes us name it). It recognizes a single token, the string “h”. When it encounters that token, it performs (in Ruby) the action:
puts "hello world"
Now, if you were to run the ragel command on this file it would compile, but you would basically end up with a blank Ruby file. We have only specified the machine, we also have to tell Ragel to actually translate this machine into Ruby code using Ragel’s write statements. The first write statement we need to add is
%% write data;
If you add this line after the state machine definition block, it will compile, as long as you remember to add a blank line afterwards. (After you’ve worked with parsers for a while you come to appreciate newlines in a whole new way.) After adding this line and compiling, you should have a rather significant Ruby file with lots of class << self statements all generated by Ragel. You don't need to study this code, at least not right now. It's pretty dull and ugly. And, if you run the ruby file at this point, you won't see any output.
There are 2 more write statements to add, and for convenience we're going to place them within a ruby method. The argument to this method is going to be the string we want to parse. Ragel expects to find a variable named "data" containing an array of ASCII codes, so we will need to convert our string to an array. This is done very easily in Ruby using the unpack method.
def run_machine(data)
data = data.unpack("c*") if data.is_a?(String)
%% write init;
%% write exec;
end
write init tells Ragel that we want to generate initialization code for the state machine. The code Ragel generates here is:
begin
p ||= 0
pe ||= data.length
cs = hello_start
end
The variable p keeps track of which character in the data string we are currently parsing, starting at 0. pe is an upper limit for p. cs stores the current state of the state machine, and here it is initialized to the starting state of the state machine. These variables are discussed in the User Guide.
write exec tells Ragel to write the meat of the parser (finally!). The code generated here will actually take an input (the data argument) and determine what the state of the system should be based on that input, executing any actions which might be triggered along the way. Let's add some puts statements so we can follow the code execution.
def run_machine(data)
data = data.unpack("c*") if data.is_a?(String)
puts "Running the state machine with input #{data}..."
%% write init;
%% write exec;
puts "Finished. The state of the machine is: #{cs}"
puts "p: #{p} pe: #{pe}"
end
Just add 2 more lines at the end to call run_machine with various arguments and then we can actually compile and run our state machine.
run_machine "h"
run_machine "x"
And here we go...
Running the state machine with input 104...
hello world!
Finished. The state of the machine is: 2
p: 1 pe: 1
Running the state machine with input 120...
Finished. The state of the machine is: 0
p: 0 pe: 1
It worked! Now, to help us interpret the values of p, pe and cs let's take a look at the state chart of this state machine. Ragel has built-in Graphviz support to create state charts. We need to use the -V flag instead of -R.
ragel -V simple_state_machine.rl > simple_state_machine.dot
If you render the resulting simple_state_machine.dot file in Graphviz, you should get something like this:

We can see that the state machine has only one possible transition, from state 1 to state 2. When we passed "h" as the parameter to run_machine we did indeed end up with the variable cs (current state) equal to 2 at the end of our run. When "x" was passed, we ended up with cs = 0. 0 is the error state, indicating that an error occurred in the state machine. (You can tell that 0 is the error state by reading some of the variable assignments generated by write data, the code I said was dull and ugly.)
In the label 104/4:18 over the arrow transitioning from state 1 to state 2, the 104 corresponds to the ASCII code for the letter "h". (Type "?h" in irb.) The / indicates that an action is being performed, and 4:18 tells us that the action starts at line 4, column 18 of the .rl file. Had we given our action a name, that would have appeared here instead of the file position.
By the way, here's the (textmate-specific) shell script I use to run all these steps quickly:
ragel -R simple_state_machine.rl
ragel -V simple_state_machine.rl > simple_state_machine.dot
dot -Tpng simple_state_machine.dot > simple_state_machine.png
open simple_state_machine.png
ruby simple_state_machine.rb
mate simple_state_machine.out
Now, try running this code:
run_machine "hh"
You should get:
Running the state machine with input 104104...
hello world!
Finished. The state of the machine is: 0
p: 1 pe: 2
You don't get "hello world!" twice. Sorry. Our state machine is only looking at a the first character we pass. It knows we gave it two characters, the variable pe = 2, but after it evaluates the first character it's in a final state. There's no arrow coming out of the state 2 circle. So, passing additional input results in the system entering the error state. If we want the entire data string to be evaluated, we need to make a small change to our machine specification.
main := expr+ @ { puts "hello world!" } ;

(Try expr* instead of expr+ and see how the state chart is different.)
Now, try running this new state machine with inputs "hhh" and "hxh":
Running the state machine with input 104104104...
hello world!
hello world!
hello world!
Finished. The state of the machine is: 2
p: 3 pe: 3
Running the state machine with input 104120104...
hello world!
Finished. The state of the machine is: 0
p: 1 pe: 3
When we pass "hhh", we get a "hello world!" for each "h". When we pass "hxh", we get the first "hello world!", but when we hit the "x" we enter the error state, so the last "h" doesn't get evaluated.
Here's one more example, this time without defining a run_machine method:
%%{
machine hello_and_welcome;
main := ( 'h' @ { puts "hello world!" }
| 'w' @ { puts "welcome" }
)*;
}%%
data = 'whwwwwhw'
%% write data;
%% write init;
%% write exec;

welcome
hello world!
welcome
welcome
welcome
welcome
hello world!
welcome
So, there you go. Hours of entertainment await you. We've only scratched the surface of Ragel's features here, but you should now be able to navigate through the User Guide without too much trouble. If you need a better reason than "fun" to play with Ragel, then bear in mind that parsers are a great tool for constructing Domain Specific Languages (DSLs), and state machines are magic code shrinking machines for situations where you need to keep track of the, er, state of something and control the transitions between states (i.e. business logic). I would highly recommend everyone to read this article about Ragel which inspired me to check it out. If you're into Rails, then take a look at the acts_as_state_machine plugin which might be more intuitive than Ragel at first. If the DSL angle is more your cup of tea then you might want to look at ANTLR instead, which has a different focus and feature set than Ragel.

September 10th, 2008 at 1:47 pm
Hi! i want to translate this to portuguese. there’s some problem on it?
May 20th, 2009 at 1:58 am
Hi, the images are broken as per 20th May 2009.
July 3rd, 2009 at 2:49 pm
Images fixed and links updated for the new complang.org home of Ragel.
September 13th, 2009 at 10:23 am
Similar to the ‘run_machine’ method, the later example without that method should have:
data = ‘whwwwwhw’.unpack(“c*”)
This works for me in these environments:
ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin9.7.0]
jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM) Client VM 1.5.0_20) [i386-java]
Thanks, Ana, for the nice examples!
ml
July 15th, 2011 at 4:07 pm
Good work with the tutorial! Cheers!
August 6th, 2011 at 8:17 pm
very nice, thanks for the write-up!