A Hello World for Ruby on Ragel
December 13th, 2007 byUPDATE: Please read the updated version of this tutorial, which is compatible with Ruby 1.8 and Ruby 1.9, and Ragel 6.0. The examples here are compatible with Ruby 1.8 and Ragel 5.24.
By the end of this post, you’ll be able to turn a simple string “h” into the much longer and more interesting string “hello world!” using the magic of Ragel, all from the comfort of Ruby. Ragel is a very powerful state machine compiler and parser generator, which is at the heart of software like Mongrel and Hpricot. It’s able to generate C, C++, Objective-C, D, Java or Ruby code.
Ragel has excellent documentation provided by the author. My goal here is just to give you some context so that the documentation “sticks” when you read it, and to give you a working example which you can modify as you explore Ragel’s functionality. If you want to skip ahead, the full example is here on pastie.
The first step, of course, is installing Ragel. The Ragel home page has a Download section which lists ports for various platforms. If you already have Ragel installed, check that the version is 5.24 or higher. There are significant differences in the Ruby code generation even between versions 5.22 and 5.24. You can also compile and install Ragel from the source. Even if you don’t want to install from source it’s handy to have a copy of it to get some examples to play with. The subversion repository for Ragel is located here:
svn://mambo.cs.queensu.ca/ragel/trunk/
As usual the test/ directory is your friend, also check out the examples/ directory. As per this thread, try searching for “LANG: ruby”.
When writing Ragel code, you create a file with a .rl extension. The .rl file is written in the “host” language, in this case Ruby, and the Ragel machine specification is embedded within the Ruby code using special delimiters. There’s actually no obligation to specify a state machine, so a perfectly valid .rl file is:
puts "hello world"
Don’t worry, I’m going to do a better Hello World than that, but this is a good place to start.
It’s a 2-step process to get from the .rl file to an executable .rb file. First, the “ragel” command creates some intermediate XML code. By default the command assumes you want to create a C file, so if you want Ruby code you need to add the -R flag:
ragel -R hello_world.rl
This will produce XML as follows:
puts “hello world”
The rlgen-ruby command then produces ruby code from this XML. So, either:
ragel -R hello_world.rl > hello_world.xml
rlgen-ruby hello_world.xml
or:
ragel -R hello_world.rl | rlgen-ruby
Will create a file entitled hello_world.rb with the following contents:
# line 1 "hello_world.rl"
puts "hello world"
I’ll, er, leave executing that file as an exercise for the diligent student.
Now, let’s write some actual Ragel. Start a new .rl file or download the example and read along. We’re going to create a machine which prints “hello world!” when it’s passed the string “h”, and does nothing otherwise. To indicate to the ragel compiler that we are writing instructions for it, and not Ruby code, we need to place our Ragel code within double-percent-sign-curly-brackets %%{ and }%% , or you can enter a single line instruction by just typing %%. (See page 6 of the User Guide.) Here’s our state machine specification:
%%{
machine hello;
expr = "h";
main := expr @ { puts "hello world!" } ;
}%%
A quick overview of what’s happening here. The name of this machine is “hello” (Ragel makes us name it). It recognizes a single token, the string “h”. When it encounters that token, it performs (in Ruby) the action:
puts "hello world"
Now, if you were to run the ragel command on this file it would compile, but you would basically end up with a blank Ruby file. We have only specified the machine, we also have to tell Ragel to actually translate this machine into Ruby code using Ragel’s write statements. The first write statement we need to add is
%% write data;
If you add this line after the state machine definition block, it will compile, as long as you remember to add a blank line afterwards. (After you’ve worked with parsers for a while you come to appreciate newlines in a whole new way.) After adding this line and compiling, you should have a rather significant Ruby file with lots of class << self statements all generated by Ragel. You don't need to study this code, at least not right now. It’s pretty dull and ugly. And, if you run the ruby file at this point, you won’t see any output.
There are 2 more write statements to add, and for convenience we're going to place them within a ruby method with a method argument named "data". Ragel insists that the input to the state machine, the string to be parsed, is stored in a variable called "data" and this is a natural way to define such a variable. Actually the User Guide says that "data" should be an array, and you can pass an array if you wish but it must be an array of integers representing ASCII codes, not an array of characters.
def run_machine(data)
%% write init;
%% write exec;
end
write init tells Ragel that we want to generate initialization code for the state machine. The code Ragel generates here is:
begin
p ||= 0
pe ||= data.length
cs = hello_start
end
The variable p keeps track of which character in the data string we are currently parsing, starting at 0. pe is an upper limit for p. cs stores the current state of the state machine, and here it is initialized to the starting state of the state machine. These variables are discussed in the User Guide.
write exec tells Ragel to write the meat of the parser (finally!). The code generated here will actually take an input (the data argument) and determine what the state of the system should be based on that input, executing any actions which might be triggered along the way. Let’s add some puts statements so we can follow the code execution.
def run_machine(data)
puts "Running the state machine with input #{data}..."
%% write init;
%% write exec;
puts "Finished. The state of the machine is: #{cs}"
puts "p: #{p} pe: #{pe}"
end
Just add 2 more lines at the end to call run_machine with various arguments and then we can actually compile and run our state machine.
run_machine "h"
run_machine "x"
And here we go…
Running the state machine with input h...
hello world!
Finished. The state of the machine is: 2
p: 1 pe: 1
Running the state machine with input x...
Finished. The state of the machine is: 0
p: 0 pe: 1
It worked! Now, to help us interpret the values of p, pe and cs let’s take a look at the state chart of this state machine. Ragel has built-in Graphviz support to create state charts. We need to use the rlgen-dot command instead of rlgen-ruby.
ragel -R simple_state_machine.rl | rlgen-dot > simple_state_machine.dot
If you render the resulting simple_state_machine.dot file in Graphviz, you should get something like this:

We can see that the state machine has only one possible transition, from state 1 to state 2. When we passed “h” as the parameter to run_machine we did indeed end up with the variable cs (current state) equal to 2 at the end of our run. When “x” was passed, we ended up with cs = 0. 0 is the error state, indicating that an error occurred in the state machine. (You can tell that 0 is the error state by reading some of the variable assignments generated by write data, the code I said was dull and ugly.)
In the label 104/4:18 over the arrow transitioning from state 1 to state 2, the 104 corresponds to the ASCII code for the letter “h”. (Type “h”.unpack(”c”) in irb.) The / indicates that an action is being performed, and 4:18 tells us that the action starts at line 4, column 18 of the .rl file. Had we given our action a name, that would have appeared here instead of the file position.
By the way, here’s the (textmate-specific) shell script I use to run all these steps quickly:
ragel -R simple_state_machine.rl | rlgen-ruby
ragel -R simple_state_machine.rl | rlgen-dot > simple_state_machine.dot
dot -Tpng simple_state_machine.dot > simple_state_machine.png
open simple_state_machine.png
ruby simple_state_machine.rb > simple_state_machine.out
mate simple_state_machine.out
Now, try running this code:
run_machine "hh"
You should get:
Running the state machine with input hh...
hello world!
Finished. The state of the machine is: 0
p: 1 pe: 2
You don’t get “hello world!” twice. Sorry. Our state machine is only looking at a the first character we pass. It knows we gave it two characters, the variable pe = 2, but after it evaluates the first character it’s in a final state. There’s no arrow coming out of the state 2 circle. So, passing additional input results in the system entering the error state. If we want the entire data string to be evaluated, we need to make a small change to our machine specification (pastie).
main := expr+ @ { puts "hello world!" } ;

(Try expr* instead of expr+ and see how the state chart is different.)
Now, try running this new state machine with inputs “hhh” and “hxh”:
Running the state machine with input hhh...
hello world!
hello world!
hello world!
Finished. The state of the machine is: 2
p: 3 pe: 3
Running the state machine with input hxh...
hello world!
Finished. The state of the machine is: 0
p: 1 pe: 3
When we pass “hhh”, we get a “hello world!” for each “h”. When we pass “hxh”, we get the first “hello world!”, but when we hit the “x” we enter the error state, so the last “h” doesn’t get evaluated.
Here’s one more example (pastie), this time without defining a run_machine method:
%%{
machine hello_and_welcome;
main := ( 'h' @ { puts "hello world!" }
| 'w' @ { puts "welcome" }
)*;
}%%
data = 'whwwwwhw'
%% write data;
%% write init;
%% write exec;

welcome
hello world!
welcome
welcome
welcome
welcome
hello world!
welcome
So, there you go. Hours of entertainment await you. We’ve only scratched the surface of Ragel’s features here, but you should now be able to navigate through the User Guide without too much trouble. If you need a better reason than “fun” to play with Ragel, then bear in mind that parsers are a great tool for constructing Domain Specific Languages (DSLs), and state machines are magic code shrinking machines for situations where you need to keep track of the, er, state of something and control the transitions between states (i.e. business logic). I would highly recommend everyone to read this article about Ragel which inspired me to check it out. If you’re into Rails, then take a look at the acts_as_state_machine plugin which might be more intuitive than Ragel at first. If the DSL angle is more your cup of tea then you might want to look at ANTLR instead, which has a different focus and feature set than Ragel.


January 3rd, 2008 at 2:08 pm
At the moment this example won’t work using Ruby 1.9 but it’s a pretty quick fix, check out this thread on the Ragel mailing list.
January 3rd, 2008 at 11:09 pm
Wow, I wish I had this four months ago when I was trying to play with Ragel/Ruby. I may give it another shot now, because it seems like a great idea.
Thanks for the writeup, the docs were not helping me wrap my tiny brain around Ragel. :)
January 4th, 2008 at 7:25 pm
Thanks for this article. Greatly helped me getting started with Ragel.
Btw., ?h gives you the ASCII of h in ruby too, with less typing ;-)
January 7th, 2008 at 2:39 pm
This is a great start on how to get the ball rolling with Ragel and Ruby. I was reading through the Ragel PDF the first time thinking the entire time… “cool!, do how do I use it?”
This is a great jump-start to getting my hands dirty with Ragel. Let the fun commence!
January 9th, 2008 at 4:03 pm
Thanks! I’ve been looking for exactly this information. I appreciate your having taken the time to write it up!
January 13th, 2008 at 11:11 am
Thanks to everyone who commented for their the feedback. apeiros, thanks for the ?h tip, I used it in the updated tutorial which fixes Ragel 6.0 and Ruby 1.9 compatibility issues.
March 6th, 2008 at 11:41 pm
big up - this article came in real handy..