Reporting from the CCC in Berlin

December 28th, 2007 by comment gloriajw

To my great joy, I ended up in Berlin for the Chaos Communication Congress in Berlin this year. I am staying in a flat with four friends I met in Austria this summer. It is walking distance to the event, and I’m getting to see a lot of Berlin.

The architecture is so vastly different between what used to be East and West Berlin that I feel as if I could trace the line where the wall used to be without much research. This is a great city, and a mind blowing gathering of geeks and hackers and phreaks from around the globe.

It feels like coming home, when I wander through the lounge where people have obviously been camped out for a few days and nights already. The atmosphere is just awe inspiring, and the variety of the people here is simply beautiful. It is probably the only place on earth where the casual enthusiast and the hard core geek can happily coexist.

On the second floor is a quantum encryption machine, which fires single photons across the room to a receiver, encrypting messages with these photons. There are RFID receivers everywhere. There is a wireless phone network set up by the event organizers, and people with compatible phones can acquire a phone number which will only work in the building.

There is no true beginning or end. Events happen around the clock, some official and some very unofficial. There is no reason to even leave the building. There is 24 hour coffee (yes!) and a never-closing, fully functional cafeteria, making pizzas and serving beers into the night. There is a Lego building station, where Lego gadgets are being built and programmed. Soldering, lock picking, heavy petting, programming, network packet sniffing, technical debates, and child care, are all happening around me as I stroll the corridors. It is genuinely beautiful. I could lose myself here. Good thing it’s temporary :)

ˆ Back to top

A Hello World for Ruby on Ragel

December 13th, 2007 by comment Ana Nelson

UPDATE: Please read the updated version of this tutorial, which is compatible with Ruby 1.8 and Ruby 1.9, and Ragel 6.0. The examples here are compatible with Ruby 1.8 and Ragel 5.24.

By the end of this post, you’ll be able to turn a simple string “h” into the much longer and more interesting string “hello world!” using the magic of Ragel, all from the comfort of Ruby. Ragel is a very powerful state machine compiler and parser generator, which is at the heart of software like Mongrel and Hpricot. It’s able to generate C, C++, Objective-C, D, Java or Ruby code.

Ragel has excellent documentation provided by the author. My goal here is just to give you some context so that the documentation “sticks” when you read it, and to give you a working example which you can modify as you explore Ragel’s functionality. If you want to skip ahead, the full example is here on pastie.

The first step, of course, is installing Ragel. The Ragel home page has a Download section which lists ports for various platforms. If you already have Ragel installed, check that the version is 5.24 or higher. There are significant differences in the Ruby code generation even between versions 5.22 and 5.24. You can also compile and install Ragel from the source. Even if you don’t want to install from source it’s handy to have a copy of it to get some examples to play with. The subversion repository for Ragel is located here:

svn://mambo.cs.queensu.ca/ragel/trunk/

As usual the test/ directory is your friend, also check out the examples/ directory. As per this thread, try searching for “LANG: ruby”.

When writing Ragel code, you create a file with a .rl extension. The .rl file is written in the “host” language, in this case Ruby, and the Ragel machine specification is embedded within the Ruby code using special delimiters. There’s actually no obligation to specify a state machine, so a perfectly valid .rl file is:

puts "hello world"

Don’t worry, I’m going to do a better Hello World than that, but this is a good place to start.

It’s a 2-step process to get from the .rl file to an executable .rb file. First, the “ragel” command creates some intermediate XML code. By default the command assumes you want to create a C file, so if you want Ruby code you need to add the -R flag:

ragel -R hello_world.rl

This will produce XML as follows:


puts "hello world"

The rlgen-ruby command then produces ruby code from this XML. So, either:

ragel -R hello_world.rl > hello_world.xml
rlgen-ruby hello_world.xml

or:

ragel -R hello_world.rl | rlgen-ruby

Will create a file entitled hello_world.rb with the following contents:

# line 1 "hello_world.rl"
puts "hello world"

I’ll, er, leave executing that file as an exercise for the diligent student.

Now, let’s write some actual Ragel. Start a new .rl file or download the example and read along. We’re going to create a machine which prints “hello world!” when it’s passed the string “h”, and does nothing otherwise. To indicate to the ragel compiler that we are writing instructions for it, and not Ruby code, we need to place our Ragel code within double-percent-sign-curly-brackets %%{ and }%% , or you can enter a single line instruction by just typing %%. (See page 6 of the User Guide.) Here’s our state machine specification:

%%{
  machine hello;
  expr = "h";
  main := expr @ { puts "hello world!" } ;
}%%

A quick overview of what’s happening here. The name of this machine is “hello” (Ragel makes us name it). It recognizes a single token, the string “h”. When it encounters that token, it performs (in Ruby) the action:

puts "hello world"

Now, if you were to run the ragel command on this file it would compile, but you would basically end up with a blank Ruby file. We have only specified the machine, we also have to tell Ragel to actually translate this machine into Ruby code using Ragel’s write statements. The first write statement we need to add is

  %% write data;

If you add this line after the state machine definition block, it will compile, as long as you remember to add a blank line afterwards. (After you’ve worked with parsers for a while you come to appreciate newlines in a whole new way.) After adding this line and compiling, you should have a rather significant Ruby file with lots of class << self statements all generated by Ragel. You don't need to study this code, at least not right now. It’s pretty dull and ugly. And, if you run the ruby file at this point, you won’t see any output.

There are 2 more write statements to add, and for convenience we're going to place them within a ruby method with a method argument named "data". Ragel insists that the input to the state machine, the string to be parsed, is stored in a variable called "data" and this is a natural way to define such a variable. Actually the User Guide says that "data" should be an array, and you can pass an array if you wish but it must be an array of integers representing ASCII codes, not an array of characters.

def run_machine(data)
  %% write init;
  %% write exec;
end

write init tells Ragel that we want to generate initialization code for the state machine. The code Ragel generates here is:

begin
  p ||= 0
  pe ||= data.length
  cs = hello_start
end

The variable p keeps track of which character in the data string we are currently parsing, starting at 0. pe is an upper limit for p. cs stores the current state of the state machine, and here it is initialized to the starting state of the state machine. These variables are discussed in the User Guide.

write exec tells Ragel to write the meat of the parser (finally!). The code generated here will actually take an input (the data argument) and determine what the state of the system should be based on that input, executing any actions which might be triggered along the way. Let's add some puts statements so we can follow the code execution.

def run_machine(data)
  puts "Running the state machine with input #{data}..."

  %% write init;
  %% write exec;

  puts "Finished. The state of the machine is: #{cs}"
  puts "p: #{p} pe: #{pe}"
end

Just add 2 more lines at the end to call run_machine with various arguments and then we can actually compile and run our state machine.

run_machine "h"
run_machine "x"

And here we go...

  Running the state machine with input h...
  hello world!
  Finished. The state of the machine is: 2
  p: 1 pe: 1
  Running the state machine with input x...
  Finished. The state of the machine is: 0
  p: 0 pe: 1

It worked! Now, to help us interpret the values of p, pe and cs let's take a look at the state chart of this state machine. Ragel has built-in Graphviz support to create state charts. We need to use the rlgen-dot command instead of rlgen-ruby.

ragel -R simple_state_machine.rl | rlgen-dot > simple_state_machine.dot

If you render the resulting simple_state_machine.dot file in Graphviz, you should get something like this:

State Chart for Simple State Machine

We can see that the state machine has only one possible transition, from state 1 to state 2. When we passed "h" as the parameter to run_machine we did indeed end up with the variable cs (current state) equal to 2 at the end of our run. When "x" was passed, we ended up with cs = 0. 0 is the error state, indicating that an error occurred in the state machine. (You can tell that 0 is the error state by reading some of the variable assignments generated by write data, the code I said was dull and ugly.)

In the label 104/4:18 over the arrow transitioning from state 1 to state 2, the 104 corresponds to the ASCII code for the letter "h". (Type "h".unpack("c") in irb.) The / indicates that an action is being performed, and 4:18 tells us that the action starts at line 4, column 18 of the .rl file. Had we given our action a name, that would have appeared here instead of the file position.

By the way, here's the (textmate-specific) shell script I use to run all these steps quickly:

  ragel -R simple_state_machine.rl | rlgen-ruby
  ragel -R simple_state_machine.rl | rlgen-dot > simple_state_machine.dot
  dot -Tpng simple_state_machine.dot > simple_state_machine.png
  open simple_state_machine.png
  ruby simple_state_machine.rb > simple_state_machine.out
  mate simple_state_machine.out

Now, try running this code:

  run_machine "hh"

You should get:

  Running the state machine with input hh...
  hello world!
  Finished. The state of the machine is: 0
  p: 1 pe: 2

You don't get "hello world!" twice. Sorry. Our state machine is only looking at a the first character we pass. It knows we gave it two characters, the variable pe = 2, but after it evaluates the first character it's in a final state. There's no arrow coming out of the state 2 circle. So, passing additional input results in the system entering the error state. If we want the entire data string to be evaluated, we need to make a small change to our machine specification (pastie).

main := expr+ @ { puts "hello world!" } ;

Endless Simple State Machine

(Try expr* instead of expr+ and see how the state chart is different.)

Now, try running this new state machine with inputs "hhh" and "hxh":

Running the state machine with input hhh...
  hello world!
  hello world!
  hello world!
  Finished. The state of the machine is: 2
  p: 3 pe: 3
  Running the state machine with input hxh...
  hello world!
  Finished. The state of the machine is: 0
  p: 1 pe: 3

When we pass "hhh", we get a "hello world!" for each "h". When we pass "hxh", we get the first "hello world!", but when we hit the "x" we enter the error state, so the last "h" doesn't get evaluated.

Here's one more example (pastie), this time without defining a run_machine method:

  %%{
    machine hello_and_welcome;
    main := ( 'h' @ { puts "hello world!" }
            | 'w' @ { puts "welcome" }
            )*;
  }%%
    data = 'whwwwwhw'
    %% write data;
    %% write init;
    %% write exec;

Hello and Welcome State Machine

welcome
hello world!
welcome
welcome
welcome
welcome
hello world!
welcome

So, there you go. Hours of entertainment await you. We've only scratched the surface of Ragel's features here, but you should now be able to navigate through the User Guide without too much trouble. If you need a better reason than "fun" to play with Ragel, then bear in mind that parsers are a great tool for constructing Domain Specific Languages (DSLs), and state machines are magic code shrinking machines for situations where you need to keep track of the, er, state of something and control the transitions between states (i.e. business logic). I would highly recommend everyone to read this article about Ragel which inspired me to check it out. If you're into Rails, then take a look at the acts_as_state_machine plugin which might be more intuitive than Ragel at first. If the DSL angle is more your cup of tea then you might want to look at ANTLR instead, which has a different focus and feature set than Ragel.

ˆ Back to top

Repeating a Try-Catch block

December 3rd, 2007 by comment Kris

Here’s a tip for beginners. You know how a try/catch block works: place some code inside a try, and if it fails, catch it with the catch block. But what do you do if you want to try an operation a set number of times, and then throw an exception if it’s still failing? There’s no such animal that is native in C#, but you can quickly implement one yourself. With a counter and a ref parameter, this can be done quite simply. This brief example will show you how.

Suppose we want to do some FooBar operation, but we know it could fail, due to some external circumstance, but we also know could work after the initial attempt or two. Let’s write the code to do 3 tries, and only then throw the error:

int MaxTries = 3;   //this is the limiter value - set this to the number of tries you want to execute
int NumTries = 0;   //this is our counter that keeps track of how many tries we've executed

//Perform the FooBar operation(we retry 3 times and if we don't succeed, the exception will be caught and handled)
while (NumTries < MaxTries)
               throw (err);

private void FooBar(int foo, int bar, ref int numTries)
        {
            try
            {
                //count the try
                numTries++;

                //set up & and perform your operation
                Foo(bar);
                Bar(foo);
                DoFinalOp();

                //if we got here, there was no error so set counter to kick us out of loop
                numTries = MaxTries;
            }
            catch (Exception err)
            {
                //only throw error if we've tried enough times
                if (numTries >= MaxTries)
                    throw (err);
            }

        }

Note that our FooBar method takes a ref parameter of numTries. When we change this value in the method, and the method returns, the counter will have been changed, for re-testing in the while loop clause.

Let’s walk through this and see how it works:

  • After setting NumTries and MaxTries (to 0 and 3), we hit the while loop. This first time, NumTries is definitely less than MaxTries, so we execute FooBar().
  • When we enter FooBar(), the first thing we do is increment tries, to count the attempt. Since this is a ref parameter, changing this local variable tries also ends up changing the global variable NumTries.
  • Then we start executing whatever we have to do in FooBar(). If an error is encountered at this point, we’ll fall to the catch block.
  • In the catch block, we test whether or not to throw an error, based on the value of tries. Since at this point, tries = 1, we don’t throw the error, FooBar() finishes executing and we return to the while loop.
  • Back in the while loop, we retest NumTries to see if it’s still less than MaxTries. Since we incremented it in the method, NumTries is now equal to 1, but this is still a true condition, so we call FooBar() again.
  • Once again inside FooBar(), we increment tries by 1 and try our operation. Suppose this time, it succeeded. In that case, we will reach the line: tries = MaxTries and then exit the method.
  • Now, back in the while loop, the test (NumTries &lt MaxTries) is false, so we don’t execute FooBar() again, and we’re done.

Voila! We’ve implemented try/catch/repeat functionality, allowing us multiple tries before throwing an error.

ˆ Back to top