I Love Python: ReSTful DB CRUD dispatching using CherryPy


CherryPy has been one of my favorite Python tools for several years. It should be mentioned here that a ReSTful dispatcher could easily be written in web.py, or pylons as well, and even comes for free in the latest TurboGears implementation.

But if you’re looking for a small, easily manageable and extremely dynamic ReST dispatching solution without the heft of an entire web framework, I’m about to show you how CherryPy can help you in three different ways, depending on your model.

Assuming this mapping:

HTTP POST = DB update
HTTP PUT = DB insert

Let’s also standardize on one common method across all examples, for determining the HTTP request type, and matching it to the function of the same name. Here is the full code snippet for accomplishing this task:

methods = ('OPTIONS','GET','HEAD','POST',

if cherrypy.request.method not in self.methods:
    raise cherrypy.HTTPError(400,'Bad Request')

# If request method is HEAD, return the page handler
# for GET, and let CherryPy take care of dropping
# the response body
method = cherrypy.request.method

if cherrypy.request.method == "HEAD":
    method = "GET"

http_method = getattr(self,method)

#print "HTTP Method: %s" % method


In our examples, we’re going to shorten this to:

http_method = getattr(self.m,cherrypy.request.method)
return (http_method)(args,kwargs)

All of this essentially determines how HTTP was called (GET/PUT/POST/DELETE), and calls the method in a class which exactly matches this name (self.GET(), self.PUT(), etc)
When you see this code, know that it’s just the HTTP method resolving code.

Now for the fun. Let’s look at the dispatcher options we have.

Way 1: A hard-coded URL pointing to fixed resources:

CherryPy can be used in a manner similar to this to establish a fixed URL, and corresponding resources, driven from predefined classes instantiated in the ‘root’ hierarchy:

import cherrypy

class ReSTPaths1:
	def index(self):
		http_method = getattr(self,cherrypy.request.method)
		return (http_method)()

	def GET(self):
		return "In GET 1.."

class ReSTPaths2:
	def index(self):
		http_method = getattr(self,cherrypy.request.method)
		return (http_method)()

	def GET(self):
		return "In GET 2.."

class ReSTPaths3:
	def index(self,client_id=None):
		http_method = getattr(self,cherrypy.request.method)
		return (http_method)(client_id)

	def GET(self,client_id=None):
		return "IN Get 3, your client_id is %s\n" % (client_id)


root.client = ReSTPaths2()
root.client.address = ReSTPaths3()

Once this is running, the URL to invoke it looks like this:






Output looks something like this:

In GET 1..
In GET 2..
IN Get 3, your client_id is None

If you’re new to CherryPy or Python in general, I’ll reiterate for you how we are calling the GET method in our class.

When we issue this request, we’re issuing what HTTP calls a GET request:


The CherryPy service above, listening on port 8081, calls the index() method on the root class. The root class was set to:


at the bottom of that file. The index() method from the ReSTPaths1 Class looks like this, at the top of that file:

	def index(self):
		http_method = getattr(self,cherrypy.request.method)
		return (http_method)()

If we were to insert a print cherrypy.request.method statement before the return, we would see it set to “GET”.

getattr simply says: “get me the function name in self, matching the string “GET”.
it returns a reference to self.GET(), which is set directly below the index:

	def GET(self):
		return "In GET 1.."

Notice that the index() method has a @cherrrypy.expose decorator above it. This makes the index method callable by the public. The GET method does not have it, which means we could never invoke the GET method by typing:


If you try this, you’ll get a 404 Not Found error, because it’s not visible through the CherryPy interface.

GET() has to be invoked through index(), which means GET can only be called if an HTTP GET request is issued. If we posted form data to this same URL from, say, a form entry asking people for data input, we would need to add a POST method to this ReSTPaths1() class, to receive the POST data entered in the form fields.

Now back to our example:

In this example, no part of the URL or associated resources are dynamic, in either initialization or run time. This is fine, and suits the needs of most ReSTful CRUD interfaces.

Way 2: URL paths and associated components dynamically set once, upon dispatcher init/startup:

Now let’s say we want to determine the contents of the root, and therefore the URLs and associated resources for our ReSTful interface, dynamically during initialization/startup.

We can assign the root setting by using a Python metaclass to generate classes in our CherryPy startup code, and set the root components to each generated class. This goes beyond the average needs for CRUD access, but it’s such a nice implementation that I must show it off:

import cherrypy

class MetaCRUD(type):
	def index(cls):
		http_method = getattr(cls,cherrypy.request.method)
		return (http_method)()

	def GET(cls): return "In class ", cls.__name__, ', received a GET request.'

	def PUT(cls): return "In class ", cls.__name__, ', received a PUT request.'

	def POST(cls): return "In class ", cls.__name__, ', received a POST request.'

	def DELETE(cls): return "In class ", cls.__name__, ', received a DELETE request.'

baseCRUD = MetaCRUD('baseCRUD',(),{})
root = baseCRUD

dynamic_class = {}

for d in ['legacy_dbi','new_dbi','some_other_dbi']:
	dynamic_class[d] = MetaCRUD(d,(),{})


Here we’re using a metaclass, with CherryPy exposed methods, to generate a dictionary of dynamic classes. We set the root.classname = the_new_class by using the setattr() method.

After initialization, URL components and resources are fixed in this model. But wow, the awesome power we have during initialization, in 28 lines really rocks. I wrote this in 30 minutes, and realized again why I am so head-over-heels in love with this language.

When we hit these URLs:





We see this output:

In class baseCRUD, received a GET request.
In class legacy_dbi, received a GET request.
In class new_dbi, received a GET request.
In class some_other_dbi, received a GET request.

Let’s issue a POST request via curl, on the command line. The response is returned:

[gloriajw@g-monster ~]$ curl http://localhost:8081/some_other_dbi/ -d ""
In class some_other_dbi, received a POST request.

This model could be used for, say, reading the contents of the Postgres template1 databases list or the mysql ‘show databases’ command, and auto-generating a ReSTful CRUD interface for each. Access of each resources can be controlled via HTTP Auth methods. This is a great solution to providing, and restricting, legacy database access for new processes through a standard interface.

Way 3: Live, ever-dynamic determination of URL and associated component:

Some ReSTful URL models may need to be ‘run-time dynamic’, especially in the case where databases are dynamically created, and the associated resources per new database could vary. There is a simple example of a dynamic URL and resource model:

import cherrypy
import pprint

class ReSTPaths:
	def __init__(self):

	def client(self,*args,**kwargs):
		return "Your HTTP method was %s. Your args are: %s and your kwargs are: %s\n" \
		% (cherrypy.request.method, pprint.pformat(args), pprint.pformat(kwargs))

	def address(self,*args,**kwargs):
		return "Your HTTP method was %s. Your args are: %s and your kwargs are: %s\n" \
		% (cherrypy.request.method, pprint.pformat(args), pprint.pformat(kwargs))


This allows for dynamic URLs such as:






The output from this code looks like this:

Your HTTP method was GET. Your args are: ('address', '34567') and your kwargs are: {}
Your HTTP method was GET. Your args are: ('address',) and your kwargs are: {'client_id': '34567'}
Your HTTP method was GET. Your args are: ('client',) and your kwargs are: {'client_id': '34567'}
Your HTTP method was GET. Your args are: ('client', '34567') and your kwargs are: {}
Your HTTP method was GET. Your args are: ('anything', 'anything_else') and your kwargs are: {}

Notice that we only have keyword args (kwargs) when we pass a named parameter, such as client_id=34567

Let’s try a POST request from curl, on the command line:

[gloriajw@g-monster ~]$ curl -d "something_else=whatever_i_want" http://localhost:8080/address/anything/anything_else
Your HTTP method was POST. Your args are: ('anything', 'anything_else') and your kwargs are: {'something_else': 'whatever_i_want'}

In this code, the sky is the limit. You can place whatever code you like in these methods, dynamically creating classes and resources as needed, letting them only persist until the result is returned. This may add some inefficiency, but in exchange offer more secure network resources.

Code is attached, Enjoy!





Hadoop Streaming and Python: No Jython necessary!


Here is another jewel of an article, written by Michael Noll:

As hardware costs plummet and greater bandwidth becomes more affordable, cloud computing becomes even more alluring and feasible for the hobbyist. Working with large data sets is an interesting problem set, now attainable by the average developer with a free LINUX hard drive partition and some spare time.

Hadoop is essentially a framework, written in Java, for accessing large data sets. For the average developer, it’s usually used in conjunction with HDFS which is a custom file system designed for stroing large data sets on cheaper hardware. HDFS data is usually stored in units of entire web pages written to disk “as-is”. Once stored, this large data set is accessible for parsing and analysis.

Indexing of this data is done via what is called a map-reduce algorithm. The easiest map-reduce algorithm to envision is the word count algorithm. Each word on a page is assigned the value “1″ in the map portion, then all words are counted by sorting the words and adding their values. The result is a frequency count of each word in a page. This is the simplest of ten well known map-reduce algorithms, found in this white paper:


The magic of Michael’s article is in how to prevent the stringent, often painful Jython interface to Hadoop. This trick is by streaming this data on the command line from his simpler, more Pythonic map-reduce algorithm directly to Hadoop, thereby avoiding the complex Jython interface. Very clever indeed! It would be wonderful to see all of the relatively standard map-reduce algorithms written in Python.

He has a couple of other great articles on exactly how to install and configure Hadoop on Ubuntu here:

and here:


On top of Hadoop is usually a metdata store, holding high level information about which page sets or data “chunks” can be found on which volumes, for efficiency. After attending a cloud computing conference in November, I realized that many companies have chosen to write their own metadata layer using HiveDB, or custom metadata tools, and many distributed MySQL databases, rather than settling for the expensive commercial solutions. This great work is the primary cause of the proliferation of easily accessible Open Source cloud computing tools.

Another project to watch is called Mahout, which is an attempt to make all of the well known map-reduce algorithms generally accessible across multicore Hadoop systems.

I run Fedora 10, and have a bit more work to do, but it is outstanding to see such an easily accessible installation for such a relatively complex set of tools. So many software tools and toys, so little time!


I love python: Zip code prefix web scrape and DB injection in 70 lines


Here’s a module I wrote (in an hour. Damn, Python is wonderful) which scrapes the US Postal service web site for three-digit zip code extensions (http://pe.usps.gov/text/dmm300/L002.htm). It creates a db table and injects zip code prefix, region and state info for each record found. It uses BeautifulSoup to parse the HTML, and SqlAlchemy to do the DB operations.

If you only need to check what region and state a particular zip code belongs, this is for you. If anyone can point me to a free longitude/latitude/full zip code site, please post that info to a reply, and I’ll rewrite this module.

import urllib2
import codecs
import re
import pdb
import os
import BeautifulSoup
import sqlalchemy

class GetZips:
	def __init__(self):
		self.zip_info = []

	def getZipPrefixes(self):
		zip_page = urllib2.urlopen("http://pe.usps.gov/text/dmm300/L002.htm").read()
		self.soup = BeautifulSoup.BeautifulSoup(zip_page)
		# match zip columns
		zips = self.soup.findAll(attrs={'class':re.compile('^trBodyRow*')})
		for i in zips:
			y = i.find(attrs={'class':re.compile('^pTblBodyLL pAlignLeft*')})
			X is the symbol for an unused 3 digit zip prefix.
			if y.span and y.span.string == 'X':

			# last 3 digits
			zip_prefix_3 = y.a.next.next
			zip_prefix_3 = re.sub('[\n\r]+','',zip_prefix_3)

			# finding the first column will suffice.
			y = i.find(attrs={'class':re.compile('^pTblBodyLL pAlignRight*')})

			region_state = y.a.next.next.split()

			region = region_state[-3]
			state_abbrev = region_state[-2]

			if region_state[-1] != zip_prefix_3:
				print "There is a problem here: %s" % i

			print "Found %s %s %s" % (region,state_abbrev,zip_prefix_3)

	def injectIntoDB(self):
		engine = sqlalchemy.create_engine('postgres://%s:%s@%s/%s' % ('postgresql','something','','zip_db'),strategy='threadlocal')
		The sqlalchemy explicit scope is done for clarity. Of course
		you can "from sqlalchemy import *" instead, and change the scope
		of these calls.
		metadata = sqlalchemy.MetaData()
		metadata.bind = engine
		zip_table = sqlalchemy.Table('zip_abbrevs', metadata,
			sqlalchemy.Column('zip_abbrevs_id', sqlalchemy.Integer, primary_key=True),
			sqlalchemy.Column('three_digit_abbrev', sqlalchemy.String(4)),
			sqlalchemy.Column('region', sqlalchemy.VARCHAR(50)),
			sqlalchemy.Column('state_abbrev', sqlalchemy.String(3)))


		for (region, state, zip) in self.zip_info:
			print "Injecting %s %s %s\n" % (region, state,zip)

if __name__ == "__main__":

# vim:ts=4: noet:

I am writing this code for the nonprofit called The Freelancer’s Union in NYC, which currently has a nationwide member drive: http://www.freelancersunion.org/advocacy/index.html.
I will shamelessly plug them in exchange for sharing this code with the world.

The more members they get in each US state, the better nationwide insurance plans they can offer. They offer E&O insurance for IT freelancers as well, so even if you freelance part-time, this could be for you. This organization rocks. I’ve been a member for three years, and now I proudly write code for them.


The unmodified code

Auto-creating MySql and Postgresql Databases with Powerful Python Tools


When designing and development web framework software for clients, I first spend a bit of time focused on the client data. I try to determine the traits of the client data, asking these questions:
Is it temporary?
If so, for how long? Should it expire and disappear automatically or manually?
Should it be archived?
How many users/applications will be entering data?
Is each user’s data mutually exclusive? If not, what is the logical grouping of users and data?
What level of control do users have over their own, or other user/app data?
How is the data used by the clients? Are there “super users”?

The database is such an essential part of the design, and being able to automatically create, archive, purge and delete databases is tremendously handy in so many of these designs.

I wrote this example to generate dynamic databases for short courses, training sessions, and demos. I use the example itself to demonstrate the functionality of CherryPy and SqlAlchemy.

First of all, I start my CherryPy process on my local port 7001, allowing me to do this:


or this:


First, the small bits of quirkiness in Postgresql and MySql

Starting a Postgresql session on the command line, we see what happens before the URL is referenced:

[cwc2008@localhost user1]$ psql -d template1 -U my_super_user
Welcome to psql 8.2.6, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

template1=# SELECT datname FROM pg_database ;
(3 rows)


and after (Note: I had to exit the command line session, run the URL, then re-enter the command line session):

template1=# SELECT datname FROM pg_database ;
(4 rows)


The cwcuser_7001 name is auto-generated from the user’s ID and port number, making it useful for training sessions with many users.

my_super_user was created using the Postgresql createuser command, and, of course, must be a super user to do this operation. The template1 database is the Postgresql master database, from which all other databases in that instance are visible.

Caveat: In Postgresql, only one user can access template1 at any given time. Violating this rule gives you this error:

template1=# create database blahblah;
ERROR:  source database "template1" is being accessed by other users

Be sure to close your database descriptor when finished with template1. If this is not a production app, and/or you feel comfortable bouncing the instance, this lock can also be cleared by running:

-bash-3.2$ /usr/local/postgresql/8.2.6/bin/pg_ctl restart -m immediate -D/usr/local/postgresql/8.2.6/data

waiting for server to shut down... done
server stopped
server starting
-bash-3.2$ LOG:  database system was interrupted at 2008-02-14 15:07:47 EST
LOG:  checkpoint record is at 0/4B9054
LOG:  redo record is at 0/4B9054; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 0/2046; next OID: 57344
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  record with zero length at 0/4B909C
LOG:  redo is not required
LOG:  database system is ready

This is nasty. In live production environments, do this with great care.

MySql is a bit forgiving when adding, removing or accessing new databases. It does not require the isolation level to be set, as in Postgresql.

We see what databases exist before:

/usr/local/mysql6/bin/mysql --socket=/tmp/mysql10.sock  -u my_super_user -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 170
Server version: 6.0.3-alpha-log Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show databases;
| Database           |
| information_schema |
| mysql              |
| test               |
3 rows in set (0.00 sec)


and after:

mysql> show databases;
| Database           |
| information_schema |
| cwcuser_7001       |
| mysql              |
| test               |
4 rows in set (0.00 sec)


MySql does not require my hopping in and out of the command line like Postgresql. It handles user scope much better, in my opinion. It does, however require each new non-super user to be granted access to each database they need, but this is (1) trivial, and (2) outside of the scope of this article.

The SqlAlchemy code to handle the quirky bits

This is invoked by CherryPy, but can be run in any other manner of Python execution. I have a Class called DBInterface, method is called create_engine:

class DBInterface:
    def create_engine(self,choice,local_dbname=None):

        if choice == 'super_mysql':

            Super user MySql commands do not require connection to a database.
            db = sqlalchemy.create_engine('mysql://%s:%s@%s' % ('my_super_user','some_password',''),strategy='thr
            # 'self' is not threadsafe! Don't use it!
            #self.show_databases = "show databases ;"
            cherrypy.session['show_databases'] = "show databases ;"

        elif choice == 'mysql':
            db = sqlalchemy.create_engine('mysql://%s:%s@%s/%s' % ('my_super_user','some_password','',local_dbnam

        elif choice == 'super_postgres':
            Super user Postgresql commands require that you connect to the
            built-in template1 database, and set the isolation level to 0.
            This can't be done in SqlAlchemy, so we do it in Psycopg.
            DSN_string = "dbname=template1 port=2222 user=my_super_user  host= password=some_password"
            def connection():
                self.local_conn = psycopg.connect(DSN_string)
                return self.local_conn

            db = sqlalchemy.create_engine('postgres://',creator=connection, strategy='threadlocal')
            # NO!
            #self.show_databases = "SELECT datname FROM pg_database ;"
            cherrypy.session['show_databases'] = "SELECT datname FROM pg_database ;"

        elif choice == 'postgres':
            db = sqlalchemy.create_engine('postgres://%s:%s@%s/%s' % ('my_super_user','some_password','',local_db

            db = None

        return db

The exposed method in the same class which invokes the create_engine method, is called create_my_db, which we saw in the URL above (I will address the CherryPy URL functionality after this point):

    def create_my_db(self,db_type):

        Creating a db is a super user act.
        cherrypy.session['db_type'] = 'super_' + db_type
        cherrypy.session['my_dbname'] = "cwcuser_%s" % (cherrypy.request.config['server.socket_port'])

        db = self.create_engine(cherrypy.session['db_type'])
        db.execute("create database %s ;" % cherrypy.session['my_dbname'])
        databases = db.execute(cherrypy.session['show_databases'])

        dblist = []
        for d in databases:

        print ', '.join(dblist)
        return ', '.join(dblist)

Because I clump together super user and non-super user database engine connection functionality, I distinguish the difference by modifying the users db type to a “super_” setting in a CherryPy session variable. This way, only one method is called in various internal modes of operation. When in non-super user mode, the SqlAlchemy connect string is the same for both Postgresql and MySql.

No more db quirks in this app: The rest is clean SqlAlchemy

All db operations after this point use the exact same SqlAlchemy code for both MySql and Postgresql. That is a profoundly beautiful thing.

The destroy_my_db method is generic:

    def destroy_my_db(self,db_type):

        Destroying a db is a super user act.
        cherrypy.session['db_type'] = 'super_' + db_type

        if not cherrypy.session.has_key('my_dbname'):
            cherrypy.session['my_dbname'] = "cwcuser_%s" % (cherrypy.request.config['server.socket_port'])

        db = self.create_engine(cherrypy.session['db_type'])
        db.execute("drop database %s ;" % cherrypy.session['my_dbname'])
        databases = db.execute(cherrypy.session['show_databases'])

        dblist = []
        for d in databases:

        print ', '.join(dblist)
        return ', '.join(dblist)

As above, all other database operations from this point forward are generic SqlAlchemy code. Yaaay!

The CherryPy layer

My CherryPy main module, which sets up the needed objects and assoiated URLs, looks like this:

import cherrypy
import pdb
import sys,os,re,traceback
import user_data

#hostname = os.environ.get('HOSTNAME')
#hostname = re.sub('\..*$','',hostname)

# Add your app here.
import db_wrapper

# Only generic headers and footers go here.
# To use generic templates in your app, you may need to symbolicly link
# them into your Templates dir, or references them by absolute path.
TemplatePath = os.path.join(os.path.dirname(__file__), '/cherrypy/dev/templates')

# Add your class here. Note that the 'root' should be the authentication module.

# To test, go to http://grrlcamp.org:1112/run_test
root = db_wrapper.WrapperTest()

# http://grrlcamp.org:port/dbi/create_my_db, destroy_my_db, etc.
root.dbi = db_wrapper.DBInterface()

# http://grrlcamp.org:port/user/get_user_data
root.user = user_data.UserData()


except SystemExit:
        #smtpInterface.sendSysAdminEmail('cherrypy@xyz.com','sysadmin@xyz.com',"From %s: CherryPy is going down due to shutdown." % hostname,"")

        print "ERROR: CherryPy crashed, error:"

        #smtpInterface.sendSysAdminEmail('cherrypy@xyz.com','sysadmin@xyz.com',"From %s: CherryPy is going down (from finally clause, check for another error message after this one)." % hostname,"")

The CherryPy “root” determines what objects are accessible via URL. I mounted a simple test under the root, and nested the above database code under root.dbi, hence the /dbi/ reference in the URL.

root.dbi contains an instance of the DBInterface class we saw earlier. All exposed methods in the DBInterface class become accessible via the /dbi/method_name URL when CherryPy is started.

I’ve attached the complete code used for training, configured to run on localhost, and connect to both MySql and Postgresql daemons. It contains some explicit redundancy and commented code for learning purposes. Your configuration variables will vary slightly. Enjoy.


GrrlCamp NYC 2008


We are kicking off GrrlCamp 2008. There are two GrrlCamps being started from this list. This GrrlCamp starts now, runs until May 2008, and meets in-person in NYC in May of 2008.

- Female (willing to voice verify. You’d be surprised that this is necessary, but at times, it is.)
- Programmer, or programmer wannabe.
- Able to meet online weekly for one hour.
- Able to dedicate a few hours a week to this project.
- Willing to read, try, do, roll up your sleeves and design and develop, even if you’ve never done this before.
- Able to travel to NYC in May for the software wrap-up and release (optional). Your expenses = your round-trip flight/ride, and minimal donations for food. Lodging is free, food will be prepared and cooked by participants.
- Willing to work in Python (If we get a Rails following, we may use both).
- Willing to learn Dojo, CSS and HTML for the user-facing interface.
- Willing to release the software to the public, free of charge.
- Willing to provide small bits of support after development is complete.

This is a large project, touching on everything from basic Linux admin, svn setup, LINUX shell, Apache configuration, threading, database schema design and access, process/content management, front end design and development, inter-language data transfer, and the list goes on. If you are willing, this is an outstanding learning opportunity.

If interested, reply to this post with a valid email address, and we will send you an invitation.

Please, no observers/reporters. This is strictly a development group, and all members must be willing to write code.