I Love Python: ReSTful DB CRUD dispatching using CherryPy

April 19th, 2009 by comment gloriajw

CherryPy has been one of my favorite Python tools for several years. It should be mentioned here that a ReSTful dispatcher could easily be written in web.py, or pylons as well, and even comes for free in the latest TurboGears implementation.

But if you’re looking for a small, easily manageable and extremely dynamic ReST dispatching solution without the heft of an entire web framework, I’m about to show you how CherryPy can help you in three different ways, depending on your model.

Assuming this mapping:

HTTP GET or HEAD = DB Read
HTTP POST = DB update
HTTP PUT = DB insert
HTTP DELETE = DB delete

Let’s also standardize on one common method across all examples, for determining the HTTP request type, and matching it to the function of the same name. Here is the full code snippet for accomplishing this task:

methods = ('OPTIONS','GET','HEAD','POST',
'PUT','DELETE','TRACE','CONNECT')

if cherrypy.request.method not in self.methods:
    raise cherrypy.HTTPError(400,'Bad Request')

# If request method is HEAD, return the page handler
# for GET, and let CherryPy take care of dropping
# the response body
method = cherrypy.request.method

if cherrypy.request.method == "HEAD":
    method = "GET"

http_method = getattr(self,method)

#print "HTTP Method: %s" % method

result=(http_method)(args,kwargs)

In our examples, we’re going to shorten this to:

http_method = getattr(self.m,cherrypy.request.method)
return (http_method)(args,kwargs)

All of this essentially determines how HTTP was called (GET/PUT/POST/DELETE), and calls the method in a class which exactly matches this name (self.GET(), self.PUT(), etc)
When you see this code, know that it’s just the HTTP method resolving code.

Now for the fun. Let’s look at the dispatcher options we have.

Way 1: A hard-coded URL pointing to fixed resources:

CherryPy can be used in a manner similar to this to establish a fixed URL, and corresponding resources, driven from predefined classes instantiated in the ‘root’ hierarchy:

import cherrypy

class ReSTPaths1:
	@cherrypy.expose
	def index(self):
		http_method = getattr(self,cherrypy.request.method)
		return (http_method)()

	def GET(self):
		return "In GET 1.."

class ReSTPaths2:
	@cherrypy.expose
	def index(self):
		http_method = getattr(self,cherrypy.request.method)
		return (http_method)()

	def GET(self):
		return "In GET 2.."

class ReSTPaths3:
	@cherrypy.expose
	def index(self,client_id=None):
		http_method = getattr(self,cherrypy.request.method)
		return (http_method)(client_id)

	def GET(self,client_id=None):
		return "IN Get 3, your client_id is %s\n" % (client_id)

cherrypy.server.socket_port=8081

root=ReSTPaths1()
root.client = ReSTPaths2()
root.client.address = ReSTPaths3()
cherrypy.quickstart(root)

Once this is running, the URL to invoke it looks like this:
http://localhost:8081/
http://localhost:8081/client/
http://localhost:8081/client/address/
http://localhost:8081/client/address/?client_id=34567
http://localhost:8081/client/address/34567

Output looks something like this:

In GET 1..
In GET 2..
IN Get 3, your client_id is None

If you’re new to CherryPy or Python in general, I’ll reiterate for you how we are calling the GET method in our class.

When we issue this request, we’re issuing what HTTP calls a GET request:

http://localhost:8081/

The CherryPy service above, listening on port 8081, calls the index() method on the root class. The root class was set to:

root=ReSTPaths1()

at the bottom of that file. The index() method from the ReSTPaths1 Class looks like this, at the top of that file:

	def index(self):
		http_method = getattr(self,cherrypy.request.method)
		return (http_method)()

If we were to insert a print cherrypy.request.method statement before the return, we would see it set to “GET”.

getattr simply says: “get me the function name in self, matching the string “GET”.
it returns a reference to self.GET(), which is set directly below the index:

	def GET(self):
		return "In GET 1.."

Notice that the index() method has a @cherrrypy.expose decorator above it. This makes the index method callable by the public. The GET method does not have it, which means we could never invoke the GET method by typing:

http://localhost:8081/GET

If you try this, you’ll get a 404 Not Found error, because it’s not visible through the CherryPy interface.

GET() has to be invoked through index(), which means GET can only be called if an HTTP GET request is issued. If we posted form data to this same URL from, say, a form entry asking people for data input, we would need to add a POST method to this ReSTPaths1() class, to receive the POST data entered in the form fields.

Now back to our example:

In this example, no part of the URL or associated resources are dynamic, in either initialization or run time. This is fine, and suits the needs of most ReSTful CRUD interfaces.

Way 2: URL paths and associated components dynamically set once, upon dispatcher init/startup:

Now let’s say we want to determine the contents of the root, and therefore the URLs and associated resources for our ReSTful interface, dynamically during initialization/startup.

We can assign the root setting by using a Python metaclass to generate classes in our CherryPy startup code, and set the root components to each generated class. This goes beyond the average needs for CRUD access, but it’s such a nice implementation that I must show it off:

import cherrypy

class MetaCRUD(type):
	@cherrypy.expose
	def index(cls):
		http_method = getattr(cls,cherrypy.request.method)
		return (http_method)()

	def GET(cls): return "In class ", cls.__name__, ', received a GET request.'

	def PUT(cls): return "In class ", cls.__name__, ', received a PUT request.'

	def POST(cls): return "In class ", cls.__name__, ', received a POST request.'

	def DELETE(cls): return "In class ", cls.__name__, ', received a DELETE request.'

baseCRUD = MetaCRUD('baseCRUD',(),{})
root = baseCRUD

dynamic_class = {}

for d in ['legacy_dbi','new_dbi','some_other_dbi']:
	dynamic_class[d] = MetaCRUD(d,(),{})
	setattr(root,d,dynamic_class[d])

cherrypy.server.socket_port=8081
cherrypy.quickstart(root)

Here we’re using a metaclass, with CherryPy exposed methods, to generate a dictionary of dynamic classes. We set the root.classname = the_new_class by using the setattr() method.

After initialization, URL components and resources are fixed in this model. But wow, the awesome power we have during initialization, in 28 lines really rocks. I wrote this in 30 minutes, and realized again why I am so head-over-heels in love with this language.

When we hit these URLs:

http://localhost:8081/
http://localhost:8081/legacy_dbi/
http://localhost:8081/new_dbi/
http://localhost:8081/some_other_dbi/

We see this output:

In class baseCRUD, received a GET request.
In class legacy_dbi, received a GET request.
In class new_dbi, received a GET request.
In class some_other_dbi, received a GET request.

Let’s issue a POST request via curl, on the command line. The response is returned:

[gloriajw@g-monster ~]$ curl http://localhost:8081/some_other_dbi/ -d ""
In class some_other_dbi, received a POST request.

This model could be used for, say, reading the contents of the Postgres template1 databases list or the mysql ’show databases’ command, and auto-generating a ReSTful CRUD interface for each. Access of each resources can be controlled via HTTP Auth methods. This is a great solution to providing, and restricting, legacy database access for new processes through a standard interface.

Way 3: Live, ever-dynamic determination of URL and associated component:

Some ReSTful URL models may need to be ‘run-time dynamic’, especially in the case where databases are dynamically created, and the associated resources per new database could vary. There is a simple example of a dynamic URL and resource model:

import cherrypy
import pprint

class ReSTPaths:
	@cherrypy.expose
	def __init__(self):
		pass

	@cherrypy.expose
	def client(self,*args,**kwargs):
		return "Your HTTP method was %s. Your args are: %s and your kwargs are: %s\n" \
		% (cherrypy.request.method, pprint.pformat(args), pprint.pformat(kwargs))

	@cherrypy.expose
	def address(self,*args,**kwargs):
		return "Your HTTP method was %s. Your args are: %s and your kwargs are: %s\n" \
		% (cherrypy.request.method, pprint.pformat(args), pprint.pformat(kwargs))

cherrypy.quickstart(ReSTPaths())

This allows for dynamic URLs such as:
http://localhost:8080/client/address/34567
http://localhost:8080/client/address?client_id=34567
http://localhost:8080/address/client?client_id=34567
http://localhost:8080/address/client/34567
http://localhost:8080/address/anything/anything_else

The output from this code looks like this:

Your HTTP method was GET. Your args are: ('address', '34567') and your kwargs are: {}
Your HTTP method was GET. Your args are: ('address',) and your kwargs are: {'client_id': '34567'}
Your HTTP method was GET. Your args are: ('client',) and your kwargs are: {'client_id': '34567'}
Your HTTP method was GET. Your args are: ('client', '34567') and your kwargs are: {}
Your HTTP method was GET. Your args are: ('anything', 'anything_else') and your kwargs are: {}

Notice that we only have keyword args (kwargs) when we pass a named parameter, such as client_id=34567

Let’s try a POST request from curl, on the command line:

[gloriajw@g-monster ~]$ curl -d "something_else=whatever_i_want" http://localhost:8080/address/anything/anything_else
Your HTTP method was POST. Your args are: ('anything', 'anything_else') and your kwargs are: {'something_else': 'whatever_i_want'}

In this code, the sky is the limit. You can place whatever code you like in these methods, dynamically creating classes and resources as needed, letting them only persist until the result is returned. This may add some inefficiency, but in exchange offer more secure network resources.

Code is attached, Enjoy!

Gloria

http://www.devchix.com/wp-content/uploads/2009/04/restfixedargs.py

http://www.devchix.com/wp-content/uploads/2009/04/restmeta.py

http://www.devchix.com/wp-content/uploads/2009/04/restvarargs.py

ˆ Back to top

Hadoop Streaming and Python: No Jython necessary!

December 25th, 2008 by comment gloriajw

Here is another jewel of an article, written by Michael Noll:
http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python

As hardware costs plummet and greater bandwidth becomes more affordable, cloud computing becomes even more alluring and feasible for the hobbyist. Working with large data sets is an interesting problem set, now attainable by the average developer with a free LINUX hard drive partition and some spare time.

Hadoop is essentially a framework, written in Java, for accessing large data sets. For the average developer, it’s usually used in conjunction with HDFS which is a custom file system designed for stroing large data sets on cheaper hardware. HDFS data is usually stored in units of entire web pages written to disk “as-is”. Once stored, this large data set is accessible for parsing and analysis.

Indexing of this data is done via what is called a map-reduce algorithm. The easiest map-reduce algorithm to envision is the word count algorithm. Each word on a page is assigned the value “1″ in the map portion, then all words are counted by sorting the words and adding their values. The result is a frequency count of each word in a page. This is the simplest of ten well known map-reduce algorithms, found in this white paper:

http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf

The magic of Michael’s article is in how to prevent the stringent, often painful Jython interface to Hadoop. This trick is by streaming this data on the command line from his simpler, more Pythonic map-reduce algorithm directly to Hadoop, thereby avoiding the complex Jython interface. Very clever indeed! It would be wonderful to see all of the relatively standard map-reduce algorithms written in Python.

He has a couple of other great articles on exactly how to install and configure Hadoop on Ubuntu here:
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)

and here:

http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)

On top of Hadoop is usually a metdata store, holding high level information about which page sets or data “chunks” can be found on which volumes, for efficiency. After attending a cloud computing conference in November, I realized that many companies have chosen to write their own metadata layer using HiveDB, or custom metadata tools, and many distributed MySQL databases, rather than settling for the expensive commercial solutions. This great work is the primary cause of the proliferation of easily accessible Open Source cloud computing tools.

Another project to watch is called Mahout, which is an attempt to make all of the well known map-reduce algorithms generally accessible across multicore Hadoop systems.

I run Fedora 10, and have a bit more work to do, but it is outstanding to see such an easily accessible installation for such a relatively complex set of tools. So many software tools and toys, so little time!

Gloria

ˆ Back to top