I Love Python: ReSTful DB CRUD dispatching using CherryPy

April 19th, 2009 by comment gloriajw

CherryPy has been one of my favorite Python tools for several years. It should be mentioned here that a ReSTful dispatcher could easily be written in web.py, or pylons as well, and even comes for free in the latest TurboGears implementation.

But if you’re looking for a small, easily manageable and extremely dynamic ReST dispatching solution without the heft of an entire web framework, I’m about to show you how CherryPy can help you in three different ways, depending on your model.

Assuming this mapping:

HTTP GET or HEAD = DB Read
HTTP POST = DB update
HTTP PUT = DB insert
HTTP DELETE = DB delete

Let’s also standardize on one common method across all examples, for determining the HTTP request type, and matching it to the function of the same name. Here is the full code snippet for accomplishing this task:

methods = ('OPTIONS','GET','HEAD','POST',
'PUT','DELETE','TRACE','CONNECT')

if cherrypy.request.method not in self.methods:
    raise cherrypy.HTTPError(400,'Bad Request')

# If request method is HEAD, return the page handler
# for GET, and let CherryPy take care of dropping
# the response body
method = cherrypy.request.method

if cherrypy.request.method == "HEAD":
    method = "GET"

http_method = getattr(self,method)

#print "HTTP Method: %s" % method

result=(http_method)(args,kwargs)

In our examples, we’re going to shorten this to:

http_method = getattr(self.m,cherrypy.request.method)
return (http_method)(args,kwargs)

All of this essentially determines how HTTP was called (GET/PUT/POST/DELETE), and calls the method in a class which exactly matches this name (self.GET(), self.PUT(), etc)
When you see this code, know that it’s just the HTTP method resolving code.

Now for the fun. Let’s look at the dispatcher options we have.

Way 1: A hard-coded URL pointing to fixed resources:

CherryPy can be used in a manner similar to this to establish a fixed URL, and corresponding resources, driven from predefined classes instantiated in the ‘root’ hierarchy:

import cherrypy

class ReSTPaths1:
	@cherrypy.expose
	def index(self):
		http_method = getattr(self,cherrypy.request.method)
		return (http_method)()

	def GET(self):
		return "In GET 1.."

class ReSTPaths2:
	@cherrypy.expose
	def index(self):
		http_method = getattr(self,cherrypy.request.method)
		return (http_method)()

	def GET(self):
		return "In GET 2.."

class ReSTPaths3:
	@cherrypy.expose
	def index(self,client_id=None):
		http_method = getattr(self,cherrypy.request.method)
		return (http_method)(client_id)

	def GET(self,client_id=None):
		return "IN Get 3, your client_id is %s\n" % (client_id)

cherrypy.server.socket_port=8081

root=ReSTPaths1()
root.client = ReSTPaths2()
root.client.address = ReSTPaths3()
cherrypy.quickstart(root)

Once this is running, the URL to invoke it looks like this:

http://localhost:8081/

http://localhost:8081/client/

http://localhost:8081/client/address/

http://localhost:8081/client/address/?client_id=34567

http://localhost:8081/client/address/34567

Output looks something like this:

In GET 1..
In GET 2..
IN Get 3, your client_id is None

If you’re new to CherryPy or Python in general, I’ll reiterate for you how we are calling the GET method in our class.

When we issue this request, we’re issuing what HTTP calls a GET request:

http://localhost:8081/

The CherryPy service above, listening on port 8081, calls the index() method on the root class. The root class was set to:

root=ReSTPaths1()

at the bottom of that file. The index() method from the ReSTPaths1 Class looks like this, at the top of that file:

	def index(self):
		http_method = getattr(self,cherrypy.request.method)
		return (http_method)()

If we were to insert a print cherrypy.request.method statement before the return, we would see it set to “GET”.

getattr simply says: “get me the function name in self, matching the string “GET”.
it returns a reference to self.GET(), which is set directly below the index:

	def GET(self):
		return "In GET 1.."

Notice that the index() method has a @cherrrypy.expose decorator above it. This makes the index method callable by the public. The GET method does not have it, which means we could never invoke the GET method by typing:

http://localhost:8081/GET

If you try this, you’ll get a 404 Not Found error, because it’s not visible through the CherryPy interface.

GET() has to be invoked through index(), which means GET can only be called if an HTTP GET request is issued. If we posted form data to this same URL from, say, a form entry asking people for data input, we would need to add a POST method to this ReSTPaths1() class, to receive the POST data entered in the form fields.

Now back to our example:

In this example, no part of the URL or associated resources are dynamic, in either initialization or run time. This is fine, and suits the needs of most ReSTful CRUD interfaces.

Way 2: URL paths and associated components dynamically set once, upon dispatcher init/startup:

Now let’s say we want to determine the contents of the root, and therefore the URLs and associated resources for our ReSTful interface, dynamically during initialization/startup.

We can assign the root setting by using a Python metaclass to generate classes in our CherryPy startup code, and set the root components to each generated class. This goes beyond the average needs for CRUD access, but it’s such a nice implementation that I must show it off:

import cherrypy

class MetaCRUD(type):
	@cherrypy.expose
	def index(cls):
		http_method = getattr(cls,cherrypy.request.method)
		return (http_method)()

	def GET(cls): return "In class ", cls.__name__, ', received a GET request.'

	def PUT(cls): return "In class ", cls.__name__, ', received a PUT request.'

	def POST(cls): return "In class ", cls.__name__, ', received a POST request.'

	def DELETE(cls): return "In class ", cls.__name__, ', received a DELETE request.'

baseCRUD = MetaCRUD('baseCRUD',(),{})
root = baseCRUD

dynamic_class = {}

for d in ['legacy_dbi','new_dbi','some_other_dbi']:
	dynamic_class[d] = MetaCRUD(d,(),{})
	setattr(root,d,dynamic_class[d])

cherrypy.server.socket_port=8081
cherrypy.quickstart(root)

Here we’re using a metaclass, with CherryPy exposed methods, to generate a dictionary of dynamic classes. We set the root.classname = the_new_class by using the setattr() method.

After initialization, URL components and resources are fixed in this model. But wow, the awesome power we have during initialization, in 28 lines really rocks. I wrote this in 30 minutes, and realized again why I am so head-over-heels in love with this language.

When we hit these URLs:


http://localhost:8081/

http://localhost:8081/legacy_dbi/

http://localhost:8081/new_dbi/

http://localhost:8081/some_other_dbi/

We see this output:

In class baseCRUD, received a GET request.
In class legacy_dbi, received a GET request.
In class new_dbi, received a GET request.
In class some_other_dbi, received a GET request.

Let’s issue a POST request via curl, on the command line. The response is returned:

[gloriajw@g-monster ~]$ curl http://localhost:8081/some_other_dbi/ -d ""
In class some_other_dbi, received a POST request.

This model could be used for, say, reading the contents of the Postgres template1 databases list or the mysql ‘show databases’ command, and auto-generating a ReSTful CRUD interface for each. Access of each resources can be controlled via HTTP Auth methods. This is a great solution to providing, and restricting, legacy database access for new processes through a standard interface.

Way 3: Live, ever-dynamic determination of URL and associated component:

Some ReSTful URL models may need to be ‘run-time dynamic’, especially in the case where databases are dynamically created, and the associated resources per new database could vary. There is a simple example of a dynamic URL and resource model:

import cherrypy
import pprint

class ReSTPaths:
	@cherrypy.expose
	def __init__(self):
		pass

	@cherrypy.expose
	def client(self,*args,**kwargs):
		return "Your HTTP method was %s. Your args are: %s and your kwargs are: %s\n" \
		% (cherrypy.request.method, pprint.pformat(args), pprint.pformat(kwargs))

	@cherrypy.expose
	def address(self,*args,**kwargs):
		return "Your HTTP method was %s. Your args are: %s and your kwargs are: %s\n" \
		% (cherrypy.request.method, pprint.pformat(args), pprint.pformat(kwargs))

cherrypy.quickstart(ReSTPaths())

This allows for dynamic URLs such as:

http://localhost:8080/client/address/34567

http://localhost:8080/client/address?client_id=34567

http://localhost:8080/address/client?client_id=34567

http://localhost:8080/address/client/34567

http://localhost:8080/address/anything/anything_else

The output from this code looks like this:

Your HTTP method was GET. Your args are: ('address', '34567') and your kwargs are: {}
Your HTTP method was GET. Your args are: ('address',) and your kwargs are: {'client_id': '34567'}
Your HTTP method was GET. Your args are: ('client',) and your kwargs are: {'client_id': '34567'}
Your HTTP method was GET. Your args are: ('client', '34567') and your kwargs are: {}
Your HTTP method was GET. Your args are: ('anything', 'anything_else') and your kwargs are: {}

Notice that we only have keyword args (kwargs) when we pass a named parameter, such as client_id=34567

Let’s try a POST request from curl, on the command line:

[gloriajw@g-monster ~]$ curl -d "something_else=whatever_i_want" http://localhost:8080/address/anything/anything_else
Your HTTP method was POST. Your args are: ('anything', 'anything_else') and your kwargs are: {'something_else': 'whatever_i_want'}

In this code, the sky is the limit. You can place whatever code you like in these methods, dynamically creating classes and resources as needed, letting them only persist until the result is returned. This may add some inefficiency, but in exchange offer more secure network resources.

Code is attached, Enjoy!

Gloria

http://www.devchix.com/wp-content/uploads/2009/04/restfixedargs.py

http://www.devchix.com/wp-content/uploads/2009/04/restmeta.py

http://www.devchix.com/wp-content/uploads/2009/04/restvarargs.py

ˆ Back to top

I love python: Zip code prefix web scrape and DB injection in 70 lines

April 17th, 2008 by comment gloriajw

Here’s a module I wrote (in an hour. Damn, Python is wonderful) which scrapes the US Postal service web site for three-digit zip code extensions (http://pe.usps.gov/text/dmm300/L002.htm). It creates a db table and injects zip code prefix, region and state info for each record found. It uses BeautifulSoup to parse the HTML, and SqlAlchemy to do the DB operations.

If you only need to check what region and state a particular zip code belongs, this is for you. If anyone can point me to a free longitude/latitude/full zip code site, please post that info to a reply, and I’ll rewrite this module.

import urllib2
import codecs
import re
import pdb
import os
import BeautifulSoup
import sqlalchemy

class GetZips:
	def __init__(self):
		self.zip_info = []

	def getZipPrefixes(self):
		zip_page = urllib2.urlopen("http://pe.usps.gov/text/dmm300/L002.htm").read()
		self.soup = BeautifulSoup.BeautifulSoup(zip_page)
		# match zip columns
		zips = self.soup.findAll(attrs={'class':re.compile('^trBodyRow*')})
		for i in zips:
			y = i.find(attrs={'class':re.compile('^pTblBodyLL pAlignLeft*')})
			'''
			X is the symbol for an unused 3 digit zip prefix.
			'''
			if y.span and y.span.string == 'X':
				continue

			# last 3 digits
			zip_prefix_3 = y.a.next.next
			zip_prefix_3 = re.sub('[\n\r]+','',zip_prefix_3)

			# finding the first column will suffice.
			y = i.find(attrs={'class':re.compile('^pTblBodyLL pAlignRight*')})

			region_state = y.a.next.next.split()

			region = region_state[-3]
			state_abbrev = region_state[-2]

			if region_state[-1] != zip_prefix_3:
				print "There is a problem here: %s" % i

			self.zip_info.append((region,state_abbrev,zip_prefix_3))
			print "Found %s %s %s" % (region,state_abbrev,zip_prefix_3)

	def injectIntoDB(self):
		engine = sqlalchemy.create_engine('postgres://%s:%s@%s/%s' % ('postgresql','something','127.0.0.1:5432','zip_db'),strategy='threadlocal')
		'''
		The sqlalchemy explicit scope is done for clarity. Of course
		you can "from sqlalchemy import *" instead, and change the scope
		of these calls.
		'''
		metadata = sqlalchemy.MetaData()
		metadata.bind = engine
		zip_table = sqlalchemy.Table('zip_abbrevs', metadata,
			sqlalchemy.Column('zip_abbrevs_id', sqlalchemy.Integer, primary_key=True),
			sqlalchemy.Column('three_digit_abbrev', sqlalchemy.String(4)),
			sqlalchemy.Column('region', sqlalchemy.VARCHAR(50)),
			sqlalchemy.Column('state_abbrev', sqlalchemy.String(3)))

		metadata.create_all(engine)

		for (region, state, zip) in self.zip_info:
			print "Injecting %s %s %s\n" % (region, state,zip)
			zip_table.insert(values={'region':region,'state_abbrev':state,'three_digit_abbrev':zip}).execute()

if __name__ == "__main__":
	x=GetZips()
	x.getZipPrefixes()
	x.injectIntoDB()

# vim:ts=4: noet:

I am writing this code for the nonprofit called The Freelancer’s Union in NYC, which currently has a nationwide member drive: http://www.freelancersunion.org/advocacy/index.html.
I will shamelessly plug them in exchange for sharing this code with the world.

The more members they get in each US state, the better nationwide insurance plans they can offer. They offer E&O insurance for IT freelancers as well, so even if you freelance part-time, this could be for you. This organization rocks. I’ve been a member for three years, and now I proudly write code for them.

Gloria

The unmodified code

ˆ Back to top

An Attachment Walked Into A Bar. Was That U, Fu?

August 22nd, 2007 by comment sarah g

I recently spent some time working on a Rails application that needed to have various kinds of attachments, such as PDFs and images, for various types of resources, such as Programs and Questions. The app in question lets you create programs and questionnaires for the purpose of granting continuing medical education (CME) credits to doctors — so they’re apprised of the bleeding edge techniques for sawing off your arm or tying your tooth to a doorknob. Not that this matters for the piece, but context — like sugar or salt — makes things go down easier.

If you can picture an office of harried administrators dealing with ever-changing AMA requirements — “This brochure needs to go with this program!”, “This xray picture needs to go with Question 7″, “This program needs to sing ‘Missing You’ while showing you diagrams of a root canal”.. (OK, maybe not the second item), you’ll see why I wanted to build a system that could expand easily. I did not want to find myself in the position of trying to look tough while shamefacedly muttering, “I’m sorry, PDFs can only go with *Programs*, as you initially specified” and watching these not-so-gentle administrative souls stare at me with fully warranted incredulity. Blaming the client is like [fill-in-the-blank]: weirdly fun for about 10 minutes before the hangover sets in.

This project turned into a tour of attachment_fu, single table inheritance, polymorphism and functional testing of uploads. Thanks to input from friends, blogs and the usual Internet suspects, I got it up and running in its first form and decided to write it up. Hopefully what I’ve learned can help someone else. “Link-outs”, the reclusive programmer’s form of the “Shout Out” (aka a list of great resources), can be found at the bottom of the article.

Goal

To implement a system that would allow for many types of files to be uploaded and attached to many types of resources.

Requirements

  1. Many types of assets (image, pdf, etc.)
  2. Assets belong to many types of resources (program, question, etc)
  3. Assets upload to the file system
  4. Assets validate in the context of their parent resource [you upload the pdf in the program form; the image in the question form, etc]
  5. All functional tests pass for uploads

Design

Rather than have database tables for each attachment type (PDFs and images, then eventually MP3s) I made one table called Assets. Listening to my requirements, a friend suggested setting up my Assets model using polymorphism and single table inheritance and that I look into attachment_fu for the uploads. For this exercise I will only be focusing on the PDF upload. Readers can extrapolate from there for image or other media types. This article does not address image resizing, or anything related to uploading images. There are some great links at the bottom that deal with these topics in depth.

Modeling the domain:

Assets (the table) has a field called type that is be [PDF, Image (etc)]; a field called resource_type which is the name of the class it belongs to, [Program, Question, etc.] and resource_id which is the id of the resource it belongs to.

The Asset class will inherit from ActiveRecord. PDF and Image (and any future file types we add to the system) inherit from Asset. Line items from Assets (the table) will look like this (leaving out the other fields for now)

type   | resource_type | resource_id

Pdf      | Program         | 82
Image  | Question         | 79

For my assets, I created the following models:
asset.rb, image.rb and pdf.rb. The latter two inherit from Asset.

# [Asset.rb]
class Asset < ActiveRecord::Base
  belongs_to :resource, :polymorphic=>true
end

# [Pdf.rb]
class Pdf < Asset
  belongs_to :program
  has_attachment  :content_type=>'application/pdf',
                         :storage=>:file_system,
                         :path_prefix=>'public/uploads/pdfs/',
                         :size => 1.megabyte..3.megabytes
  validates_presence_of   :content_type, :filename

   def validate
     if filename &&  /pdf$/.match(filename).nil?
         errors.add(:filename, "must be a PDF ")
     end
  end

end

Because I am using attachment_fu, I need to have a set of basic fields in my table that will handle image and file uploads. The migration I created looks like this:


t.column :parent_id,  :integer     # for the thumbnails of resized images, required for plugin
t.column :type, :string               # for STI on this table (curent types are image, PDF)
t.column :resource_type, :string  # for polymorphism on this table (current types are Program, Question)
t.column :resource_id, :integer    # for polymorphism, (program_id, question_id, etc)
t.column :content_type, :string    # Fields here [content_type] and below all attachment_fu requirements
t.column :filename, :string
t.column :thumbnail, :string
t.column :size, :integer
t.column :width, :integer
t.column :height, :integer

To create the relationship for my parent models (question and program) I set up the polymorphic relationship like so:

# [program.rb]
class Program < ActiveRecord::Base
   has_many    :pdfs, :as=>:resource,  :dependent => :destroy
end

# [question.rb]
class Question < ActiveRecord::Base
  has_many      :images, :as=>:resource, :dependent => :destroy
end

Note the difference between delete and destroy. Destroy will remove the entry from the database and the file from the filesystem. Delete will only delete the database entry.

Once this is set up, you can test it in the console and see the magic of Rails at work. Through the column names and relationships we've built, entries in Assets are automatically created. It's one of the things that's great about Rails. It doesn't simply describe different modeling patterns: it implements them if you set up your code correctly. It's a beautiful thing.

Uploading

Without moving files to the server, the system is not complete. So, onto attachment_fu. The attachment_fu tutorials I found online all made the assumption that the attachment you are uploading is an entity of its own, in its own form: a mugshot, for instance. My case was different: I needed to embed the attachment in the forms for their parent resources (pdf in the program form; image in the question form, etc) rather than independently. So I needed to validate a form for more than one model. In my _form.rhtml partial for program I have the pdf field like so:

# [views/program/_form.rhtml]

< label for="program_notes" >Notes
<%= f.text_area :notes %> # ... more fields ... < label for="program_status" >Add Brochure? [PDF Files only]< /label > <%= file_field("pdf", "uploaded_data") %>

This means that both a pdf object and a program object are being passed back to the program controller. Since we know that a program has_many pdfs, we can use program.pdfs.build in order to create -- then validate and ultimately save -- pdf objects.

Validating in context of parent resource:

I needed to check for errors with my program fields and my pdf field. This snippet is what I ended up with. The second line, validate_and_build_pdf is just a result of refactoring: it allows the validation methods to be run if you select a file through the upload file field (so if you try to upload a gif, the validation in Pdf.rb will squak) -- but not if you just leave that field empty.

#[programs_controller.rb]
def create
  @program = Program.new(params[:program])
   validate_and_build_pdf(params[:pdf],params[:pdf][:uploaded_data])
   if @program.save
      flash[:notice] = 'Program was successfully created.'
      redirect_to :action => 'list' and return
   else
      render :action => 'new'
   end
end

# this method stops program from saving the asset if the field is blank,
# and returns type-errors if relevant (files not PDF)
def validate_and_build_pdf(fileParams,file)
  if Asset.is_valid_file?(file)
     @pdf = @program.pdfs.build(fileParams)
  end
end

The reason for the Asset.is_valid_file? method was the same: if no file was uploaded an entry was still being made in the assets table due to the polymorphic relationship. So I had to make sure I was dealing with an actual fileobject, not an empty field. Since this would be used across the application for all types of file uploads, I made this a class method of Asset.

[Asset.rb]
# Checks that the object is one of the following types before running validation on it
def self.is_valid_file?(fileObj)
  if fileObj.is_a?(Tempfile)|| fileObj.is_a?(StringIO)
     || (defined?(ActionController::TestUploadedFile) && fileObj.instance_of?(ActionController::TestUploadedFile))
     true
  else
     false
  end
end

You may be wondering what's up with the ActionController::TestUploadedFile condition -- it's because when you run functional tests of uploaded assets, that is what they are -- they are not instances of fileObj.

Once it's determined that you're dealing with a valid file and you hit custom validation methods (such as informing the user that the file needs to be a pdf), you can include them with your program error messages so you only get one messages box at the top of your form, not two:

<%= error_messages_for :program, :pdf %>

Pass Functional Tests

This part is where you test all your uploads from your functional tests. To test uploaded files, you need to put files in your fixtures/files directory then build your tests.

# [programs_controller_test.rb]
def test_create_with_pdf
  fdata = fixture_file_upload('/files/semi.pdf', 'application/pdf')
  num_programs = Program.count
  post :create,   :multipart => true,
  :pdf=>{"uploaded_data"=>fdata},
  :program => {:name=>"Newest Created Program",
                    :is_ongoing=>1,
                    :status=>"Edit"}
  assert_response :redirect
  assert_redirected_to :action => 'list'
  assert_equal num_programs + 1, Program.count
end

I had a gotcha here -- I realized that I has to have the pdf array separate from the program array (since in the form it’s field_for pdf). I fixed it by submitting two objects, program and pdf, to the controller, seen above.

Finally, running the functional testing leaves files on the file system and they should be deleted at the end of all tests. So I added a destroy method in the teardown, which removes them.

def teardown
  Pdf.find(:all).each { |p| p.destroy }
end

Failing Upload Tests
Just to check that only Pdfs can be uploaded for programs, I added a handful of other file types to my directory and created an array of invalid file types in my setup:

# [programs_controller_test.rb]
def setup
 # other stuff....
 @invalid_files = [['lipsum.doc',"application/doc"],['lipsum.rtf',"application/rtf"],
             ['lipsum.txt',"text/plain"],['squiggle.gif','image/gif' ],
             ['squiggle.jpg','image/jpg' ], ['squiggle.png','image/png' ],
             ['squiggle.psd',"image/x-photoshop"],['squiggle.zip', 'application/zip']
             ]
end

and created a test to check these uploads will fail:

  def test_create_with_invalid_file_types
    @invalid_files.each do |name, type|
      fdata = fixture_file_upload('/files/' + name, type)
      post :create,   :multipart => true,
                      :pdf=>{"uploaded_data"=>fdata},
                      :program => {:name=>"Newerest Created Program",
                             :is_ongoing=>1,
                             :status=>"Edit",
                             :start_date=>Time.now.tomorrow,
                             :end_date=>''}
      assert_response 200
    end
  end

This gave me peace of mind that my files are being uploaded, only the right types are accepted, and all is good in the land of PDFs and Programs. And voila, we are done and on to the next task.


Link Outs

Here's a listing of various online resources on the topics covered in this post. This should be enough for a good start. Happy Uploading!

ˆ Back to top

Mysql gotcha

February 9th, 2007 by comment Liz

I fix bugs for a living. I found this doozy recently. Get out your sticky yellow pad and write this down:

Don’t do this:

select now() - max(my_timestamp) from my_table;

Go find a table with a TIMESTAMP field in it.

mysql> select my_timestamp from my_table limit 1;
my_timestamp: 2006-02-15 02:22:15
1 row in set (0.00 sec)

Looks nice. Now try:

mysql> select max(my_timestamp) from my_table;
max(my_timestamp): 2007-01-31 22:39:17
1 row in set (0.35 sec)

Now this:

mysql> select max(my_timestamp)+0 from my_table;
max(my_timestamp)+0: 2007
1 row in set (0.35 sec)

What happened? max(my_timestamp) was evaluated in a numeric context. The timestamp string was truncated at the first non-numeric character.

Something a bit different happens to now:

mysql> select now();
now(): 2007-02-06 22:47:25
1 row in set (0.00 sec)
mysql> select now()+0;
now()+0: 20070206224744
1 row in set (0.00 sec)

So the original exssion now() – max(my_timestamp) will yield

20070206224744 – 2007

which I doubt is what you had in mind.

If you need to calculate the difference in days, try:

to_days(now()) - max(to_days(my_timestamp))

For something a bit more fine-grained, try:

select (unix_timestamp(now())- unix_timestamp(max(stamp)))
ˆ Back to top