RESTful Thoughts on a Web 2.0 Python Project
August 16th, 2007 byIn September of last year, I was contracted by an independent investor to do the server side implementation of a Web 2.0 application from the ground up. It was an exciting new project, one which kept me up late at night through many nights, doing design and development. It had been a very long time since I had worked on a project that I found so exciting that it replaced my need for sleep. It was a geeks’ dream come true. It was great fun.
My first step was to get to know the front end Dojo guru who was working on the user interface. We were separated by thousands of miles and many time zones. We had never met in person (and still have not to this day). We were going to be spending long and occasionally stressful nights together in virtual space, so we spent time chatting to get to know each other. After the first month, we could both almost intuit when the other was tired, frustrated, or both. After a short time we worked so well together that communication was fast and efficient, and the work was not hindered by our physical distance. There were occasions where it would have been great to draw together on a real whiteboard. But we compensated for this loss fairly well. It was the first entirely virtual project I had worked on, and the social aspects of how we worked together were just as interesting as the technical aspects of the project.
We spent time evaluating tools which would achieve our goals of rapid design and development, and excellent performance both in the browser and on the server. For the server development I chose CherryPy and SqlAlchemy, using Postgresql. For the front end we chose Dojo and Javascript, using JSON to communicate between the browser and the server.
I started with the database schemas in Postgresql, then the wrappers which automatically generate one database per set of client data. The basics on how I do this in Postgresql are here:
This model of one database per data set per client worked so well with the client’s particular data that it was a natural fit. Scripting dynamic archiving and recovery took under one day in Python and Psycopg, and worked like a charm.
Here is a peek at how Postgresql can be used in CherryPy (or any other web framework, for that matter) to snapshot to backup tables, and subsequently to disk:
def take_snapshot(self):
try:
'''
All timestamps are UTC.
'''
snaptime = datetime.datetime.now()
snaptime = pytz.timezone(config.server_timezone).localize(snaptime)
snaptime = snaptime.astimezone(pytz.timezone('UTC'))
snaptime_str = snaptime.strftime("%Y%m%d_%H%M%S_%Z")
db = self.login.open_session_db()
#query = 'select datname from pg_database'
query = "select tablename from pg_tables where tablename not like 'pg_%' and tablename not like 'sql_%' and tablename not like '%_utc';"
our_tables = db.queryAndFetchAll(query)
mod = ""
for table in our_tables:
table = table[0]
'''
This table grows forever. Never recover it
from snapshot.
'''
if table in ('user_change'):
continue
mod += "create table %s_%s as (select * from %s);\n" % (table,snaptime_str,table)
print "Taking snapshot: %s" % mod
db.executeMod(mod)
except:
raise
db.close()
ft=open(TemplatePath + '/login')
showPage=ft.read()
ft.close()
showPage = "<P> Invalid User Name or Password.</P>" + showPage
return showPage
db.close()
return snaptime_str
take_snapshot.exposed = True
Small snippets on the key concepts of flushing/retrieving snapshots from/to disk:
def archive_to_disk(self,snaptime_str):
[...]
for archive_table in archive_tables:
archive_table = archive_table[0]
shmod += "mkdir -p %s; pg_dump -U tp -t %s -d -f %s/%s %s\n" % (archive_dir,archive_table,archive_dir,archive_table,dbname)
mod += "drop table %s;\n" % archive_table
print "Archiving old snapshot tables to disk: %s" % shmod
os.popen(shmod)
[...]
def revert_to_snapshot(self,snaptime_str):
[...]
for revert_table in revert_tables:
'''
Move current tables to current snapshots.
Copy older snapshot tables to current tables.
'''
revert_table = revert_table[0]
table = re.sub('_%s' % snaptime_str,'',revert_table)
mod += "alter table %s rename to %s_%s; create table %s as (select * from %s);\n" % (table,table,curr_snaptime_str,table,revert_table)
print "Taking snapshot, and reverting to old snapshot: %s" % mod
db.executeMod(mod)
[...]
Why spend so much Web 2.0 article space talking about database concepts and client data handling? Because while designing Web 2.0 applications, all matters involving client data take up somewhere between one-third to 50% of the development time and energy. Client data is the very reason the application exists in the first place. The more time and energy placed into making sure it is kept, stored and manipulated in the best possible way for the application, means less headaches and refactoring as the product matures.
Once the database schema and wrappers were in place (much of those details were mercifully left out of this article), the business logic was next. When designing a RESTful application, a great bit of thought goes into the design of the state transactions between the browser and server. For our needs, passing massive XML structures back and forth to communicate state seemed like overkill. Passing JSON dictionaries with well defined fields back and forth seemed like a better solution, for speed and ease of customization.
Once the ‘how’ part is resolved, the next question is what will be passed back and forth between browser and server. In a REST implementation, the goal is to pass a complete transaction from browser to server, and a complete transaction from server back to browser. The definition of a transaction will vary based on the application. For RSS feeds, a transaction is only one-way, where the server constructs and sends a complete XML document to an RSS feed reader. This is probably the simplest REST implementation in existence right now. I’ll be addressing a more complex transaction model, where client and server pass each other data, and either side can manipulate the data and pass it back to the other side.
To achieve a RESTful state transaction model, in a nutshell, means passing enough data back and forth so that a user can click on any feature or function of the application, in any order, and the server will (1) know what to do every time without fail, and (2) will not ‘remember’ each client’s previous states. This implies that the server is ’stateless’ with respect to specific client data. The server has a state of it’s own, and knows it’s own transaction state, of course. But it is only ‘aware’ of each client’s state when it is contacted by the individual clients, and the client notifies the server of it’s state.
It helps tremendously to work out the ‘chatter’ between the client and the server in human language before designing it. Here is an example of RESTful chatter between a client inventory application, and it’s corresponding server application:
server’s current state: “My current state is fifty boxes of paper clips in inventory.”
client one: “Hi server. Last time I contacted you, you had seventy boxes of paper clips in inventory. I am placing an order for sixty. I will accept a smaller quantity if you have >= forty in inventory. Bye!”
client two: “Hi server. I need twenty boxes of paper clips. I have no clue how many you had in inventory last time I contacted you, and I don’t really care. Fulfill this exact order or cancel it, no exceptions. Smell ya later.”
server: “Received client two’s request. Hi client two, you are properly authenticated, so I’ll look at your state. You want exactly twenty boxes of paper clips. You will make no exceptions. I currently have fifty boxes. I don’t care how many you have, I only care about how many you need. Your order is fulfilled, and I have thirty left. Bye.”
server: “Received client one’s request. Hi client one, you are properly authenticated, so I’ll look at your state. You want sixty boxes of paper clips. You will settle for a minimum of forty. My current state is only thirty in inventory. Your order is not fulfilled, and I have thirty left. Bye.”
Compare this to non-RESTful chatter between client and server:
client two:”Hi server.”
server: “Hi client two. You are authenticated, so I’ll continue to talk to you.” (server stores the state of talking to client two, properly authenticated.)
client one:”Hi server.”
server :”Hi client one. You are authenticated, so I’ll continue to talk to you.” (server stores the state of talking to client one, properly authenticated.)
client one:*BOOM* (crashed, blue screen of death, user restarts session once machine reboots)
client two: “server, hook me up with twenty boxes of paper clips.”
client two: *WHOMP* (browser crashes, user restarts browser)
server: “done, server two….hey wait, I can’t respond to you. How strange.”
client one: “Hi server.”
server: “Client one, you just authenticated, and you’re trying to authenticate again? I have to reject your request. Bye.”
client two: “Hi server.”
server: “client two: you have an outstanding transaction, but now your session ID is different. Are you trying to trick me? Get out of here. Bye.”
client two: “Huh? I just loaded, and have no idea what you’re talking about.”
server: “You are in a messed up state. Your authentication is rejected. Call customer support at 1-800-….”
client two: “????”
The point of the chatter is to ensure that the client is passing a complete transaction with every request, and that the server passes a complete transaction with every response. Similar to the CRC cards for OO design, the Chatter ensures that you have your states, transactions and interfaces well defined.
Note that a crash of the server in a REST implementation means that the client can’t reach the server with it’s complete transaction, so it waits and accumulates transactions, sending bulk transactions when the server is available once again. The crash of a client in a REST implementation means that if the server cannot acknowledge the client’s transaction, it can either drop it, knowing it will arrive again, or process it and ignore the duplicate transaction that may arrive again when the client is back up. There are no “partial states” which need to be resolved, the transaction was either accepted and completed, or rejected. Recovery in a REST implementation becomes trivial because of this simplicity, and this saves a tremendous amount of development time and energy.
Once your chatter seems complete, it can be translated into a header and data components. For this application, this means that the JSON structure coming from the client to the server has fixed fields with expected values, and data arrays/dictionaries with predefined formats.
The server receives the JSON data, decodes it into Python dictionaries and arrays, and first evaluates the ‘header’, as defined by the developers. The header tells the server the client’s current state, as well as the server’s last known state when the client last contacted the server. (For our application, this level of detail was necessary. Not all applications would need to know about the server state, and this needs to be decided while devising the chatter.)
Once the client’s full state is known, the client’s data can be processed against the server’s current state data.
The server constructs a reply header and data, as defined by the developers, comprised of expected fields and proposed values. In the case of inventory, let’s say client two needed to be notified of client one’s transaction, because these are franchise stores, and all inventory data is shared between them. The returned data set from server to client one would be the acknowledgment header (processed_order=False) plus the data received from other clients (other_orders_since_you_last_contacted_me=client two placed this order at that date and time).
CherryPy turned out to be an excellent framework for this task. It’s thin, fast, threaded architecture, and clean session handling made REST implementation quite easy. The same methods are called by every client, to contact the server and pass across it’s data. The same method is called by the server threads to respond to every client. The API between client and server is reduced down to about five methods. Granted, the object which checks the client’s current state against the server’s current state is quite large. But the transaction recovery code is quite small, and the objects are extensible, to handle new transaction types.
This article assumes the model of user authentication separate from transactions. The subject of cookieless authentication won’t be addressed here, but it is an option to consider, based on the application requirements.
This article also only addressed a Python implementation. Rails developers have a bit of an advantage, having a built-in REST methodology. I hope some articles to come will address Rails implementations.

August 17th, 2007 at 7:45 pm
Why spend so much Web 2.0 article space talking about database concepts and client data handling? Because while designing Web 2.0 applications, all matters involving client data take up somewhere between one-third to 50% of the development time and energy.
Wondering if you considered using Zope 3 and ZODB.
August 18th, 2007 at 7:54 am
Hi Julian,
This is a great question. I’ve used and maintained projects on earlier versions of Zope. And several years ago I implemented a touch screen POS application using the ZODB separate from Zope.
Zope is not my favorite framework. I did not like the templating options. I also found many aspects of it non-intuitive and painful, such as skin changes, mapping objects to disk, and URL redirection.
However, I loved the ZODB, since it made database access so Pythonic that it looks like accessing a standard dictionary. The object access is stunning. But sorting by columns became very awkward very quickly. The one advantage of a relational DB is the fact that you can sort on any column, and join combinations of columns dynamically. In ZODB this is dictionary manipulation, which is not horrible, but also is too foreign for investors to understand.
There is a stigma around funding projects which do not use well understood relational database concepts. So although I loved the ZODB, I could not ’sell’ it, because porting data out of it was not a function of SQL.
Since then I have settled on SqlAlchemy, which is an outstanding object-relational mapper, and gives investors what they expect with respect to database portability and reuse.
Irrespective of the tools, much thought does need to go into questions such as:
1- One monolithic db with all client data, or one db per client/data set?
2- How will the data be reused and sorted?
3- What foreign keys and views make the data reliable and yet still fully extensible?
I don’t think different tools make these questions go away. They are unavoidable aspects of the application design itself.
August 20th, 2007 at 11:51 am
Gloria,
I have read the previous comment and can appreciate that people think SQL when they think databased - unfortunate but true. You should try http://www.schevo.org IMHO it is a fantastic alternative to SQL and slightly like ZODB but better. I would be interested in your opinion of Schevo.
Additionally, I would also like to know how you narrowed down to CherryPy and not some other framework - I know this question is probably common but what specifically made you choose CherryPy. Finally, did you consider Pylons?
Nice blog entry. Thanks
August 20th, 2007 at 3:52 pm
Hi Keios,
I looked at Schevo, and it looks like an interesting hybrid between ZODB and SqlAlchemy. The use of the ‘root’ as the database root is the same concept as the ZODB. And the find and execute syntax looks almost excatly like the fetch and execute syntax of SqlAlchemy. The big exception is this syntax:
>>> actor = db.Actor.findone(name=’Winona Ryder’)
>>> castings = actor.m.movie_castings()
>>> movies = [casting.movie for casting in castings]
>>> for movie in movies:
… print movie
A Scanner Darkly (2006)
Edward Scissorhands (1990)
Iterators to navigate database data! Perfect. It also addresses the many-to-many problem with a good solution called an intermediary extent, which translates to a Python iterator. Very nice indeed.
Back in 1999, after spending a year studying commercial and free OODBs for my own patent prototype, I fully supported the “movement” to get rid of SQL.
The problem of switching seems like more of a social problem than a technical one. Many people in many countries speaking all different human languages, and writing in different programming languages, have learned and understand SQL. It can be accessed from PHP, Python, C, C++, Perl, and the command line, just to name a few interfaces.
An equivalent OODB would need APIs from anywhere to it, even if it means you use SWIG to create language wrappers into the Python interface. I am not sure the world is ready for an OODB, but a hybrid like this is definitely worth considering.
To answer your second question, I came to CherryPy because, for my REST model, I was interfacing to Dojo. So in my web framework, I did not need a templating language. I had done my own database construct and SqlAlchemy interface, so I did not need the DBI. I needed a thin, fast, threaded framework, with simple URL mapping. I use Apache to do the SSL and complex URL routing, so this would sit behind Apache. I also had a strong Python preference, since I would be interfacing to some large bits of Python algorithm and data manipulation server-side code I had written.
CherryPy fit perfectly in this well defined nitche, satisfying all of my project needs, wants and requirements. I was, and still am impressed with the support on the CherryPy user group, and I like the direction that the product has taken over the past year or so. Pylons, back about two years ago, did not seem to have the same forward momentum or support as CherryPy, which is why I glanced at it but did not consider it.
Thanks for the great discussion,
Gloria
August 20th, 2007 at 9:04 pm
Gloria, Thanks for taking the time out to respond.
August 20th, 2007 at 9:41 pm
Hello GloriaJW,
Good post on designing transactions by defining “transactional” interactions. There’s always a trade-off between lousy(lazy?) interaction/protocol design and how complex your transaction management gets.
On a related note … Using django seems great for “simple” apps. Once you get to transactions that span multiple models (with creates/updates mixed up) django (and most other frameworks) give up or become very hard to work with.
For example … Lets say you have a book library site :) You want to be able to enter people’s favourite lists as one transaction. They get to pick from AllTheBooks, and create a MyFavouriteList which is a list of MyFavouriteItem. And My FavouriteItem is essentially a foregin key to an existing book from AllTheBooks along with a comment.
So you have
MyFavoutiesList = {MyFavourites}
MyFavourite = { SomeBookId + comment/rating }
If I want to wrap a transaction around the creation of MyFavouriteList _and_ it’s items - i.e. either I enter all my favourites and create a list - or I leave the database as is (no dangling MyFavourite item entries). Doing this is a paint the **s with django. The problem is that the admin forms are weirdly cryptic at best, and modifying the admin code to deal with nested transaction is a royal pain.
Have you run into something similar ? I can’t believe I’m the only person to hit this problem … but I”ve seen no blog-moans about it. Hence this moan!
Thanks,
RSK.
August 21st, 2007 at 9:36 am
Hi Krishnan,
I stopped using Django over a year ago. I agree, I have only seen simple monolithic DB models implemented in Django. A big turn-off for me was Django’s inability to open more than one DB per application. This immediately disqualified Django from most of my projects at the time.
The good news about Django, well at least a year ago, is that the support was fairly decent, and the response time on the google group was very good. You can either ask for help in the Django users group, or write your own DB module to handle this one transaction in SqlAlchemy.
Joins and transactions are really simple in SqlAlchemy, and Django is flexible enough to let you import your own DB module and call your own DB code at certain points. Maybe this is a lot easier than force-fitting your transaction into their model?
When I used Django, I found myself writing almost all of my DB transaction code in separate modules. In the end I simply dropped Django for CherryPy, because it had no DB interface (that was one of many reasons).
With the advent of SqlAlchemy, writing even a complex schema and corresponding interface from beginning to end can take about a week to two weeks at most. I personally like the DB interface decoupled from the web framework, because it takes me so little time to crank out new DB schemas and interfaces these days.
But this is my personal preference, of course. No single web framework is the “silver bullet”, and each project varies enough to force me to redo this evaluation from time to time.
I ultimately hope the Django group can help you work out your transaction via their interface. However, if this is a recurring theme during your project development, I recommend DIY with SqlAlchemy.
Thanks for your comments,
Gloria
October 2nd, 2007 at 9:58 pm
Thank you for your post, especially the stateless REST stuff was helpful. I’m presently working on a multi-page form app in Rails that does loading (GET) and updating (PUT) in separate requests. While it seems to fit with Rails REST, I think what I need is something like you described. Otherwise I can imagine various concurrency issues cropping out (error handling is rather challenging when loading and saving simultaneously).
Just wanted to say thanks. Now to figure out how to approach it with Rails… hrm.
October 7th, 2007 at 7:18 pm
Nathan, so sorry for the delayed response. I had a busy week.
There are some very intelligent women here working with Rails/Merb and REST designs. I hope they have the time to give you some pointers. I will post your request to the list to let them know it’s here (looks like a busy time for us all).
Gloria