Auto-creating MySql and Postgresql Databases with Powerful Python Tools


CherryPyMySQLPostgreSQLPythonThoughts

When designing and development web framework software for clients, I first spend a bit of time focused on the client data. I try to determine the traits of the client data, asking these questions:
Is it temporary?
If so, for how long? Should it expire and disappear automatically or manually?
Should it be archived?
How many users/applications will be entering data?
Is each user’s data mutually exclusive? If not, what is the logical grouping of users and data?
What level of control do users have over their own, or other user/app data?
How is the data used by the clients? Are there “super users”?

The database is such an essential part of the design, and being able to automatically create, archive, purge and delete databases is tremendously handy in so many of these designs.

I wrote this example to generate dynamic databases for short courses, training sessions, and demos. I use the example itself to demonstrate the functionality of CherryPy and SqlAlchemy.

First of all, I start my CherryPy process on my local port 7001, allowing me to do this:

http://grrlcamp.org:7001/dbi/create_my_db?db_type=postgres

or this:

http://grrlcamp.org:7001/dbi/create_my_db?db_type=mysql

First, the small bits of quirkiness in Postgresql and MySql

Starting a Postgresql session on the command line, we see what happens before the URL is referenced:

[cwc2008@localhost user1]$ psql -d template1 -U my_super_user
Welcome to psql 8.2.6, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

template1=# SELECT datname FROM pg_database ;
  datname
-----------
 postgres
 template1
 template0
(3 rows)

template1=#

and after (Note: I had to exit the command line session, run the URL, then re-enter the command line session):

template1=# SELECT datname FROM pg_database ;
   datname
--------------
 postgres
 template1
 template0
 cwcuser_7001
(4 rows)

template1=#

The cwcuser_7001 name is auto-generated from the user’s ID and port number, making it useful for training sessions with many users.

my_super_user was created using the Postgresql createuser command, and, of course, must be a super user to do this operation. The template1 database is the Postgresql master database, from which all other databases in that instance are visible.

Caveat: In Postgresql, only one user can access template1 at any given time. Violating this rule gives you this error:

template1=# create database blahblah;
ERROR:  source database "template1" is being accessed by other users
template1=#

Be sure to close your database descriptor when finished with template1. If this is not a production app, and/or you feel comfortable bouncing the instance, this lock can also be cleared by running:

-bash-3.2$ /usr/local/postgresql/8.2.6/bin/pg_ctl restart -m immediate -D/usr/local/postgresql/8.2.6/data

waiting for server to shut down... done
server stopped
server starting
-bash-3.2$ LOG:  database system was interrupted at 2008-02-14 15:07:47 EST
LOG:  checkpoint record is at 0/4B9054
LOG:  redo record is at 0/4B9054; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 0/2046; next OID: 57344
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  record with zero length at 0/4B909C
LOG:  redo is not required
LOG:  database system is ready

This is nasty. In live production environments, do this with great care.

MySql is a bit forgiving when adding, removing or accessing new databases. It does not require the isolation level to be set, as in Postgresql.

We see what databases exist before:

/usr/local/mysql6/bin/mysql --socket=/tmp/mysql10.sock  -u my_super_user -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 170
Server version: 6.0.3-alpha-log Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| test               |
+--------------------+
3 rows in set (0.00 sec)

mysql>

and after:

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| cwcuser_7001       |
| mysql              |
| test               |
+--------------------+
4 rows in set (0.00 sec)

mysql>

MySql does not require my hopping in and out of the command line like Postgresql. It handles user scope much better, in my opinion. It does, however require each new non-super user to be granted access to each database they need, but this is (1) trivial, and (2) outside of the scope of this article.

The SqlAlchemy code to handle the quirky bits

This is invoked by CherryPy, but can be run in any other manner of Python execution. I have a Class called DBInterface, method is called create_engine:

class DBInterface:
    #~~~~~~~~~~~~~~~~~~~~~~~
    def create_engine(self,choice,local_dbname=None):

        if choice == 'super_mysql':

            '''
            Super user MySql commands do not require connection to a database.
            '''
            db = sqlalchemy.create_engine('mysql://%s:%s@%s' % ('my_super_user','some_password','127.0.0.1:3333'),strategy='thr
eadlocal')
            # 'self' is not threadsafe! Don't use it!
            #self.show_databases = "show databases ;"
            cherrypy.session['show_databases'] = "show databases ;"

        elif choice == 'mysql':
            db = sqlalchemy.create_engine('mysql://%s:%s@%s/%s' % ('my_super_user','some_password','127.0.0.1:3333',local_dbnam
e),strategy='threadlocal')

        elif choice == 'super_postgres':
            '''
            Super user Postgresql commands require that you connect to the
            built-in template1 database, and set the isolation level to 0.
            This can't be done in SqlAlchemy, so we do it in Psycopg.
            '''
            DSN_string = "dbname=template1 port=2222 user=my_super_user  host=127.0.0.1 password=some_password"
            def connection():
                self.local_conn = psycopg.connect(DSN_string)
                self.local_conn.set_isolation_level(0)
                return self.local_conn

            db = sqlalchemy.create_engine('postgres://',creator=connection, strategy='threadlocal')
            # NO!
            #self.show_databases = "SELECT datname FROM pg_database ;"
            cherrypy.session['show_databases'] = "SELECT datname FROM pg_database ;"

        elif choice == 'postgres':
            db = sqlalchemy.create_engine('postgres://%s:%s@%s/%s' % ('my_super_user','some_password','127.0.0.1:2222',local_db
name),strategy='threadlocal')

        else:
            db = None

        return db

The exposed method in the same class which invokes the create_engine method, is called create_my_db, which we saw in the URL above (I will address the CherryPy URL functionality after this point):

    @cherrypy.expose
    def create_my_db(self,db_type):

        '''
        Creating a db is a super user act.
        '''
        cherrypy.session['db_type'] = 'super_' + db_type
        cherrypy.session['my_dbname'] = "cwcuser_%s" % (cherrypy.request.config['server.socket_port'])

        db = self.create_engine(cherrypy.session['db_type'])
        db.execute("create database %s ;" % cherrypy.session['my_dbname'])
        databases = db.execute(cherrypy.session['show_databases'])

        dblist = []
        for d in databases:
             dblist.append(d[0])

        print ', '.join(dblist)
        self.dbclose(cherrypy.session['db_type'],databases)
        return ', '.join(dblist)

Because I clump together super user and non-super user database engine connection functionality, I distinguish the difference by modifying the users db type to a “super_” setting in a CherryPy session variable. This way, only one method is called in various internal modes of operation. When in non-super user mode, the SqlAlchemy connect string is the same for both Postgresql and MySql.

No more db quirks in this app: The rest is clean SqlAlchemy

All db operations after this point use the exact same SqlAlchemy code for both MySql and Postgresql. That is a profoundly beautiful thing.

The destroy_my_db method is generic:

    @cherrypy.expose
    def destroy_my_db(self,db_type):

        '''
        Destroying a db is a super user act.
        '''
        cherrypy.session['db_type'] = 'super_' + db_type

        if not cherrypy.session.has_key('my_dbname'):
            cherrypy.session['my_dbname'] = "cwcuser_%s" % (cherrypy.request.config['server.socket_port'])

        db = self.create_engine(cherrypy.session['db_type'])
        db.execute("drop database %s ;" % cherrypy.session['my_dbname'])
        databases = db.execute(cherrypy.session['show_databases'])

        dblist = []
        for d in databases:
             dblist.append(d[0])

        print ', '.join(dblist)
        self.dbclose(cherrypy.session['db_type'],databases)
        return ', '.join(dblist)

As above, all other database operations from this point forward are generic SqlAlchemy code. Yaaay!

The CherryPy layer

My CherryPy main module, which sets up the needed objects and assoiated URLs, looks like this:

import cherrypy
import pdb
import sys,os,re,traceback
import user_data
sys.path.append('../')

#hostname = os.environ.get('HOSTNAME')
#hostname = re.sub('\..*$','',hostname)

# Add your app here.
import db_wrapper

# Only generic headers and footers go here.
# To use generic templates in your app, you may need to symbolicly link
# them into your Templates dir, or references them by absolute path.
TemplatePath = os.path.join(os.path.dirname(__file__), '/cherrypy/dev/templates')

# Add your class here. Note that the 'root' should be the authentication module.

# To test, go to http://grrlcamp.org:1112/run_test
root = db_wrapper.WrapperTest()

# http://grrlcamp.org:port/dbi/create_my_db, destroy_my_db, etc.
root.dbi = db_wrapper.DBInterface()

# http://grrlcamp.org:port/user/get_user_data
root.user = user_data.UserData()

try:
        cherrypy.tree.mount(root)
        cherrypy.config.update(os.path.join(os.path.dirname(__file__),'cp.conf'))
        cherrypy.server.quickstart()
        cherrypy.engine.start()

except SystemExit:
        #smtpInterface.sendSysAdminEmail('cherrypy@xyz.com','sysadmin@xyz.com',"From %s: CherryPy is going down due to shutdown." % hostname,"")
        sys.exit()

except:
        print "ERROR: CherryPy crashed, error:"
        raise

finally:
        #smtpInterface.sendSysAdminEmail('cherrypy@xyz.com','sysadmin@xyz.com',"From %s: CherryPy is going down (from finally clause, check for another error message after this one)." % hostname,"")
        pass

The CherryPy “root” determines what objects are accessible via URL. I mounted a simple test under the root, and nested the above database code under root.dbi, hence the /dbi/ reference in the URL.

root.dbi contains an instance of the DBInterface class we saw earlier. All exposed methods in the DBInterface class become accessible via the /dbi/method_name URL when CherryPy is started.

I’ve attached the complete code used for training, configured to run on localhost, and connect to both MySql and Postgresql daemons. It contains some explicit redundancy and commented code for learning purposes. Your configuration variables will vary slightly. Enjoy.

auto-create-any-db.tar

GrrlCamp NYC 2008


EventsMySQLPostgreSQLPythonThoughts

We are kicking off GrrlCamp 2008. There are two GrrlCamps being started from this list. This GrrlCamp starts now, runs until May 2008, and meets in-person in NYC in May of 2008.

Qualifications:
- Female (willing to voice verify. You’d be surprised that this is necessary, but at times, it is.)
- Programmer, or programmer wannabe.
- Able to meet online weekly for one hour.
- Able to dedicate a few hours a week to this project.
- Willing to read, try, do, roll up your sleeves and design and develop, even if you’ve never done this before.
- Able to travel to NYC in May for the software wrap-up and release (optional). Your expenses = your round-trip flight/ride, and minimal donations for food. Lodging is free, food will be prepared and cooked by participants.
- Willing to work in Python (If we get a Rails following, we may use both).
- Willing to learn Dojo, CSS and HTML for the user-facing interface.
- Willing to release the software to the public, free of charge.
- Willing to provide small bits of support after development is complete.

This is a large project, touching on everything from basic Linux admin, svn setup, LINUX shell, Apache configuration, threading, database schema design and access, process/content management, front end design and development, inter-language data transfer, and the list goes on. If you are willing, this is an outstanding learning opportunity.

If interested, reply to this post with a valid email address, and we will send you an invitation.

Please, no observers/reporters. This is strictly a development group, and all members must be willing to write code.

RESTful Thoughts on a Web 2.0 Python Project


CherryPyPostgreSQLPythonThoughts

In September of last year, I was contracted by an independent investor to do the server side implementation of a Web 2.0 application from the ground up. It was an exciting new project, one which kept me up late at night through many nights, doing design and development. It had been a very long time since I had worked on a project that I found so exciting that it replaced my need for sleep. It was a geeks’ dream come true. It was great fun.

My first step was to get to know the front end Dojo guru who was working on the user interface. We were separated by thousands of miles and many time zones. We had never met in person (and still have not to this day). We were going to be spending long and occasionally stressful nights together in virtual space, so we spent time chatting to get to know each other. After the first month, we could both almost intuit when the other was tired, frustrated, or both. After a short time we worked so well together that communication was fast and efficient, and the work was not hindered by our physical distance. There were occasions where it would have been great to draw together on a real whiteboard. But we compensated for this loss fairly well. It was the first entirely virtual project I had worked on, and the social aspects of how we worked together were just as interesting as the technical aspects of the project.

We spent time evaluating tools which would achieve our goals of rapid design and development, and excellent performance both in the browser and on the server. For the server development I chose CherryPy and SqlAlchemy, using Postgresql. For the front end we chose Dojo and Javascript, using JSON to communicate between the browser and the server.

I started with the database schemas in Postgresql, then the wrappers which automatically generate one database per set of client data. The basics on how I do this in Postgresql are here:

http://www.devchix.com/2007/05/27/dynamic-web-content-what-you-can-do-in-postgresql-that-you-cannot-do-in-oracle/

This model of one database per data set per client worked so well with the client’s particular data that it was a natural fit. Scripting dynamic archiving and recovery took under one day in Python and Psycopg, and worked like a charm.

Here is a peek at how Postgresql can be used in CherryPy (or any other web framework, for that matter) to snapshot to backup tables, and subsequently to disk:

         def take_snapshot(self):
                try:
                        '''
                        All timestamps are UTC.
                        '''
                        snaptime = datetime.datetime.now()
                        snaptime = pytz.timezone(config.server_timezone).localize(snaptime)
                        snaptime = snaptime.astimezone(pytz.timezone('UTC'))
                        snaptime_str = snaptime.strftime("%Y%m%d_%H%M%S_%Z")

                        db = self.login.open_session_db()
                        #query = 'select datname from pg_database'
                        query = "select tablename from pg_tables where tablename not like 'pg_%' and tablename not like 'sql_%' and tablename not like '%_utc';"
                        our_tables = db.queryAndFetchAll(query)
                        mod = ""
                        for table in our_tables:
                                table = table[0]
                                '''
                                This table grows forever. Never recover it
                                from snapshot.
                                '''
                                if table in ('user_change'):
                                        continue

                                mod += "create table %s_%s as (select * from %s);\n" % (table,snaptime_str,table)
                        print "Taking snapshot: %s" % mod
                        db.executeMod(mod)

                except:
                        raise
                        db.close()
                        ft=open(TemplatePath + '/login')
                        showPage=ft.read()
                        ft.close()
                        showPage = "<P> Invalid User Name or Password.</P>" + showPage
                        return showPage

                db.close()
                return snaptime_str

        take_snapshot.exposed = True

Small snippets on the key concepts of flushing/retrieving snapshots from/to disk:

         def archive_to_disk(self,snaptime_str):
                        [...]
                        for archive_table in archive_tables:
                                archive_table = archive_table[0]

                                shmod += "mkdir -p %s; pg_dump -U tp -t %s -d -f %s/%s %s\n" % (archive_dir,archive_table,archive_dir,archive_table,dbname)
                                mod += "drop table %s;\n" % archive_table

                        print "Archiving old snapshot tables to disk: %s" % shmod
                        os.popen(shmod)
                        [...]

        def revert_to_snapshot(self,snaptime_str):
                        [...]
                        for revert_table in revert_tables:
                                '''
                                Move current tables to current snapshots.
                                Copy older snapshot tables to current tables.
                                '''
                                revert_table = revert_table[0]
                                table = re.sub('_%s' % snaptime_str,'',revert_table)
                                mod += "alter table %s rename to %s_%s; create table %s as (select * from %s);\n" % (table,table,curr_snaptime_str,table,revert_table)

                        print "Taking snapshot, and reverting to old snapshot: %s" % mod
                        db.executeMod(mod)
                        [...]

Why spend so much Web 2.0 article space talking about database concepts and client data handling? Because while designing Web 2.0 applications, all matters involving client data take up somewhere between one-third to 50% of the development time and energy. Client data is the very reason the application exists in the first place. The more time and energy placed into making sure it is kept, stored and manipulated in the best possible way for the application, means less headaches and refactoring as the product matures.

Once the database schema and wrappers were in place (much of those details were mercifully left out of this article), the business logic was next. When designing a RESTful application, a great bit of thought goes into the design of the state transactions between the browser and server. For our needs, passing massive XML structures back and forth to communicate state seemed like overkill. Passing JSON dictionaries with well defined fields back and forth seemed like a better solution, for speed and ease of customization.

Once the ‘how’ part is resolved, the next question is what will be passed back and forth between browser and server. In a REST implementation, the goal is to pass a complete transaction from browser to server, and a complete transaction from server back to browser. The definition of a transaction will vary based on the application. For RSS feeds, a transaction is only one-way, where the server constructs and sends a complete XML document to an RSS feed reader. This is probably the simplest REST implementation in existence right now. I’ll be addressing a more complex transaction model, where client and server pass each other data, and either side can manipulate the data and pass it back to the other side.

To achieve a RESTful state transaction model, in a nutshell, means passing enough data back and forth so that a user can click on any feature or function of the application, in any order, and the server will (1) know what to do every time without fail, and (2) will not ‘remember’ each client’s previous states. This implies that the server is ‘stateless’ with respect to specific client data. The server has a state of it’s own, and knows it’s own transaction state, of course. But it is only ‘aware’ of each client’s state when it is contacted by the individual clients, and the client notifies the server of it’s state.

It helps tremendously to work out the ‘chatter’ between the client and the server in human language before designing it. Here is an example of RESTful chatter between a client inventory application, and it’s corresponding server application:

server’s current state: “My current state is fifty boxes of paper clips in inventory.”

client one: “Hi server. Last time I contacted you, you had seventy boxes of paper clips in inventory. I am placing an order for sixty. I will accept a smaller quantity if you have >= forty in inventory. Bye!”

client two: “Hi server. I need twenty boxes of paper clips. I have no clue how many you had in inventory last time I contacted you, and I don’t really care. Fulfill this exact order or cancel it, no exceptions. Smell ya later.”

server: “Received client two’s request. Hi client two, you are properly authenticated, so I’ll look at your state. You want exactly twenty boxes of paper clips. You will make no exceptions. I currently have fifty boxes. I don’t care how many you have, I only care about how many you need. Your order is fulfilled, and I have thirty left. Bye.”

server: “Received client one’s request. Hi client one, you are properly authenticated, so I’ll look at your state. You want sixty boxes of paper clips. You will settle for a minimum of forty. My current state is only thirty in inventory. Your order is not fulfilled, and I have thirty left. Bye.”

Compare this to non-RESTful chatter between client and server:


client two:”Hi server.”

server: “Hi client two. You are authenticated, so I’ll continue to talk to you.” (server stores the state of talking to client two, properly authenticated.)

client one:”Hi server.”

server :”Hi client one. You are authenticated, so I’ll continue to talk to you.” (server stores the state of talking to client one, properly authenticated.)

client one:*BOOM* (crashed, blue screen of death, user restarts session once machine reboots)

client two: “server, hook me up with twenty boxes of paper clips.”

client two: *WHOMP* (browser crashes, user restarts browser)

server: “done, server two….hey wait, I can’t respond to you. How strange.”

client one: “Hi server.”

server: “Client one, you just authenticated, and you’re trying to authenticate again? I have to reject your request. Bye.”

client two: “Hi server.”

server: “client two: you have an outstanding transaction, but now your session ID is different. Are you trying to trick me? Get out of here. Bye.”

client two: “Huh? I just loaded, and have no idea what you’re talking about.”

server: “You are in a messed up state. Your authentication is rejected. Call customer support at 1-800-….”
client two: “????”

The point of the chatter is to ensure that the client is passing a complete transaction with every request, and that the server passes a complete transaction with every response. Similar to the CRC cards for OO design, the Chatter ensures that you have your states, transactions and interfaces well defined.

Note that a crash of the server in a REST implementation means that the client can’t reach the server with it’s complete transaction, so it waits and accumulates transactions, sending bulk transactions when the server is available once again. The crash of a client in a REST implementation means that if the server cannot acknowledge the client’s transaction, it can either drop it, knowing it will arrive again, or process it and ignore the duplicate transaction that may arrive again when the client is back up. There are no “partial states” which need to be resolved, the transaction was either accepted and completed, or rejected. Recovery in a REST implementation becomes trivial because of this simplicity, and this saves a tremendous amount of development time and energy.

Once your chatter seems complete, it can be translated into a header and data components. For this application, this means that the JSON structure coming from the client to the server has fixed fields with expected values, and data arrays/dictionaries with predefined formats.

The server receives the JSON data, decodes it into Python dictionaries and arrays, and first evaluates the ‘header’, as defined by the developers. The header tells the server the client’s current state, as well as the server’s last known state when the client last contacted the server. (For our application, this level of detail was necessary. Not all applications would need to know about the server state, and this needs to be decided while devising the chatter.)

Once the client’s full state is known, the client’s data can be processed against the server’s current state data.
The server constructs a reply header and data, as defined by the developers, comprised of expected fields and proposed values. In the case of inventory, let’s say client two needed to be notified of client one’s transaction, because these are franchise stores, and all inventory data is shared between them. The returned data set from server to client one would be the acknowledgment header (processed_order=False) plus the data received from other clients (other_orders_since_you_last_contacted_me=client two placed this order at that date and time).

CherryPy turned out to be an excellent framework for this task. It’s thin, fast, threaded architecture, and clean session handling made REST implementation quite easy. The same methods are called by every client, to contact the server and pass across it’s data. The same method is called by the server threads to respond to every client. The API between client and server is reduced down to about five methods. Granted, the object which checks the client’s current state against the server’s current state is quite large. But the transaction recovery code is quite small, and the objects are extensible, to handle new transaction types.

This article assumes the model of user authentication separate from transactions. The subject of cookieless authentication won’t be addressed here, but it is an option to consider, based on the application requirements.

This article also only addressed a Python implementation. Rails developers have a bit of an advantage, having a built-in REST methodology. I hope some articles to come will address Rails implementations.

Dynamic Web Content: What you can do in Postgresql that you cannot do in Oracle.


PostgreSQLTips and Tricks

With the advent of dynamic online content comes the need to create, destroy and manage this content in more automated and efficient ways. I have discovered some very nifty ways to do this in Postgresql which cannot be done as well, as easily, or even at all in Oracle. As it turns out, Postgresql is very well suited for dynamic database generation, for any size data set.

Let’s say you have a web site where users create dynamic content. You have these basic choices when determining how to store your content:

1. Place everything in a monolithic database (one large database, many indexed fields), sorting by key/indexed fields. User content can be distinguished by a unique user id, or maybe there is no need to distinguish user content at all. It depends on the data being stored, and the applied use of that data.

2. Dynamically create a database from a fixed (or even from a dynamic) database schema. Store all user data in a separate database, and metadata in a common (master) database.

Option 1 is the most common. Everyone doing web content seems to want to default to this model, maybe because it is the easiest to roll out quickly, maybe because they think they have no other choice.

In some cases, it’s an appropriate model based on the data set. An example that comes to mind which makes this model appropriate would be weather reporting. In a weather database, it probably doesn’t matter who recorded last night’s rainfall around the world, as long as the person had legitimate access, and recorded the data correctly. It also probably makes no sense to break the data out into different databases by zip code, region, or state. The purpose of collecting weather data is not only to report local weather conditions, but to also measure and compare the data statewide, nationwide, or worldwide. This is a contrived but decent example of a data set best being served by a monolithic database.

An example of data which can/should be broken out into separate databases is an inventory application. Let’s say you provide online inventory information for many companies, where each company is unrelated to another. Companies may even have security concerns, or may not want other companies to have access to their hardware inventory data, for example, because it may reveal trade secrets regarding what proprietary hardware designs they have devised for their business needs. Companies may also want to separate and secure their inventory data because competitors can derive sales information form this data.

A monolithic database could be used in this model, but how do you guarantee one client that their data is secure, and will not be exposed to another client, even by accident? Complex selective encryption methods could be used on the monolithic model, but why bother? Breaking the data out into separate client databases is ideal for this data model for many for these reasons:

1. Security: Data is inherently separate. Separate databases can easily be encrypted if necessary. Logins/passwords to each database can be isolated to the individual database.
2. Data Management: Individual client data can easily be archived, exported or purged to meet clients needs.
3. Local Database Support: The model inherently supports database servers owned and run by the client. If the client chooses to switch to a model where they run and maintain your software behind their own firewall, the client’s database can easily be exported, encrypted, sent to the client, decrypted, and imported on their server.

Of course, these issues can be dealt with in the monolithic model, with much wrestling and convoluted scripting/management, but again, why make the job of the developers and database admin more difficult?

Using the above example, let’s say we have an inventory database schema for Postgresql which looks something like this, in a file called client_db_schema (Highlight the text if you have trouble seeing it all):


CREATE SEQUENCE simple_inventory_id_seq  # The sequence number is for DBAPI 2.0 compliance.
    INCREMENT BY 1
    MINVALUE 1
    NO MAXVALUE
    CACHE 1;


CREATE TABLE simple_inventory (
        simple_inventory_id integer unique DEFAULT nextval(('simple_inventory_id_seq'::text)::regclass) NOT NULL PRIMARY KEY,
        item_number integer unique NOT NULL,
        item_description character varying NOT NULL,
        (...any other fields you can imagine)
);

(more client-specific tables go here)

After installing Postgresql, run the createuser command:

createuser -s queen_bean

It will prompt you for a password.

The “queen_bean” super user can create or destroy databases, create other users, etc. She really is the queen.

To create a new inventory database from Python and psycopg, do this:

import os
import psycopg

super_db = psycopg.connect('dbname=template1' + ' user='queen_bean' + ' password=whatever the password is' + ' port=' +   str(5432) + ' host=localhost')
cursor = super_db.cursor()

# Notice that 'template1' is the actual name of the Postgresql internal master database.
# Connect to this to do database create/delete operations, but don't delete the template1 database itself.

super_db.conn.set_isolation_level(0) # This tells Postgresql that we're about to do high level, 'narrow' admin operations.

for i in range(1,6):
     client_string = "client%d" % i
     #cursor.execute("drop database %s_inventory;" % client_string)  #optional
     cursor.execute("create database %s_inventory with encoding = 'UTF-8' ;" % client_string)
super_db.close()

To do this on the command line instead of in Python, you’d type this:

psql template1 queen_bean

psql> create database client1_inventory;
psql> create database client2_inventory;
etc...
pqsl> \\q

This loop created five different, empty client databases. Now we need to add the tables from the client database schema to each database:

# This is the schema file we created above. It's in the current directory, but could be anywhere (change the  ./ component).
new_tables = os.path.abspath(os.path.join(os.path.dirname(__file__),"./client_db_schema"))

# Read this file into memory:
fo = open(new_tables)
commands = ' '.join(fo.readlines())
fo.close()

for i in range(1,6):
     client_string = "client%d" % i

     client_db = psycopg.connect('dbname=' client_string + '_inventory' + ' user='queen_bean' + ' password=whatever the password is' + ' port=' +   str(5432) + ' host=localhost')
     cursor = client_db.cursor()

     cursor.execute(commands)
     client_db.close()

The command line equivalent would be:

psql client1_inventory queen_bean < ./client_db_schema
psql client2_inventory queen_bean < ./client_db_schema
etc...
\\q

The above code is an oversimplification of some db concepts. But it works, and it demonstrates how simple it is to dynamically generate Postgresql databases for client data management.

In Postgresql, all databases are managed under the same data area on the disk. They are all accessible through the same database instance, or daemon. This is why we only connected once to the template1 master database to create the client databases.

In Oracle, the terminology is slightly different. A ‘database’ is understood to be the running instance, or daemon. The ‘instance’ is an individual database as Postgresql would know it. To create new Oracle ‘instances’, or databases requires shutting the entire Oracle daemon down! This means that all users accessing all oracle instances at that moment lose connectivity, because a new database is being dynamically generated.

In any database tool, a shutdown is not trivial. Connection cleanup, index cleanup, memory flush to disk, and garbage collection, amongst other things, happen during shutdown. This means that shutdown could take minutes to perform, making this model impossible to use under Oracle. Being experienced in both Oracle and Postgresql, I could list many other reasons why you would not want to use Oracle over Postgresql on any given day for any given project, but that is material for yet another article.

The above code implies some other neat features as well. Since the database schema comes from a file, that file could be generated in real time, from FORM data. So not only is the database dynamically created, but the schema can change in real time as well.

Beware of bogus arguments against this model:

Gripe: “But a separate connection is required for each user, for each database. This means thousands of connections, which could cause problems”
Answer: ZZZTTTT! Wrong. Connections which are open but inactive take up very few, if any, database resources. Plus, in a dynamic web design, the connection is being opened and closed immediately after use. So connections never linger, because of the threaded model of a web server. Plus, pg_pool does a good job at managing connections, if for some reason they are left open (usually programming bugs and crashes cause this). Unlike Oracle, a process’ database connections are all closed when the process closes. So bouncing an HTTP server will forcably cause db connection cleanup in Postgresql. In Oracle, connections are stored in the IPC table (Inter process communication table, a system resource), and they must be manually closed/cleared.

Gripe: “The multiple database model is inefficient and will negatively affect performance.”
Answer: BBBBBLLLAAP! Wrong. Testing has revealed that it is no more or less efficient than the monolithic database. This model is used very effectively in production web sites.

I hope this serves as a handy primer, and encourages you to play. Feel free to post questions.
~G~