Category Archives: Websites

Database Abstraction in Python

As I was recently working on trying out the Flask web framework for Python, I ended up wanting to access my MySQL database. Recently at work I have been using entity framework and I have gotten quite used to having a good database abstraction that allows programmatic creation of SQL. While such frameworks exist in Python, I thought it would interesting to try writing one. This is one great example of getting carried away with a seemingly simple task.

I aimed for these things:

  • Tables should be represented as objects which each instance of the object representing a row
  • These objects should be able to generate their own insert, select, and update queries
  • Querying the database should be accomplished by logical predicates, not by strings
  • Update queries should be optimized to only update those fields which have changed
  • The database objects should have support for “immutable” fields that are generated by the database

I also wanted to be able to do relations between tables with foreign keys, but I have decided to stop for now on that. I have a structure outlined, but it isn’t necessary enough at this point since all I wanted was a database abstraction for my simple Flask project. I will probably implement it later.

This can be found as a gist here: https://gist.github.com/kcuzner/5246020

Example

Before going into the code, here is an example of what this abstraction can do as it stands. It directly uses the DbObject and DbQuery-inheriting objects which are shown further down in this post.

This example first loads a user using a DbSelectQuery. The user is then modified and the DbObject-level function save() is used to save it. Next, a new user is created and saved using the same function. After saving, the primary key will have been populated and will be printed.

Change Tracking Columns

I started out with columns. I needed columns that track changes and have a mapping to an SQL column name. I came up with the following:

The Column class describes the column and is implemented as a descriptor. Each ColumnSet instance contains multiple columns and holds ColumnInstance objects which hold the individual column per-object properties, such as the value and whether it has been mutated or not. Each column type has a validation function to help screen invalid data from the columns. When a ColumnSet is initiated, it scans itself for columns and at that moment creates its ColumnInstances.

Generation of SQL using logical predicates

The next thing I had to create was the database querying structure. I decided that rather than actually using the ColumnInstance or Column objects, I would use a go-between object that can be assigned a “prefix”. A common thing to do in SQL queries is to rename the tables in the query so that you can reference the same table multiple times or use different tables with the same column names. So, for example if I had a table called posts and I also had a table called users and they both shared a column called ‘last_update’, I could assign a prefix ‘p’ to the post columns and a prefix ‘u’ to the user columns so that the final column name would be ‘p.last_update’ and ‘u.last_update’ for posts and users respectively.

Another thing I wanted to do was avoid the usage of SQL in constructing my queries. This is similar to the way that LINQ works for C#: A predicate is specified and later translated into an SQL query or a series of operations in memory depending on what is going on. So, in Python one of my queries looks like so:

This would print out a tuple (" WHERE x.column_1 = %s AND x.column_2 > %s", ["4", 5]). So, how does this work? I used operator overloading to create DbQueryExpression objects. The code is like so:

The __str__ function and arguments property return recursively generated expressions using the column prefixes (in the case of __str__) and the arguments (in the case of arguments). As can be seen, this supports parameterization of queries. To be honest, this part was the most fun since I was surprised it was so easy to make predicate expressions using a minimum of classes. One thing that I didn’t like, however, was the fact that the boolean and/or operators cannot be overloaded. For that reason I had to use the bitwise operators, so the expressions aren’t entirely correct when being read.

This DbQueryExpression is fed into my DbQuery object which actually does the translation to SQL. In the example above, we saw that I just passed a logical argument into my where function. This actually was a DbQueryExpression since my overloaded operators create DbQueryExpression objects when they are compared. The DbColumnSet object is an dynamically generated object containing the go-between column objects which is created from a DbObject. We will discuss the DbObject a little further down

The DbQuery objects are implemented as follows:

Each of the SELECT, INSERT, UPDATE, and DELETE query types inherits from a base DbQuery which does execution and such. I decided to make the DbQuery object take a PEP 249-style cursor object and execute the query itself. My hope is that this will make this a little more portable since, to my knowledge, I didn’t make the queries have any MySQL-specific constructions.

The different query types each implement a variety of statements corresponding to different parts of an SQL query: where(), limit(), orderby(), select(), from_table(), etc. These each take in either a DbQueryColumn (such as is the case with where(), orderby(), select(), etc) or a string to be appended to the query, such as is the case with limit(). I could easily have made limit take in two integers as well, but I was kind of rushing through because I wanted to see if this would even work. The query is built by creating the query object for the basic query type that is desired and then calling its member functions to add things on to the query.

Executing the queries can cause a callback “filter” function to be called which takes in the query and the cursor as arguments. I use this function to create new objects from the data or to update an object. It could probably be used for more clever things as well, but those two cases were my original intent in creating it. If no filter is specified, then the cursor is returned.

Table and row objects

At the highest level of this hierarchy is the DbObject. The DbObject definition actually represents a table in the database with a name and a single primary key column. Each instance represents a row. DbObjects also implement the methods for selecting records of their type and also updating themselves when they are changed. They inherit change tracking from the ColumnSet and use DbQueries to accomplish their querying goals. The code is as follows:

DbObjects require that the inheriting classes define two properties: dbo_tablename and primary_key. dbo_tablename is just a string giving the name of the table in the database and primary_key is a Column that will be used as the primary key.

To select records from the database, the select() function can be called from the class. This sets up a DbSelectQuery which will return an array of the DbObject that it is called for when the query is executed.

One fallacy of this structure is that at the moment it assumes that the primary key won’t be None if it has been set. In other words, the way I did it right now does not allow for null primary keys. The reason it does this is because it says that if the primary key hasn’t been set, it needs to generate a DbInsertQuery for the object when save() is called instead of a DbUpdateQuery. Both insert and update queries do not include every field. Immutable fields are always excluded and then later selected or inferred from the cursor object.

256Mb doesn’t do what it used to…

So, this last week I got an email from rackspace saying that my server was thrashing the hard drives and lowering performance of everyone else on that same machine. In consequence, they had rebooted my server for me.

I made a few mistakes in the setup of this first iteration of my server: I didn’t restart it after kernel updates, I ran folding@home on it while running nodejs, and I didn’t have backups turned on. I had it running for well over 200 days without a reboot while there had been a dozen or so kernel updates. When they hard rebooted my server, it wouldn’t respond at all to pings, ssh, or otherwise. In fact, it behaved like the firewalls were shut (hanging on the “waiting” step rather than saying “connection refused”). I ended up having to go into their handy rescue mode and copy out all the files. I only copied my www directory and the mysql binary table files, but as you can see, I was able to restore the server from those.

This gave me an excellent opportunity to actually set up my server correctly. I no longer have to be root to edit my website files (yay!), I have virtual hosts set up in a fashion that makes sense and actually works, and overall performance seems to be improved. From now on, I will be doing the updates less frequently and when I do I will be rebooting the machine. That should fix the problem with breaking everything if a hard reboot happens.

I do pay for the hosting for this, 1.5 cents per hour per 256Mb of RAM with extra for bandwidth. I only have 256Mb and since I don’t make any profit off this server whatsoever at the moment, I plan on keeping it that way for now. Considering that back in the day, 256Mb was a ton of memory, it clearly no longer suffices for running too much on my server (httpd + mysql + nodejs + folding@home = crash and burn).

Multiprocessing with the WebSocketServer

After spending way to much time thinking about exactly how to do it, I have got multiprocessing working with the WebSocketServer.

I have learned some interesting things about multiprocessing with Python:

  • Basically no complicated objects can be sent over queues.
  • Since pickle is used to send the objects, the object must also be serializable
  • Methods (i.e. callbacks) cannot be sent either
  • Processes can’t have new queues added after the initial creation since they aren’t picklable, so when the process is created, it has to be given its queues and that’s all the queues its ever going to get

So the new model for operation is as follows:

WebSocketServer Diagram

WebSocketServer Diagram

Sadly, because of the lack of ability to share methods between processes, there is a lot of polling going on here. A lot. The service has to poll its queues to see if any clients or packets have come in, the server has to poll all the client socket queues to see if anything came in from them and all the service queues to see if the services want to send anything to the clients, etc. I guess it was a sacrifice that had to be made to be able to use multiprocessing. The advantage gained now is that services have access to their own interpreter and so they aren’t all forced to share the same CPU. I would imagine this would improve performance with more intensive servers, but the bottleneck will still be with the actual sending and receiving to the clients since every client on the server still has to share the same thread.

So now, the plan to proceed is as follows:

  • Figure out a way to do a pre-forked server so that client polling can be done by more than one process
  • Extend the built in classes a bit to simplify service creation and to make it a bit more intuitive.

As always, the source is available at https://github.com/kcuzner/python-websocket-server

Making a WebSocket server

For the past few weeks I have been experimenting a bit with HTML5 WebSockets. I don’t normally focus only on software when building something, but this has been an interesting project and has allowed me to learn a lot more about the nitty gritty of sockets and such. I have created a github repository for it (it’s my first time using git and I’m loving it) which is here: https://github.com/kcuzner/python-websocket-server

The server I have runs on a port which is considered a dedicated port for WebSocket-based services. The server is written in python and defines a few base classes for implementing a service. The basic structure is as follows:

Super-basic flowchart

Each service has its own thread and inherits from a base class which is a thread plus a queue for accepting new clients. The clients are a socket object returned by socket.accept which are wrapped in a class that allows for communication to the socket via queues. The actual communication to sockets is managed by a separate thread that handles all the encoding and decoding to websocket frames. Since adding a client doesn’t produce much overhead, this structure potentially could be expanded very easily to handle many many clients.

A few things I plan on adding to the server eventually are:

  • Using processes instead of threads for the services. Due to the global interpreter lock, if this is run using CPython (which is what most people use as far as I know and also what comes installed by default on many systems) all the threads will be locked to use the same CPU since the python interpreter can’t be used by more than one thread at once (however, it can for processes). The difficult part of all this is that it is hard to pass actual objects between processes and I have to do some serious re-structuring for the code to work without needing to pass objects (such as sockets) to services.
  • Creating a better structure for web services (currently only two base classes are really available) including a generalized database binding that is thread safe so that a service could potentially split into many threads while not overwhelming the database connection.

Currently, the repository includes a demo chatroom service for which I should have the client side application done soon and uploaded. Currently it supports multiple chatrooms and multiple users, but there is no authentication really and there are a few features I would like to add (such as being able to see who is in the chatroom).

Not so temporary…

Well I have decided to place my actual website on this server. My home server has failed due to router issues and I will probably be redesigning the entire website in the near future anyway since it uses outdated layout techniques. This server is basically the smith-marsden.org website that I work on for my dad’s side of the family. I plan on downloading the contents of my server to here over the weekend so that I can try to clean things up a bit and hopefully get this site back on the map.

I will probably post progress on the new smith-marsden site here since that is currently my main development challenge.

Additional features of the new smith-marsden site:

  • Ability to invite other users given to everyone
  • RSS feeds on letter submissions
  • More information stored so that it is all in one place that is private rather than facebook or somethwere else which could be quite public
  • New state of the art layout techniques enhance the website
  • AJAX allows pages to submit forms without reloading, making page load times faster and looks really cool.
  • The entire thing is made using CakePHP, a robust framework.
  • More to come as I think of them and make them…