Working with the Django admin and legacy databases, pt.3

Posted by: barbara | Date: May 01, 2009 | Category: Django    

Part 1 | Part 2

After you've run inspectdb and done all your syncing and basic admin setup, take a peek at the models you generated - depending on the state of your legacy db, you've probably wound up with something that looks like this (column names have been changed to protect the innocent, although I'm not sure that the so-called architects who created these tables deserve anyone's protection):

    class MyUsers(models.Model):
        username = models.CharField(max_length=90, db_column='Username') # Field name made lowercase.
        password = models.CharField(max_length=450, db_column='Password') # Field name made lowercase.
        email = models.CharField(max_length=150, db_column='Email') # Field name made lowercase.
        first_name = models.CharField(max_length=300, db_column='First_Name') # Field name made lowercase.
        last_name = models.CharField(max_length=300, db_column='Last_Name') # Field name made lowercase.
        birthday = models.DateField(db_column='U_Birthday') # Field name made lowercase.
        company = models.CharField(max_length=450, db_column='Company') # Field name made lowercase.
        title = models.CharField(max_length=450, db_column='Title', blank=True) # Field name made lowercase.
        address = models.CharField(max_length=450, db_column='Address') # Field name made lowercase.
        address2 = models.CharField(max_length=450, db_column='Address2') # Field name made lowercase.
        city = models.CharField(max_length=225, db_column='City') # Field name made lowercase.
        state = models.CharField(max_length=225, db_column='State') # Field name made lowercase.
        zip = models.CharField(max_length=75, db_column='Zip') # Field name made lowercase.
        member_number = models.IntegerField(db_column='Member_Number') # Field name made lowercase.
        member_status = models.CharField(max_length=24, db_column='Member_status') # Field name made lowercase.
        display_name = models.CharField(max_length=765)
        active = models.CharField(max_length=30)
        couponcode = models.CharField(max_length=150, db_column='couponCode') # Field name made lowercase.
        last_modified = models.DateTimeField()
        subscribed_on = models.DateTimeField()
        created = models.DateTimeField()
        email_verify = models.CharField(max_length=30, blank=True)
        priority_group = models.CharField(max_length=3, blank=True)
        class Meta:
            db_table = u'My_Users'

For the record, my traversal produced upwards of 80 model classes, all in similar states of disarray - I had a lot of cleanup to do just to get the admin to stop barfing every time I fired it up. Here's a sort of informal checklist of the things you should look over:

Primary Keys

First things first - wherever you have a model with a primary key named "id", just delete it, e.g.:

    class NewsContent(models.Model):
        id = models.IntegerField()
        title = models.TextField(blank=True)
        ...

Django already assumes that your table has an "id" column and defaults to an automatic primary key field - you don't need to define it in the class.

    class NewsContent(models.Model):
        title = models.TextField(blank=True)
        ...

On the other hand, if you have a primary key field that's not named "id", you need to pass "primary_key=True" as a field option:

        member_number = models.IntegerField("user id", primary_key=True, db_column='Member_Number')

What if there's no primary key?

Here's something I hope none of you run into - a few tables without primary keys:

    class NewsPriority(models.Model):
        tags = models.ForeignKey(NewsTags)
        content = models.ForeignKey(NewsContent)
        rank = models.IntegerField()

I need to complete this project with as few db changes as possible - any change to one database means a change across several dozen more.

But in this case I don't have a choice:

    "Each model requires exactly one field to have primary_key=True."

In an ideal world, my primary key field for the model above would be the 'tags' and 'content' concatenated. Last week, I re-opened a discussion about the possibility of creating concatenated primary keys on the Django users group - if you have any ideas, that'd be a great place for feedback.

In the meantime, we're just going to add an "id" column to this table - and add a 'unique_together' to the model's Meta class:

        class Meta:
            db_table = u'news_priority'
            unique_together = (("tags", "content"),)

max_length

Here's something to be on the lookout for: I found that I had to adjust the max_length values on every CharField - they were set to values 3x the length set for the varchar columns in my tables. I don't know if that's a bug in the inspectdb script - I haven't looked - or if the Django developers are just trying to keep us on our toes. Just something to be aware of.

Changing field types

In a few cases, fields that could have been booleans output as other field types:

    # type = SMALLINT, inserting '0' or '1' programmatically
    approved = models.IntegerField(null=True, blank=True)
    active = models.IntegerField(null=True, blank=True)

    # type = enum, values of '0' or '1'
    approved = models.CharField(max_length=3, blank=True)
    active = models.CharField(max_length=3, blank=True)

    approved = models.BooleanField(default=False)
    active = models.BooleanField(default=False)

I also found a few places where the script gave me TextFields when what I really wanted were CharFields:

    title = models.TextField(blank=True)
    title = models.CharField(max_length=255)

The lesson here: Go through your model classes with a fine-toothed comb and make sure they're in line with what's actually in your table definitions.

ForeignKey relationships

ForeignKey (and other) relationships have to be defined - inspectdb won't do that for you. You might need to shuffle the order of your model classes - the model you're creating a relationship with needs to have been defined already.

    class MediaType(models.Model):
        name = models.CharField(max_length=135)
        class Meta:
            db_table = u'media_type'

    class MediaContent(models.Model):
        media_name = models.CharField(max_length=255)
        type = models.ForeignKey(MediaType)
        class Meta:
            db_table = u'media_content'

Stripping "_id" off of some of the field names:

    class NewsContentType(models.Model):
        name = models.CharField(max_length=50)
        class Meta:
            db_table = u'news_content_type'

    class NewsContent(models.Model):
        title = models.CharField(max_length=255)
        source = models.CharField(max_length=255)
        body = models.TextField()
        content_type_id = models.ForeignKey(NewsContentType)

becomes:

        content_type = models.ForeignKey(CmsContentType)

And in some cases you might want to have a field default to a relationship with Django's User model:

    from django.contrib.auth.models import User

from:

        author_id = models.IntegerField(null=True, blank=True)
        approver_id = models.IntegerField(null=True, blank=True)

to:

        author = models.ForeignKey(User, editable=False)
        approver = models.ForeignKey(User, editable=False)

Neatening things up for admin presentation

Add __unicode__() methods and, in some cases, verbose names to the Meta classes:

    class NewsContent(models.Model):
        title = models.CharField(max_length=255)
        source = models.CharField(max_length=255)
        body = models.TextField()

        def __unicode__(self):
            return u'%s | %s' % (self.title, self.source)

        class Meta:
            db_table = u'news_content'
            verbose_name = 'Content'
            verbose_name_plural = 'Content'

Other field options

Verbose field names (optional, but if you use them they have to be the first argument passed in - see the documentation here):

    birthday = models.DateField("date of birth", db_column='Birthday') 

Other db changes

As I mentioned, we're trying to keep db changes to a minimum. At the same time, we are looking at this as an opportunity to tighten things up wherever the impact will be negligible.

We have a lot of tables that include user identifiers - things like author_id and approver_id. We're removing those - maintaining those columns with their existing user id values would have meant a data migration headache (either creating new admin users and reassigning their id's to match old data, or changing the old data to match the new user ids). Besides, transaction records with user_id's are saved to django_admin_log - on the rare occasion that we need to look for the source of a content change, we can find it there.

On the database side, I did find some tables with id columns that were not set as primary keys and didn't autoincrement, so that had to change.

One additional thing, and this is just a personal preference: As I've found changes I know I'll need to make to tables, I've been leaving detailed notes in the docstrings for each model. I'm working on a set of test databases now, but eventually we will need to do some migration work before we move this into production - those notes'll come in handy then.

Next up: the really fiddly stuff

At this point, I've got a lovely demo working on top of just one database. I've created a few test groups and user accounts with specific permissions - the idea is to allow clients limited access to their site content.

But we have more than one client, and the idea is to be able to run a single instance of the admin for all of them. So now we have to solve the multitenancy problem. In a perfect world, we'd have one set of model classes (maybe not even abstracts) and their corresponding admin classes, one master db handling auth and site data for all the rest. We need multi db. Or we need to be able to change db connections at runtime (yeah, I know, no).

It's starting to sound like Django might not be the right solution for a project of this scope, as much as I'd like it to be. But I'm pretty happy with what we've got so far - it'll be useful for other projects going forward.

So I'll let you know how the multitenancy thing goes - and if you have any ideas for how to approach this, by all means, feel free to chime in.

The CharField length problem is what you'd get if there was a UTF-8 field being examined as ASCII data on MySQL (it sets aside 3 bytes for each character, at a maximum, and cannot handle UTF-8 sequences longer than that, which is a slight flaw). Unclear if that's a bug in inspectdb or an issue of character encoding mismatch somewhere, but I'd guess a bug in inspectdb.
Comment by Scot Hacker on May 01, 2009:
I'm going through a very similar conversion project, moving a pile of OSS and custom PHP code+data scattered across dozens of databases and hundreds of tables into a single Django project (intranet and public site for the UC Berkeley J-School). inspectdb has been invaluable so far. Gets you off to a great start, but a lot of post-tweaking necessary. The big hitch is that we're also taking the opportunity to rewrite a lot of models, moving things around, renaming fields, etc. So it's not a clean transfer. My approach has been: 1) Write the new model and get it perfect, testing with dummy data. 2) Use inspectdb + dumpdata/loaddata to bring the old data into the new project in its old form. 3) Now we can use the ORM to talk to both the old and new data. Write a python script that brings in the Django environment and loops through the old data, creating a new record in the new model for each row. Perform field transformations as needed (converting unix timestamps to DateTime fields, re-mapping relational fields that have changed, splitting single models into multiple models, etc.) 4) Drop the dummy data and run the script, using the admin and a db tool to inspect for correctness. When everything is looking good, drop the inspect-generated model from Django, and drop the old data from the db. It's detail-y work that takes more time than expected (like everything), but produces perfect results. Twitter: @shacker
Comment by JJ on May 01, 2009:
In some cases, especially for read-only access, you can use a server-provided (real or synthetic) column instead of modifying tables to add primary key columns. For example, Oracle provides a ROWID column which is unique across all rows in a table. Note that it is not necessarily unique across all rows in all tables in the database, and that the rowid for a particular record may change over time. Within the context of a query or sequence of queries, though, it should be good enough to appease Django's ORM without requiring database changes. For a legacy table with no primary key defined, you can add a field to your model like id=models.CharField(max_length=36, primary_key=True, db_column='ROWID') You'll of course run into problems if you try to save into that model, since you can't save to that column. Overriding the save() method is a good idea in these cases to avoid such problems. If you're willing to do evil, ugly things in the Django db backend, you can even support saving to these models.
Comment by Greg Corey on Jun 19, 2009:
Does anyone know why a seemingly error free inspectdb and models.py generation would create 2 models (out of 35, only needed those 2) where one works as expected and the other returns [] when doing model.objects.all(). The one that doesn't work is the main table and the one that works is its related table. This would be a good one to clear up for future blog readers.
Thanks! I've been trying to introspect Drupal 6. I found that the models do indeed max out the max_length value. Your fix worked. :)