Using Django ‘select_related’ to Improve Website Performance

Django Framework Overview shutterstock_134982059

Django is a model-view-controller framework used for web backend development. Of all things you can do with Django in the backend, face recognition, photo-sharing, following/crowd sharing in lists, and even tax filing are great examples of backend heavy applications.

Clearly the web application is at core of most smart phone applications, beyond the glitz and shine, comes engineering a good website, which can provide seamless user experience across different platforms, all the while keeping the user data secure.

Django is a full-stack web-application framework, and allows you to build marketplace web pages, application logic to drive smart phone apps, and desktop applications.

In this blog post you will read about a web application in the context of which we will discuss data model representations and the ‘select_related’ query. You may also want to read our other blog post on learning Django.

Library Poll Web Application

Consider this scenario; your local library has requested you to develop a polling web application to keep track of the most popular authors and books, each month. This is the web application in context of which we will discuss the select_related query caching in Django.

You have the following requirements for application,

  1. Develop a web form with the link emailed to the library members

  2. Form allows them to select upto 3 authors, and 3 books each.

  3. Email URL ensures a one time vote.

  4. At the end of a each month, you can tally this data and publish the information on the library front-page

Finally you have to deliver the source code with unit tests following standard testing practices for web apps.

Setting up Django

If you are somewhat familiar with Django, and have it up and running, feel free to skip this section and the next.

Typically you can setup Django framework for web application development on your Linux machine by typing,

$ sudo apt-get install django

Upon successful installation this should enable commands like,

$ django-admin.py

or importing Django module from the python interpreter,

$ python
>>> import django
>>> django.get_version()
'1.6.1'

and check these commands execute correctly, with version number matching your installation.

You also need to install Apache web server, and MySQL or SQLite as database backend before you may start using Django. Read more from the installation section of the django project.

Django Project

Following the sections of the Django documentation gets your started on various aspects of Django; but I’d highly recommend you get some help!

While we don’t talk about any of the following issues, In this blog post, here are some pointers

Creating Your Application

Create a project and setup a SQLite database connection using the reference above. Then within your project you can setup an application,

Setup an application named ‘library_poll’ by typing

$ django-adming.py startapp library_poll

Django Models

For the library we application, we may create models like the following, described in detail elsewhere, in the library_poll/models.py file of your Django application

Library Application – Django models

from django.db import models

class Poll(models.Model):
    question = models.CharField(max_length=200) #enter 
    pub_date = models.DateTimeField('date published',required=False)

class AuthorChoice(models.Model):
    poll = models.ForeignKey(Poll)
    author_name = models.CharField(max_length=200)
    votes = models.IntegerField(default=0)
class BookChoice(models.Model):
    poll = models.ForeignKey(Poll)
    author = models.ForeignKey(AuthorChoice)
    book_name = models.CharField(max_length=200)
votes = models.IntegerField(default=0)

The logic behind models is to have a book and author models for the respective favorties poll; where you will use the Poll object to ask the question, like,

  1. who is your favorite author ?, or

  2. what is your favorite book ?

and then, since a book depends on author, we have an extra foreign key for the BookChoice model in addition to the Polls.

The design of this foreign key mapping is what is allows us to ask not just the most popular book, but the overall popularity of the author across books. Simply put, it is like asking even if Gone Girl, by Gillian Flynn, was the most popular book of the month, the Hunger Games author Suzanne Collins, and not Flynn, may have been the most popular author. It is because of the aggregate votes on the author across their books could outweigh single-book authors. This shows how author popularity votes donot equal the book popularity votes.

Note that we cannot easily ask the question what books the author has written, because we don’t have a foreign-key relationship of Books within Authors and it goes only one way – i.e. asking who are the authors of this book.

Object Relational Mapper (ORM)

Python automatically creates database operation wrappers to insert, query and delete objects and table for the models you have created in the previous section. You can run the following commands to generate the object-relational mapping,

Generate ORM mapping

$ python manage.py syncdb

And then log into the Django-Python shell by typing, and populating your author and book database. 
Generate ORM mapping


$ python manage.py shell

>>> from library_poll.models import Poll, AuthorChoice, BookChoice #ORM
>>> author_poll = Poll(question='who is your favorite author?')
>>> book_poll = Poll(question='what is your favorite book?')
>>> author_poll.save() #demo the ORM capability with database
>>> book_poll.save()
>>> suze = AuthorChoice(poll=author_poll,author_name='Suze Collins')
>>> suze.save()
>>>
>>>hunger_games = BookChoice(poll=book_poll,book_name='The Hunger Games 1',author=suze)
>>> hunger_games.save()

Note the comments on the snippets, demonstrating the use of ORM.

Accessing Foreign-Keys

Once you have setup the models, you can access the SQLite database from the django shell and query the models, via the QuerySet Django API inbuilt within each model.

However to retrieve for example the author who wrote the book, hunger games you will run the query from Python,

Extract foreign-keys / traditional way


>>> book = BookChoice.objects.filter( book_name__icontains='The Hunger Games')
>>> book.book_name
'The Hunger Games 1'
>>> book.author.author_name #Django makes another internal Database query
'Suze Collins'

Clearly performance concerns of websites, including the very recently the Obamacare website suffer due to repeated database hits, and this is not a desirable thing if you want to scale your web-application or service.

Accessing Foreign-Keys With ‘select_related’

Here comes the Django ‘select_related’ queryset API when used as shown below, it retrieves the maximum possible foreign-key dependencies of a matching database record. Extract the foreign-keys using select_related as following snippet illustrates,

 >>> book=BookChoice.objects.select_related().filter( book_name__icontains='The Hunger Games')
 >>> book.book_name #query pulls in all the information including foreign-keys
 'The Hunger Games 1'
 >>> book.author.author_name #Django does not make extra queries here.

Clearly having a select_related is useful when you are doing broad analysis of data, and require access to the whole record and its relations. While this comes at a slightly longer time to finish the queries, subsequent processing times are reduced.

Django documentation has more to say on using query sets in the select_related API.

Summary

Django query sets support the ‘select_related’ API, which is a caching behavior is useful to conserve database access once per query and not for each foreign-key present in the matching records.

Caching frameworks like Voldemort, and transactional databases like Mongo-DB can vastly improve the performance of web pages; Django’s own optimizing queries start at ‘select_related’. Learn more about web programming with Django, and optimizing backends starting with the popular course, Python+Django for Beginners.

Learn More Django

  1. The Django framework is hosted at http://djangoproject.com with current (latest) and past software releases available for download under the MIT license.

  2. Django API reference on QuerySet and select_related() can be found at the documentation website. 

  3. Various 3rd party plugins for from Django from creating models, to optimizing web content delivery can be found in the django plugin project.

  4. Good references on Django include the classic ‘Definitive Guide to Django,’ by Adrian Holovaty.

  5. Planning on hosting a medium to large website? You will want to know all about web programming in Python.

  6. Already an expert? Don’t despair, we have some selections for you 

  7. Did you know, Django project was named after, Django Reinhardt? A great Jazz musician in the mid-late 60s.