Django, Python, web development and things (un)related

Django performance testing – a real world example

28 April 2008

About a week ago Andrew and I launched a new Django-powered site called Hey! Wall. It’s a social site along the lines of “the wall” on social networks and gives groups of friends a place to leave messages, share photos, videos and links.

We wanted to gauge performance and try some server config and code changes to see what steps we could take to improve it. We tested using httperf and doubled performance by making some optimisations.

Server and Client

The server is a Xen VPS from Slicehost with 256MB RAM running Debian Etch. It is located in the US Midwest.

For testing, the client is a Xen VPS from Xtraordinary Hosting, located in the UK. Our normal Internet access is via ADSL which makes it difficult to make enough requests to the server. Using a well-connected VPS as the client means we can really hammer the server.

Server spec caveats

It’s hard to say exactly what the server specs are. The VPS has 256MB RAM and is hosted with similar VPSes, probably on a quad core server with 16GB RAM. That’s a maximum of 64 VPSes on the physical server, assuming it is full of 256MB slices. If the four processors are 2.4GHz, that’s 9.6GHz total, divided by 64 gives a minimum of 150MHz of CPU.

On a Xen VPS, you get a fixed allocation of memory and CPU without contention, but usually any available CPU on the machine can be used. If other VPSes on the same box are idle, your VPS can make use of more of the CPU. This probably means more CPU was used during testing and perhaps more for some tests than for others.

Measuring performance with httperf

There are various web performance testing tools around including ab (from Apache), Flood and httperf. We went with httperf for no particular reason.

An httperf command looks something like:

httperf --hog --server=example.com --uri=/ --timeout=10 --num-conns=200 --rate=5

In this example, we’re requesting http://example.com/ 200 times, with up to 5 requests per second.

Testing Plan

Some tools support sessions and try to emulate users performing tasks on your site. We went with a simple brute-force test to get an idea of how many requests per second the site could handle.

The basic approach is to make a number of requests and see how the server responds: a status 200 is good, a status 500 is bad. Increase the rate (the number of requests made per second) and try again. When it starts returning lots of 500s, you’ve reached a limit.

Monitoring server resources

The other side is knowing what the server is doing in terms of memory and CPU use. To track this, we run top and log the output to a file for later review. The top command is something like:

top -b -d 3 -U www-data > top.txt

In this example we’re logging information on processes running as user www-data every three seconds. If you want to be more specific, instead of -U username you can use -p 1, 2, 3 where 1, 2 and 3 are pids (process ids of processes you want to watch).

The web server is Lighttpd with Python 2.5 running as FastCGI processes. We didn’t log information on the database process (PostgreSQL), though that could be useful.

Another useful tool is vmstat, particularly the swap columns which show how much memory is being swapped. Swapping means you don’t have enough memory and is a performance killer. To repeatedly run vmstat, specify the number of seconds between checks. e.g.

vmstat 2

Authenticated requests with httperf

httperf makes simple GET requests to a URL and downloads the html (but not any of the media). Requesting public/anonymous pages is easy, but what if you want a page that requires login?

httperf can pass request headers. Django authentication (from django.contrib.auth) uses sessions which rely on a session id held in a cookie on the client. The client passes the cookie in a request header. You see where this is going.

Log in to the site and check your cookies. There should be one like sessionid=97d674a05b2614e98411553b28f909de. To pass this cookie using httperf, use the --add-header option. e.g.

httperf ... --add-header='Cookie: sessionid=97d674a05b2614e98411553b28f909de\n'

Note the \n after the header. If you miss it, you will probably get timeouts for every request.

Which pages to test

With this in mind we tested two pages on the site:

home: anonymous request to the home page
wall: authenticated request to a “wall” which contains content retrieved from the database

Practically static versus highly dynamic

The home page is essentially static for anonymous users and just renders a template without needing any data from the database.

The wall page is very dynamic, with the main data retrieved from the database. The template is rendered specifically for the user with dates set to the user’s timezone, “remove” links on certain items, etc. The particular wall we tested has about 50 items on it and before optimisation made about 80 database queries.

For the first test we had two FastCGI backends running, able to accept requests for Django.

Home: 175 req/s (i.e. requests per second).
Wall: 8 req/s.

Compressed content

The first config optimisation was to enable gzip compression of the output using GZipMiddleware. Performance improved slightly, but not a huge difference. Worth doing for the bandwidth savings in any case.

Home: 200 req/s.
Wall: 8 req/s.

More processes, shorter queues

Next we increased the number of FastCGI backends from two to five. This was an improvement with fewer 500 responses as more of the requests could be handled by the extra backends.

Home: 200 req/s.
Wall: 11 req/s.

Mo processes, mo problems

The increase from two to five was good, so we tried increasing FastCGI backends to ten. Performance decreased significantly.

Checking with vmstat on the server, I could see it was swapping. Too many processes, each using memory for Python, had caused the VPS to run out of memory and swap memory to and from disk.

Home: 150 req/s.
Wall: 7 req/s.

At this point we set the FastCGI backends back down to five for further tests.

Profiling – where does the time go

The wall page had disappointing performance, so we started to optimise. The first thing we did was profile the code to see where time was being spent.

Using some simple profiling middleware it was clear the time was being spent in database queries. The wall page had a lot of queries and they increased linearly with the number of items on the wall. On the test wall this caused around 80 queries. No wonder its performance was poor.

Optimise this

By optimising how media attached to items is handled we were able to drop one query per item straight away. This slightly reduced how long the request took and so increased the number of queries handled per second.

Wall: 12 req/s.

Another inefficiency was the way several filters were applied to the content of each item whenever the page was requested. We changed it so the html output from the filtered content was stored in the item, saving some processing each time the page was viewed. This gave another small increase.

Wall: 13 req/s.

Back to reducing database queries, we were able to eliminate one query per item by changing how user profiles were retrieved (used to show who posted the item to the wall). Another worthwhile increase came from this change.

Wall: 15 req/s.

The final optimisation for this round of testing was to further reduce the queries needed to retrieve media attached to items. Again, we shed some queries and slightly increased performance.

Wall: 17 req/s.

Next step: caching

Having reduced queries as much as we can, the next step would be to do some caching. Retrieving cached data is usually much quicker than hitting the database, so we’d expect a good increase in performance.

Caching the output of complete pages is not useful because each page is heavily personalised to the user requesting it. It would only be a cache hit if the user requested the same page twice with nothing changing on it in the meantime.

Caching data such as lists of walls, items and users is more useful. The cached data could be used for multiple requests from a single user and shared to some degree across walls and different users. It’s not necessarily a huge win because each wall is likely to have a very small number of users, so the data would need to stay in cache long enough to be retrieved by others.

Our simplistic httperf tests would be very misleading in this case. Each request is made as the same user so cache hits would be practically 100% and performance would be great! This does not reflect real-world use of the site, so we’d need some better tests.

We haven’t made use of caching yet as the site can easily handle its current level of activity, but if Hey! Wall becomes popular, it will be our next step.

How many users is 17 req/s?

Serving 17 req/s still seems fairly low, but it would be interesting to know how this translates to actual users of the site. Obviously, this figure doesn’t include serving any media such as images, CSS and JavaScript files. Media files are relatively large but should be served fast as they are handled directly by Lighttpd (not Django) and have Expires headers to allow the client to cache them. Still, it’s some work the server would be doing in addition to what we measured with our tests.

It’s too early to tell what the common usage pattern would be, so I can only speculate. Allow me to do that!

I’ll assume the average user has access to three walls and checks each of them in turn, pausing for 10 or 20 seconds on each to read new comments and perhaps view some photos or open links. The user does this three times per day.

Looking specifically at the wall page and ignoring media, that means our user is making 9 requests per day for wall pages. Each user only makes one request at a time, so 17 users can be doing that at any second in time. Within a minute the user only makes three requests so is only counted within the 17 concurrent users for 3 seconds out of 60 (or 1 in 20).

If the distribution of user requests over time was perfectly balanced (hint: it won’t be), that means 340 users (17 * 20) could be using the site each minute. To continue with this unrealistic example, we could say there are 1440 minutes in a day and each user is on the site for three minutes per day, so the site could handle about 163,000 users. That would be very good for a $20/month VPS!

To reign in those numbers a bit, lets say we handle 200 concurrent users in a minute for 6 hours per day, 100 concurrent users for another 6 hours and 10 concurrent users for the remaining 12 hours. That’s still around 115,000 users the site could handle in a day given the maximum load of 17 requests per second.

I’m sure these numbers are somewhere between unrealistic and absurd. I’d be interested in comments on better ways to estimate or any real-world figures.

What we learned

To summarise:

Testing the performance of your website may yield surprising results
Having many database queries is bad for performance (duh)
Caching works better for some types of site than others
An inexpensive VPS may handle a lot more users than you’d think

Filed under: Django — Scott @ 2:58 pm

Comments (12)

Serving websites from svn checkout considered harmful

22 April 2008

Serving from a working copy

A simple way to update sites is to serve them from Subversion working copies. Checkout the code on the server, develop and commit changes, then svn update the server when you’re ready to release.

Security concerns

There’s a potential security problem with this. Subversion keeps track of meta-data and original versions of files by storing them in .svn directories in the working copy. If your web server allows requests that include these .svn directories, anything within them could be served to whoever requests it.

Requests would look like:

http://example.com/stuff/.svn/entries
http://example.com/stuff/.svn/text-base/page.php.svn-base
http://example.com/stuff/.svn/text-base/settings.py.svn-base

The first one would reveal some meta-data about your project, such as file paths, repository urls and usernames.

The second one may be interpreted as a PHP script, in which case there’s little risk. Or it may return the PHP source file, which is a much bigger risk.

The third one (assumed to be a Django project) should never happen. The request can only be for files within the web server’s document root. Code itself doesn’t need to be there, only media files do.

Alternatives

Instead of serving sites from a working copy, you can use svn export to get a “clean” copy of the site which does not include .svn directories. If you svn export from the repository, you must export the complete site, rather than just update the changed files, which could be a lot more data.

However, you can svn export from a working copy on the server. It’s still a complete export, but you don’t have to trouble the repository, so it’s typically much quicker.

An alternative is to update a working copy which is stored on the server, but not in the web document root, then use rsync or some file copying to update the “clean” copy in the web document root. In this case, only changed files are affected.

Protection through web server config

If you do serve from working copies, you should configure the web server to block all requests which include .svn in the url. Here’s how to do it for some popular web servers:

Apache

<LocationMatch ".*\.svn.*">
    Order allow,deny
    Deny from all
</LocationMatch>

Lighttpd

$HTTP["url"] =~ ".*\.svn.*" {
  url.access-deny = ("")
}

Nginx

Using the location directive which must appear in the context of server.

server {

    location ~ \.svn { deny all; }

...

}

Filed under: Hosting,Security — Scott @ 9:48 pm

Comments (7)

ImageField and edit_inline revisited

24 February 2008

A while back I wrote about using edit inline with image and file fields. Specifically, I suggested adding an uneditable BooleanField as the core field of the related model. This means you don’t have to set the ImageField or FileField to be core (which would cause confusing behaviour).

Removing the related model

The downside to having an uneditable core field is that you can’t remove the related model instance using admin. At the time, I wasn’t trouble by this so I just left it. In a recent project I needed to associate photos with articles, use edit_inline for the photos and be able to remove them. So here’s an extended workaround.

As well as the uneditable BooleanField (“keep”) which keeps the ArticlePhoto from being deleted, we now have a “remove” BooleanField which the user can tick in admin to cause the ArticlePhoto to be deleted. The check for this is in the save() method.

class ArticlePhoto(models.Model):
    article = models.ForeignKey(Article, related_name='photos', edit_inline=models.TABULAR, min_num_in_admin=5)
    keep = models.BooleanField(core=True, default=True, editable=False)
    remove = models.BooleanField(default=False)
    image = CustomImageField()

    def save(self):
        if not self.id and not self.image:
            return
        if self.remove:
            self.delete()
        else:
            super(ArticlePhoto, self).save()

It’s a pretty easy way to work around the problem and gives a sensible looking “remove” checkbox in the admin interface. The database table will have a “remove” column that never gets used, but it’s a pretty small price to pay.

Filed under: Django — Scott @ 9:12 pm

Comments (2)

getSelection() returns empty in Google Mail

21 February 2008

Getting selected text in a Firefox extension

I’m developing a Firefox extension for a client which does something with the currently selected text in the browser window.

The standard way to get the selection is with window.content.getSelection().

Selection is empty in Gmail

Some users reported that selected text in Gmail messages wasn’t being found by the extension. I suspect the issue is with content added using JavaScript, but I haven’t investigated.

An alternative way to getSelection

The standard Search Google for “whatever” contextual menu item does work in Gmail, so obviously it gets the selection another way.

I found a function getBrowserSelection() in the browser.js file in Firefox’s chrome. It is used by Firefox for the contextual menu search.

This is how it gets the selection:

var focusedWindow = document.commandDispatcher.focusedWindow;
var selection = focusedWindow.getSelection();

I don’t know what the difference is, but I am now using this code in my Firefox extension and it is working well.

Filed under: Uncategorized — Scott @ 12:23 pm

Comments (8)

Which web hosting company is best

30 January 2008

Choosing the best web host

Sometimes friends and clients ask me to recommend a web hosting company. For the past couple of years I’ve done my own hosting on a VPS, so I don’t spend much time with shared web hosting accounts. But there are a few I’ve used or heard good things about, so here’s what I normally recommend.

Big web hosting companies

Some of the big boys are:

Hosting Facebook apps

I recently had a bad experience with DreamHost. I developed a Facebook app for a client and suggested DreamHost because I’d heard they were pretty good. The server was slow in responding at times which caused Facebook to show an error.

Facebook gives app servers about 10 seconds to respond and if they don’t, it tells the user there’s a problem. That seems fair enough; I like my websites to respond in about 1 second. But whereas a determined user can wait for a slow website to respond, they don’t get the option of waiting for a slow Facebook app. For Facebook apps, responsiveness counts.

The Facebook app I developed is now on a VPS and is much more responsive.

Overselling

Most web hosting companies oversell resources. This means they give customers lots of disk space and bandwidth on the assumption most won’t use anywhere near the amount. If everyone actually used that amount, there’s no way the host could deliver.

The poster child for overselling is probably Dreamhost. Currently offering 500 GB of disk space and 5 TB of monthly bandwidth for a fistful of dollars.

Overselling is part of the business and not much can be inferred from the numbers. Dreamhost is probably no better or worse than another host that offers more or less disk space and bandwidth. There’s just no guarantee of what you are getting. Your account is lumped in with hundreds of others and the performance you get depends what these neighbours are doing.

Smaller webhosts try harder

I would consider a smaller hosting company like:

I’ve heard good things about A Small Orange.

WebFaction has support for Rails and Django apps and generally seems a bit more savvy and flexible that the big boys.

VPS hosting for speed and flexibility

An alternative is to have your own VPS (Virtual Private Server). You have full control and usually very good performance, but need more geek skills.

Having a VPS is just like having your own dedicated server, but instead of your own machine, there are several virtual machines running on one physical server. Split the resources and split the cost.

Some VPS packages come with a control panel such as Webmin or CPanel. So if you know what you’re doing, but are not geek enough to do everything on the command line, a VPS may still be an option for you.

Dedicated resources with Xen

There are different virtualisation packages that allow hosting companies to split a physical server in to multiple virtual machines. Two of the big ones are Xen and Virtuozzo.

I recommend going for a Xen VPS. With Xen, a fixed amount of memory is assigned to each virtual machine. It’s not possible for the hosting company to oversell resources. This effectively limits how many VPSes can be run one one physical server which gives you a much better idea of the resources dedicated to you.

Here are some good VPS hosts:

I’ve had a UK based VPS from Xtraordinary Hosting for about 18 months and I’ve been delighted with it. Rock solid servers, very good performance and responsive and helpful technical staff. I highly recommend them if you need a VPS in the UK.

RimuHosting offers Xen VPSes mainly hosted in the US, but with an option to host in the UK or Australia. When Dreamhost wasn’t delivering the goods for a Facebook app, I moved it to a VPS on Rimuhosting. It too has been fast and reliable.

Slicehost has great prices and has been generating a lot of positive buzz. The servers are hosted in the US, so if you’re looking for a US based Xen VPS, consider Slicehost.

When good hosts go bad

Sometimes good web hosting companies start to suck. If that happens to your web host, you might need to jump ship. The gold rule is to always register your domain names yourself using a domain registrar and not get them as part of your web hosting package. That way, moving to another host is just a matter of re-pointing your domains.

Read the Reviews

It’s worth reading some reviews on WhoIsHostingThis.com to see what other customers say about a hosting company. But choose your reviews site carefully as some are full of shill reviews or are even operated by the hosting companies themselves!

Filed under: Hosting — Scott @ 11:30 pm

Comments (10)

SysLogHandler not writing to syslog with Python logging

1 January 2008

Logging to syslog in Python

I was trying to use the standard Python logging module to write messages to syslog. The logging module has a SysLogHandler class which can log to a local or remote syslog daemon.

With no host specified, SysLogHandler uses localhost which is what I wanted. I tried to use SysLogHandler, but it just wouldn’t work. There was no error when I called the logging methods, but my messages didn’t show up in /var/log/syslog.

syslog module works

Python also has a standard syslog module. I tried it and it worked fine; my messages were written to the syslog file.

For example:

import syslog
syslog.syslog('test')

syslogd isn’t listening

After running Wireshark I found the SysLogHandler was correctly sending a UDP packet to localhost on port 514. I could also see there was an ICMP response indicating the UDP packet was not received on that port. syslog wasn’t listening!

Use /dev/log

Instead of sending to localhost, I wanted SysLogHandler to pass the message to syslog on the local machine in the same way the syslog Python module was doing.

The solution is to pass /dev/log as the address parameter to SysLogHandler. It’s not well documented, but it works.

For example:

import logging
from logging.handlers import SysLogHandler

logger = logging.getLogger()
logger.setLevel(logging.INFO)
syslog = SysLogHandler(address='/dev/log')
formatter = logging.Formatter('%(name)s: %(levelname)s %(message)s')
syslog.setFormatter(formatter)
logger.addHandler(syslog)

Easy when you know how.

Filed under: Python — Scott @ 11:21 pm

Comments (6)

Case-insensitive ordering with Django and PostgreSQL

20 November 2007

When the Django Gigs site first went live we noticed the ordering of developers by name was not right. Those starting with an uppercase letter were coming before those starting with a lowercase letter.

PostgreSQL and the locale

PostgreSQL has a locale setting which is configured when the cluster is created. Among other things, this affects the ordering of results when you use the SQL order by clause.

The local on my server was set to “C” which means it uses byte-level comparisons, rather than following more complex rules for a given culture. Although this is apparently good for performance, it means order by will be case sensitive – e.g. “Zebra” comes before “apple”.

Depending on how your system is set up, you may have locales such as en_GB. The locale can’t easily be changed in PostgreSQL because indexes and other data depends on it. To change locale, you need to start a new cluster and move databases to it.

Django and case-sensitivity

Django provides the order_by() function on QuerySets, but does not have an option for case insensitive ordering. Instead this is left to your database configuration.

When using SQL directly, you can sort case-insensitively using the PostgreSQL lower() function.

e.g.

select * from developer order by lower(name)

One way to do this in Django is to use extra to call the lower() function, creating a virtual column which you can then order by.

e.g.

Developer.objects.all().extra(
select={'lower_name': 'lower(name)'}).order_by('lower_name')

Using SQL functions could tie you to a particular database, though in this case the lower() function is standard and should work with most databases. Some other databases do case-insensitive comparisons so wouldn’t need it.

Filed under: Django — Scott @ 8:38 pm

Comments (9)

Django developers: We are the world

29 September 2007

An informal survey of the Django community

This week, Andrew and I launched the Django Gigs website to help employers find Django developers. Andrew wrote about it and thanks to the Django Community feed aggregator we had quite a few visitors in the first couple of days.

It’s clear that Django is catching on and growing in popularity. The djangoproject.com site is getting close to 8 million hits each month. I thought it would be interesting to analyse my logs and see what I could tell about the Django community, or at least the section of it that read the blog and visited the Django Gigs website.

Visitors

1280 unique IP addresses

The number of IP addresses seems a pretty good indication of how many unique visitors we had in about two days.

Platforms

510	Windows
373	Mac OS X (including 4 iPhones)
312	Linux
85	Other (mostly bots, feed aggregator sites, a handful of BSD)

The platforms is a pretty even split among Windows, Mac and Linux. Which given the dominance of Windows on the desktop suggests Django is disproportionately popular with Mac OS X and Linux users. I suspect this is the case with Python in general, but I don’t have any stats to back that up.

Browsers

875	Firefox
148	Safari
40	IE
36	Camino
13	Konqueror
168	Other (mostly bots or feed readers like NetNewsWire)

No big surprise here: Firefox is the daddy.

One thing that surprised me was the number of different user agents. There were 408 unique user agent strings! Of course, most of them were from different versions of the same software. IE on Windows likes to report versions of the .NET framework and various browser extension installed on the machine.

Countries

423	United States
133	France
126	United Kingdom
67	Germany
61	Canada
45	Russian Federation
42	Brazil
34	Australia
33	Netherlands
23	Italy
16	Belgium
16	China
16	Spain
15	Poland
15	Sweden
14	India
13	Norway
13	Singapore
13	Switzerland
13	Austria
13	Japan
9	Ireland
8	Ukraine
8	New Zealand
7	Finland
7	Portugal
6	Czech Republic
5	Saudi Arabia
5	Iceland

Honourable mentions (1-5 visitors): Slovenia, Denmark, Romania, Greece, Republic of Korea, Serbia and Montenegro, Indonesia, Hong Kong, Philippines, Israel, Croatia, Estonia, Colombia, Peru, Slovakia, Thailand, Turkey, Malaysia, Chile, Puerto Rico, Latvia, Hungary, Belarus, Mexico, Kenya, Kuwait, Nigeria, Lithuania, Argentina, Bolivia, Europe, Iran, Islamic Republic of, Dominican Republic, Moldova, Republic of, Bulgaria, Jamaica, Egypt, United Arab Emirates, Kazakhstan.

I used the free version of GeoIP from MaxMind to look up countries from IP addresses. It’s not totally accurate, but good enough.

It’s very easy to use from Python, assuming you have the library installed:

import GeoIP
geo = GeoIP.new(GeoIP.GEOIP_MEMORY_CACHE)
print geo.country_name_by_addr('4.4.4.4')

It’s not surprising that North America and Western Europe are well represented, but Russia, Brazil and Australia seem to have a good Django following also.

We are the world

Obviously this is just a sample of the Django community and may not be representative, but it does given an indication that Django developers are spread across the world and across the major platforms. That can only be a good thing for the continued growth and success of the framework.

Filed under: Django — Scott @ 1:03 pm

Comments (3)

VMWare on Ubuntu Linux with bridged network to XP

23 August 2007

XP on Linux

I run Ubuntu on my main machine, but needed Windows for a project for one of my clients. I installed the free VMWare Server from the Ubuntu commercial repository and installed Windows XP Pro on a virtual machine.

VMWare networking modes

There are three different networking modes in VMWare to give the virtual machine network access:

Host-only: A private network between the host and VM. The VM can’t be accessed by other machines on the network.
NAT: The VM shares the IP address of the host.
Bridged: The VM has its own IP address and can be accessed by the host and other machines on the network as if it was a separate box.

I wanted to keep my virtual Windows box as isolated as possible for security reasons – Windows boxes get compromised so easily. I used bridged networking to give the virtual machine its own IP address and blocked outgoing Internet access for that machine on my router firewall.

Networking problems with Samba/SMB

I wanted to share files between the Linux host and the Windows virtual machine. I used Samba on Linux to share some directories then tried to connect to them from the Windows VM. It couldn’t connect and just timed out without a helpful error message.

After messing with Samba for a while and reading the VMWare Samba docs I was no further forward. I tried using IE on Windows to connect to the web server on my Linux box. No dice. It timed out as well.

It’s the network card settings

I read some discussion in the VMWare forums about similar problems using bridged networking, but working fine with NAT.

This led to the answer on Launchpad – the problem was the network card. Apparently some network cards optimise by discarding packets they have already seen. Because the networking is effectively between two machines on the same network card, some of the data was getting lost.

The solution is to disable these settings on the Ethernet card using:

ethtool -K eth0 sg off rx off tx off

ethtool -K eth0 sg off rx off tx off tso off

depending on the settings supported by your network card.

I ran this command and it worked immediately. Note that when you reboot you will need to issue this command again. You could add it to /etc/rc.local or similar to have it issued automatically.

Filed under: Linux,VMWare — Scott @ 1:18 pm

Comments (10)

Edit inline with ImageField or FileField in Django admin

22 August 2007

Django admin lets you edit related model objects “inline”. For example when editing a Recipe you can add/eding a group of Ingredient models.

Core fields for edit_inline

The related model being edited inline must specify one or more “core” fields using core=True. If the core fields are filled in, the related model is added. If the core fields are empty, the related model is removed.

This works great for normal objects with CharFields, etc, but not so well if you want to have images or files uploaded using inline editing. If the only core field is a FileField or ImageField, you’ll get strange behaviour like the file/image being removed when you edit an existing model in the admin.

Using inline editing with ImageField or FileField

In a recent project I wanted to have an item with title and description and zero or more photos. The Photo model just has an ImageField. To make it easy to edit, I wanted the photos set to edit_inline.

Here’s my first attempt:

class Item(models.Model):
    title = models.CharField(max_length=100)
    description = models.TextField()

    class Admin:
        pass

class Photo(models.Model):
    item = models.ForeignKey(Item, related_name='photos', edit_inline=models.STACKED)
    image = models.ImageField(blank=False, upload_to='items', core=True)

Notice that the ImageField in Photo has core=True to make it a core field.

This worked ok in the Django admin interface for adding a Photo to a new Item, but if I edited that Item, the Photo would be deleted.

This is a known issue (see Ticket #2534), but it’s marked as “pending design decision” and may be ignored for now since the Django admin is being rewritten to use newforms.

In the meantime I needed a workaround.

Workaround using a different core field

Instead of having the ImageField as a core field, we need something else. If you’ve got some other natural data, such as a caption, that would work fine.

In my case, I didn’t want to add any other fields to the interface, so I went with a BooleanField that is not editable.

Here’s the revised Photo model:

class Photo(models.Model):
    item = models.ForeignKey(Item, related_name='photos', edit_inline=models.STACKED)
    image = models.ImageField(blank=False, upload_to='items')
    keep = models.BooleanField(default=True, editable=False, core=True)

    def save(self):
        # Don't save if there is no image (since core field is always set).
        if not self.id and not self.image:
            return
        super(Photo, self).save()

The keep field has been added and set to be core instead of the image field. Since there is a core field and it’s not empty, the main Item model can be edited without the Photo models being deleted.

Don’t save the empties

The core field always has a value which means the Photo model is told to save even when its ImageField is empty. To prevent creating these empty objects, the Photo model overrides save() and checks if an image was uploaded to the ImageField. If not, it returns without saving.

The remaining issue is that you can’t delete a Photo using the Django admin interface. You can replace the image, but will need some other method for deleting. For me this isn’t a big problem, so the workaround solved the problem for now.

Note that this is just a workaround to the problem which hopefully will be fixed in Django at some point in the future. Ideally, the admin interface would properly handle having an ImageField or FileField as the only core field of the related model and optionally put a “remove” checkbox in the UI to allow removing the image/file.

Filed under: Django — Scott @ 3:08 pm

Comments (3)

« Previous Page — Next Page »