Django performance testing – a real world example

28 April 2008

About a week ago Andrew and I launched a new Django-powered site called Hey! Wall. It’s a social site along the lines of “the wall” on social networks and gives groups of friends a place to leave messages, share photos, videos and links.

We wanted to gauge performance and try some server config and code changes to see what steps we could take to improve it. We tested using httperf and doubled performance by making some optimisations.

Server and Client

The server is a Xen VPS from Slicehost with 256MB RAM running Debian Etch. It is located in the US Midwest.

For testing, the client is a Xen VPS from Xtraordinary Hosting, located in the UK. Our normal Internet access is via ADSL which makes it difficult to make enough requests to the server. Using a well-connected VPS as the client means we can really hammer the server.

Server spec caveats

It’s hard to say exactly what the server specs are. The VPS has 256MB RAM and is hosted with similar VPSes, probably on a quad core server with 16GB RAM. That’s a maximum of 64 VPSes on the physical server, assuming it is full of 256MB slices. If the four processors are 2.4GHz, that’s 9.6GHz total, divided by 64 gives a minimum of 150MHz of CPU.

On a Xen VPS, you get a fixed allocation of memory and CPU without contention, but usually any available CPU on the machine can be used. If other VPSes on the same box are idle, your VPS can make use of more of the CPU. This probably means more CPU was used during testing and perhaps more for some tests than for others.

Measuring performance with httperf

There are various web performance testing tools around including ab (from Apache), Flood and httperf. We went with httperf for no particular reason.

An httperf command looks something like:

httperf --hog --server=example.com --uri=/ --timeout=10 --num-conns=200 --rate=5

In this example, we’re requesting http://example.com/ 200 times, with up to 5 requests per second.

Testing Plan

Some tools support sessions and try to emulate users performing tasks on your site. We went with a simple brute-force test to get an idea of how many requests per second the site could handle.

The basic approach is to make a number of requests and see how the server responds: a status 200 is good, a status 500 is bad. Increase the rate (the number of requests made per second) and try again. When it starts returning lots of 500s, you’ve reached a limit.

Monitoring server resources

The other side is knowing what the server is doing in terms of memory and CPU use. To track this, we run top and log the output to a file for later review. The top command is something like:

top -b -d 3 -U www-data > top.txt

In this example we’re logging information on processes running as user www-data every three seconds. If you want to be more specific, instead of -U username you can use -p 1, 2, 3 where 1, 2 and 3 are pids (process ids of processes you want to watch).

The web server is Lighttpd with Python 2.5 running as FastCGI processes. We didn’t log information on the database process (PostgreSQL), though that could be useful.

Another useful tool is vmstat, particularly the swap columns which show how much memory is being swapped. Swapping means you don’t have enough memory and is a performance killer. To repeatedly run vmstat, specify the number of seconds between checks. e.g.

vmstat 2

Authenticated requests with httperf

httperf makes simple GET requests to a URL and downloads the html (but not any of the media). Requesting public/anonymous pages is easy, but what if you want a page that requires login?

httperf can pass request headers. Django authentication (from django.contrib.auth) uses sessions which rely on a session id held in a cookie on the client. The client passes the cookie in a request header. You see where this is going.

Log in to the site and check your cookies. There should be one like sessionid=97d674a05b2614e98411553b28f909de. To pass this cookie using httperf, use the --add-header option. e.g.

httperf ... --add-header='Cookie: sessionid=97d674a05b2614e98411553b28f909de\n'

Note the \n after the header. If you miss it, you will probably get timeouts for every request.

Which pages to test

With this in mind we tested two pages on the site:

home: anonymous request to the home page
wall: authenticated request to a “wall” which contains content retrieved from the database

Practically static versus highly dynamic

The home page is essentially static for anonymous users and just renders a template without needing any data from the database.

The wall page is very dynamic, with the main data retrieved from the database. The template is rendered specifically for the user with dates set to the user’s timezone, “remove” links on certain items, etc. The particular wall we tested has about 50 items on it and before optimisation made about 80 database queries.

For the first test we had two FastCGI backends running, able to accept requests for Django.

Home: 175 req/s (i.e. requests per second).
Wall: 8 req/s.

Compressed content

The first config optimisation was to enable gzip compression of the output using GZipMiddleware. Performance improved slightly, but not a huge difference. Worth doing for the bandwidth savings in any case.

Home: 200 req/s.
Wall: 8 req/s.

More processes, shorter queues

Next we increased the number of FastCGI backends from two to five. This was an improvement with fewer 500 responses as more of the requests could be handled by the extra backends.

Home: 200 req/s.
Wall: 11 req/s.

Mo processes, mo problems

The increase from two to five was good, so we tried increasing FastCGI backends to ten. Performance decreased significantly.

Checking with vmstat on the server, I could see it was swapping. Too many processes, each using memory for Python, had caused the VPS to run out of memory and swap memory to and from disk.

Home: 150 req/s.
Wall: 7 req/s.

At this point we set the FastCGI backends back down to five for further tests.

Profiling – where does the time go

The wall page had disappointing performance, so we started to optimise. The first thing we did was profile the code to see where time was being spent.

Using some simple profiling middleware it was clear the time was being spent in database queries. The wall page had a lot of queries and they increased linearly with the number of items on the wall. On the test wall this caused around 80 queries. No wonder its performance was poor.

Optimise this

By optimising how media attached to items is handled we were able to drop one query per item straight away. This slightly reduced how long the request took and so increased the number of queries handled per second.

Wall: 12 req/s.

Another inefficiency was the way several filters were applied to the content of each item whenever the page was requested. We changed it so the html output from the filtered content was stored in the item, saving some processing each time the page was viewed. This gave another small increase.

Wall: 13 req/s.

Back to reducing database queries, we were able to eliminate one query per item by changing how user profiles were retrieved (used to show who posted the item to the wall). Another worthwhile increase came from this change.

Wall: 15 req/s.

The final optimisation for this round of testing was to further reduce the queries needed to retrieve media attached to items. Again, we shed some queries and slightly increased performance.

Wall: 17 req/s.

Next step: caching

Having reduced queries as much as we can, the next step would be to do some caching. Retrieving cached data is usually much quicker than hitting the database, so we’d expect a good increase in performance.

Caching the output of complete pages is not useful because each page is heavily personalised to the user requesting it. It would only be a cache hit if the user requested the same page twice with nothing changing on it in the meantime.

Caching data such as lists of walls, items and users is more useful. The cached data could be used for multiple requests from a single user and shared to some degree across walls and different users. It’s not necessarily a huge win because each wall is likely to have a very small number of users, so the data would need to stay in cache long enough to be retrieved by others.

Our simplistic httperf tests would be very misleading in this case. Each request is made as the same user so cache hits would be practically 100% and performance would be great! This does not reflect real-world use of the site, so we’d need some better tests.

We haven’t made use of caching yet as the site can easily handle its current level of activity, but if Hey! Wall becomes popular, it will be our next step.

How many users is 17 req/s?

Serving 17 req/s still seems fairly low, but it would be interesting to know how this translates to actual users of the site. Obviously, this figure doesn’t include serving any media such as images, CSS and JavaScript files. Media files are relatively large but should be served fast as they are handled directly by Lighttpd (not Django) and have Expires headers to allow the client to cache them. Still, it’s some work the server would be doing in addition to what we measured with our tests.

It’s too early to tell what the common usage pattern would be, so I can only speculate. Allow me to do that!

I’ll assume the average user has access to three walls and checks each of them in turn, pausing for 10 or 20 seconds on each to read new comments and perhaps view some photos or open links. The user does this three times per day.

Looking specifically at the wall page and ignoring media, that means our user is making 9 requests per day for wall pages. Each user only makes one request at a time, so 17 users can be doing that at any second in time. Within a minute the user only makes three requests so is only counted within the 17 concurrent users for 3 seconds out of 60 (or 1 in 20).

If the distribution of user requests over time was perfectly balanced (hint: it won’t be), that means 340 users (17 * 20) could be using the site each minute. To continue with this unrealistic example, we could say there are 1440 minutes in a day and each user is on the site for three minutes per day, so the site could handle about 163,000 users. That would be very good for a $20/month VPS!

To reign in those numbers a bit, lets say we handle 200 concurrent users in a minute for 6 hours per day, 100 concurrent users for another 6 hours and 10 concurrent users for the remaining 12 hours. That’s still around 115,000 users the site could handle in a day given the maximum load of 17 requests per second.

I’m sure these numbers are somewhere between unrealistic and absurd. I’d be interested in comments on better ways to estimate or any real-world figures.

What we learned

To summarise:

Testing the performance of your website may yield surprising results
Having many database queries is bad for performance (duh)
Caching works better for some types of site than others
An inexpensive VPS may handle a lot more users than you’d think

Filed under: Django — Scott @ 2:58 pm

Comments (12)

12 Comments »

Hi there,

200 req/s for a static homepage in Django sounds like a small number! Did you perchance forget to turn HTTP keepalives OFF for the Lighttpd instances that serve your dynamic content?

Should it be the case, in order to fix it, in your virtualhost section ($HTTP[“host”] =~ “your\.server$”), add a line that says:

server.max-keep-alive-requests = 0

Comment by S. — 28 April 2008 @ 4:10 pm
“The wall page had a lot of queries and they increased linearly with the number of items on the wall.”….eh, what? Why isn’t it ONE query, no matter how many items?

Are you retrieving items then doing a separate query for the details of each one? Why aren’t you just doing a join in the database?

Different types of items, stored in different tables? Do a union query with left joins. Or do a query for each type of item if you must, and sort them out in your web code.

Comment by jeo — 28 April 2008 @ 7:27 pm
That is insanely bad performance, though ultimately it may be “Good” enough. I often think people don’t realize the trade offs they make when using frameworks like Django and Rails. Resource and performance wise it’s like moving back into the days of CGI. Forget RAM, forget concurrency, and learn to buy big boxes . Literally if you want to see your performance increase 10x, drop the framework.

Personally I’ve taken to using a hybrid approach, use some of the framework and hand code alot. For instance lose newforms entirely it’s an amazing waste of memory and cpu cycles ( still can’t figure out why it exists ), don’t use Models on any queries that happen alot, if you do use Models use ‘select_all’, QuerySet is the devil when it comes to speed.

One of the most beautiful things about Django is that you can subvert Django when you need to and go straight to the underlying tools.

Comment by Vance Dubberly — 29 April 2008 @ 12:15 am
Really nice article walking through the performance evaluation of your site! Nice work, and thanks for posting it so others can do the same for themselves!

Comment by Joe — 29 April 2008 @ 3:25 am
Wow, that is bad performance even after tuning.
the PyCon website can handle about 800 requests per second (all highly personalized w/ login and custom navbar). Sounds like you are not using select_related, values, and sub template caching.

I also recommend using mod_wsgi in deamon mode instead of fastcgi, using file based session caching, and memcached.

looking at the site, I see no reason why you should not be able to do 1000 requests per second.

Comment by Doug Napoleone — 29 April 2008 @ 5:15 am
i’ve done some similar test around Django, and the database ORM seems to be the bottle neck to me…

Comment by sean — 29 April 2008 @ 5:33 am
@S.

Thanks for the suggestion to disable keep-alive for the dynamic content. I tried it, but it seems to have made no difference.

The Lighttpd docs refer to disabling keep-alive as “the last resort”. Reducing the requests and idle time from the defaults didn’t seem to make a difference either.

@jeo

We’re old SQL hounds, so we know all about joins and unions. :)

A wall has multiple items. Each item is sent by a user and can have several pieces of media (e.g. photos).

The initial/naive approach was one query (through normal Django ORM) for the wall then in the template something like:

{% for item in wall.items.all %}
…
{% with item.sender as sender %}
{{ sender.name }} etc…
{% endwith %}
…
{% for media in item.media.all %}
…
{% endfor %}
…
{% endfor %}

I would say this is pretty standard — it’s the easy, natural way to work with the ORM. Of course, it means we are doing one query for the items and then two queries for each item, which is bad news. In some cases, select_related() will help, but not in this case as we had some nullable foreign keys.

This can be done far more efficiently, as you say, by joining the tables, or doing a couple of queries and stitching the data together in Python. That’s what we did during optimisation.

@Vance

The point of newforms and ORM and all this stuff is to make it easier/quicker to develop and maintain. There’s obviously a big performance trade-off. I totally agree with you that Django makes it easy to drop down a level when you want to trade ease for performance.

@Doug

Thanks for sharing some numbers for the PyCon website. What’s the hardware spec of the server?

Hundreds of req/s is certainly more appealing. We are using values() now to avoid lots of queries, but no caching yet.

When you benchmarked at 800 req/s, did your test make requests as multiple users? It occurs to me that caching could be misleading when testing performance since one user making 800 requests is very different from 800 users making one request each!

I might have a play with mod_wsgi at some point — looks interesting.

Comment by Scott — 29 April 2008 @ 4:00 pm
First check if you can reduce the number of queries executed (using this snippet for example : http://www.djangosnippets.org/snippets/93/), then profile them using the SQL EXPLAIN statement (not sure if it’s a standard one though, but it works on MySQL). With this, you’ll be able to identify “greedy” queries and try to optimize them by, for example, setting indexes on the right columns.

And if your indexes are already good, cache the objects for which queries are the most expensive. You could also check your my.cnf values and try to tune it, but 256Mb of RAM is a bit short…

Nice article though !

Comment by Sephi — 8 September 2008 @ 10:18 am
Could you please explain bit more about ‘authenticated requests with httperf’. I am wondering about the format of my session file with the POST data. The site I am trying to log in have a one cookie. Two different urls in the same session ends us receiving two different vales for the cookie though I use –session-cookie with –wsesslog option.Thx

Comment by SJS — 16 January 2009 @ 3:26 am
@SJS

Sorry, I don’t have any experience using this. The authentication I used was faked by setting the cookie header to a known value. In your case, httperf is handling sessions by itself – receiving a cookie and sending it on subsequent requests.

Comment by Scott — 17 January 2009 @ 11:06 pm
Oh, boy. Easily half a gig per httpd process.. Anyone have seen this before? http://twitpic.com/zzem4 Great article, btw. Thanks for putting it together.

Comment by Jan Paricka — 27 January 2010 @ 1:36 pm
Very good post. I’m going to have a look at httperf, never heard about it before.

As for the vmstat command I would recommend running it within a scree so that it will still run even if the ssh connection to the server is closed. This might prove very useful for longer tests.

Also adding -n will stop the vmstat headers from being reprented

screen vmstat -n 2

To leave the screen use Ctrl+a Ctrl+d and to return to it use screen -r

Comment by Maciej Zaleski — 14 April 2011 @ 1:24 pm

RSS feed for comments on this post.

Flapping Head