Django performance tips
Django handles lots of traffic with ease; Django sites have survived slashdottings, farkings, and more. Here are some notes on how we tweak our servers to get that type of high performance.
Use a separate media server
Django deliberately doesn’t serve media for you, and it’s designed that way to save you from yourself. If you try to serve media from the same Apache instance that’s serving Django, you’re going to absolutely kill performance. Apache reuses processes between each request, so once a process caches all the code and libraries for Django, those stick around in memory. If you aren’t using that process to service a Django request, all the memory overhead is wasted.
So, set up all your media to be served by a different web server entirely. Ideally, this is a physically separate machine running a high-performance web server like lighttpd or tux. If you can’t afford the separate machine, at least have the media server be a separate process on the same machine.
Use a separate database server
If you can afford it, stick your database server on a separate machine, too. All too often Apache and PostgreSQL (or MySQL or whatever) compete for system resources in a bad way. A separate DB server — ideally one with lots of RAM and fast (10k or better) drives — will seriously improve the number of hits you can dish out.
Use PostgreSQL
I’ll probably get lots of push-back from the MySQL community about this one, but in my experience PostgreSQL is much faster than MySQL in nearly every case.
There’s no such thing as too much RAM
Even really expensive RAM costs only about $200 per gigabyte. That’s SO much cheaper than the cost of programmer time it isn’t even funny. Buy as much RAM as you can possibly afford, and then buy a little bit more.
Faster processors really won’t improve performance all that much; most web servers spend up to 90% of their time waiting on IO! As soon as you start swapping, performance will just die. Faster disks might help slightly, but they’re so much more expensive than RAM that it doesn’t really matter.
If you’ve got multiple servers, the first place to put your RAM is in the database server. If you can afford it, get enough RAM to get fit your entire DB. This shouldn’t be too hard; our database — including half a million articles dating back to 1989 — is only 1.5 gigs.
Next max out the RAM on your web server. The ideal situation is one where neither swaps – ever. If you get to that point you should be able to withstand most normal traffic.
Turn off KeepAlive
I don’t totally understand how KeepAlive works, but turning it off on our Django servers increased performance by something like 50%. Of course, don’t do this if the same server is also serving media… but you’re not doing that, right?
Use memcached
Although Django has support for a number of cache backends, none of them perform even half as well as memcached does. If you find yourself needing the cache, do yourself a favor and don’t even play around with the other backends; go straight for memcached.
Tune, tune, tune
(With apologies to the Byrds.)
Chances are the defaults for your web server, database engine, or machine are not tuned as nicely as they could be. This is far from a comprehensive list, but below are some of the resources I used to make my stuff scream:
- Performance Tuning PostgreSQL by Frank Wiles (who also works for the Journal-World, albeit in another department).
- High Performance MySQL (if you’re the MySQL type).
- Power PostgreSQL performance tips.
- The annotated postgresql.conf.
- The postgresql-performance mailing list.
- ONLamp’s Introducing LAMP Tuning Techniques.
- Fixing an overloaded web server.
- The #apache IRC channel.
Again, far from comprehensive, but those should help anyone involved in tuning a Django site.
Future directions
Running some simple benchmarks seems to imply that Django under lighttpd and FastCGI outperforms Apache/mod_python. I need to play around with it some more, but there’s a good chance that Django just doesn’t need the overhead of Apache.
Also, for very large sites some sort of database replication/federation is going to be needed eventually. Nothing I’ve done has hit that point yet, but when when it does that will make things very interesting. Tools like Slony and/or pg_pool will likely come in handy at that point.