Jacob Kaplan-Moss

REST worst practices

I wrote this post in 2008, more than 16 years ago. It may be very out of date, partially or totally incorrect. I may even no longer agree with this, or might approach things differently if I wrote this post today. I rarely edit posts after writing them, but if I have there'll be a note at the bottom about what I changed and why. If something in this post is actively harmful or dangerous please get in touch and I'll fix it.

A few weeks ago, I sent the following in a email to a co-worker asking for input on designing REST APIs in Django. Since then, I’ve quoted myself a few times; I thought these thoughts would be worth a (slightly edited) public home.

I think the best way to dive in terms of mistakes to avoid. If you poke around you’ll find a couple-three different stabs at writing a generic REST API module for Django.

So, with no further ado, some REST “worst practices:”

Conflating models and resources

In the REST world, the resource is key, and it’s really tempting to simply look at a Django model and make a direct link between resources and models – one model, one resource. This fails, though, as soon as you need to provide any sort of aggregated resource, and it really fails with highly denormalized models. Think about a Superhero model: a single GET /heros/superman/ ought to return all his vital stats along with a list of related Power objects, a list of his related Friend objects, etc. So the data associated with a resource might actually come out of a bunch of models. Think select_related(), except arbitrary.

I’d solve this in a similar manner to the way forms work in Django: there’s a basic Form, and then a ModelForm; I’d have a Resource and a ModelResource.

Hardcoded auth

(Or depending on cookie-based auth.) Auth needs to be pluggable, and the auth object needs to determine which operations on which resources are allowed; that’s the only way you could get something complicated working. The use case I think of is one where you want anonymous read, read/write access to certain APIs with a developer token, and access to everything via HTTP auth and appropriate Django User/Permission settings.

Resource-specific output formats

It’s really tempting to have each resource have a different representation (i.e. schema if using XML), but I think that’s a bad idea – it makes consumption code really hard. That’s the idea behind the serialization format in Django: there’s a single format for each item (well, until you get down into fields) and that makes parsing easy. Look also at the Yahoo APIs – the ResultSet/Result format is the same throughout. Atom (and/or GData) are obvious choices here.

The idea is that client code shouldn’t have to know how to parse all sorts of different formats.

Hardcoded output formats

Yes, JSON rocks and we should probably make it the only option at first. But at some point later on we might want to use a different format – AtomPub comes to mind for certain resources – and the system needs to support it.

Further, the output format ought to be determined heuristically from the HTTP Accept header; that lets clients select output formats without resorting to ugly ?format=xml crap.

Weak HTTP method support

Most systems I’ve seen map the Big Four methods (GET/POST/PUT/DELETE) to more CRUD-y methods (create/retrieve/update/delete). This is a bad idea: some resources might want to use the POST-as-create-subordinate-resource pattern, and other might want to use POST-as-update for compatibility with HTML forms. Either should be allowed.

Further, systems that do this don’t allow use of extended HTTP methods. WebDAV defines some useful methods, and HTTP PATCH is approaching draft status. Nothing about the REST model says that you have to limit yourself to the Big Four; it simply says that the methods you support need to be supported uniformly. PATCH, in particular, might be a very useful method.

Couple the REST API to the application

Any big API is going to need to have dedicated servers that just serve API applications: the performance characteristics of large-scale APIs are so different from web apps in general that they almost always require separately-tuned servers.

On top of that, the API servers need to be able to crash and not affect the public web site.

From what I’ve heard, this is what really brought Twitter down: the API was so tightly coupled to the web site that API calls would bring down the public side. It was only solved by severing the links between the API code and the rest of the side.

Update (May 1, 2009):

I heard wrong; see Alex’s comment, below. Alex also disagrees that this is a performance issue at all, and given he clearly knows more than I, you probably should listen to him over me.

Django makes this kind of decoupling easy, but it’s still a trap that’s easy to fall into.