REST worst practices
A few weeks ago, I sent the following in a email to a co-worker asking for input on designing REST APIs in Django. Since then, I’ve quoted myself a few times; I thought these thoughts would be worth a (slightly edited) public home.
I think the best way to dive in terms of mistakes to avoid. If you poke around you’ll find a couple-three different stabs at writing a generic REST API module for Django.
So, with no further ado, some REST “worst practices:”
Conflating models and resources
In the REST world, the resource is key, and it’s really tempting to
simply look at a Django model and make a direct link between resources
and models – one model, one resource. This fails, though, as soon as
you need to provide any sort of aggregated resource, and it really
fails with highly denormalized models. Think about a Superhero
model:
a single GET /heros/superman/
ought to return all his vital stats
along with a list of related Power
objects, a list of his related
Friend
objects, etc. So the data associated with a resource might
actually come out of a bunch of models. Think select_related()
, except
arbitrary.
I’d solve this in a similar manner to the way forms work in Django:
there’s a basic Form
, and then a ModelForm
; I’d have a Resource
and a ModelResource
.
Hardcoded auth
(Or depending on cookie-based auth.) Auth needs to be pluggable, and the auth object needs to determine which operations on which resources are allowed; that’s the only way you could get something complicated working. The use case I think of is one where you want anonymous read, read/write access to certain APIs with a developer token, and access to everything via HTTP auth and appropriate Django User/Permission settings.
Resource-specific output formats
It’s really tempting to have each resource have a different
representation (i.e. schema if using XML), but I think that’s a bad idea
– it makes consumption code really hard. That’s the idea behind the
serialization format in Django: there’s a single format for each item
(well, until you get down into fields) and that makes parsing easy. Look
also at the Yahoo APIs – the ResultSet
/Result
format is the same
throughout. Atom (and/or GData) are obvious choices here.
The idea is that client code shouldn’t have to know how to parse all sorts of different formats.
Hardcoded output formats
Yes, JSON rocks and we should probably make it the only option at first. But at some point later on we might want to use a different format – AtomPub comes to mind for certain resources – and the system needs to support it.
Further, the output format ought to be determined heuristically from the
HTTP Accept
header; that lets clients select output formats without
resorting to ugly ?format=xml
crap.
Weak HTTP method support
Most systems I’ve seen map the Big Four methods (GET/POST/PUT/DELETE) to more CRUD-y methods (create/retrieve/update/delete). This is a bad idea: some resources might want to use the POST-as-create-subordinate-resource pattern, and other might want to use POST-as-update for compatibility with HTML forms. Either should be allowed.
Further, systems that do this don’t allow use of extended HTTP methods. WebDAV defines some useful methods, and HTTP PATCH is approaching draft status. Nothing about the REST model says that you have to limit yourself to the Big Four; it simply says that the methods you support need to be supported uniformly. PATCH, in particular, might be a very useful method.
Improper use of links
So you want some resource to reference another resource – say, a
PhotoAlbum
referencing some Photo
objects. You might think to do it
something like:
{'album': 'whatever', 'photos': [1, 2, 3, 4]}
Just include the other object IDs, right? Unfortunately, this means that
the consumption code needs to “Just Know” how to construct URLs for
Photo
resources, and this means a brittleness between the client and
server and leads to all sorts of annoying library incompatibilities.
Nearly every API on the planet makes this mistake, and it means that
client libraries eventually go stale as URLs change on the server.
Instead, a web service ought to to exactly what the web does: use URLs:
{'photos': ['http//example.com/ph/1', ...]}
Alternatively, if the photo ID is important information that needs to be conveyed separately from the somewhat opaque URL, a URI-Template is just the ticket:
{
'photos': [1, 2, 3],
'photo_uri_template': 'http://example.com/ph/{id}'
}
Again, Resource != Model.
Couple the REST API to the application
Any big API is going to need to have dedicated servers that just serve API applications: the performance characteristics of large-scale APIs are so different from web apps in general that they almost always require separately-tuned servers.
On top of that, the API servers need to be able to crash and not affect the public web site.
From what I’ve heard, this is what really brought Twitter down: the
API was so tightly coupled to the web site that API calls would bring
down the public side. It was only solved by severing the links between
the API code and the rest of the side.
Update (May 1, 2009):
I heard wrong; see Alex’s comment, below. Alex also disagrees that this is a performance issue at all, and given he clearly knows more than I, you probably should listen to him over me.
Django makes this kind of decoupling easy, but it’s still a trap that’s easy to fall into.