A few weeks ago, I sent the following in a email to a co-worker asking for input on designing REST APIs in Django. Since then, I’ve quoted myself a few times; I thought these thoughts would be worth a (slightly edited) public home.
I think the best way to dive in terms of mistakes to avoid. If you poke around you’ll find a couple-three different stabs at writing a generic REST API module for Django.
So, with no further ado, some REST “worst practices:”
Conflating models and resources
In the REST world, the resource is key, and it’s really tempting to simply look at a Django model and make a direct link between resources and models — one model, one resource. This fails, though, as soon as you need to provide any sort of aggregated resource, and it really fails with highly denormalized models. Think about a Superhero model: a single GET /heros/superman/ ought to return all his vital stats along with a list of related Power objects, a list of his related Friend objects, etc. So the data associated with a resource might actually come out of a bunch of models. Think select_related(), except arbitrary.
I’d solve this in a similar manner to the way forms work in Django: there’s a basic Form, and then a ModelForm; I’d have a Resource and a ModelResource.
Hardcoded auth
(Or depending on cookie-based auth.) Auth needs to be pluggable, and the auth object needs to determine which operations on which resources are allowed; that’s the only way you could get something complicated working. The use case I think of is one where you want anonymous read, read/write access to certain APIs with a developer token, and access to everything via HTTP auth and appropriate Django User/Permission settings.
Resource-specific output formats
It’s really tempting to have each resource have a different representation (i.e. schema if using XML), but I think that’s a bad idea — it makes consumption code really hard. That’s the idea behind the serialization format in Django: there’s a single format for each item (well, until you get down into fields) and that makes parsing easy. Look also at the Yahoo APIs — the ResultSet/Result format is the same throughout. Atom (and/or GData) are obvious choices here.
The idea is that client code shouldn’t have to know how to parse all sorts of different formats.
Hardcoded output formats
Yes, JSON rocks and we should probably make it the only option at first. But at some point later on we might want to use a different format — AtomPub comes to mind for certain resources — and the system needs to support it.
Further, the output format ought to be determined heuristically from the HTTP Accept header; that lets clients select output formats without resorting to ugly ?format=xml crap.
Weak HTTP method support
Most systems I’ve seen map the Big Four methods (GET/POST/PUT/DELETE) to more CRUD-y methods (create/retrieve/update/delete). This is a bad idea: some resources might want to use the POST-as-create-subordinate-resource pattern, and other might want to use POST-as-update for compatibility with HTML forms. Either should be allowed.
Further, systems that do this don’t allow use of extended HTTP methods. WebDAV defines some useful methods, and HTTP PATCH is approaching draft status. Nothing about the REST model says that you have to limit yourself to the Big Four; it simply says that the methods you support need to be supported uniformly. PATCH, in particular, might be a very useful method.
Improper use of links
So you want some resource to reference another resource — say, a PhotoAlbum referencing some Photo objects. You might think to do it something like:
{'album': 'whatever', 'photos': [1, 2, 3, 4]}
Just include the other object IDs, right? Unfortunately, this means that the consumption code needs to “Just Know” how to construct URLs for Photo resources, and this means a brittleness between the client and server and leads to all sorts of annoying library incompatibilities. Nearly every API on the planet makes this mistake, and it means that client libraries eventually go stale as URLs change on the server.
Instead, a web service ought to to exactly what the web does: use URLs:
{'photos': ['http//example.com/ph/1', ...]}
Alternatively, if the photo ID is important information that needs to be conveyed separately from the somewhat opaque URL, a URI-Template is just the ticket:
{
'photos': [1, 2, 3],
'photo_uri_template': 'http://example.com/ph/{id}'
}
Again, Resource != Model.
Couple the REST API to the application
Any big API is going to need to have dedicated servers that just serve API applications: the performance characteristics of large-scale APIs are so different from web apps in general that they almost always require separately-tuned servers.
On top of that, the API servers need to be able to crash and not affect the public web site.
Update (May 1, 2009):
I heard wrong; see Alex’s comment, below. Alex also disagrees that this is a performance issue at all, and given he clearly knows more than I, you probably should listen to him over me.
Django makes this kind of decoupling easy, but it’s still a trap that’s easy to fall into.
Comments:
"the output format ought to be determined hubristically from the HTTP Accept header; that lets clients select output formats without resorting to ugly ?format=xml crap."
As per the way Django does i18n, you need a ladder for conneg. That ugly crap saves people having to deal with parsing/generating accept headers and the associated matching algorithms. Another alternative is to put .xml at the end of the URL; it should be easier on caches than a query param.
"If you poke around you’ll find a couple-three different stabs at writing a generic REST API module for Django."
Did you count the admin application? You should - it just happens to be restricted to hardcoded html output and forms input :)
"So the data associated with a resource might actually come out of a bunch of models. Think select_related(), except arbitrary."
The follow on consequence is the cost of loading that resource out of a database. This costs goes up as the client is allowing to dynamically ask for subsets of the data in the entity.
"Auth needs to be pluggable"
Yes, and this can impact caching, especially if one is changing the entity depending on the permissions.
One suggestion though - stop calling these things APIs. And I really do think the place to start the REST examination of Django is the admin application
What are your thoughts on versioning in REST APIs?
"This costs goes up as the client is allowing to dynamically ask for subsets of the data in the entity."
I think this is untrue if caching is fine-grained. David Cramer's django-orm-cache ( http://code.google.com/p/dj... ) provides a, uh, guide here.
To echo Bill de hÓra, conneg is unwiedly and I've found the .xml, .json, etc at the end of the URL to work well enough. In case you're interested, this has been discussed on the rest-discuss list a couple of times with all the pros/cons, etc.
Thanks for interesting tips. But I didn't understand the "Conflating models and resources" paragraph. You mean that I should have the /heros/superman_power/ resource?
@Peter Harkins: I would think that an API version could easily be requested in the hostname.
I think too often people conflate "Web App == Specific Data and Business Logic == Domain Name." The URI is the API interface to a specific data / business logic combo. If the API changes (as in a new version), then the URI should change. I think it's a fallacy to believe that this URI change *must* be found in the path info. In general, I think the path info represents the resource within a system, but the domain name represents the system — and specifically the hostname represents the specific API.
We already do something similar with “www.example.com” being the HTTP interface, “mail.” (SMTP + POP), and “db1.” (SQL over TCP socket) and the like.
i´ve been reading some articles stating that you should use the accepts header to deal with versioning. For a more compreensive explanation checkout http://barelyenough.org/blo... The main reason to do this is to keep consistency for the resource url. If you publish something as http://myapi/resource/item1 should be valid regardless the version.
@Jose @Ben
Media types are your contract. If you make breaking changes to your contract then you need to create a V2 contract.
I find many people are confused because they see the use of application/xml and application/json everywhere. The problem with these generic media types is they only work if you are doing code download, like javascript in a browser. If your server is delivering content to non-browser clients then you have to either stick with a standardized format e.g Atom, XHTML+Microformats, RDF, or you need to define your own media type in the vnd.mycompany.myformat space. Once you have your own format, versioning is pretty easy with accept header.
This is a great list, and I agree with everything on it. Having tried most of the pluggable API apps out there, I ended up writing my own API, basically because of one thing: none of them let me easily output a model instance method as part of the response. I can puts database field values in my JSON 'till the cows come home, but it was damn-near impossible to include the results of get_absolute_url(), or any other model instance method.
This, of course, goes right along with your models != resources bit.
"From what I’ve heard, this is what really brought Twitter down: the API was so tightly coupled to the web site that API calls would bring down the public side. It was only solved by severing the links between the API code and the rest of the side."
This is speculation, and it's false. It would be great if you had asked; everyone on the engineering team at Twitter is pretty accessible, for future reference. Just trying to keep the facts straight, in case someone is looking to learn from our mistakes.
Our API traffic still goes through the same app servers that handle user-facing traffic. We're gradually separating them, but there were issues much deeper down the stack that caused our performance issues. Still the same code base, the same app servers, etc. etc. It's hardly optimal, but it does manage to serve hundreds of millions of requests per day. If I had to do it over again, though, I'd separate the two from day one mostly for maintainability, and less for performance.
In our case, API traffic wildly outstrips users-with-browsers traffic. Treating the API as a second class citizen and allowing it to "crash" would actually be penalizing the majority of our users. Your advice better suits services for which the API usage is in the minority, traffic-wise.
As it turns out, doing REST correctly and serving clients on many platforms at a high volume of requests means compromising a fair bit of this advice. Less commonly used HTTP methods go right out the window on Flash and mobile platforms. Links mean more requests to get data that could be delivered inline, which doesn't work for bandwidth- and resource-constrained environments. HTTP Accept headers (and indeed, use of many fancy headers) are beyond what most client developers will bother with; including the response format in the URL with a ".xml" or ".json" is natural and obvious.
In a perfect world, we'd do REST a bit more like what you've outlined heree. If you'd like to reach a broader audience, in the meantime, you'll have to compromise. That's been my experience, anyway.
is Rest api the important part of web app?
Shoji does for JSON catalogs what AtomPub does for Atom feeds. See http://www.aminus.org/rbre/... if you're interested.
I agree with the URI Template thing, although I'm slightly nervous of something that (so far at least) has failed to be standardised. My response to that concern is to encapsulate it - both server-side and client-side. Done right, I think integration can be made very easy. http://positiveincline.com/... shows how I've approached this.
"the output format ought to be determined hubristically from the HTTP Accept header"
Did you mean 'heuristically'?
Good article and it can easily be renamed to "REST best practices"
In line with Alex Payne's comments, please don't use anything other than GET or POST if you're designing a REST API, at least if you want to make it usable on all the wacky platforms that are out there. Even doing a PUT with CURL/PHP involves jumping through hoops, and while Actionscript supports the standard methods, custom ones like WebDAV are not possible.
Alex: oof, I'm really sorry for getting that wrong. You're right; I should have contacted someone there to verify what I'd heard from a third party. I've struck out the incorrect passage above and pointed people to your comment. Thanks for the clarification, and for your explanation which goes far beyond the call of duty for a simple correction!
Leave a comment: