Writing great documentation:
What to write
Jacob Kaplan-Moss
November 10, 2009
Part of the
Writing great documentation series.
Tech docs can take a bunch of different forms ranging from high-level overviews, to step-by-step walkthroughs, to auto-generated API documentation. Unfortunately, no single format works for all users; there’s huge differences in the way that people learn, so a well-documented project needs to provide many different forms of documentation.
At a high level, you can break down the different types of documentation you need to provide into three different formats:
- step-by-step tutorials,
- overviews and topical guides to the various conceptual areas of your project, and
- low-level, deep-dive reference material.
Let’s look at each in turn.
Tutorials
Good tutorials are a must as they’re usually the first thing someone sees when trying out a new piece of tech. First impressions are incredibly important: that rush of success as you work through a good tutorial will likely color your future opinions about the project.
Django’s tutorial is frankly a bit musty at this point and is probably due for at least a light refresh, but it hits all the important points. A good tutorial should:
Be quick. At some conference or another I heard someone — I think it was Kathy Sierra — say that, as a rule of thumb, a new user should be able to experience success within thirty minutes. That’s a great rule: thirty minutes is nothing — think “half a lunch hour.” If your project can give new users the warm fuzzies that quickly, they’ll come away wondering about all the awesome successes a deeper dive might give.
Be easy. Remember: you want success to be the outcome of the tutorial. This means you need to playtest the tutorial under all sorts of different circumstances, making sure that it always works (even on Windows).
But not too easy. There’s always going to be a class of users who aren’t really qualified to use your project. Someone who’s never written any code before isn’t going to get very far with Django; those types of users should fail quickly. Don’t get them through the tutorial only to run into a wall later on.
Another similar anti-pattern is glossing over bad choices made in the interest of expediency. Django’s tutorial makes this mistake: we gloss over the project/app distinction in a way that bites users later on. (It’ll get fixed soon, I promise!)
The best way of thinking about a tutorial’s ease is that it’s the on-ramp onto your project’s learning curve. This means the slope can be more gradual than later tasks, but no so much so that things suddenly get much much harder after the tutorial’s finished.
Demonstrate how your project “feels.” More than anything, people are using your tutorial to get a sense of how your project is going to “feel” in the long term. This means you that it should be pretty cross-sectional; a good tutorial should show off most of the different areas of the project.
A couple of projects with really good tutorials to check out for inspiration are LOVE and Lamson.
Topical guides
This is the meat of your documentation. Once somebody’s learned (from a tutorial) the high-level concepts, they’re going to need to dive into the details of some area or another. Any documentation worth its salt is going to have a whole bunch of these — Django’s got about 35 different topical guides, covering each conceptual area (e.g. models, sessions, testing, etc.)
These don’t need to cover every single configuration option or function argument — that’s what reference material is for — but each guide (or section, or chapter, depending on how things are organized) needs to take a pretty deep dive into its respective area.
The main goal for topical coverage should be comprehensiveness. The reader ought to come away from a close read feeling very comfortable with the topic in question. They should feel that they know the vast majority of the possible options, and more importantly they should understand how all the concepts fit together.
Unfortunately there aren’t a lot of projects that do these very well. Most have reasonable tutorials, many have okay-to-good reference material, but most seem to leave the topical guides to books.
While it’s true that books shine in the “topical guide” area, they’re not really a great substitute for guides as part of the official documentation. Official docs, even when done poorly, are usually much more up-to-date; books, even when done well, are often out of date the day they hit the shelves.
Books-as-guides can be done well — the Subversion Book is a great example — but only when the book is continuously maintained available for free.
There’s a particularly pernicious anti-pattern in documentation where tutorials are provided for free but the real documentation is only available for purchase. At best that’s lazy and sloppy; at worst it’s downright evil. Free software needs free documentation. If you’ve got otherwise you should be ashamed of yourself.
Reference
Finally, you need complete reference for all the public APIs your project provides. These should be designed for those who already know how to use some API, but need to look up the exact arguments some function takes, or how a particular setting influences behavior, etc.
It’s important to point out that reference material is not in any way a substitute for good tutorials and guides! Great reference material on the foo.baz package does readers no good whatsoever if they don’t know the name of the package they’re looking for.
Python’s documentation is a perfect case in point. The individual standard library modules tend to have incredibly good documentation, but there’s no high-level overviews to help you discover which module you might actually want! Take for example the collections module: it’s great reference material, explaining exactly what’s in the module, how to use it, and what all the options are. But if you don’t know that Python ships with a deque implementation in collections.deque you’ll probably end up missing the library entirely.
Think of guides and reference as partners: guides give you the “why,” and reference gives you the “how.” Following the deque example, some sort of “guide to data structures in Python” could give an overview of all the different types of data structures in Python (be they built-in or standard library), linking off to the documentation for each module and type for the complete details.
It’s really tempting to use an auto-documentation tool like Javadoc or RDoc for reference material.
Don’t.
Auto-generated documentation is almost worthless. At best it’s a slightly improved version of simply browsing through the source, but most of the time it’s easier just to read the source than to navigate the bullshit that these autodoc tools produce. About the only thing auto-generated documentation is good for is filling printed pages when contracts dictate delivery of a certain number of pages of documentation. I feel a particularly deep form of rage every time I click on a “documentation” link and see auto-generated documentation.
There’s no substitute for documentation written, organized, and edited by hand.
I’ll even go further and say that auto-generated documentation is worse than useless: it lets maintainers fool themselves into thinking they have documentation, thus putting off actually writing good reference by hand. If you don’t have documentation just admit to it. Maybe a volunteer will offer to write some! But don’t lie and give me that auto-documentation crap.
What’s next
Now that I’ve covered what to write, I’ll move into how to write. Tomorrow I’ll start going into the actual mechanics of writing good, readable technical prose.
Comments:
Django definitely has some of the best documentation I've ever seen and all of the contributors to it should be proud. I can say unequivocally that I came to Django because of the docs. Before I even knew about Django I looked at Rails and a few PHP frameworks, at that time *none* of them had any sort of introduction or tutorial that got you started using them. By contrast when I found out about Django I read all of the documentation and the drafts of the DjangoBook in one night. Knowledge is power!
You should be proud of the Django documentation, it is the best documentation I have ever found for a project this size. With virtually no background with Python, and armed with nothing other than the docs, I was able to complete the tutorial, and go on to write a production site. To be fair, I'm not new to programming languages, and the site is no beauty, but I did get the job done.
Great work on a great resource.
I agree that Django has good documentation. However, one little thing I've missed having come from a PHP background is that with PHP you can type in http://php.net/foo in your browser and the PHP website will almost always find the reference you're searching for. I don't miss this functionality as much on the Django documentation website as much as the python documentation website.
Definitely agree with all your points. I think Scrapy (a screen scraping framework - http://scrapy.org) is another project with pretty decent documentation. They even claim (in their FAQ) to have "stolen" some ideas from Django, and the good documentation is fortunately one of them.
Brent: I also came from PHP but I have a very different point of view on its documentation. Much like a bank which only loans you money when you don't need it, PHP's documentation only gives you answers when you already know them in advance, because you have to know the name of the function you're looking for in order to find it.
Of course, if you're already an experienced PHP programmer and you're just trying to remember the argument signature or return value for a particular function, that has its uses, but as a tool for learning the language it's crap. Sure, it has a big list of functions, but what it doesn't have is the sorts of high-level guides which let you know what PHP offers and why you'll want to care about all the stuff that's built in.
(and for the sorts of things I *would* use php.net for in the past, these days I just type 'pydoc some_name' and read what comes back; for any well-written module the output will be exactly what I'm looking for)
Amen. My Software Engineering professor was trying to convince my team to use Swing for our prototype (of a web app), because it has "excellent documentation". Yeah.
It's unfortunate that proper documentation is an afterthought in many open-source projects. No matter how compelling an open-source project looks, I'll likely bounce if there's no (recent) documentation.
I don't entirely agree that auto-generated API documentation is useless. API documentation often feels easier to search/scan/navigate than source code. Also, the added verbosity in explaining the purpose of a class/function is sometimes more helpful than just staring at someone else's source code, particularly when the docs explain how that class/function relates to other components in the framework.
Auto-generated API docs are not a great way to dive in and learn a new technology, but they can be super efficient as a reference source.
Of course, I'm talking about auto-generated docs which also contain the additional human-written comments; documentation generated from uncommented code *is* useless.
Yep the django documentation is fairly good, however I found myself joining #django on freenode to get questions answered here and there.
www.djangosnippets.org has been useful also
Your framework of having 3 different 'angles of approach' for documentation (tutorial, topical guide, reference) is pretty awesome. I think it deserves a lot of attention so I urge you to name it and introduce it to the FOSS community.
Simply by having an easy name, ideas can gain a lot of popularity. Take MVC, DRY, RAII or 'The Rule of Three'(C++) as examples. None are particularly interesting over similar proposals, but simply by having good names, these concepts are practically known everywhere.
I look forward to seeing documentation that links to your blog as reference :D
I agree that django has some of the best documentation out there.
I do have a small counter point though: django certainly isn't making the best use of Sphinx out there. When writing an SQLAlchemy app earlier this year I was able to put something like :meth:`~sqlalchemy.orm.scoping.ScopedSession.query_property` in my docs (or source code) and produce a link that other can follow. (This is using the intersphinx_mapping setting in conf.py). However in django even the most basic parts of the framework aren't marked up with .. autoclass:: and therefore you loose that ability.
That said I don't believe that auto generated documentation is ever sufficient.
Thanks for a thought-provoking article. In particular, I think your distinction between tutorials and topical guides is a useful one.
However, I found myself agreeing with Kyle Fox about the auto-generated documentation: it's useful when it includes hand-written comments but not when it's generated from uncommented code. I would go further and say that using auto-generated documentation can encourage developers to add more detailed comments to their code (for example, explaining any special cases), which must be a good thing.
I look forward to the next installment. One of the many things that attracted me to Django was just how good the documentation is. The tutorial might be a bit musty (I haven't really used it since the 0.96 days) but it got me up-and-running very quickly and was a great way to get started.
Great article - I think the advice for tutorials to 'be quick' and deliver a feeling of success within half a lunch hour is dead right. That's exactly what I've tried to do with my introductory Puppet tutorial:
http://bitfieldconsulting.c...
as it takes the shortest possible path to getting the user to a successful run of Puppet that does something useful. "Be easy... but not too easy" is also great advice.
Excellent Article! I think I speak for every Django noob when I say "Thanks!" for the Django docs. It's definately a huge asset to the django community, and a model for success for any other open source project.
Editor's note: in your 3 paragraph, "...to any for of technical documentation." should be "...to any form of technical documentation."
Thanks for this... As someone who's working on writing, by hand, a complete documentation system for an open source project, this advice affirms the things we've been doing right and provides useful guidelines for continued writing and improvement.
From my Hacker News comment:
Under Tutorials > Demonstrate how your project “feels.” bullet, the second sentence is:
This means you that it should be pretty cross-sectional; a good tutorial should show off most of the different areas of the project.
I think it should be:
This means, for your purposes, that it should be pretty cross-sectional; a good tutorial should show off most of the different areas of the project.
I'd also just strike out the "pretty" there and replace "should show" with "shows". Maybe replace "most of the different areas" with "the most important areas". Be forceful in your writing!
You overstate the your point about autogenerated documentation. It can be good or bad depending on the language in question, effort that's put into it, specific situation, method of delivery, searchablity, etc. I've seen handcrafted manuals that look real pretty but are total crap too.
Good documentation is hard work and a fairly rare skill that is often independent of coding ability. Also there's the native speaker of (some reasonably standard dialect of) English thing.
Ok, I read your post more carefully and you see what you're confused about. I think "autogenerated documentation" is a misleading phrase. There's really no such thing. Autodoc tools are really just that, tools that take the tedious part out of doc production. Some of these tools produce better output than others, but none actually "generate" documentation. The documentation still has to be written. It's just more efficient and logical for that documentation to live with the code that it documents.
Your article couldn't have come at a better time. We've always been quite happy, as have our users, with the documentation on the jQuery Project. But this weekend we're getting together to take it to the next level. We have some good tutorials and reference. Your post quickly pointed out our lacking in the overview and guides area.
Looking forward to the rest in the series. Thanks!
As someone who is beginning to document internal systems, this is a great read. I look forward to the future installments!
To have a better documentation, a better documenting system is also necessary. And Sphinx serves really good for python documentation. I even used it for C# software documentation. I wish Sphinx had autodoc for C family language also.
Better documentation is very much necessary for any kind of project. Be it OSS or proprietary.
Mechanically-generated documentation can be useful (though it isn't always). Specifically, it can tell the reader what is related to what, provide rapid navigation, etc. That said, it is NO substitute for good human-generated documentation. Ideally, IMHO, human- and machine-generated information should be combined in a customizable, editable format. But nobody does this, AFAIK.
My problem with the Django documentation is that they spend way too much time 'building you up'.
Basically you read something like <here's how to do X> so you get all excited then start doing that while reading along, and then they're like <but that's not great so do it Y way instead> so you're like "Oh, ok, cool...".
Then they go <but that's not good either so here's Z> and then I start to get pissed.
It makes it impossible to get through sanely and I feel like I have to read the whole 700 pages before I can start anything (which of course is never a good strategy for enabling quick and easy success).
And when I do start stuff and reference the doc, I can never remember if what I'm reading / trying is the "final answer" of rightness....
yea, it's frustrating to say the least... but definitely well written.
I agree auto-generated documentation is crap. Too many projects have this issue esp in the C++ world. I'll be honest pre-1.0 I can't say I was happy with Django documentation or the state of the project, it was too fragile and the docs didn't match the code. I've sworn off it. However, I believe your advice here is sound. You should add though MAKE SURE YOUR DOCUMENTATION MATCHES YOUR CODE. nothing worse than asking for help after RTFM because the Manual is F-ing WRONG.
The Django tutorial is very good, but, at least when used by readers new to Web application development, it puts too much emphasis on tailoring the admin pages, and not enough emphasis on the core URLconf-models-views-templates pattern.
Django has some of the best documentation I've seen, and it seems to be a cultural value in the Python community to be concerned with quality documentation. I miss things like doctest when I use other languages. Although it falls within the "auto-documentation" category, I like the guarantee that the docs and the code always agree. Of course, doctest is not enough. I agree there should be hand written docs. What I find interesting is the influence of "agile" programming methodolgy on valuing documentation. Some think "agile" is about working code first, and documentation as an afterthought. Others think that documentation encourages involvement, and since agile is about people, this encouragement shouldn't be underrated. What do you think about this?
This is a very well done piece of information. I have never done API documentation so this is a first for me. I enjoyed reading your work.
Typo? "but only when the book is continuously maintained available for free." Missing an article "*and* available"?
Nice article. Re the autodoc issue, I agree with a previous commenter that there's a distinction between "using" an autodoc tool to (1) simply run over undocumented source code to produce a nice fat stack of paper or (2) to generate useful, intelligently written documentation that's embedded in the source code for the purpose of streamlining maintenance of a continually changing product. Not an auto-doc booster here--I've wasted wasted my share of hours wrestling with the tools and not always winning. But this is a critical distinction missing in this excellent overview.
I'd also like to see specifically *what* you add of value to an API ref written & organized by hand. (Not elaborating here weakens your argument, which I heartily agree with.) Some of the things I might add are: sequence info--if you need to call one function before another, comparison info--when more than one function does a similar but slightly different task, error messages, boundary values for parameters, critical ramifications of using certain values (if relevant), clearer description of what gets returned, etc. What else?
I hope that every "dictator", benevolent or not, will read your post. At one of my jobs, we used a pretty large CMS (similar to Plone) which had documentation for the APIs, but no topical guides. I had to ask my colleagues for directions all the time. I was feeling bad for wasting their time with so many questions, not to mention the fact that they did not have too much time for explanations. After a while, I got sick of this and I quit. After two weeks another colleague did the same thing.
Thank you, Jacob, for writing such a insightful, compassionate, and balanced article on software documentation. I look forward to reading your entire series on the topic!
While I don't have anything new to add to the conversation, I would like to point all my fellow Rubyists to a wonderfully refreshing documentation tool, called YARD. Though writing good documentation is never easy, having the right tool will help to make it worth your time. It's waaaayy more useful (and much better documented) than Rdoc.
http://yardoc.org/
Thanks for another great article. I agree with the commeters who think you're too hard on autodoc. For large and fast-evolving APIs I don't see how else you can keep documentation up to date. I wish there was a better or more natural way to present auto-generated documentation on the web, but that's a problem with the available software, not the concept of auto-generated docs from code comments.
I've just written a proposal for a standard documentation format for open source projects:
http://code.alexreisner.com...
I think we agree on a lot of points, but I'd be curious to get your reaction to it.
Leave a comment: