Buildbot, the venerable Python continuous integration server, has the reputation of being complex and difficult to set up.
After spending a couple of weeks deep in Buildbot land, I’ve come to the conclusion that this reputation, while true, is only partially deserved. That is, Buildbot is complex, but only if you’re trying to view it as an out-of-the-box CI solution. Buildbot suddenly starts to make much more sense if you view it as a framework for creating your own CI solution, not a CI server in its own right.
You won’t find this revelation anywhere in the Buildbot docs, nor in any of the books or online material that cover the tool. There are some good tutorials out there showing how to set up a simple Buildbot instance — Jeff Younker’s Foundations of Agile Python Development has the best one I’ve run across — but none of these examples make much sense when setting up a complex buildfarm with complicated requirements.
So I’m here to fill that gap. In this series of posts — I think I’m looking at five parts — I’ll explain this “Buildbot is a CI framework” view, delve into Buildbot’s architecture, and then walk through the complicated-but-worth-the-effort CI sever I’ve built for Django.
By way of disclaimer I should mention I’m anything but a Buildbot expert. I’m almost certainly Doing Things Entirely Wrong. I may or may not be using public APIs as I’ve simply trolled through Buildbot’s source until I found something that did what I wanted. However, what I’ve got here on the other side makes me pretty damn happy, and I want to show it off.
Here, then, is
Part 1: Background
We’ve been looking for a CI solution for Django for quite some time. Over the years we’ve tried a bunch of different tools: Buildbot, CruiseControl, Hudson, and even some home-grown solutions.
Nothing’s worked out. That is, nothing’s been able to provide the “continuous” part: builds only continue working as long as there’s someone dedicated around to babysit the system. This sucks: it’s meant that at times Django’s been broken on supported platforms simply because nobody’s been bothering to run the tests.
A few weeks ago a few of us started banging on this problem again, determined to get it right this time. Eric set up a new Hudson instance (modeled after the one he’d been using at work), and I dove headlong into Buildbot again. I’m not really going to talk much about Hudson here, but I’ll note that it’s actually been really instructive working on two different systems in parallel. It’s forced us to really think through and formalize our CI needs.
This led me to my first big CI revelation: CI is hard. There’s any number of “simple” CI tools out there… and they appear to work for exactly two projects (the project the tool was built to test, and the CI server itself, natch). The general purpose tools — Hudson, Buildbot, CruiseControl, etc. — are big, complicated, and heavily opinionated. This is a clear sign that we’re in a space where even the basic tenets of the problem can’t be agreed upon by all parties. CI is one of those problems that’s hard because there really isn’t a good core set of needs to be abstracted. Nearly every project has very different CI needs.
[This is part of what makes Buildbot so complicated: I think it’s actually trying pretty hard to be completely agnostic and allow any kind of continuous integration system you could think up. If Buildbot was more opinionated it could drop some of the layers of abstraction, but because it’s trying so hard to be everything to everyone it ends up being crazy complex. I’ve not decided if this is admirable or crazy. Both, perhaps.]
So what are Django’s needs? What make CI hard for us?
Django’s big. The test suite is around 40,000 lines of code in something like 3,000 individual tests. We work constantly to speed up the test suite, but best case it still takes about 5 minutes to run.
This means that our CI absolutely needs to be distributed — a single test server won’t cut it.
Our test suite isn’t just unit tests; in fact, it’s mostly integration tests. We run most tests against real databases and attempt to simulate as much of the HTTP request/response cycle as we can.
This means that our build system needs to be heterogeneous: since we test against real databases, we need to have lots of different ones to test against. We can’t just run a farm of Linux buildslaves running Python 2.6 and SQLite. Since slaves are heterogeneous, the build system needs to be highly targeted. We can’t treat each build slave identically, but we’ll need to target certain types of tests to the slaves that support ‘em.
We’re ambitious in what we support: Django supports four versions of Python (2.4, 2.5, 2.6, and 2.7), three Python implementations (CPython, Jython, and PyPy), four database engines (PostgreSQL, MySQL, SQLite, and Oracle), multiple versions of each database (for example, we support six versions of PostgreSQL: 8.0, 8.1, 8.2, 8.3, 8.4, and 9.0), and a bunch of OSes (Mac OS X, Windows, and most Linux and BSD flavors).
We need the capability to run all sorts of crazy combinations. In an ideal world, we’d actually be able to test against every single unique python/db/os combination.
This means that our build system needs to be capable of getting really big, potentially spanning dozens or even hundreds of machines. We’re clearly talking cloud computing here: there’s no way a bunch of volunteers can afford the money and time to keep a rack of dozens of heterogeneous hardware all running smoothly.
As I mentioned, we’re all volunteers. Nobody gets paid to babysit the CI server, which means it needs to be highly autonomous. Builds need to happen without any intervention. Most critically, build servers can’t disappear, go “stale” or break because /tmp gets full.
After a bunch of playing with these requirements, I sketched out a dream system that looked something like this:
- We’ve got a bunch of (dormant) VM images for a cloud computing service or platform.
- Each image “knows” which kinds of configs it can build. For example, one image might have Python 2.4 and SQLite, while another might have Python 2.7 and PostgreSQL 9.0.
- When new requests are made the build master spins up some VMs, hands them build jobs (based on the types of builds the VM can support).
- When no more builds are in the queue for a particular VM, the build master shuts down the image and saves us money.
Every out-of-the-box CI system I examined failed to give me that workflow. Most failed on the “heterogeneous” requirement. That includes Buildbot. I knew, however, that a few projects — PyPy, Chrome, and Python itself — were using Buildbot to some success against similar issues, and I knew that Buildbot had recently gained the ability to deal with cloud computing. Finally, since Buildbot’s written in Python I was fairly confidant in my own ability to hack it to pieces if necessary.
Well, a couple of weeks later, I’m there: I have a Buildbot-based system that’s doing exactly what I described above. I’m still not 100% sure this is the solution, but it’s a solution, and it’s working.
The rest of this series will dive into the code. Next time, we’ll look at an overview of Buildbot’s architecture and configuration, and I’ll explain my Buildbot-is-a-framework revelation in more detail.
Comments:
s/tenants/tenets/ and delete me... This isn't be "that guy", is it?
Hey Jacob, can you use pypy.org as an entry point to pypy?
Cheers,
fijal
Simeon, fijal: fixed; thanks for the edits.
s/tenats/tenets/
Yours, That Guy
Javascript has a distributed continuous integration tool available as a service at http://testswarm.com (currently down). John Resig of jQuery fame worked on it and has a blog post explaining it at http://ejohn.org/blog/test-...
Probably, this idea could be used as an inspiration???
Thejaswi: so pony-build (https://github.com/ctb/pony...) is basically that same idea. We (Eric and I) have given it some work -- Eric build http://devmason.com/ around it -- and while I think it's a good idea there's still a long way to go. For one, triggering builds upon commit is still hard in a decentralized, distributed system.
I do think, though, that something like testswarm or pony-build is the best solution for the future. But I wanted something that works *now*, and since I've got Buildbot cowed into submission I thought I'd document it.
<<Each image “knows” which kinds of configs it can build. For example, one image might have Python 2.4 and SQLite, while another might have Python 2.7 and PostgreSQL 9.0.>>
This sort of absolute separation isn't necessary; you can cut down on the number of images with separate installation roots and database servers.
Python versions: have a bunch of prefixes in /home/buildbot/pythons. For example, CPython 2.6 would be installed to /home/buildbot/pythons/cpython-2.6. Then you just use the buildbot environment configuration to set $PATH and $PYTHONPATH appropriately for whichever Python version you want to test against.
A similar improvement can be made for databases. For databases with remote connection support, you really only need one image per version. Eg, postgres80.local, postgres81.local, mysql52.local, etc. Again, set this per builder in the BuildBot configuration.
Once the number of images is reduced, you can probably run most of them off a single server (or even a reasonably powerful desktop). Linux's KVM system can virtualise Linux, *BSD, and Windows -- take the money you would have thrown away in the Cloud and buy a secondhand Mac off Craigslist to round out the OS selection.
John: that's certainly true, and I'm doing something similar with my config. A couple of things, though, make it less simple than you'd think:
* Time: I didn't mention this ('cause I assumed it was obvious), but I'd like builds to finish in a reasonable amount of time. So while I *could* actually run all the builds for a single OS on a single machine it'd end up taking a *really* long time. MySQL builds can take over an hour, so a full test run for every revision would take over 24 hours on a single machine. Given that we check in a couple to a dozen revisions a day, well, you can see why more machines in parallel is a good idea.
* Even if we had fewer images we'd still want that metadata about the builds to be available. I want to be able to glance at a dashboard and see that builds are failing for MySQL 6 on Windows. So the configuration matrix metadata's important regardless of how it's used by the slaves themselves.
* As I alluded to above, real machines like you're proposing need read admins to keep 'em in shape. We're a community of volunteers, and we can't count on always having someone around to babysit a server if it breaks. Cloud resources help a lot here: the build server is restored to a "known good" state regularly, and nobody ever has to worry about replacing a known power supply.
* Finally, since cloud machines only have to be online *when builds are running* I'm hardly throwing away money when I compare the costs of running a server 24/7. I'm estimating that this system will cost roughly 5 cents per commit. Django's had about 15k commits, so if we'd been running this system since day one we'd be out around $750 over the last five years. That's $150/yr or $12.50/mo. That's a mighty small price to pay, I think.
I'm interested to see where this goes, Jacob. For a project that _uses_ django (and so can count on a single production environment) I've found Hudson to be perfectly adequate, despite not being Python. I did a talk about my Django automated test setup (including CI, but also unit, integration and system tests) at BrightonPy recently, which you can see here http://18dex.com/blog/2010/...
Nice post. I've actually spent about a year working on an internal testing framework that included getting pretty familiar with BuildBot in order to get things rolling (familiar meaning I committed features and bug fixes to buildbot).
I can say it was pretty hard to understand what is going on, and that after that year I came to the conclusion I could have replaced most of my work with out-of-the-box hudson and a few plugins that already exist. I simply see no reason to spend so much time chasing around trying to make buildbot play nice. Indeed, if you have a very complex system (I had a very complex one, and even it didn't really need buildbot) you might try buildbot because it can do everything (meaning, you can code your way to do everything, and buildbot will still 'be there'). But generally, I see no reason for all this effort
In my experience CI definitely need some work to keep running. I have mainly used Hudson. More because of the momentum than any particular reason. I have also started this little CI tool with Django and virtualenv:
https://github.com/batiste/...
It's quite simple piece of code quite similar to pony-build.
why is the rss feed of this blog producing a br0ken title in http://www.djangoproject.co... ???
Nice work!
As for Buildbot being "a framework for creating your own CI solution" -- YES. This is the direction I'm trying to push Buildbot, but it's not easy.
I'm jealous of Django and its careful design and elegant syntax. Buildbot has neither of those things, and also has not historically done a good job of documenting its APIs, because it has historically been presented as a CI tool whose configuration language happens to be Turing-complete. As a result, users have read the source code and made whatever assumptions seemed reasonable at the time, which doesn't leave us much room to change things without breaking existing implementations. I'm willing to do a bit of that, but it's naturally slow work.
I'd like to add this series to the Buildbot 'press' page, and incorporate it into the documentation itself. Do you think we can make that work?
Hello! Very interesting read. I was just looking over your Buildbot instance and noticed that you have Gracefully Shutdown and other admin-ey features available to the world. You may want to consider passing allowForce=False to your WebStatus module, otherwise anyone can force builds, shut down slaves, or even the master!
Leave a comment: