Configuration and architecture

I wrote this post in 2010, more than 14 years ago. It may be very out of date, partially or totally incorrect. I may even no longer agree with this, or might approach things differently if I wrote this post today. I rarely edit posts after writing them, but if I have there'll be a note at the bottom about what I changed and why. If something in this post is actively harmful or dangerous please get in touch and I'll fix it.

This is the second part in my series about building a build farm for Django with Buildbot. Part 1 covered some background, including the specific problems facing a CI system for Django’s core development.

Starting in this part I’ll be looking at the actual code I wrote to solve these problems. It’s running now at buildbot.djangoproject.com, and the code’s on GitHub. Please feel free to take a look (and fork/reuse if you want) but keep in mind it’s very much a work-in-progress. You’ll probably notice me updating this stuff a bunch over the next few days/weeks.

If you’re not already a bit familiar with Buildbot you might have a bit of trouble following what’s coming next, particularly because I’m going to deviate quite a bit from the documented way of doing things. If you’d like to learn the “party line” – and I suggest that you do, since I’m quite possibly doing something a bit insane here – then I’d suggest either the official documentation (which is quite good, if a bit rough), or the relevant chapters from Jeff Younker’s Foundations of Agile Python Development

The first think you’ll notice if you look at my code is that doesn’t look anything at all like the example Buildbot config you get. If you start a new master following the documentation you’ll get this sample config file – 200 lines of (mostly-commented-out) Python code.

At a first quick glance you might think – as I did – that this settings file is a bit like Django’s settings file. That is, a file that happens to be in Python for ease of use, but that really only contains a series of simple constants and basic data types (strings, lists, dicts, etc.). You can use the config file in this way, and it’ll work fairly well for simple build needs. However, as the builds get more complex, things will rapidly get out of hand. I kept ending up with something that was confusing, hard to follow, and brittle as hell, which repeated, hardcoded configuration scattered all over the place. my first stab at a Django build farm ran to almost 1,000 impossible-to-follow lines. Ugh.

Then, as I was casting around for examples of bigger config options, I saw Buildbot’s own buildbot config. And, bing, the lightbulb came on: this is just Python, so why don’t I write this config like it was a program, not a config file!?

Once I started treating my config like a program everything became much easier. Now, my master.cfg isn’t anything like my settings.py files. It’s more like one of my mod_wsgi .wsgi files: a bit of bootstrapping code that calls into my own app. The rest of my Buildbot config follows this idea: I’m treating Buildbot as a CI framework, not a a CI server that I’ve configured. Instead of just tweaking and tuning things, I’m subclassing liberally, overriding the parts that I don’t want and adding extra bits that I do.

And it’s working brilliantly.

So let’s dive in and look at the master.cfg. First I’ll show the annotated source from my master.cfg, and then I’ll give an overview of the build process to explain how all the bits in the config fit together.

`master.cfg`

master.cfg is the config file that Buildbot loads when starting up. As I mentioned above, I’m treating master.cfg as essentially an entry point to the CI server. It’s going to set up some environment stuff, load a bit of config, and then pass over control to the main “app” in the djangobotcfg module.

First, I import some stuff:

import json
import djangobotcfg
from buildbot.manhole import AuthorizedKeysManhole
from unipath import FSPath as Path

Not very exiting, but a couple interesting things from the very beginning:

I’m using some third-party packages – unipath, here. I’m going to run Buildbot in a venv like a good boy, so I’m going to take full advantage of being able to install stuff. I’ve even got a pip requirements file
Check out that import djangobotcfg. Yup, I’m putting 99% of the config into a separate module. It’s even got a couple of tests. Just like a real program.

Next, I make sure that things that might change are easy enough to tweak:

SVN = 'http://code.djangoproject.com/svn/django'
BRANCHES = {'trunk': SVN + '/trunk',
            '1.2.X': SVN + '/branches/releases/1.2.X'}

# Load some secrets to pass onto the various bits that need it.
SECRETS = json.load(open(Path('~/master/secrets.json').expand()))

I guess the SVN repo probably change [1], but as we push releases we’ll need to change which branches we test against. We currently build against trunk and the latest bugfix, and I should probably think about adding the security-fix release (1.1.X) to this list, too. I’ll show a bit later on how I make sure that builds only run when code actually gets committed to a particular branch.

Oh, and I also stuffed all the various “secrets” – passwords, mostly – into a secrets.json on the server. That way I don’t have to share all my secrets with the world [2].

In the long run I should probably move more constants up here (or into a JSON file) for easy editing. You’ll see later that there’s more hardcoded stuff – slave names, config info, etc. – that might be best moved into a central location that’s easy to edit. It’s on the ol’ TODO list.

Moving on, check this awesomeness out:

# Bootstrap Django so that authentication against Django's database works.
# Since buildbot sometimes reloads this module we've got to be careful.

from django.conf import settings
if not settings.configured:
    settings.configure(
        INSTALLED_APPS = ['django.contrib.auth'],
        DATABASES = {
            'default': {
                'ENGINE': 'django.db.backends.postgresql_psycopg2',
                'NAME': 'djangoproject',
                'USER': 'djangoproject'
            }
        }
    )

Hey, you got Buildbot in my Django! You got Django in my Buildbot!.

That’s right: I’m going to be using Django in my Buildbot configuration file. If this was just a config file like settings.py I’d think this was sick, but once again: I’m treating this config as a program, and I’m going to take full advantage of every tool I’ve got in my Python toolbox. In particular, I’m going to make Buildbot authenticate against Django’s auth database [3].

Next, it’s time to generate the main parts of the config:

slaves = djangobotcfg.slaves.get_slaves(SECRETS)
status = djangobotcfg.status.get_status(SECRETS)
builders = djangobotcfg.builders.get_builders(BRANCHES, slaves)
schedulers = djangobotcfg.schedulers.get_schedulers(BRANCHES, builders)
changesource = djangobotcfg.changesource.get_change_source(SVN, BRANCHES)

This is calling into my djangobotcfg module, using functions to generate the various parts of config that Buildbot’s going to want. Instead of creating config at module level I’m calling functions to generate that config. This means I can pass in arguments, and it even means I can unit test my config [4].

Finally, we come to the important part. The master.cfg “contract” is that it has to provide a BuildmasterConfig dict. Buildbot imports master.cfg, looks for this BuildmasterConfig dict, and draws all the config from it. So this dict is basically the “table of contents” for the buildbot instance.

In most Buildbot examples I could find – including the out-of-the-box sample config – this dict is built up over the course of many dozens of lines of code. In mine, it’s pretty simple since all the complex parts are buried away in from the djangobotcfg module:

BuildmasterConfig = {
    'slaves': slaves,
    'change_source': changesource,
    'schedulers': schedulers,
    'builders': builders,
    'status': status,
    'slavePortnum': 9989,
    'projectName': 'Django',
    'projectURL': 'http://code.djangoproject.com/',
    'buildbotURL': 'http://buildbot.djangoproject.com/',
    'db_url': 'sqlite:///state.sqlite',
    'manhole': AuthorizedKeysManhole(9990, Path('~/.ssh/authorized_keys').expand()),
}

Let’s go over this dict key by key and talk each part. We’ll look at each piece in more detail in later parts of this series, but for now I’d like to focus on what each key represents and how all the parts fit together.

The first ones are the really important parts:

"slaves" Slaves are the nodes that actually run builds. This key is a list of slave objects; each corresponds to a specific instance of the Buildbot slave process, usually running on a remote machine.
In my config these are actually “latent” slaves – nodes that are booted on-demand and shut down as needed – but in many (most?) configs they’re physical machines that are up all the time. I’ll have a whole part about latent slaves towards the end of this series.
"change_source" The change source is where a build typically starts. Think of the change source as a “collector” for the changes you make to your source code.
There’s a general-purpose remote trigger for a proprietary Buildbot API, a REST-based remote API, tools for picking up new commits from Github and Bitbucket’s post-commit webhooks, and more.
I’m using a tool that watches Django’s SVN repository and picks up new commits. It’s fairly simple – just polls SVN every five minutes – but it’s good enough for my purposes.
"schedulers" Schedulers decide when to filter a build based on some arbitrary criteria, including information passed in from a change source.
There are tools to trigger builds on a timed basis, which are useful for typical compiled projects where you’d like to provide a daily or nightly build. In the future we might add a process that builds a nightly Django tarball for installation, for example.
Right now, I’m using schedulers which watch the changes from the change source and trigger new builds based on which branch those commits touched. Thus, if someone commits to the 1.2.X branch we only trigger the tests for that branch, not all branches.
"builders" Builders are the workhorses of the whole system, and we’ll spend a lot of time looking at them in the next installment. Right now I’ll just say that a builder provides a set of instructions to a slave about how to build a particular “thing”.
These builds, in our case, are the steps required to run the unit tests on a particular platform. As I went over in part 1, we’ve got a whole bunch of configuration combinations we want to test, so each builder represents one specific combination. For example, a “1.2.X / Python 2.6 / PostgreSQL 8.4 / Linux” builder or a “Trunk / Python 2.7 / MySQL 6.0 / Mac OS X” builder. Each builder needs to know which slaves can successfully run that build, which is why djangobotcfg.builders.get_builders receives the list of slaves as an argument.
"status" Buildbot separates the actual process of running the tests from the mechanism used to deliver the results. The list of status targets in the``“status”`` key are the bits that report on builds. In my case, I’ve got two: the web interface and an IRC bot that reports when builders transition from success to failure or vice versa.

The next group of keys are just simple configuration:

"slavePortnum" The port that Buildbot listens for slaves on. You’ll need to know this when setting up slaves, but it’s completely arbitrary.
"projectName", "projectURL", and "buildbotURL" This are used mostly in the web interface and in other status reporting pieces to provide the name of the project, a link to it, and the main URL of the Buildbot instance.
"db_url" Recent versions of Buildbot have started moving from storing state in on-disk pickles to a bonafide database. I’m just using SQLite here, but if the database started being a bottleneck (unlikely) I could always switch.
"manhole" This is actually pretty neat: the “manhole” is actually an interactive Python interpreter embedded in the Buildbot Connect over SSH to the manhole port and you’ll see the familiar >>> prompt! It’s not all that useful for a stable server, but I found it really useful for poking at bits I din’t quite understand. I’m piggy-backing on my SSH keys for access control, which is also sweet.

Build process overview

In practice, here’s how all the above fits together:

Somebody commits to Django.
The SVN poller – the "change_source" entry – wakes up a few moments later, checks SVN, and notices there’s a new commit. It parses some info out of the commit, and passes the branch name and list of changed files onto the schedulers.
Each scheduler in the "schedulers" list gets a crack at at the changeset. In my config there’s one scheduler for each branch, and that scheduler triggers a build if the branch in the changeset matches the scheduler. However, schedulers can really use any technique for deciding whether a change is “interesting”. For example, in the future I’ll probably have a documentation scheduler that triggers a rebuild of the docs whenever a commit is made that involves documentation
The scheduler gets to trigger any number of "builders". Again, in my config it’s a simple “which branch?” question, but this could be as arbitrary as needed.
Each builder knows which "slaves" it can build on, picks one of those slaves to handle the build. Builders can queue up builds that can’t be handled right now, and I’m using this to make sure that each slave only runs one build at a time (because I’m running on very small instances with only 256 MB of RAM and if they start swapping the tests take for-fucking-ever.)
Once a builder has a free slave it generates a set of build steps for that slave. These steps can do pretty much anything as we’ll see in the next part of the series.
The slave executes the build steps, reporting success, failure, and progress back to the master.
The "status" entries get notified as stuff happens, triggering updates to the web page, notifications over IRC or email, etc.

What’s next?

In the next part I’ll dive into my particular implementation of these various bits – the code for my change sources, schedulers, slaves, builders, status, etc.


[1]	Until we switch over Mergitizarcskeeper, of course.


[2]	Don’t worry, it’s an a repository, just not a public one.


[3]	We’re already authorizing SVN against this same database, so I can easily piggy-back off that information and allow any of Django’s committers full access to Buildbot’s web interface. Neat, eh?


[4]	I’m not doing this as much as I’d like, sadly.. I have a couple of tests, but doing more is waiting on figuring out how to mock more parts of Buildbot’s APIs.