Jacob Kaplan-Moss

The sorry state of database journalism

I wrote this post in 2007, more than 16 years ago. It may be very out of date, partially or totally incorrect. I may even no longer agree with this, or might approach things differently if I wrote this post today. I rarely edit posts after writing them, but if I have there'll be a note at the bottom about what I changed and why. If something in this post is actively harmful or dangerous please get in touch and I'll fix it.

I’ve been following with interest as Derek Willis explores Caspio, a sort of hosted data-driven web app tool for journalists. The following started out as a comment on his blog, but soon ballooned, so I’m posting it over here where it’ll have more space to breathe.

Of course, for this to make any sense, you should read Derek’s articles first:

  1. Outsourcing Database Development, or the Caspio Issue (be sure to also read Caspio CEO David Milliron’s comments, too).
  2. On Trials, Software and Otherwise

Like Derek, after reading David Milliron’s reply, I went to go poke at Caspio, and had a very similar experience to Derek, who writes:

I went to caspio.com to see about a free 14-day trial in order to test things out. Then I read the Terms of Service, which contains this sentence: “In addition, you may not access the Service for purposes of monitoring its availability, performance or functionality, or for any other benchmarking or competitive purposes.”

Like Derek, that stopped me cold.

I missed the part about using the demo for “competitive purposes,” too. I’m pretty sure that my role as the lead developer of both commercial and non-commercial competition would get snagged under that bit, as well.

As far as I’m concerned, that right there is the mark of bad software. Good software makes its developers proud: they want to show it off to everyone and shout its praises from the mountaintops. Bad software makes its developers embarrassed and scared: they know any competent programmer could improve on their functionality in no time. Those TOSes that forbid competitors and benchmarking seem like coded messages from the developers saying “our software sucks!”

But Derek’s right on about the appeal of tools like Caspio – I’ve done my share of fighting with IT, and I can actually speak their language. I shudder to picture an editor trying to get a site from IT done on deadline. In fact, if I’ve got my Journal-World history correct, the online news division actually hired its own programmers to route around corporate IT.

Still, in my mind it all comes down to shortsighted management. You don’t need a lot of resources to roll your own awesome online coverage, and those minimal resources are a necessary investment in the future of the news industry.

In the past couple of years the web development landscape has changed dramatically. The Third Age – the age of the web framework – is upon us, and that means that developers can do a lot more with limited resources.

A single example I’m currently crushing on is PolitiFact: it was developed at the St. Petersburg Times – not a large organization at all – by a single programmer, Matt Waite.

I think even the smallest newspaper should be able to afford a single on-staff programmer. Let’s compare costs, using PolitiFact as a rough starting point.

I don’t know what Matt makes, but an average programmer’s yearly salary, according to the US Bureau of Labor Statistics is $61,000 (the BLS data is only availble through POSTed forms, which means I can’t link you directly to that data; sorry.) In practice I know that newspapers pay far below average, but let’s use $61k as a strawman anyway.

So what would PolitiFact cost if it was built with Caspio (assuming that was even possible)? Well, it’s hard to figure out given Caspio’s pricing – Caspio has a number of different price points depending on how long archives are maintained, level of service, etc. However, given that only the highest level plan offers 24/7 support and an SLA (both of which you get for “free” by calling your full-time developer), I think we’re talking about the top plan, which runs $12 per “DataPage” per month.

Now, I’m not sure what a “DataPage” is, but it looks like it’s basically any database-driven “page” presented as part of your app. Google tells me that there are 205 pages under politifact.com; thats $2.460 per month, or almost $30k per year – half the average programmer’s salary.

I don’t know how long it took Matt to build PolitiFact, but I doubt it was a full six months. Even if it was, though, a on-staff programmer costs a fixed amount reguardless of how many sites you launch. LJWorld.com has dozens of data-driven projects, most written in a couple of days at most, that are mostly self-maintaining – reporters and producers can feed them with updated data at negligable cost.

if you’re using something like Caspio, however you’d need to keep paying that $30,000 every year you want to keep the site up.nce you’re up to four sites of that size you’re looking at a $120,000 yearly ransom just to keep your data from 404ing.

Of course, none of this changes the sad fact that many news organizations are so shortsighted that they think hiring a programmer is an extravagant expense. At least tools like Caspio let these organizations start learning the techniques they’ll need to stay in business, but in an ideal online media landscape there’d be no need for silly tools like Caspio.