Jacob Kaplan-Moss

CouchDB first impressions

I wrote this post in 2007, more than 14 years ago. It may be very out of date, partially or totally incorrect. I may even no longer agree with this, or might approach things differently if I wrote this post today. I rarely edit posts after writing them, but if I have there'll be a note at the bottom about what I changed and why. If something in this post is actively harmful or dangerous please get in touch and I'll fix it.

I’m playing with CouchDB tonight. Some first thoughts, as they occur to me:

  • The build process was very easy. I already had Erlang and all the standard automake/autoconf crap installed, so it was just a matter of installing something called icu and going from there. Something like 10 minutes from svn checkout to relaxing.

  • Anything I can poke at with curl is pretty damn cool.

  • Wow, I can just chuck arbitrary JSON objects up at this thing and it’ll store it. No setup, no schemas, no nothing. This is relaxing…

  • I hadn’t expected CouchDB to look this polished so soon in the game. The web interface is truly awesome, and naturally implemented directly against the regular API.

  • Nice, there’s already a couchdb-python (built with httplib2, natch). The latest release installed with easy_install doesn’t seem to work, but SVN trunk does.

  • Good lord, inserting data is slow slow slow.

    My standard stress-test for databasy things is to chuck in the half-million stories from ljworld.com. This is gonna take a long time:

    ...
    finshed #21000 (elapsed=35.921 per-story=0.036)
    finshed #22000 (elapsed=34.100 per-story=0.034)
    finshed #23000 (elapsed=38.658 per-story=0.039)
    ...
    

    Looks like it’ll take something on the order of five hours to import everything. Painful.

  • The good news is that the import speed doesn’t seem to be changing as I write/remove views, run ad-hoc queries, etc., and the import speed remains at about 25 [1] documents/second.

  • I’d probably get better behavior if I made my importer multi-threaded or select-based; there’s a fair amount of iowait even over loopback.

  • I wonder if there’s a bulk insert mode? Can’t seem to find one…

  • If I really only can do 2.5 documents/second, I don’t think I could use this in actual write-heavy production environments. Most of the sites I work on are fairly read-heavy, so CouchDB might be OK.

    I think in practice I’d pair CouchDB up with a message queue so I could post documents asynchronously.

    Or maybe there’s an asynchronous POST option I’m missing?

  • Views written in Python are amazingly slower than those written in JavaScript. Like an order of magnitude slower. Wonder what’s up with that.

  • I wonder what it would take to make CouchDB into a backend for Django models? Seems there’s a much lower impedance mismatch between a document database and an object one – a model instance maps much better to a document than to a tuple.

    Joins, of course, are simply not possible… but in the right situations you wouldn’t need ’em.

That’s all for now. I may post more if I find more to say.

[1]This originally read “2.5” because I suck at math. Thanks for catching my mistake, folks.