Jacob Kaplan-Moss

CouchDB first impressions

I’m playing with CouchDB tonight. Some first thoughts, as they occur to me:

  • The build process was very easy. I already had Erlang and all the standard automake/autoconf crap installed, so it was just a matter of installing something called icu and going from there. Something like 10 minutes from svn checkout to relaxing.

  • Anything I can poke at with curl is pretty damn cool.

  • Wow, I can just chuck arbitrary JSON objects up at this thing and it’ll store it. No setup, no schemas, no nothing. This is relaxing…

  • I hadn’t expected CouchDB to look this polished so soon in the game. The web interface is truly awesome, and naturally implemented directly against the regular API.

  • Nice, there’s already a couchdb-python (built with httplib2, natch). The latest release installed with easy_install doesn’t seem to work, but SVN trunk does.

  • Good lord, inserting data is slow slow slow.

    My standard stress-test for databasy things is to chuck in the half-million stories from ljworld.com. This is gonna take a long time:

    finshed #21000 (elapsed=35.921 per-story=0.036)
    finshed #22000 (elapsed=34.100 per-story=0.034)
    finshed #23000 (elapsed=38.658 per-story=0.039)

    Looks like it’ll take something on the order of five hours to import everything. Painful.

  • The good news is that the import speed doesn’t seem to be changing as I write/remove views, run ad-hoc queries, etc., and the import speed remains at about 25 [1] documents/second.

  • I’d probably get better behavior if I made my importer multi-threaded or select-based; there’s a fair amount of iowait even over loopback.

  • I wonder if there’s a bulk insert mode? Can’t seem to find one…

  • If I really only can do 2.5 documents/second, I don’t think I could use this in actual write-heavy production environments. Most of the sites I work on are fairly read-heavy, so CouchDB might be OK.

    I think in practice I’d pair CouchDB up with a message queue so I could post documents asynchronously.

    Or maybe there’s an asynchronous POST option I’m missing?

  • Views written in Python are amazingly slower than those written in JavaScript. Like an order of magnitude slower. Wonder what’s up with that.

  • I wonder what it would take to make CouchDB into a backend for Django models? Seems there’s a much lower impedance mismatch between a document database and an object one – a model instance maps much better to a document than to a tuple.

    Joins, of course, are simply not possible… but in the right situations you wouldn’t need ‘em.

That’s all for now. I may post more if I find more to say.

[1]This originally read “2.5” because I suck at math. Thanks for catching my mistake, folks.