CouchDB first impressions
I’m playing with CouchDB tonight. Some first thoughts, as they occur to me:
The build process was very easy. I already had Erlang and all the standard automake/autoconf crap installed, so it was just a matter of installing something called
icu
and going from there. Something like 10 minutes fromsvn checkout
to relaxing.Anything I can poke at with
curl
is pretty damn cool.Wow, I can just chuck arbitrary JSON objects up at this thing and it’ll store it. No setup, no schemas, no nothing. This is relaxing…
I hadn’t expected CouchDB to look this polished so soon in the game. The web interface is truly awesome, and naturally implemented directly against the regular API.
Nice, there’s already a couchdb-python (built with httplib2, natch). The latest release installed with
easy_install
doesn’t seem to work, but SVN trunk does.Good lord, inserting data is slow slow slow.
My standard stress-test for databasy things is to chuck in the half-million stories from ljworld.com. This is gonna take a long time:
... finshed #21000 (elapsed=35.921 per-story=0.036) finshed #22000 (elapsed=34.100 per-story=0.034) finshed #23000 (elapsed=38.658 per-story=0.039) ...
Looks like it’ll take something on the order of five hours to import everything. Painful.
The good news is that the import speed doesn’t seem to be changing as I write/remove views, run ad-hoc queries, etc., and the import speed remains at about 25 [1] documents/second.
I’d probably get better behavior if I made my importer multi-threaded or select-based; there’s a fair amount of iowait even over loopback.
I wonder if there’s a bulk insert mode? Can’t seem to find one…
If I really only can do 2.5 documents/second, I don’t think I could use this in actual write-heavy production environments. Most of the sites I work on are fairly read-heavy, so CouchDB might be OK.
I think in practice I’d pair CouchDB up with a message queue so I could post documents asynchronously.
Or maybe there’s an asynchronous POST option I’m missing?
Views written in Python are amazingly slower than those written in JavaScript. Like an order of magnitude slower. Wonder what’s up with that.
I wonder what it would take to make CouchDB into a backend for Django models? Seems there’s a much lower impedance mismatch between a document database and an object one – a model instance maps much better to a document than to a tuple.
Joins, of course, are simply not possible… but in the right situations you wouldn’t need ’em.
That’s all for now. I may post more if I find more to say.
[1] | This originally read “2.5” because I suck at math. Thanks for catching my mistake, folks. |