CouchDB first impressions

Jacob Kaplan-Moss

October 18, 2007

I’m playing with CouchDB tonight. Some first thoughts, as they occur to me:

  • The build process was very easy. I already had Erlang and all the standard automake/autoconf crap installed, so it was just a matter of installing something called icu and going from there. Something like 10 minutes from svn checkout to relaxing.

  • Anything I can poke at with curl is pretty damn cool.

  • Wow, I can just chuck arbitrary JSON objects up at this thing and it’ll store it. No setup, no schemas, no nothing. This is relaxing…

  • I hadn’t expected CouchDB to look this polished so soon in the game. The web interface is truly awesome, and naturally implemented directly against the regular API.

  • Nice, there’s already a couchdb-python (built with httplib2, natch). The latest release installed with easy_install doesn’t seem to work, but SVN trunk does.

  • Good lord, inserting data is slow slow slow.

    My standard stress-test for databasy things is to chuck in the half-million stories from ljworld.com. This is gonna take a long time:

    ...
    finshed #21000 (elapsed=35.921 per-story=0.036)
    finshed #22000 (elapsed=34.100 per-story=0.034)
    finshed #23000 (elapsed=38.658 per-story=0.039)
    ...
    

    Looks like it’ll take something on the order of five hours to import everything. Painful.

  • The good news is that the import speed doesn’t seem to be changing as I write/remove views, run ad-hoc queries, etc., and the import speed remains at about 251 documents/second.

  • I’d probably get better behavior if I made my importer multi-threaded or select-based; there’s a fair amount of iowait even over loopback.

  • I wonder if there’s a bulk insert mode? Can’t seem to find one…

  • If I really only can do 2.5 documents/second, I don’t think I could use this in actual write-heavy production environments. Most of the sites I work on are fairly read-heavy, so CouchDB might be OK.

    I think in practice I’d pair CouchDB up with a message queue so I could post documents asynchronously.

    Or maybe there’s an asynchronous POST option I’m missing?

  • Views written in Python are amazingly slower than those written in JavaScript. Like an order of magnitude slower. Wonder what’s up with that.

  • I wonder what it would take to make CouchDB into a backend for Django models? Seems there’s a much lower impedance mismatch between a document database and an object one — a model instance maps much better to a document than to a tuple.

    Joins, of course, are simply not possible… but in the right situations you wouldn’t need ‘em.

That’s all for now. I may post more if I find more to say.

[1]This originally read “2.5” because I suck at math. Thanks for catching my mistake, folks.

Comments:

Christopher Lenz:

Since the first release of couchdb-python, CouchDB has been a somewhat moving target, so I'm waiting for the 0.7 release of CouchDB itself until pushing out a new version :P

There's a batch saving feature, but it was undocumented. I've just added a basic description to the Wiki: http://www.couchdbwiki.com/...

About the speed of Python views, I have no idea what that might be about. Although I did implement the Python view server, I don't actually use Python views myself, as I'm perfectly happy with using Javascript there.

Christopher Lenz:

Okay, two more things:

About insert performance, how large/complex are the documents you're inserting?

About the Django/ORM idea, check out the couchdb.schema module, which does some experimental but nifty mapping between raw JSON and Python objects.

roberthahn:

question about the math. I would guess the per-story times are in seconds? if so, wouldn't that be 25 docs per second?

Jacob Kaplan-Moss:

@roberthahn: You're right, of course; my math stinks.

@cmlenz: Sweet, thanks for the documentation of the multi-doc API! I'll benchmark and compare the speed later (think I've got enough for a "CouchDB second impressions" article tonight).

Dan Sickles:

CouchDB would be a good fit for Django which emerged from the newspaper publishing business but I don't know that it would work best just backing the existing style models. I think the model could change to be more direct. I've always said that you don't need an ORM if there's no R.

sandro:

I really like the stream of consciousness style of this post, sounds familiar to anyone who's spent a night hacking on something new...nice.

mike:

what's an order of magnitude among friends??

thanks for the write-up, Jacob.

Randy:

@mike what's an order of magnitude among friends??

Nice. Best comment I've read in a long time.

Svovl:

Perhaps a stupid question... I've been toying around with python-couchdb, but havent been able to figure out an easy way to convert the results from the database in the Row datastructure back to JSON, anyone have an idea for that?

Ikke:

I started some work on combining Django and CouchDB, see http://eikke.com/couchdb-wi... and http://eikke.com/filesystem... (and most likely some posts later on too).

Matt Good:

I'm not sure what state CouchDB was in at the time of this post, but there is a bulk-insert/update feature at this time:
http://www.couchdbwiki.com/...

If you're using the Python lib it's db.update([doc1, doc2, ...])

Thomas:

There's another "2.5" that you missed farther on in the article...

Nick Fitzgerald:

FYI, as of Nov. 4th, 2009 it seems that python views are almost twice as fast as JS views now.

http://www.mikealrogers.com...

http://github.com/mikeal/co...

Mikael also implemented WSGI on top of CouchDB's external process handler so you can run django directly on couch!

http://www.mikealrogers.com...

Leave a comment:

Use your real name, or risk deletion.

Optional.

No markup allowed. Linebreaks will be converted; links will be linkified.

Be nice; don't be that guy.