I’m playing with CouchDB tonight. Some first thoughts, as they occur to me:
The build process was very easy. I already had Erlang and all the standard automake/autoconf crap installed, so it was just a matter of installing something called icu and going from there. Something like 10 minutes from svn checkout to relaxing.
Anything I can poke at with curl is pretty damn cool.
Wow, I can just chuck arbitrary JSON objects up at this thing and it’ll store it. No setup, no schemas, no nothing. This is relaxing…
I hadn’t expected CouchDB to look this polished so soon in the game. The web interface is truly awesome, and naturally implemented directly against the regular API.
Nice, there’s already a couchdb-python (built with httplib2, natch). The latest release installed with easy_install doesn’t seem to work, but SVN trunk does.
Good lord, inserting data is slow slow slow.
My standard stress-test for databasy things is to chuck in the half-million stories from ljworld.com. This is gonna take a long time:
... finshed #21000 (elapsed=35.921 per-story=0.036) finshed #22000 (elapsed=34.100 per-story=0.034) finshed #23000 (elapsed=38.658 per-story=0.039) ...
Looks like it’ll take something on the order of five hours to import everything. Painful.
The good news is that the import speed doesn’t seem to be changing as I write/remove views, run ad-hoc queries, etc., and the import speed remains at about 25 [1] documents/second.
I’d probably get better behavior if I made my importer multi-threaded or select-based; there’s a fair amount of iowait even over loopback.
I wonder if there’s a bulk insert mode? Can’t seem to find one…
If I really only can do 2.5 documents/second, I don’t think I could use this in actual write-heavy production environments. Most of the sites I work on are fairly read-heavy, so CouchDB might be OK.
I think in practice I’d pair CouchDB up with a message queue so I could post documents asynchronously.
Or maybe there’s an asynchronous POST option I’m missing?
Views written in Python are amazingly slower than those written in JavaScript. Like an order of magnitude slower. Wonder what’s up with that.
I wonder what it would take to make CouchDB into a backend for Django models? Seems there’s a much lower impedance mismatch between a document database and an object one — a model instance maps much better to a document than to a tuple.
Joins, of course, are simply not possible… but in the right situations you wouldn’t need ‘em.
That’s all for now. I may post more if I find more to say.
| [1] | This originally read “2.5” because I suck at math. Thanks for catching my mistake, folks. |
Christopher Lenz
Oct. 19th, 2007
2:51 a.m.
Since the first release of couchdb-python, CouchDB has been a somewhat moving target, so I'm waiting for the 0.7 release of CouchDB itself until pushing out a new version :P
There's a batch saving feature, but it was undocumented. I've just added a basic description to the Wiki: http://www.couchdbwiki.com/index.php?title=HTTP_Doc_API#Modify_Multiple_Documents_with_a_Single_Requ...
About the speed of Python views, I have no idea what that might be about. Although I did implement the Python view server, I don't actually use Python views myself, as I'm perfectly happy with using Javascript there.
Christopher Lenz
Oct. 19th, 2007
3:52 a.m.
Okay, two more things:
About insert performance, how large/complex are the documents you're inserting?
About the Django/ORM idea, check out the couchdb.schema module, which does some experimental but nifty mapping between raw JSON and Python objects.
roberthahn
Oct. 19th, 2007
8:39 a.m.
question about the math. I would guess the per-story times are in seconds? if so, wouldn't that be 25 docs per second?
Jacob
Oct. 19th, 2007
9 a.m.
@roberthahn: You're right, of course; my math stinks.
@cmlenz: Sweet, thanks for the documentation of the multi-doc API! I'll benchmark and compare the speed later (think I've got enough for a "CouchDB second impressions" article tonight).
Dan Sickles
Oct. 19th, 2007
6:22 p.m.
CouchDB would be a good fit for Django which emerged from the newspaper publishing business but I don't know that it would work best just backing the existing style models. I think the model could change to be more direct. I've always said that you don't need an ORM if there's no R.
sandro
Oct. 21st, 2007
2:14 p.m.
I really like the stream of consciousness style of this post, sounds familiar to anyone who's spent a night hacking on something new...nice.
mike
Oct. 22nd, 2007
1:07 p.m.
what's an order of magnitude among friends??
thanks for the write-up, Jacob.
Randy
Oct. 22nd, 2007
5:09 p.m.
@mike what's an order of magnitude among friends??
Nice. Best comment I've read in a long time.
Svovl
Dec. 19th, 2007
9:02 a.m.
Perhaps a stupid question... I've been toying around with python-couchdb, but havent been able to figure out an easy way to convert the results from the database in the Row datastructure back to JSON, anyone have an idea for that?
Ikke
Dec. 29th, 2007
7:52 p.m.
I started some work on combining Django and CouchDB, see http://eikke.com/couchdb-with-python/ and http://eikke.com/filesystem-issues-and-django-couchdb-work/ (and most likely some posts later on too).
Matt Good
Jan. 16th, 2008
11:25 p.m.
I'm not sure what state CouchDB was in at the time of this post, but there is a bulk-insert/update feature at this time:
http://www.couchdbwiki.com/index.php?title=HTTP_Doc_API#Modify_Multiple_Documents_with_a_Single_Requ...
If you're using the Python lib it's db.update([doc1, doc2, ...])
Thomas
April 10th, 2008
7:34 a.m.
There's another "2.5" that you missed farther on in the article...