Jacob Kaplan-Moss

Merquery

I wrote this post in 2006, more than 18 years ago. It may be very out of date, partially or totally incorrect. I may even no longer agree with this, or might approach things differently if I wrote this post today. I rarely edit posts after writing them, but if I have there'll be a note at the bottom about what I changed and why. If something in this post is actively harmful or dangerous please get in touch and I'll fix it.

Brian Beck just announced that he’s beginning work on Merquery, a full-text indexer and search engine specifically designed for developers using RAD frameworks like Django.

I’m so excited about this I can barely contain myself. Right now Ellington ships with a search engine built on top of Swish-E. It’s pretty cool, and I’ve been debating cleaning it up and rolling it into django.contrib. However it has a number of major flaws that limit its usefulness:

  • Swish-E’s Python bindings don’t have any way to return results ordered by date (vital when you’re searching for news stories), so we use a patched version of the bindings. This makes installation super annoying.
  • Swish-E doesn’t have any sort of incremental indexing, so we have to re-index the entire contents of the database every time. We’ve got nearly half a million stories in the database, so indexing takes over two hours, meaning we really only can do it once a day. Thus, breaking news stories aren’t in the index. Argh.
  • The search query syntax is braindead and buggy. By braindead I mean there’s only “A or B” type searches – no phrase searches or cool operators. As for buggy… let’s just say there are certain groups of characters you can search for which will throw the search engine into an infinate loop that consumes nearly 100% CPU. We filter them out at the view level, but sheesh.

So, yeah, I’m super-exited about a modern, pure-Python search engine and indexer.

Brian, if you’re “listening”, I’d be thrilled to help you out with this project in any way I can.