I’m so excited about this I can barely contain myself. Right now Ellington ships with a search engine built on top of Swish-E. It’s pretty cool, and I’ve been debating cleaning it up and rolling it into django.contrib. However it has a number of major flaws that limit its usefulness:
- Swish-E’s Python bindings don’t have any way to return results ordered by date (vital when you’re searching for news stories), so we use a patched version of the bindings. This makes installation super annoying.
- Swish-E doesn’t have any sort of incremental indexing, so we have to re-index the entire contents of the database every time. We’ve got nearly half a million stories in the database, so indexing takes over two hours, meaning we really only can do it once a day. Thus, breaking news stories aren’t in the index. Argh.
- The search query syntax is braindead and buggy. By braindead I mean there’s only “A or B” type searches – no phrase searches or cool operators. As for buggy… let’s just say there are certain groups of characters you can search for which will throw the search engine into an infinate loop that consumes nearly 100% CPU. We filter them out at the view level, but sheesh.
So, yeah, I’m super-exited about a modern, pure-Python search engine and indexer.
Brian, if you’re “listening”, I’d be thrilled to help you out with this project in any way I can.