An in-progress custom site search project

I’m writing about a project that’s not finished, might never be and is probably of little interest to most people. This post is basically journaling a milestone, but you can see the project in action right now. I’ll show you in a minute.

I’ve been pondering for a while how to best surface my older content. Most of what I write is “evergreen,” with little relation to anything else happening at the time and relevant for long periods. Blogs aren’t built that way, though, and anyone who’s ever tried digging through archives knows it’s a pain. So we rely on Google (and NerdQuery), which is pretty good, but I really wanted to come up with a way to help people dig into my own content once they got here.

I’ve been through a few permutations of a tag-based site search, but nothing that really seemed to add much value. My latest experiment seems to be my best attempt yet, though it has a way to go yet.

First, go check out the latest incarnation in the Archives section. Type a topic (markdown, applescript, service tools, etc.) into the search box and get instant results on the page. The results are based on a TextMate-style string match of the title, tags and a distillation of keywords in the content using some homebrew semantic analysis.

The “database” is a JSON file I generate with a few custom Jekyll filters and a template when my site builds. The search and sorting is all client-side right now, meaning it happens in your browser instead of on the server. This is not a good idea. I’m reworking the same concept in a CGI, which will allow more advanced analysis and sorting without choking the browser.

In essence, I’m just building my own API that I can pull from locally, add the information I consider relevant and present the results however I please. The current implementation creates a div on the Archives page, but the script could be used on any page. Any time you typed in the search field, the page would just turn into search results, or a popup would appear, filtering as you type.

Searchpath uses a similar implementation. I really should just give up and use it instead, but I’m stubbornly working on this as a pet project. I’m hoping that once I have the mechanics worked out, doing some more advanced fuzzy matching and topic relationships will be an easy step. I’ll also be adding autocomplete, but I need to think about the most useful implementation for that. I’d love an autocomplete that — beyond just completing known words — would offer semantically related suggestions from the site’s content. So much to do.

Since you’re here still, I need a name for this feature that will instantly indicate what it is. “Archives” is definitely out, nobody reads those. “Search” isn’t quite accurate; that’s what NerdQuery does for me from the sidebar. It’s more of a “sort” than anything. A “sifter” or “sieve,” maybe. Any ideas?

Join the conversation