Auto-Tagging Jekyll posts with Zemanta

[Tweet : nvALT]

More for the purposes of associating posts and building my custom search engine, but also for SEO, I’ve been adding semantic keywords to my Jekyll posts. The result is similar to my old AutoTag bundle for the TextMate blogging bundle. It creates a keyword block for my post in addition to my curated tags which contains top-level topics and can be used in Open Graph keywords, keyword meta and for search and related post association during site generation.

I keep my keyword YAML separate from “tags” because I use it in different ways under different circumstances. In your templates you can easily choose to combine them or use them separately, so there’s no harm in having the extra header.

This post details the process of adding keywords to new posts. I also used the same technique to back-catalog all of my previous posts.

I use a service called Zemanta to analyze my content and determine the appropriate tags. It’s very good, but sometimes still requires a bit of manual editing after I run it. It’s still faster than doing it by hand.

To get started you’ll need an API key. Don’t worry, for your purposes this is entirely free. Create an account at Zemanta, then register an application to get the API key.

Next you just need to install the “zemanta” gem (gem install zemanta). Add it to your Rakefile with (at the top after the hashbang):

require 'rubygems'
require 'zemanta'

Now you can easily pass your post content to Zemanta and get back an easy-to-parse array. I run this as part of my “publish” task, which moves a post from source/_draft into source/_posts and adds this kind of meta to the YAML. The script below illustrates this section. It extracts the YAML headers from the post, adds the keywords and sticks the headers back in.

Insert your Zemanta API key at line 7 where the object is created.

To test, you can point Rake at a post and add keywords by running rake add_keywords[path_to_post].

Now you can utilize the “Keywords” payload in whatever way you like. I use them, for example, in my Open Graph headers. In head.html I have a line:

{% if page.keywords %}<meta name="keywords" content="{{ page.keywords | keyword_string }}">{% endif %}

So, if the page has keywords on it, it runs this from my plugins folder:

module Jekyll
  module Filters
    def keyword_string(keywords)
      keywords.join(" ")

I also include them in the Open Graph tags for a post, also in head.html:

{% if page.keywords %}{{ page.keywords | og_tags }}{% endif %}

which calls:

module Jekyll
  module Filters
    def og_tags(tags) {|tag|
        %Q{<meta property="article:tag" content="#{tag}">}

I’ll be covering my Open Graph system for Jekyll soon.

Lastly, I include them in the JSON file I use for my site search (still in progress).

Hopefully some Jekyll users will find this useful. Note that the tags returned by Zemanta are generally 90% correct with a couple of superfluous tags that won’t hurt but could be removed.