More for the purposes of associating posts and building my custom search engine, but also for SEO, I’ve been adding semantic keywords to my Jekyll posts. The result is similar to my old AutoTag bundle for the TextMate blogging bundle. It creates a keyword block for my post in addition to my curated tags which contains top-level topics and can be used in Open Graph keywords, keyword meta and for search and related post association during site generation.

I keep my keyword YAML separate from “tags” because I use it in different ways under different circumstances. In your templates you can easily choose to combine them or use them separately, so there’s no harm in having the extra header.

This post details the process of adding keywords to new posts. I also used the same technique to back-catalog all of my previous posts.

I use a service called Zemanta to analyze my content and determine the appropriate tags. It’s very good, but sometimes still requires a bit of manual editing after I run it. It’s still faster than doing it by hand.

To get started you’ll need an API key. Don’t worry, for your purposes this is entirely free. Create an account at Zemanta, then register an application to get the API key.

Next you just need to install the “zemanta” gem (gem install zemanta). Add it to your Rakefile with (at the top after the hashbang):

require 'rubygems'
require 'zemanta'

Now you can easily pass your post content to Zemanta and get back an easy-to-parse array. I run this as part of my “publish” task, which moves a post from source/_draft into source/_posts and adds this kind of meta to the YAML. The script below illustrates this section. It extracts the YAML headers from the post, adds the keywords and sticks the headers back in.

Insert your Zemanta API key at line 7 where the Zemanta.new object is created.

zemanta.rbraw
require 'yaml'
require 'rubygems'
require 'zemanta' # gem install zemanta

def get_zemanta_terms(content)
  $stderr.puts "Querying Zemanta..."
  zemanta = Zemanta.new "xxxxxxxxxxxxxxxxxxxxxxxx"
  suggests = zemanta.suggest(content)
  res = []
  suggests['keywords'].each {|k|
    res << k['name'].downcase.gsub(/\s*\(.*?\)/,'').strip if k['confidence'] > 0.02
  }
  res
end

desc "Add Zemanta keywords to post YAML"
task :add_keywords, :post do |t, args|
  file = args.post
  if File.exists?(file)
    # Split the post by --- to extract YAML headers
    contents = IO.read(file).split(/^---\s*$/)
    headers = YAML::load("---\n"+contents[1])
    content = contents[2].strip
    # skip adding keywords if it's already been done
    unless headers['keywords'] && headers['keywords'] != []
      begin
        $stderr.puts "getting terms for #{file}"
        # retrieve the suggested keywords
        keywords = get_zemanta_terms(content)
        # insert them in the YAML array
        headers['keywords'] = keywords
        # Dump the headers and contents back to the post
        File.open(file,'w+') {|file| file.puts YAML::dump(headers) + "---\n" + content + "\n"}
      rescue
        $stderr.puts "ERROR: #{file}"
      end
    else
      puts "Skipped: post already has keywords header"
    end
  else
    puts "No such file."
  end
end

To test, you can point Rake at a post and add keywords by running rake add_keywords[path_to_post].

Now you can utilize the “Keywords” payload in whatever way you like. I use them, for example, in my Open Graph headers. In head.html I have a line:

{% if page.keywords %}<meta name="keywords" content="{{ page.keywords | keyword_string }}">{% endif %}

So, if the page has keywords on it, it runs this from my plugins folder:

module Jekyll
  module Filters
    def keyword_string(keywords)
      keywords.join(" ")
    end
  end
end

I also include them in the Open Graph tags for a post, also in head.html:

{% if page.keywords %}{{ page.keywords | og_tags }}{% endif %}

which calls:

module Jekyll
  module Filters
    def og_tags(tags)
      tags.map {|tag|
        %Q{<meta property="article:tag" content="#{tag}">}
      }.join("\n")
    end
  end
end

I’ll be covering my Open Graph system for Jekyll soon.

Lastly, I include them in the JSON file I use for my site search (still in progress).

Hopefully some Jekyll users will find this useful. Note that the tags returned by Zemanta are generally 90% correct with a couple of superfluous tags that won’t hurt but could be removed.