As I move along with my Jekyll/Octopress transition, I’m working to make the move as clean as possible. I’m importing my WordPress database rather than starting fresh, and I’ll be sharing tidbits of discoveries as I go. These posts will only be of interest to people making similar transitions, but they’ll also be serving as notes for myself and Google search results for people up against the same conundrums.

I have heavily extended the Octopress Rakefile, and built an almost entirely-new WordPress import module. The importer now…

  • converts my Download Monitor shortcodes to actual download links, respecting the format parameters and including descriptions, versions and titles.
  • generates an .htaccess file with redirects from my old permalink structure for every post it imports.
  • replaces shortcodes from my gist plugin with Octopress formatting
  • replaces [ caption] and <img> tags, maintaining classes, alt and title attributes and alignment settings
  • replaces YouTube shortcodes with YouTube embed code
  • updates multiple formats of code blocks to standard fenced code with language specifier where it finds one
  • replaces video and audio shortcodes with Octopress format and HTML5 embed, respectively.
  • strips out some extra markup I used to compensate for elements of my WordPress theme
  • gathers slug, redirect alias, tags, categories and a custom “series” plugin data as YAML front matter
  • locates WordPress gallery shortcodes and replaces them with all of the included attachments as an unordered list of thumbnails linked to their full size images

It’s that last item that I’ll share today. The input is any content that includes [ gallery] code (with optional extra parameters). The output is Markdown, with some extra Kramdown syntax. I think it might also work with Maruku, but you may have to adjust depending on your chosen Jekyll Markdown interpreter.

Basically, if it detects [ gallery] codes in the post, it runs a query for all “attachment” posts with the current post as the parent. It passes those to a function that replaces the single [ gallery] code with a full Markdown list of the images, using WordPress’ automatically generated thumbnails as the visible image, and linking them to the full-size upload. If the 150x150 thumbnail doesn’t exist, it uses sips to create it. If you want to alter the thumbnail process, see the sips commands in the thumbnail_image function.

This snippet is added as part of the WordPress module in lib/jekyll/migrators/wordpress.rb. I’ve rebuilt it completely in a new module. Because so much of the code is specific to my own plugins and content, I probably won’t post the entire file, but I’ll pull out the useful bits for incorporation into your own.

To kick it off, here’s the code for the [ gallery] replacer:

galleryimport.rbraw
# create a 150x150 thumbnail of the passed image using `sips`
# img is an absolute path to the base image file
def thumbnail_image(img)
  return_dir = Dir.pwd
  Dir.chdir(File.dirname(img))
  width = %x{sips -g pixelWidth #{img.strip}|tail -n 1}.gsub(/pixelWidth: /,'').strip.to_i
  height = %x{sips -g pixelHeight #{img.strip}|tail -n 1}.gsub(/pixelHeight: /,'').strip.to_i
  thumb_name = img.strip.gsub(/^(.*?)(\..{3,4})$/,"\\1-150x150\\2").strip
  thumb = File.expand_path(thumb_name)
  FileUtils.cp(File.expand_path(img),thumb)
  type = width > height ? '--resampleHeight' : '--resampleWidth'
  %x{sips #{type} 150 #{thumb} && sips -c 150 150 #{thumb}}
  Dir.chdir(return_dir)
  return thumb_name
end

# `content` is a passed string containing post_content
# `attachments` is an array of hashes containing 'title' and 'url' for each attachment on the post
# replace_galleries uses kramdown syntax for attributes and classes, adjust as needed
def replace_galleries(content, attachments)
  images = "\n" # unordered list of thumbnails linked to images as references
  imagerefs = "\n" # block of reference defenitions
  counter = 1
  content.gsub!(/\[gallery.*?\]/) do |gall|
    attachments.each do |att|
      image = att['url'].gsub(/^#{@domain}\/wp-content/,'').sub(/-\d+x\d+\./,'.')
      thumb = image.sub(/(.*?)(\..{3,4})/,'\\1-150x150\\2')
      assets_dir = Dir.pwd+"/source"
      return gall unless File.directory?(assets_dir+File.dirname(thumb))
      unless File.exists?(assets_dir+thumb)
        thumb = thumbnail_image(assets_dir+image)
        puts thumb
      end
      title = att['title']

      images += %Q{* [![#{title}][img#{counter}thumb]{: width="150" height="150"}][img#{counter}]\n}
      imagerefs += %Q{[img#{counter}thumb]: #{thumb}\n}
      imagerefs += %Q{[img#{counter}]: #{image} '#{title}'\n}
      counter += 1
    end
    images + "{:.gallery}\n" +imagerefs + "\n"
  end
  content
end

# `content` is the post_content field for the row of the WordPress database query being looped
# `px` is a variable containing the table prefix in the WordPress database
# `db` is a Sequel.mysql object
if content =~ /\[gallery/
  attachments = []

  gquery =
  "SELECT
    posts.guid AS `attachment`,
    posts.post_title AS `title`
  FROM
    #{px}posts AS `posts`
  WHERE
    posts.post_parent = '#{post[:ID]}' AND
    posts.post_type = 'attachment'"

  db[gquery].each do |a|
    attachments << { 'url' => a[:attachment], 'title' => a[:title] }
  end

  replace_galleries(content, attachments) unless attachments.empty?
end