wp_mangler, scripted post modification for WordPress

Here’s a quick script I was playing around with this morning. I want to do some extensive string handling on all of the posts in my WordPress database. It made the most sense to me to pull each post’s content from the db, do the string mangling with a shell script and then replace the post_content field as I go. This script is destructive and any mistakes you introduce could kill your entire blog. Should you have a need for this script, back up first!

The script itself is really simple: it just pulls all of the post content and ids from the blog database, runs each one through your processing script and then sticks the result back in the post. The handler is empty here, it’s assumed that you’ll edit the function to call your own script to suit your needs.

wp_mangler:

wp_mangler.rbraw
#!/usr/bin/env ruby
# Requires 'sequel' and 'mysql' rubygems

require 'rubygems'
require 'sequel'
require 'cgi'

def e_sql(str)
  str.to_s.gsub(/(?=[\\])/, "\\")
end

def process_post(input)
  # do your thing
  input
end

def process_db(db_name = '', user='', pass='', host = 'localhost', db_prefix = '')

  db = Sequel.mysql(db_name, :user => user, :password => pass, :host => host)

  db["select post_content, ID from wp_#{db_prefix}posts where post_status = 'publish' and post_type = 'post'"].each do |post|
    content = post[:post_content]
    content = e_sql(process_post(content)) # run process_post on post_content, escape result
    update_ds = db["UPDATE wp_#{db_prefix}posts SET post_content = ? WHERE ID = ?",content,post[:ID]]
    update_ds.update # replace the existing content with the new content
  end
end

# process_db (dbname, username, password, host, dbprefix)
process_db('database','user','pass','localhost','tableprefix_')

You’ll need to edit the process_db call in the last line with your mySQL database credentials and host info.

As an example, here’s a function you can put into the above script that will find all your inline links and turn them into reference links at the bottom of the post. Just for fun.

wp_mangler_inline-to-ref.rbraw
def e_sh(str)
  str.to_s.gsub(/(?=[^a-zA-Z0-9_.\/\-\x7F-\xFF\n])/n, '\\').gsub(/\n/, "'\n'").sub(/^$/, "''")
end

def process_post(input)
  links = input.scan(/\((https?:\/\/([^\)]+))\)/)
  refs = input.scan(/^\[([^^][^\]]+)\]: (.*)$/)
  lines = input.split("\n")

  bottom = lines[0..-1].join("\n").gsub(/^\[([^^][^\]]+)\]: .*\n?/,'')

  norepeat = []
  norepeatlinks = []
  output = []

  refs.each {|ref|
    name = ref[0]
    next if norepeatlinks.include? ref[1]
    while norepeat.include? name
      if name =~ / ?[0-9]$/
        name.next!
      else
        name = name + " 2"
      end
    end
    output << {'orig' => ref[0], 'title' => name, 'link' => ref[1]}
    norepeat.push name
    norepeatlinks.push ref[1]
  }

  links.each {|url|
      next if norepeatlinks.include? url[0]
      domain = url[0].match(/https?:\/\/([^\/]+)/)
      parts = domain[1].split('.')
      name = case parts.length
        when 1 then parts[0]
        when 2 then parts[0]
        else parts[1]
      end
    while norepeat.include? name
      if name =~ / ?[0-9]$/
        name.next!
      else
        name = name + " 2"
      end
    end
    output << {'orig' => url[0], 'title' => name, 'link' => url[0] }
    norepeat.push name
    norepeatlinks.push url[0]
  }
  output = output.sort {|a,b| a['title'] <=> b['title']}
  o = []

  output.each_with_index { |x,i|
    o.push("[#{x['title']}]: #{x['link']}")
    bottom = bottom.gsub(/\((#{e_sh x['orig']}|#{e_sh x['link']})\)/,"[#{x['title']}]").gsub(/\[#{e_sh x['orig']}\]/,"[#{x['title']}]")
  }
  bottom + "\n#{o.join("\n")}\n"
end

The script requires the ‘sequel’ and ‘mysql’ Ruby gems to run. I was able to install these on a Dreamhost server with no problem, but I’d recommend just dumping your database to a local mysql server, running the script and testing the result, then uploading it. Assuming you have a local testing environment with a MySQL server…

Anyway, doubt it’s of much use to most of you, but figured I’d put it out there.

Brett Terpstra

Brett is a writer and developer living in Minnesota, USA. You can follow him as ttscoff on Twitter, GitHub, and Mastodon. Keep up with this blog by subscribing in your favorite news reader.

This content is supported by readers like you.

Join the conversation