A warm, fuzzy, file-finding feeling

I write a lot of scripts. I often want to edit and tweak those scripts. I sometimes forget exactly where a script is in my path, though, so I end up using “subl $(which scriptname)” to do it (yes, I have an alias for that). I didn’t know I needed a smarter, faster way until I built this, but now I’m quite enamored with it.

The idea started as a bash function that ran a series of searches in a sequence designed to find the file I most likely wanted to edit. That worked well, and looked like this:

editscript.rbraw
#!/usr/bin/env ruby
# encoding: utf-8
# == Synopsis
#   Proof of concept using Fuzzy File Finder to locate a script to edit
#   Searches a set of predefined locations for a fuzzy string
#   e.g. "mwp" matches both "myweatherprogram" and "mowthelawnplease"
#   ................on "(m)y(w)eather(p)rogram" and "(m)o(w)thelawn(p)lease"
#
#   Results are ranked and a menu is displayed with the most likely
#   match at the top. Editor to be launched and directories to search
#   specified in CONFIG below.
#
# == Examples
#   Search through configured directories for "mwp"
#     editscript mwp
#
# == Usage
#   editscript [options] "search string"
#
#   For help use: editscript -h
#
# == Options
#   -s, --show         Show results without executing
#   -n, --no-menu      No menu interaction. Executes the highest ranked result
#                      or, with '-s', returns a plain list of results
#   -a, --show-all     Show all results, otherwise limited to top 10
#       --scores       Show match scoring with results
#   -d, --debug        Verbose debugging
#   -h, --help         help
#
# == Author
#   Brett Terpstra
#
# == Copyright
#   Public Domain
#

# This script includes Fuzzy File Finder by Jamis Buck, no external dependencies
# Fuzzy File Finder is available as a gem as well if you want to play further
# `gem install –source=gems.github.com jamis-fuzzy_file_finder`

## CONFIG
#
# The CLI tool you want to launch with the file
editor = 'subl'

# A list of directories to search in (recursive)
search_paths = ['~/scripts','/usr/bin/local','~/bin','~/.bash_it']

##################################################################
## ADVANCED (OPTIONAL) CONFIG
## You do not need to edit the $options below, but
## you can modify default behavior if you really want to.
## All of the following can be specified via command line arguments
## Use `editscript -h` to see the available $options
$options = {
  :show => false, # Show results without executing
  :menu => true, # Don't display a menu, execute/show the highest ranked result
  :showall => false, # Show all results, otherwise limited to top 10
  :showscores => false, # Show match scoring with results
  :debug => false, # Verbose debugging
}
## End CONFIG #####################################################

require 'optparse'
require 'readline'

#--
# Fuzzy File Finder <https://github.com/jamis/fuzzy_file_finder>
# ==================================================================
# Author: Jamis Buck (jamis@jamisbuck.org)
# Date: 2008-10-09
#
# This file is in the public domain. Usage, modification, and
# redistribution of this file are unrestricted.
# ==================================================================
#++
class FuzzyFileFinder
  module Version
    MAJOR = 1
    MINOR = 0
    TINY  = 4
    STRING = [MAJOR, MINOR, TINY].join(".")
  end
  class TooManyEntries < RuntimeError; end
  class CharacterRun < Struct.new(:string, :inside) #:nodoc:
    def to_s
      if inside
        "(#{string})"
      else
        string
      end
    end
  end
  class FileSystemEntry #:nodoc:
    attr_reader :parent
    attr_reader :name
    def initialize(parent, name)
      @parent = parent
      @name = name
    end
    def path
      File.join(parent.name, name)
    end
  end
  class Directory #:nodoc:
    attr_reader :name
    def initialize(name, is_root=false)
      @name = name
      @is_root = is_root
    end
    def root?
      is_root
    end
  end
  attr_reader :roots
  attr_reader :files
  attr_reader :ceiling
  attr_reader :shared_prefix
  attr_reader :ignores
  def initialize(directories=['.'], ceiling=10_000, ignores=nil)
    directories = Array(directories)
    directories << "." if directories.empty?
    root_dirnames = directories.map { |d| File.expand_path(d) }.select { |d| File.directory?(d) }.uniq
    @roots = root_dirnames.map { |d| Directory.new(d, true) }
    @shared_prefix = determine_shared_prefix
    @shared_prefix_re = Regexp.new("^#{Regexp.escape(shared_prefix)}" + (shared_prefix.empty? ? "" : "/"))
    @files = []
    @ceiling = ceiling
    @ignores = Array(ignores)
    rescan!
  end
  def rescan!
    @files.clear
    roots.each { |root| follow_tree(root) }
  end
  def search(pattern, &block)
    pattern.gsub!(" ", "")
    path_parts = pattern.split("/")
    path_parts.push "" if pattern[-1,1] == "/"
    file_name_part = path_parts.pop || ""
    if path_parts.any?
      path_regex_raw = "^(.*?)" + path_parts.map { |part| make_pattern(part) }.join("(.*?/.*?)") + "(.*?)$"
      path_regex = Regexp.new(path_regex_raw, Regexp::IGNORECASE)
    end
    file_regex_raw = "^(.*?)" << make_pattern(file_name_part) << "(.*)$"
    file_regex = Regexp.new(file_regex_raw, Regexp::IGNORECASE)
    path_matches = {}
    files.each do |file|
      path_match = match_path(file.parent, path_matches, path_regex, path_parts.length)
      next if path_match[:missed]
      match_file(file, file_regex, path_match, &block)
    end
  end
  def find(pattern, max=nil)
    results = []
    search(pattern) do |match|
      results << match
      break if max && results.length >= max
    end
    return results
  end
  def inspect #:nodoc:
    "#<%s:0x%x roots=%s, files=%d>" % [self.class.name, object_id, roots.map { |r| r.name.inspect }.join(", "), files.length]
  end
  private
    def follow_tree(directory)
      Dir.entries(directory.name).each do |entry|
        next if entry[0,1] == "."
        next if ignore?(directory.name) # Ignore whole directory hierarchies
        raise TooManyEntries if files.length > ceiling
        full = File.join(directory.name, entry)
        if File.directory?(full)
          follow_tree(Directory.new(full))
        elsif !ignore?(full.sub(@shared_prefix_re, ""))
          files.push(FileSystemEntry.new(directory, entry))
        end
      end
    end
    def ignore?(name)
      ignores.any? { |pattern| File.fnmatch(pattern, name) }
    end
    def make_pattern(pattern)
      pattern = pattern.split(//)
      pattern << "" if pattern.empty?
      pattern.inject("") do |regex, character|
        regex << "([^/]*?)" if regex.length > 0
        regex << "(" << Regexp.escape(character) << ")"
      end
    end
    def build_match_result(match, inside_segments)
      runs = []
      inside_chars = total_chars = 0
      match.captures.each_with_index do |capture, index|
        if capture.length > 0
          inside = index % 2 != 0
          total_chars += capture.gsub(%r(/), "").length # ignore '/' delimiters
          inside_chars += capture.length if inside
          if runs.last && runs.last.inside == inside
            runs.last.string << capture
          else
            runs << CharacterRun.new(capture, inside)
          end
        end
      end
      inside_runs = runs.select { |r| r.inside }
      run_ratio = inside_runs.length.zero? ? 1 : inside_segments / inside_runs.length.to_f
      char_ratio = total_chars.zero? ? 1 : inside_chars.to_f / total_chars
      score = run_ratio * char_ratio
      return { :score => score, :result => runs.join }
    end
    def match_path(path, path_matches, path_regex, path_segments)
      return path_matches[path] if path_matches.key?(path)
      name_with_slash = path.name + "/" # add a trailing slash for matching the prefix
      matchable_name = name_with_slash.sub(@shared_prefix_re, "")
      matchable_name.chop! # kill the trailing slash
      if path_regex
        match = matchable_name.match(path_regex)
        path_matches[path] =
          match && build_match_result(match, path_segments) ||
          { :score => 1, :result => matchable_name, :missed => true }
      else
        path_matches[path] = { :score => 1, :result => matchable_name }
      end
    end
    def match_file(file, file_regex, path_match, &block)
      if file_match = file.name.match(file_regex)
        match_result = build_match_result(file_match, 1)
        full_match_result = path_match[:result].empty? ? match_result[:result] : File.join(path_match[:result], match_result[:result])
        shortened_path = path_match[:result].gsub(/[^\/]+/) { |m| m.index("(") ? m : m[0,1] }
        abbr = shortened_path.empty? ? match_result[:result] : File.join(shortened_path, match_result[:result])
        result = { :path => file.path,
                   :abbr => abbr,
                   :directory => file.parent.name,
                   :name => file.name,
                   :highlighted_directory => path_match[:result],
                   :highlighted_name => match_result[:result],
                   :highlighted_path => full_match_result,
                   :score => path_match[:score] * match_result[:score] }
        yield result
      end
    end
    def determine_shared_prefix
      return roots.first.name if roots.length == 1
      split_roots = roots.map { |root| root.name.split(%r{/}) }
      segments = split_roots.map { |root| root.length }.max
      master = split_roots.pop
      segments.times do |segment|
        if !split_roots.all? { |root| root[segment] == master[segment] }
          return master[0,segment].join("/")
        end
      end
      return roots.first.name
    end
end

################################### editscript

opt_parser = OptionParser.new do |opt|

  opt.banner = "Usage: #{File.basename(__FILE__)} [$options] 'search terms'"
  opt.separator  ""
  opt.separator  "Options:"

  opt.on("-s","--show","Show results without executing") do |environment|
    $options[:show] = true
  end

  opt.on("-n","--no-menu","No menu interaction. Executes the highest ranked result or, with '-s', returns a plain list of results") do |menu|
    $options[:menu] = false
  end

  opt.on("-a","--show-all","Show all results, otherwise limited to top 10") do
    $options[:showall] = true
  end

  opt.on("--scores","Show match scoring with results") do
    $options[:showscores] = true
  end

  opt.on("-d","--debug","Verbose debugging") do
    $options[:debug] = true
  end

  opt.on("-h","--help","help") do
    puts opt_parser
    exit
  end
end

opt_parser.parse!

if ARGV.empty?
  puts "No search term given. Use '#{File.basename(__FILE__)} -h' for help."
  exit
else
  search_terms = ARGV.join(' ')
end



puts "Searching #{search_terms}" if $options[:debug]

finder = FuzzyFileFinder.new(search_paths)

res = finder.find(search_terms).delete_if { |file|
  %x{file "#{file[:path]}"}.chomp !~ /text/
}

if res.length == 0
  puts "No matching files"
  exit
elsif res.length > 1
  res = res.sort {|a,b|
    a[:score] <=> b[:score]
  }.reverse
end

res.each do |match|
  printf("[%09.4f]",match[:score]) if $options[:showscores]
  puts match[:path]
end if $options[:show]

def results_menu(res)
  counter = 1
  puts
  res.each do |match|
    display = $options[:debug] ? match[:highlighted_path] : match[:path]
    if $options[:showscores]
      printf("%2d ) [%09.4f] %s\n",counter, match[:score], display)
    else
      printf("%2d ) %s\n", counter, display)
    end
    counter += 1
  end
  puts
end

unless $options[:show]
  unless $options[:menu] # Just execute the top result
    %x{#{editor} #{res[0][:path]}}
  else # Show a menu of results
    res = res[0..9] unless $options[:showall] # limit to top 10 results
    stty_save = `stty -g`.chomp
    trap('INT') { system('stty', stty_save); exit }
    results_menu(res)
    begin
      printf("Type 'q' or hit return to cancel",res.length)
      while line = Readline.readline(": ", true)
        if line =~ /^[a-z]/i || line == ''
          system('stty', stty_save) # Restore
          exit
        end
        if line.to_i > 0 && line.to_i <= res.length
          puts res[line.to_i - 1][:path] if $options[:debug]
          %x{#{editor} #{res[line.to_i - 1][:path]}}
          break
        else
          puts "Out of range"
          results_menu(res)
        end
      end
    rescue Interrupt => e
      system('stty', stty_save)
      exit
    end
  end
end

It required you to get at least part of the string right, though. How annoying.

So I was playing around with fuzzy string matching (again), and found a smart little Ruby gem called Fuzzy File Finder. It’s essentially TextMate/Quicksilver style matching for a directory tree, allowing you to use queries that aren’t contiguous but are in left to right sequence and ranking the results. If you haven’t used this kind of searching before, think of it as typing your initials to match your whole name. Typing “brmc” would match “Black Rebel Motorcycle Club,” but so would “bblmccb.”

I built it into a proof-of-concept script that turned out to be really useful. It’s already become second nature for me after just a couple of days. The script is designed to quickly find a text/source file I want to edit within a group of my standard script directories, but it could easily be modified to handle an array of file searching tasks. The algorithm for Fuzzy Finder itself is portable to more than just filenames, too1. More on that another time.

This script has Fuzzy File Finder built in, so there are no dependencies and it should run on any stock Ruby install. To run it, save the Gist to a folder in your path as editscript and make it executable (chmod a+x editscript).

See the comments at the top of the script for a description, and run editscript -h for options. There are a couple of items to configure at below the top documentation and before the script. It comes set up to search in “~/scripts”, “~/bin” and “/usr/local/bin”, but you can adjust as needed. It searches recursively, so don’t set a root folder as a search directory or you’ll be in for a long wait. Also set up the command for your preferred editor (mvim, vim, mate, subl, etc.) in that section.

editscript: locate a text file by name using fuzzy string matching
Usage: editscript [$options] 'search terms'

Options:
    -s, --show                       Show results without executing
    -n, --no-menu                    No menu interaction. Executes the highest ranked result or, with '-s', returns a plain list of results
    -a, --show-all                   Show all results, otherwise limited to top 10
    --scores                     	 Show match scoring with results
    -c, --command COMMAND            Run an arbitrary command instead of the defined editor
    -d, --debug                      Verbose debugging
    -h, --help                       help

By default it returns up to 10 of the most likely matches for your query as a numbered menu for you to select from. You can tell it to skip the menu and just open the top match with -n. You can also have it return a raw list of matches for use in other scripts using -s. You can prevent truncation of the list to the top 10 items and return all matches using -a.

For verbose debugging use -d (though there are very few messages… one, I think) and use --scores to include the match rankings in the output (both -s and default menu output). You can also run a command in place of the default editor with the -c option.

The script is a Gist on Github. I hope you find it as handy as I have.

  1. I frequently use amatch for text searching, but I wanted to play with some options.

Ryan Irelan has produced a series of shell trick videos based on BrettTerpstra.com posts. Readers can get 10% off using the coupon code TERPSTRA.

Brett Terpstra

Brett is a writer and developer living in Minnesota, USA. You can follow him as ttscoff on Twitter, GitHub, and Mastodon. Keep up with this blog by subscribing in your favorite news reader.

This content is supported by readers like you.