Shell Tricks: list files with most text matches

[Tweet : ADN : nvALT]

Here’s a Bash function for searching all text files in the current directory for a pattern, then listing the files containing matches in ascending order by number of matches. It’s mostly a proof of concept, but a useful companion to a basic grep search.

The meat of the script happens in an array declaration. It first uses grep -lIi -E "$patt" * 2> /dev/null to list files containing the provided pattern (case insensitive), ignoring binary files. The error redirect at the end of the command will ignore the errors thrown by directories. The results of this are fed to another grep command: grep -Hi -c -E "$patt" which outputs the match count for each file. The results are saved to the array.

After including the function in a sourced file (e.g. ~/.bash_profile), running matches -h will show the available flags and switches:

$ matches -h
Find files in the current directory containing the most occurrences 
of a pattern
	-c Include occurrence counts in output
	-r Reverse sort order (default ascending)
	-m COUNT Minimum number of matches required
	-h Display this help screen

 Example:
	# search for files containing at least 3 occurrences
	# of the word "jekyll", display filenames with counts

	$ matches -c -m 3 jekyll

Here’s the function for pasting into ~/.bash_profile (or other sourced file):

# Find files in the current directory containing the most occurrences of a pattern
# switch -c: turn on display of occurrence counts
# switch -r: reverse sort order (default ascending)
# flag -m COUNT: minimum number of occurrences required to include file in results
# param 1: (required) search pattern (regex allowed, case insensitive)
#
# Results are output in ascending order by occurrence count
matches () {

	local counts=false minmatches=1 patt width=1 reverse=""
	local helpstring="Find files in the current directory containing the most occurrences of a pattern\n\t-c         Include occurrence counts in output\n\t-r         Reverse sort order (default ascending)\n\t-m COUNT   Minimum number of matches required\n\t-h         Display this help screen\n\n	Example:\n\t# search for files containing at least 3 occurrences\n\t# of the word \"jekyll\", display filenames with counts\n\n\t$ matches -c -m 3 jekyll"

	OPTIND=1
	while getopts "crm:h" opt; do
		case $opt in
			c) counts=true ;;
			r) reverse="r" ;;
			m) minmatches=$OPTARG ;;
			h) echo -e $helpstring; return;;
			*) return 1;;
		esac
	done
	shift $((OPTIND-1))

	if [ $# -ne 1 ]; then
		echo -e $helpstring
		return 1
	fi

	patt=$1; shift

	OLDIFS=$IFS
	IFS=$'\n'

	declare -a matches=$(while read -r line; do \
	                grep -Hi -c -E "$patt" "$line"; \
	              done < <(grep -lIi -E "$patt" * 2> /dev/null) \
	              | sort -t: -${reverse}n -k 2)
	width=$(echo -n ${matches[0]##*:}|wc -c|tr -d ' ')

	for mtch in ${matches[@]}; do
		if [ ${mtch##*:} -ge $minmatches ]; then
			if $counts; then
				printf "%${width}d: %s\n" ${mtch##*:} "${mtch%:*}"
			else
				echo "${mtch%:*}"
			fi
		fi
	done

	IFS=$OLDIFS
}

Ryan Irelan has produced a series of shell trick videos based on BrettTerpstra.com posts. Readers can get 10% off using the coupon code TERPSTRA.