Finding the Files You Want
Problem
You want to locate all the files in a directory hierarchy that match some criteria. For instance, you might want to find all the empty files, all the MP3 files, or all the files named "README."
Solution
Use the Find.find method to walk the directory structure and accumulate a list of matching files.
Pass in a block to the following method and it'll walk a directory tree, testing each file against the code block you provide. It returns an array of all files for which the value of the block is true.
require 'find' module Find def match(*paths) matched = [] find(*paths) { |path| matched << path if yield path } return matched end module_function :match end
Here's what Find.match might return if you used it on a typical disorganized home directory:
Find.match("./") { |p| File.lstat(p).size == 0 } # => ["./Music/cancelled_download.MP3", "./tmp/empty2", "./tmp/empty1"] Find.match("./") { |p| ext = p[-4…p.size]; ext && ext.downcase == ".mp3" } # => ["./Music/The Snails - Red Rocket.mp3", # => "./Music/The Snails - Moonfall.mp3", "./Music/cancelled_download.MP3"] Find.match("./") { |p| File.split(p)[1] == "README" } # => ["./rubyprog-0.1/README", "./tmp/README"]
Discussion
This is an especially useful chunk of code for system administration tasks. It gives you functionality at least as powerful as the Unix find command, but you can write your search criteria in Ruby and you won't have to remember the arcane syntax of find.
As with Find.walk itself, you can stop Find.match from processing a directory by calling Find.prune:
Find.match("./") do |p| Find.prune if p == "./tmp" File.split(p)[1] == "README" end # => ["./rubyprog-0.1/README"]
You can even look inside each file to see whether you want it:
# Find all files that start with a particular phrase. must_start_with = "This Ruby program" Find.match("./") do |p| if File.file? p open(p) { |f| f.read(must_start_with.size) == must_start_with } else false end end # => ["./rubyprog-0.1/README"]
A few other useful things to search for using this function:
# Finds files that were probably left behind by emacs sessions. def emacs_droppings(*paths) Find.match(*paths) do |p| (p[-1] == ?~ and p[0] != ?~) or (p[0] == ?# and p[-1] == ?#) end end # Finds all files that are larger than a certain threshold. Use this to find # the files hogging space on your filesystem. def bigger_than(bytes, *paths) Find.match(*paths) { |p| File.lstat(p).size > bytes } end # Finds all files modified more recently than a certain number of seconds ago. def modified_recently(seconds, *paths) time = Time.now - seconds Find.match(*paths) { |p| File.lstat(p).mtime > time } end # Finds all files that haven't been accessed since they were last modified. def possibly_abandoned(*paths) Find.match(*paths) { |p| f = File.lstat(p); f.mtime == f.atime } end
See Also
- Recipe 6.12, "Walking a Directory Tree"