Deleting Files That Match a Regular Expression

Credit: Matthew Palmer

Problem

You have a directory full of files and you need to remove some of them. The patterns you want to match are too complex to represent as file globs, but you can represent them as a regular expression.

Solution

The Dir.entries method gives you an array of all files in a directory, and you can iterate over this array with #each. A method to delete the files matching a regular expression might look like this:

def delete_matching_regexp(dir, regex) Dir.entries(dir).each do |name| path = File.join(dir, name) if name =~ regex ftype = File.directory?(path) ? Dir : File begin ftype.delete(path) rescue SystemCallError => e $stderr.puts e.message end end end end

Heres an example. Lets create a bunch of files and directories beneath a temporary directory:

require fileutils tmp_dir = mp_buncha_files files = [A, A.txt, A.html, p.html, A.html.bak] directories = [ ext.dir, Directory.for.html] Dir.mkdir(tmp_dir) unless File.directory? tmp_dir files.each { |f| FileUtils.touch(File.join(tmp_dir,f)) } directories.each { |d| Dir.mkdir(File.join(tmp_dir, d)) }

Now lets delete some of those files and directories. Well delete a file or directory if its name starts with a capital letter, and if its extension (the string after its last period) is at least four characters long. This corresponds to the regular expression /^[A-Z].*.[^.]{4,}$/:

Dir.entries(tmp_dir) # => [".", "..", "A", "A.txt", "A.html", "p.html", "A.html.bak", # "text.dir", "Directory.for.html"] delete_matching_regexp(tmp_dir, /^[A-Z].*.[^.]{4,}$/) Dir.entries(tmp_dir) # => [".", "..", "A", "A.txt", "p.html", "A.html.bak", "text.dir"]

Discussion

Like most good things in Ruby, Dir.entries takes a code block. It yields every file and subdirectory it finds to that code block. Our particular code block uses the regular expression match operator =~ to match every real file (no subdirectories) against the regular expression, and File.delete to remove offending files.

File.delete won delete directories; for that, you need Directory.delete. So delete_ matching_regexp uses the File predicates to check whether a file is a directory. We also have error reporting, to report cases when we don have permission to delete a file, or a directory isn empty.

Of course, once weve got this basic "find matching files" thing going, theres no reason why we have to limit ourselves to deleting the matched files. We can move them to somewhere new:

def move_matching_regexp(src, dest, regex) Dir.entries(dir).each do |name| File.rename(File.join(src, name), File.join(dest, name)) if name =~ regex end end

Or we can append a suffix to them:

def append_matching_regexp(dir, suffix, regex) Dir.entries(dir).each do |name| if name =~ regex File.rename(File.join(dir, name), File.join(dir, name+suffix)) end end end

Note the common code in both of those implementations. We can factor it out into yet another method that takes a block:

def each_matching_regexp(dir, regex) Dir.entries(dir).each { |name| yield name if name =~ regex } end

We no longer have to tell Dir.each how to match the files we want; we just need to tell each_matching_regexp what to do with them:

def append_matching_regexp(dir, suffix, regex) each_matching_regexp(dir, regex) do |name| File.rename(File.join(dir, name), File.join(dir, name+suffix)) end end

This is all well and good, but these methods only manipulate files directly beneath the directory you specify. "Ive got a whole tree full of files I want to get rid of!" I hear you cry. For that, you should use Find.find instead of Dir.each. Apart from that change, the implementation is nearly identical to delete_matching_regexp:

def delete_matching_regexp_recursively(dir, regex) Find.find(dir) do |path| dir, name = File.split(path) if name =~ regex ftype = File.directory?(path) ? Dir : File begin ftype.delete(path) rescue SystemCallError => e $stderr.puts e.message end end end end

If you want to recursively delete the contents of directories that match the regular expression (even if the contents themselves don match), use FileUtils.rm_rf instead of Dir.delete.

See Also

Категории