Deleting Files That Match a Regular Expression
Credit: Matthew Palmer
Problem
You have a directory full of files and you need to remove some of them. The patterns you want to match are too complex to represent as file globs, but you can represent them as a regular expression.
Solution
The Dir.entries method gives you an array of all files in a directory, and you can iterate over this array with #each. A method to delete the files matching a regular expression might look like this:
def delete_matching_regexp(dir, regex) Dir.entries(dir).each do |name| path = File.join(dir, name) if name =~ regex ftype = File.directory?(path) ? Dir : File begin ftype.delete(path) rescue SystemCallError => e $stderr.puts e.message end end end end
Heres an example. Lets create a bunch of files and directories beneath a temporary directory:
require fileutils tmp_dir = mp_buncha_files files = [A, A.txt, A.html, p.html, A.html.bak] directories = [ ext.dir, Directory.for.html] Dir.mkdir(tmp_dir) unless File.directory? tmp_dir files.each { |f| FileUtils.touch(File.join(tmp_dir,f)) } directories.each { |d| Dir.mkdir(File.join(tmp_dir, d)) }
Now lets delete some of those files and directories. Well delete a file or directory if its name starts with a capital letter, and if its extension (the string after its last period) is at least four characters long. This corresponds to the regular expression /^[A-Z].*.[^.]{4,}$/:
Dir.entries(tmp_dir) # => [".", "..", "A", "A.txt", "A.html", "p.html", "A.html.bak", # "text.dir", "Directory.for.html"] delete_matching_regexp(tmp_dir, /^[A-Z].*.[^.]{4,}$/) Dir.entries(tmp_dir) # => [".", "..", "A", "A.txt", "p.html", "A.html.bak", "text.dir"]
Discussion
Like most good things in Ruby, Dir.entries takes a code block. It yields every file and subdirectory it finds to that code block. Our particular code block uses the regular expression match operator =~ to match every real file (no subdirectories) against the regular expression, and File.delete to remove offending files.
File.delete won delete directories; for that, you need Directory.delete. So delete_ matching_regexp uses the File predicates to check whether a file is a directory. We also have error reporting, to report cases when we don have permission to delete a file, or a directory isn empty.
Of course, once weve got this basic "find matching files" thing going, theres no reason why we have to limit ourselves to deleting the matched files. We can move them to somewhere new:
def move_matching_regexp(src, dest, regex) Dir.entries(dir).each do |name| File.rename(File.join(src, name), File.join(dest, name)) if name =~ regex end end
Or we can append a suffix to them:
def append_matching_regexp(dir, suffix, regex) Dir.entries(dir).each do |name| if name =~ regex File.rename(File.join(dir, name), File.join(dir, name+suffix)) end end end
Note the common code in both of those implementations. We can factor it out into yet another method that takes a block:
def each_matching_regexp(dir, regex) Dir.entries(dir).each { |name| yield name if name =~ regex } end
We no longer have to tell Dir.each how to match the files we want; we just need to tell each_matching_regexp what to do with them:
def append_matching_regexp(dir, suffix, regex) each_matching_regexp(dir, regex) do |name| File.rename(File.join(dir, name), File.join(dir, name+suffix)) end end
This is all well and good, but these methods only manipulate files directly beneath the directory you specify. "Ive got a whole tree full of files I want to get rid of!" I hear you cry. For that, you should use Find.find instead of Dir.each. Apart from that change, the implementation is nearly identical to delete_matching_regexp:
def delete_matching_regexp_recursively(dir, regex) Find.find(dir) do |path| dir, name = File.split(path) if name =~ regex ftype = File.directory?(path) ? Dir : File begin ftype.delete(path) rescue SystemCallError => e $stderr.puts e.message end end end end
If you want to recursively delete the contents of directories that match the regular expression (even if the contents themselves don match), use FileUtils.rm_rf instead of Dir.delete.
See Also
- Dir.delete will only remove an empty directory; see Recipe 6.18 for information on how to remove one thats not empty
- Recipe 6.20, "Finding the Files You Want"
Категории