Deleting Directory Trees
Both of the copy scripts in the last section work as planned, but they aren't very forgiving of existing directory trees. That is, they implicitly assume that the "to" target directory is either empty or doesn't exist at all, and fail badly if that isn't the case. Presumably, you will first somehow delete the target directory on your machine. For my purposes, that was a reasonable assumption to make.
The copiers could be changed to work with existing "to" directories too (e.g., ignore os.mkdir exceptions), but I prefer to start from scratch when copying trees; you never know what old garbage might be laying around in the "to" directory. So when testing the copies above, I was careful to run a rm -rf cpexamples command line to recursively delete the entire cpexamples directory tree before copying another tree to that name.
Unfortunately, the rm command used to clear the target directory is really a Unix utility that I installed on my PC from a commercial package; it probably won't work on your computer. There are other platform-specific ways to delete directory trees (e.g., deleting a folder's icon in a Windows explorer GUI), but why not do it once in Python for every platform? Example 5-22 deletes every file and directory at and below a passed-in directory's name. Because its logic is packaged as a function, it is also an importable utility that can be run from other scripts. Because it is pure Python code, it is a cross-platform solution for tree removal.
Example 5-22. PP2ESystemFiletools mall.py
#!/usr/bin/python ################################################################ # Use: "python rmall.py directoryPath directoryPath..." # recursive directory tree deletion: removes all files and # directories at and below directoryPaths; recurs into subdirs # and removes parent dir last, because os.rmdir requires that # directory is empty; like a Unix "rm -rf directoryPath" ################################################################ import sys, os fcount = dcount = 0 def rmall(dirPath): # delete dirPath and below global fcount, dcount namesHere = os.listdir(dirPath) for name in namesHere: # remove all contents first path = os.path.join(dirPath, name) if not os.path.isdir(path): # remove simple files os.remove(path) fcount = fcount + 1 else: # recur to remove subdirs rmall(path) os.rmdir(dirPath) # remove now-empty dirPath dcount = dcount + 1 if __name__ == '__main__': import time start = time.time( ) for dname in sys.argv[1:]: rmall(dname) tottime = time.time( ) - start print 'Removed %d files and %d dirs in %s secs' % (fcount, dcount, tottime)
The great thing about coding this sort of tool in Python is that it can be run with the same command-line interface on any machine where Python is installed. If you don't have a rm -rf type command available on your Windows, Unix, or Macintosh computer, simply run the Python rmall script instead:
C: emp>python %X%SystemFiletoolscpall.py examples cpexamples Note: dirTo was created Copying... Copied 1379 files, 121 directories in 2.68999993801 seconds C: emp>python %X%SystemFiletools mall.py cpexamples Removed 1379 files and 122 dirs in 0.549999952316 secs C: emp>ls cpexamples ls: File or directory "cpexamples" is not found
Here, the script traverses and deletes a tree of 1379 files and 122 directories in about half a second -- substantially impressive for a noncompiled programming language, and roughly equivalent to the commercial rm -rf program I purchased and installed on my PC.
One subtlety here: this script must be careful to delete the contents of a directory before deleting the directory itself -- the os.rmdir call mandates that directories must be empty when deleted (and throws an exception if they are not). Because of that, the recursive calls on subdirectories need to happen before the os.mkdir call. Computer scientists would recognize this as a postorder, depth-first tree traversal, since we process parent directories after their children. This also makes any traversals based on os.path.walk out of the question: we need to return to a parent directory to delete it after visiting its descendents.
To illustrate, let's run interactive os.remove and os.rmdir calls on a cpexample directory containing files or nested directories:
>>> os.path.isdir('cpexamples') 1 >>> os.remove('cpexamples') Traceback (innermost last): File "", line 1, in ? OSError: [Errno 2] No such file or directory: 'cpexamples' >>> os.rmdir('cpexamples') Traceback (innermost last): File "", line 1, in ? OSError: [Errno 13] Permission denied: 'cpexamples'
Both calls always fail if the directory is not empty. But now, delete the contents of cpexamples in another window and try again:
>>> os.path.isdir('cpexamples') 1 >>> os.remove('cpexamples') Traceback (innermost last): File "", line 1, in ? OSError: [Errno 2] No such file or directory: 'cpexamples' >>> os.rmdir('cpexamples') >>> os.path.exists('cpexamples') 0
The os.remove still fails -- it's only meant for deleting nondirectory items -- but os.rmdir now works because the directory is empty. The upshot of this is that a tree deletion traversal must generally remove directories "on the way out."
5.7.1 Recoding Deletions for Generality
As coded, the rmall script only processes directory names and fails if given names of simple files, but it's trivial to generalize the script to eliminate that restriction. The recoding in Example 5-23 accepts an arbitrary command-line list of file and directory names, deletes simple files, and recursively deletes directories.
Example 5-23. PP2ESystemFiletools mall2.py
#!/usr/bin/python ################################################################ # Use: "python rmall2.py fileOrDirPath fileOrDirPath..." # like rmall.py, alternative coding, files okay on cmd line ################################################################ import sys, os fcount = dcount = 0 def rmone(pathName): global fcount, dcount if not os.path.isdir(pathName): # remove simple files os.remove(pathName) fcount = fcount + 1 else: # recur to remove contents for name in os.listdir(pathName): rmone(os.path.join(pathName, name)) os.rmdir(pathName) # remove now-empty dirPath dcount = dcount + 1 if __name__ == '__main__': import time start = time.time( ) for name in sys.argv[1:]: rmone(name) tottime = time.time( ) - start print 'Removed %d files and %d dirs in %s secs' % (fcount, dcount, tottime)
This shorter version runs the same, and just as fast, as the original:
C: emp>python %X%SystemFiletoolscpall.py examples cpexamples Note: dirTo was created Copying... Copied 1379 files, 121 directories in 2.52999997139 seconds C: emp>python %X%SystemFiletools mall2.py cpexamples Removed 1379 files and 122 dirs in 0.550000071526 secs C: emp>ls cpexamples ls: File or directory "cpexamples" is not found
but can also be used to delete simple files:
C: emp>python %X%SystemFiletools mall2.py spam.txt eggs.txt Removed 2 files and 0 dirs in 0.0600000619888 secs C: emp>python %X%SystemFiletools mall2.py spam.txt eggs.txt cpexamples Removed 1381 files and 122 dirs in 0.630000042915 secs
As usual, there is more than one way to do it in Python (though you'll have to try harder to find many spurious ways). Notice that these scripts trap no exceptions; in programs designed to blindly delete an entire directory tree, exceptions are all likely to denote truly bad things. We could get more fancy, and support filename patterns by using the built-in fnmatch module along the way too, but this was beyond the scope of these script's goals (for pointers on matching, see Example Example 5-17, and also find.py in Chapter 2).