Fixing DOS Filenames
The heart of the prior script was findFiles, a function than knows how to portably collect matching file and directory names in an entire tree, given a list of filename patterns. It doesn't do much more than the built-in find.find call, but can be augmented for our own purposes. Because this logic was bundled up in a function, though, it automatically becomes a reusable tool.
For example, the next script imports and applies findFiles, to collect all file names in a directory tree, by using the filename pattern * (it matches everything). I use this script to fix a legacy problem in the book's examples tree. The names of some files created under MS-DOS were made all uppercase; for example, spam.py became SPAM.PY somewhere along the way. Because case is significant both in Python and on some platforms, an import statement like "import spam" will sometimes fail for uppercase filenames.
To repair the damage everywhere in the thousand-file examples tree, I wrote and ran Example 5-6. It works like this: For every filename in the tree, it checks to see if the name is all uppercase, and asks the console user whether the file should be renamed with the os.rename call. To make this easy, it also comes up with a reasonable default for most new names -- the old one in all-lowercase form.
Example 5-6. PP2EPyToolsfixnames_all.py
######################################################### # Use: "python ....PyToolsfixnames_all.py". # find all files with all upper-case names at and below # the current directory ('.'); for each, ask the user for # a new name to rename the file to; used to catch old # uppercase file names created on MS-DOS (case matters on # some platforms, when importing Python module files); # caveats: this may fail on case-sensitive machines if # directory names are converted before their contents--the # original dir name in the paths returned by find may no # longer exist; the allUpper heuristic also fails for # odd filenames that are all non-alphabetic (ex: '.'); ######################################################### import os, string listonly = 0 def allUpper(name): for char in name: if char in string.lowercase: # any lowercase letter disqualifies return 0 # else all upper, digit, or special return 1 def convertOne(fname): fpath, oldfname = os.path.split(fname) if allUpper(oldfname): prompt = 'Convert dir=%s file=%s? (y|Y)' % (fpath, oldfname) if raw_input(prompt) in ['Y', 'y']: default = string.lower(oldfname) newfname = raw_input('Type new file name (enter=%s): ' % default) newfname = newfname or default newfpath = os.path.join(fpath, newfname) os.rename(fname, newfpath) print 'Renamed: ', fname print 'to: ', str(newfpath) raw_input('Press enter to continue') return 1 return 0 if __name__ == '__main__': patts = "*" # inspect all file names from fixeoln_all import findFiles # reuse finder function matches = findFiles(patts) ccount = vcount = 0 for matchlist in matches: # list of lists, one per pattern for fname in matchlist: # fnames are full directory paths print vcount+1, '=>', fname # includes names of directories if not listonly: ccount = ccount + convertOne(fname) vcount = vcount + 1 print 'Converted %d files, visited %d' % (ccount, vcount)
As before, the findFiles function returns a list of simple filename lists, representing the expansion of all patterns passed in (here, just one result list, for the wildcard pattern * ).[5] For each file and directory name in the result, this script's convertOne function prompts for name changes; an os.path.split and an os.path.join call combination portably tacks the new filename onto the old directory name. Here is a renaming session in progress on Windows:
[5] Interestingly, using string '*' for the patterns list works the same as using list ['*'] here, only because a single-character string is a sequence that contains itself; compare the results of map(find.find, '*') with map(find.find, ['*']) interactively to verify.
C: empexamples>python %X%PyToolsfixnames_all.py Using Python find 1 => ..cshrc 2 => .LaunchBrowser.out.txt 3 => .LaunchBrowser.py ... ...more deleted... ... 218 => .Ai 219 => .AiExpertSystem 220 => .AiExpertSystemTODO Convert dir=.AiExpertSystem file=TODO? (y|Y)n 221 => .AiExpertSystem\__init__.py 222 => .AiExpertSystemholmes 223 => .AiExpertSystemholmesREADME.1ST Convert dir=.AiExpertSystemholmes file=README.1ST? (y|Y)y Type new file name (enter=readme.1st): Renamed: .AiExpertSystemholmesREADME.1st to: .AiExpertSystemholmes eadme.1st Press enter to continue 224 => .AiExpertSystemholmesREADME.2ND Convert dir=.AiExpertSystemholmes file=README.2ND? (y|Y)y Type new file name (enter=readme.2nd): readme-more Renamed: .AiExpertSystemholmesREADME.2nd to: .AiExpertSystemholmes eadme-more Press enter to continue ... ...more deleted... ... 1471 => . odos.py 1472 => . ounix.py 1473 => .xferall.linux.csh Converted 2 files, visited 1473
This script could simply convert every all-uppercase name to an all-lowercase equivalent automatically, but that's potentially dangerous (some names might require mixed-case). Instead, it asks for input during the traversal, and shows the results of each renaming operation along the way.
5.3.1 Rewriting with os.path.walk
Notice, though, that the pattern-matching power of the find.find call goes completely unused in this script. Because it always must visit every file in the tree, the os.path.walk interface we studied in Chapter 2 would work just as well, and avoids any initial pause while a filename list is being collected (that pause is negligible here, but may be significant for larger trees). Example 5-7 is an equivalent version of this script that does its tree traversal with the walk callbacks-based model.
Example 5-7. PP2EPyToolsfixnames_all2.py
############################################################### # Use: "python ....PyToolsfixnames_all2.py". # same, but use the os.path.walk interface, not find.find; # to make this work like the simple find version, puts of # visiting directories until just before visiting their # contents (find.find lists dir names before their contents); # renaming dirs here can fail on case-sensitive platforms # too--walk keeps extending paths containing old dir names; ############################################################### import os listonly = 0 from fixnames_all import convertOne def visitname(fname): global ccount, vcount print vcount+1, '=>', fname if not listonly: ccount = ccount + convertOne(fname) vcount = vcount + 1 def visitor(myData, directoryName, filesInDirectory): # called for each dir visitname(directoryName) # do dir we're in now, for fname in filesInDirectory: # and non-dir files here fpath = os.path.join(directoryName, fname) # fnames have no dirpath if not os.path.isdir(fpath): visitname(fpath) ccount = vcount = 0 os.path.walk('.', visitor, None) print 'Converted %d files, visited %d' % (ccount, vcount)
This version does the same job, but visits one extra file (the topmost root directory), and may visit directories in a different order (os.listdir results are unordered). Both versions run in under a dozen seconds for the example directory tree on my computer.[6] We'll revisit this script, as well as the fixeoln line-end fixer, in the context of a general tree-walker class hierarchy later in this chapter.
[6] Very subtle thing: both versions of this script might fail on platforms where case matters, if they rename directoriesalong the way. If a directory is renamed before the contents of that directory have been visited (e.g., a directory SPAM renamed to spam), then later reference to the directory's contents using the old name (e.g., SPAM/filename) will no longer be valid on case-sensitive platforms. This can happen in the find.find version, because directories can and do show up in the result list before their contents. It's also a potential with the os.path.walk version, because the prior directory path (with original directory names) keeps being extended at each level of the tree. I only use this script on Windows (DOS), so I haven't been bitten by this in practice. Workarounds -- ordering find result lists, walking trees in a bottom-up fashion, making two distinct passes for files and directories, queuing up directory names on a list to be renamed later, or simply not renaming directories at all -- are all complex enough to be delegated to the realm of reader experiments. As a rule of thumb, changing a tree's names or structure while it is being walked is a risky venture.