Visitor: Walking Trees Generically
Visitor Walking Trees Generically
Armed with the portable search_all script from Example 5-10, I was able to better pinpoint files to be edited, every time I changed the book examples tree structure. At least initially, I ran search_all to pick out suspicious files in one window, and edited each along the way by hand in another window.
Pretty soon, though, this became tedious too. Manually typing filenames into editor commands is no fun, especially when the number of files to edit is large. The search for "Part2" shown earlier returned 74 files, for instance. Since there are at least occasionally better things to do than manually start 74 editor sessions, I looked for a way to automatically run an editor on each suspicious file.
Unfortunately, search_all simply prints results to the screen. Although that text could be intercepted and parsed, a more direct approach that spawns edit sessions during the search may be easier, but may require major changes to the tree search script as currently coded. At this point, two thoughts came to mind.
First, I knew it would be easier in the long-run to be able to add features to a general directory searcher as external components, not by changing the original script. Because editing files was just one possible extension (what about automating text replacements too?), a more generic, customizable, and reusable search component seemed the way to go.
Second, after writing a few directory walking utilities, it became clear that I was rewriting the same sort of code over and over again. Traversals could be even further simplified by wrapping common details for easier reuse. The os.path.walk tool helps, but its use tends to foster redundant operations (e.g., directory name joins), and its function-object-based interface doesn't quite lend itself to customization the way a class can.
Of course, both goals point to using an OO framework for traversals and searching. Example 5-11 is one concrete realization of these goals. It exports a general FileVisitor class that mostly just wraps os.path.walk for easier use and extension, as well as a generic SearchVisitor class that generalizes the notion of directory searches. By itself, SearchVisitor simply does what search_all did, but it also opens up the search process to customization -- bits of its behavior can be modified by overloading its methods in subclasses. Moreover, its core search logic can be reused everywhere we need to search; simply define a subclass that adds search-specific extensions.
Example 5-11. PP2EPyToolsvisitor.py
############################################################# # Test: "python ....PyToolsvisitor.py testmask [string]". # Uses OOP, classes, and subclasses to wrap some of the # details of using os.path.walk to walk and search; testmask # is an integer bitmask with 1 bit per available selftest; # see also: visitor_edit/replace/find/fix*/.py subclasses, # and the fixsitename.py client script in InternetCgi-Web; ############################################################# import os, sys, string listonly = 0 class FileVisitor: """ visits all non-directory files below startDir; override visitfile to provide a file handler """ def __init__(self, data=None, listonly=0): self.context = data self.fcount = 0 self.dcount = 0 self.listonly = listonly def run(self, startDir=os.curdir): # default start='.' os.path.walk(startDir, self.visitor, None) def visitor(self, data, dirName, filesInDir): # called for each dir self.visitdir(dirName) # do this dir first for fname in filesInDir: # do non-dir files fpath = os.path.join(dirName, fname) # fnames have no path if not os.path.isdir(fpath): self.visitfile(fpath) def visitdir(self, dirpath): # called for each dir self.dcount = self.dcount + 1 # override or extend me print dirpath, '...' def visitfile(self, filepath): # called for each file self.fcount = self.fcount + 1 # override or extend me print self.fcount, '=>', filepath # default: print name class SearchVisitor(FileVisitor): """ search files at and below startDir for a string """ skipexts = ['.gif', '.exe', '.pyc', '.o', '.a'] # skip binary files def __init__(self, key, listonly=0): FileVisitor.__init__(self, key, listonly) self.scount = 0 def visitfile(self, fname): # test for a match FileVisitor.visitfile(self, fname) if not self.listonly: if os.path.splitext(fname)[1] in self.skipexts: print 'Skipping', fname else: text = open(fname).read( ) if string.find(text, self.context) != -1: self.visitmatch(fname, text) self.scount = self.scount + 1 def visitmatch(self, fname, text): # process a match raw_input('%s has %s' % (fname, self.context)) # override me lower # self-test logic dolist = 1 dosearch = 2 # 3=do list and search donext = 4 # when next test added def selftest(testmask): if testmask & dolist: visitor = FileVisitor( ) visitor.run('.') print 'Visited %d files and %d dirs' % (visitor.fcount, visitor.dcount) if testmask & dosearch: visitor = SearchVisitor(sys.argv[2], listonly) visitor.run('.') print 'Found in %d files, visited %d' % (visitor.scount, visitor.fcount) if __name__ == '__main__': selftest(int(sys.argv[1])) # e.g., 5 = dolist | dorename
This module primarily serves to export classes for external use, but it does something useful when run standalone too. If you invoke it as a script with a single argument "1", it makes and runs a FileVisitor object, and prints an exhaustive listing of every file and directory at and below the place you are at when the script is invoked (i.e., ".", the current working directory):
C: emp>python %X%PyToolsvisitor.py 1 . ... 1 => .autoexec.bat 2 => .cleanall.csh 3 => .echoEnvironment.pyw 4 => .Launcher.py 5 => .Launcher.pyc 6 => .Launch_PyGadgets.py 7 => .Launch_PyDemos.pyw ...more deleted... 479 => .GuiClockplotterGui.py 480 => .GuiClockplotterText.py 481 => .GuiClockplotterText1.py 482 => .GuiClock\__init__.py .Guigifs ... 483 => .Guigifsfrank.gif 484 => .Guigifsfrank.note 485 => .Guigifsgilligan.gif 486 => .Guigifsgilligan.note ...more deleted... 1352 => .PyToolsvisitor_fixnames.py 1353 => .PyToolsvisitor_find_quiet2.py 1354 => .PyToolsvisitor_find.pyc 1355 => .PyToolsvisitor_find_quiet1.py 1356 => .PyToolsfixeoln_one.doc.txt Visited 1356 files and 119 dirs
If you instead invoke this script with a "2" as its first argument, it makes and runs a SearchVisitor object, using the second argument as the search key. This form is equivalent to running the search_all.py script we met earlier; it pauses for an Enter key press after each matching file is reported (lines in bold font here):
C: empexamples>python %X%PyToolsvisitor.py 2 Part3 . ... 1 => .autoexec.bat 2 => .cleanall.csh .cleanall.csh has Part3 3 => .echoEnvironment.pyw 4 => .Launcher.py .Launcher.py has Part3 5 => .Launcher.pyc Skipping .Launcher.pyc 6 => .Launch_PyGadgets.py 7 => .Launch_PyDemos.pyw 8 => .LaunchBrowser.out.txt 9 => .LaunchBrowser.py 10 => .Launch_PyGadgets_bar.pyw 11 => .makeall.csh .makeall.csh has Part3 ... ...more deleted ... 1353 => .PyToolsvisitor_find_quiet2.py 1354 => .PyToolsvisitor_find.pyc Skipping .PyToolsvisitor_find.pyc 1355 => .PyToolsvisitor_find_quiet1.py 1356 => .PyToolsfixeoln_one.doc.txt Found in 49 files, visited 1356
Technically, passing this script a first argument "3" runs both a FileVisitor and a SearchVisitor (two separate traversals are performed). The first argument is really used as a bitmask to select one or more supported self-tests -- if a test's bit is on in the binary value of the argument, the test will be run. Because 3 is 011 in binary, it selects both a search (010) and a listing (001). In a more user-friendly system we might want to be more symbolic about that (e.g., check for "-search" and "-list" arguments), but bitmasks work just as well for this script's scope.
5.5.1 Editing Files in Directory Trees
Now, after genericizing tree traversals and searches, it's an easy step to add automatic file editing in a brand-new, separate component. Example 5-12 defines a new EditVisitor class that simply customizes the visitmatch method of the SearchVisitor class, to open a text editor on the matched file. Yes, this is the complete program -- it needs to do something special only when visiting matched files, and so need provide only that behavior; the rest of the traversal and search logic is unchanged and inherited.
Example 5-12. PP2EPyToolsvisitor_edit.py
############################################################### # Use: "python PyToolsvisitor_edit.py string". # add auto-editor start up to SearchVisitor in an external # component (subclass), not in-place changes; this version # automatically pops up an editor on each file containing the # string as it traverses; you can also use editor='edit' or # 'notepad' on windows; 'vi' and 'edit' run in console window; # editor=r'python GuiTextEditor extEditor.pyw' may work too; # caveat: we might be able to make this smarter by sending # a search command to go to the first match in some editors; ############################################################### import os, sys, string from visitor import SearchVisitor listonly = 0 class EditVisitor(SearchVisitor): """ edit files at and below startDir having string """ editor = 'vi' # ymmv def visitmatch(self, fname, text): os.system('%s %s' % (self.editor, fname)) if __name__ == '__main__': visitor = EditVisitor(sys.argv[1], listonly) visitor.run('.') print 'Edited %d files, visited %d' % (visitor.scount, visitor.fcount)
When we make and run an EditVisitor, a text editor is started with the os.system command-line spawn call, which usually blocks its caller until the spawned program finishes. On my machines, each time this script finds a matched file during the traversal, it starts up the vi text editor within the console window where the script was started; exiting the editor resumes the tree walk.
Let's find and edit some files. When run as a script, we pass this program the search string as a command argument (here, the string "-exec" is the search key, not an option flag). The root directory is always passed to the run method as ".", the current run directory. Traversal status messages show up in the console as before, but each matched file now automatically pops up in a text editor along the way. Here, the editor is started eight times:
C:...PP2E>python PyToolsvisitor_edit.py -exec 1 => .autoexec.bat 2 => .cleanall.csh 3 => .echoEnvironment.pyw 4 => .Launcher.py 5 => .Launcher.pyc Skipping .Launcher.pyc ...more deleted... 1340 => .old_Part2Basicsunpack2.py 1341 => .old_Part2Basicsunpack2b.py 1342 => .old_Part2Basicsunpack3.py 1343 => .old_Part2Basics\__init__.py Edited 8 files, visited 1343
This, finally, is the exact tool I was looking for to simplify global book examples tree maintenance. After major changes to things like shared modules and file and directory names, I run this script on the examples root directory with an appropriate search string, and edit any files it pops up as needed. I still need to change files by hand in the editor, but that's often safer than blind global replacements.
5.5.2 Global Replacements in Directory Trees
But since I brought it up: given a general tree traversal class, it's easy to code a global search-and-replace subclass too. The FileVisitor subclass in Example 5-13, ReplaceVisitor, customizes the visitfile method to globally replace any appearances of one string with another, in all text files at and below a root directory. It also collects the names of all files that were changed in a list, just in case you wish to go through and verify the automatic edits applied (a text editor could be automatically popped up on each changed file, for instance).
Example 5-13. PP2EPyToolsvisitor_replace.py
################################################################ # Use: "python PyToolsvisitor_replace.py fromStr toStr". # does global search-and-replace in all files in a directory # tree--replaces fromStr with toStr in all text files; this # is powerful but dangerous!! visitor_edit.py runs an editor # for you to verify and make changes, and so is much safer; # use CollectVisitor to simply collect a list of matched files; ################################################################ import os, sys, string from visitor import SearchVisitor listonly = 0 class ReplaceVisitor(SearchVisitor): """ change fromStr to toStr in files at and below startDir; files changed available in obj.changed list after a run """ def __init__(self, fromStr, toStr, listonly=0): self.changed = [] self.toStr = toStr SearchVisitor.__init__(self, fromStr, listonly) def visitmatch(self, fname, text): fromStr, toStr = self.context, self.toStr text = string.replace(text, fromStr, toStr) open(fname, 'w').write(text) self.changed.append(fname) if __name__ == '__main__': if raw_input('Are you sure?') == 'y': visitor = ReplaceVisitor(sys.argv[1], sys.argv[2], listonly) visitor.run(startDir='.') print 'Visited %d files' % visitor.fcount print 'Changed %d files:' % len(visitor.changed) for fname in visitor.changed: print fname
To run this script over a directory tree, go to the directory to be changed and run the following sort of command line, with "from" and "to" strings. On my current machine, doing this on a 1354-file tree and changing 75 files along the way takes roughly six seconds of real clock time when the system isn't particularly busy:
C: empexamples>python %X%/PyTools/visitor_replace.py Part2 SPAM2 Are you sure?y . ... 1 => .autoexec.bat 2 => .cleanall.csh 3 => .echoEnvironment.pyw 4 => .Launcher.py 5 => .Launcher.pyc Skipping .Launcher.pyc 6 => .Launch_PyGadgets.py ...more deleted... 1351 => .PyToolsvisitor_find_quiet2.py 1352 => .PyToolsvisitor_find.pyc Skipping .PyToolsvisitor_find.pyc 1353 => .PyToolsvisitor_find_quiet1.py 1354 => .PyToolsfixeoln_one.doc.txt Visited 1354 files Changed 75 files: .Launcher.py .LaunchBrowser.out.txt .LaunchBrowser.py .PyDemos.pyw .PyGadgets.py .README-PP2E.txt ...more deleted... .PyToolssearch_all.out.txt .PyToolsvisitor.out.txt .PyToolsvisitor_edit.py [to delete, use an empty toStr] C: empexamples>python %X%/PyTools/visitor_replace.py SPAM ""
This is both wildly powerful and dangerous. If the string to be replaced is something that can show up in places you didn't anticipate, you might just ruin an entire tree of files by running the ReplaceVisitor object defined here. On the other hand, if the string is something very specific, this object can obviate the need to automatically edit suspicious files. For instance, we will use this approach to automatically change web site addresses in HTML files in Chapter 12; the addresses are likely too specific to show up in other places by chance.
5.5.3 Collecting Matched Files in Trees
The scripts so far search and replace in directory trees, using the same traversal code base (module visitor). Suppose, though, that you just want to get a Python list of files in a directory containing a string. You could run a search and parse the output messages for "found" messages. Much simpler, simply knock off another SearchVisitor subclass to collect the list along the way, as in Example 5-14.
Example 5-14. PP2EPyToolsvisitor_collect.py
################################################################# # Use: "python PyToolsvisitor_collect.py searchstring". # CollectVisitor simply collects a list of matched files, for # display or later processing (e.g., replacement, auto-editing); ################################################################# import os, sys, string from visitor import SearchVisitor class CollectVisitor(SearchVisitor): """ collect names of files containing a string; run this and then fetch its obj.matches list """ def __init__(self, searchstr, listonly=0): self.matches = [] SearchVisitor.__init__(self, searchstr, listonly) def visitmatch(self, fname, text): self.matches.append(fname) if __name__ == '__main__': visitor = CollectVisitor(sys.argv[1]) visitor.run(startDir='.') print 'Found these files:' for fname in visitor.matches: print fname
CollectVisitor is just tree search again, with a new kind of specialization -- collecting files, instead of printing messages. This class is useful from other scripts that mean to collect a matched files list for later processing; it can be run by itself as a script too:
C:...PP2E>python PyToolsvisitor_collect.py -exec ... ...more deleted... ... 1342 => .old_Part2Basicsunpack2b.py 1343 => .old_Part2Basicsunpack3.py 1344 => .old_Part2Basics\__init__.py Found these files: .package.csh .README-PP2E.txt . eadme-old-pp1E.txt .PyToolscleanpyc.py .PyToolsfixeoln_all.py .SystemProcessesoutput.txt .InternetCgi-Webfixcgi.py
5.5.3.1 Suppressing status messages
Here, the items in the collected list are displayed at the end -- all the files containing the string "-exec". Notice, though, that traversal status messages are still printed along the way (in fact, I deleted about 1600 lines of such messages here!). In a tool meant to be called from another script, that may be an undesirable side effect; the calling script's output may be more important than the traversal's.
We could add mode flags to SearchVisitor to turn off status messages, but that makes it more complex. Instead, the following two files show how we might go about collecting matched filenames without letting any traversal messages show up in the console, all without changing the original code base. The first, shown in Example 5-15, simply takes over and copies the search logic, without print statements. It's a bit redundant with SearchVisitor, but only in a few lines of mimicked code.
Example 5-15. PP2EPyToolsvisitor_collect_quiet1.py
############################################################## # Like visitor_collect, but avoid traversal status messages ############################################################## import os, sys, string from visitor import FileVisitor, SearchVisitor class CollectVisitor(FileVisitor): """ collect names of files containing a string, silently; """ skipexts = SearchVisitor.skipexts def __init__(self, searchStr): self.matches = [] self.context = searchStr def visitdir(self, dname): pass def visitfile(self, fname): if (os.path.splitext(fname)[1] not in self.skipexts and string.find(open(fname).read( ), self.context) != -1): self.matches.append(fname) if __name__ == '__main__': visitor = CollectVisitor(sys.argv[1]) visitor.run(startDir='.') print 'Found these files:' for fname in visitor.matches: print fname
When this class is run, only the contents of the matched filenames list show up at the end; no status messages appear during the traversal. Because of that, this form may be more useful as a general-purpose tool used by other scripts:
C:...PP2E>python PyToolsvisitor_collect_quiet1.py -exec Found these files: .package.csh .README-PP2E.txt . eadme-old-pp1E.txt .PyToolscleanpyc.py .PyToolsfixeoln_all.py .SystemProcessesoutput.txt .InternetCgi-Webfixcgi.py
A more interesting and less redundant way to suppress printed text during a traversal is to apply the stream redirection tricks we met in Chapter 2. Example 5-16 sets sys.stdin to a NullOut object that throws away all printed text for the duration of the traversal (its write method does nothing).
The only real complication with this scheme is that there is no good place to insert a restoration of sys.stdout at the end of the traversal; instead, we code the restore in the __del__ destructor method, and require clients to delete the visitor to resume printing as usual. An explicitly called method would work just as well, if you prefer less magical interfaces.
Example 5-16. PP2EPyToolsvisitor_collect_quiet2.py
############################################################## # Like visitor_collect, but avoid traversal status messages ############################################################## import os, sys, string from visitor import SearchVisitor class NullOut: def write(self, line): pass class CollectVisitor(SearchVisitor): """ collect names of files containing a string, silently """ def __init__(self, searchstr, listonly=0): self.matches = [] self.saveout, sys.stdout = sys.stdout, NullOut( ) SearchVisitor.__init__(self, searchstr, listonly) def __del__(self): sys.stdout = self.saveout def visitmatch(self, fname, text): self.matches.append(fname) if __name__ == '__main__': visitor = CollectVisitor(sys.argv[1]) visitor.run(startDir='.') matches = visitor.matches del visitor print 'Found these files:' for fname in matches: print fname
When this script is run, output is identical to the prior run -- just the matched filenames at the end. Perhaps better still, why not code and debug just one verbose CollectVisitor utility class, and require clients to wrap calls to its run method in the redirect.redirect function we wrote back in Example 2-10 ?
>>> from PP2E.PyTools.visitor_collect import CollectVisitor >>> from PP2E.System.Streams.redirect import redirect >>> walker = CollectVisitor('-exec') # object to find '-exec' >>> output = redirect(walker.run, ('.',), '') # function, args, input >>> for line in walker.matches: print line # print items in list ... .package.csh .README-PP2E.txt . eadme-old-pp1E.txt .PyToolscleanpyc.py .PyToolsfixeoln_all.py .SystemProcessesoutput.txt .InternetCgi-Webfixcgi.py
The redirect call employed here resets standard input and output streams to file-like objects for the duration of any function call; because of that, it's a more general way to suppress output than recoding every outputter. Here, it has the effect of intercepting (and hence suppressing) printed messages during a walker.run('.') traversal. They really are printed, but show up in the string result of the redirect call, not on the screen:
>>> output[:60] '. ... 121 => .\autoexec.bat 122 => .\cleanall.csh 123 => .\echoEnv' >>> import string >>> len(output), len(string.split(output, ' ')) # bytes, lines (67609, 1592) >>> walker.matches ['.\package.csh', '.\README-PP2E.txt', '.\readme-old-pp1E.txt', '.\PyTools\cleanpyc.py', '.\PyTools\fixeoln_all.py', '.\System\Processes\output.txt', '.\Internet\Cgi-Web\fixcgi.py']
Because redirect saves printed text in a string, it may be less appropriate than the two quiet CollectVisitor variants for functions that generate much output. Here, for example, 67,609 bytes of output was queued up in an in-memory string (see the len call results); such a buffer may or may not be significant in some applications.
In more general terms, redirecting sys.stdout to dummy objects as done here is a simple way to turn off outputs (and is the equivalent to the Unix notion of redirecting output to file /dev/null -- a file that discards everything sent to it). For instance, we'll pull this trick out of the bag again in the context of server-side Internet scripting, to prevent utility status messages from showing up in generated web page output streams.[10]
[10] For the impatient: see commonhtml.runsilent in the PyMailCgi system presented in Chapter 13. It's a variation on redirect.redirect that discards output as it is printed (instead of retaining it in a string), returns the return value of the function called (not the output string), and lets exceptions pass via a try/finally statement (instead of catching and reporting them with a try/except). It's still redirection at work, though.
5.5.4 Recoding Fixers with Visitors
Be warned: once you've written and debugged a class that knows how to do something useful like walking directory trees, it's easy for it to spread throughout your system utility libraries. Of course, that's the whole point of code reuse. For instance, very soon after writing the visitor classes presented in the prior sections, I recoded both the fixnames_all.py and fixeoln_all.py directory walker scripts listed earlier in Examples Example 5-6 and Example 5-4, respectively, to use visitor instead of proprietary tree-walk logic (they both originally used find.find). Example 5-17 combines the original convertLines function (to fix end-of-lines in a single file) with visitor's tree walker class, to yield an alternative implementation of the line-end converter for directory trees.
Example 5-17. PP2EPyToolsvisitor_fixeoln.py
############################################################## # Use: "python visitor_fixeoln.py todos|tounix". # recode fixeoln_all.py as a visitor subclass: this version # uses os.path.walk, not find.find to collext all names first; # limited but fast: if os.path.splitext(fname)[1] in patts: ############################################################## import visitor, sys, fnmatch, os from fixeoln_dir import patts from fixeoln_one import convertEndlines class EolnFixer(visitor.FileVisitor): def visitfile(self, fullname): # match on basename basename = os.path.basename(fullname) # to make result same for patt in patts: # else visits fewer if fnmatch.fnmatch(basename, patt): convertEndlines(self.context, fullname) self.fcount = self.fcount + 1 # could break here # but results differ if __name__ == '__main__': walker = EolnFixer(sys.argv[1]) walker.run( ) print 'Files matched (converted or not):', walker.fcount
As we saw in Chapter 2, the built-in fnmatch module performs Unix shell-like filename matching; this script uses it to match names to the previous version's filename patterns (simply looking for filename extensions after a "." is simpler, but not as general):
C: empexamples>python %X%/PyTools/visitor_fixeoln.py tounix . ... Changing .echoEnvironment.pyw Changing .Launcher.py Changing .Launch_PyGadgets.py Changing .Launch_PyDemos.pyw ...more deleted... Changing .PyToolsvisitor_find.py Changing .PyToolsvisitor_fixnames.py Changing .PyToolsvisitor_find_quiet2.py Changing .PyToolsvisitor_find_quiet1.py Changing .PyToolsfixeoln_one.doc.txt Files matched (converted or not): 1065 C: empexamples>python %X%/PyTools/visitor_fixeoln.py tounix ...more deleted... .ExtendSwigShadow ... . ... .EmbExtExports ... .EmbExtExportsClassAndMod ... .EmbExtRegist ... .PyTools ... Files matched (converted or not): 1065
If you run this script and the original fixeoln_all.py on the book examples tree, you'll notice that this version visits two fewer matched files. This simply reflects the fact that fixeoln_all also collects and skips over two directory names for its patterns in the find.find result (both called "Output"). In all other ways, this version works the same way even when it could do better -- adding a break statement after the convertEndlines call here avoids visiting files that appear redundantly in the original's find results lists.
The first command here takes roughly six seconds on my computer, and the second takes about four (there are no files to be converted). That's faster than the eight- and six-second figures for the original find.find-based version of this script, but they differ in amount of output, and benchmarks are usually much more subtle than you imagine. Most of the real clock time is likely spent scrolling text in the console, not doing any real directory processing. Since both are plenty fast for their intended purposes, finer-grained performance figures are left as exercises.
The script in Example 5-18 combines the original convertOne function (to rename a single file or directory) with the visitor's tree walker class, to create a directory tree-wide fix for uppercase filenames. Notice that we redefine both file and directory visitation methods here, as we need to rename both.
Example 5-18. PP2EPyToolsvisitor_fixnames.py
############################################################### # recode fixnames_all.py name case fixer with the Visitor class # note: "from fixnames_all import convertOne" doesn't help at # top-level of the fixnames class, since it is assumed to be a # method and called with extra self argument (an exception); ############################################################### from visitor import FileVisitor class FixnamesVisitor(FileVisitor): """ check filenames at and below startDir for uppercase """ import fixnames_all def __init__(self, listonly=0): FileVisitor.__init__(self, listonly=listonly) self.ccount = 0 def rename(self, pathname): if not self.listonly: convertflag = self.fixnames_all.convertOne(pathname) self.ccount = self.ccount + convertflag def visitdir(self, dirname): FileVisitor.visitdir(self, dirname) self.rename(dirname) def visitfile(self, filename): FileVisitor.visitfile(self, filename) self.rename(filename) if __name__ == '__main__': walker = FixnamesVisitor( ) walker.run( ) allnames = walker.fcount + walker.dcount print 'Converted %d files, visited %d' % (walker.ccount, allnames)
This version is run like the original find.find based version, fixnames_all, but visits one more name (the top-level root directory), and there is no initial delay while filenames are collected on a list -- we're using os.path.walk again, not find.find. It's also close to the original os.path.walk version of this script, but is based on a class hierarchy, not direct function callbacks:
C: empexamples>python %X%/PyTools/visitor_fixnames.py ...more deleted... 303 => .\__init__.py 304 => .\__init__.pyc 305 => .AiExpertSystemholmes.tar 306 => .AiExpertSystemTODO Convert dir=.AiExpertSystem file=TODO? (y|Y) 307 => .AiExpertSystem\__init__.py 308 => .AiExpertSystemholmescnv 309 => .AiExpertSystemholmesREADME.1ST Convert dir=.AiExpertSystemholmes file=README.1ST? (y|Y) ...more deleted... 1353 => .PyToolsvisitor_find.pyc 1354 => .PyToolsvisitor_find_quiet1.py 1355 => .PyToolsfixeoln_one.doc.txt Converted 1 files, visited 1474
Both of these fixer scripts work roughly the same as the originals, but because the directory walking logic lives in just one file (visitor.py), it only needs to be debugged once. Moreover, improvements in that file will automatically be inherited by every directory-processing tool derived from its classes. Even when coding system-level scripts, reuse and reduced redundancy pay off in the end.
5.5.5 Fixing File Permissions in Trees
Just in case the preceding visitor-client sections weren't quite enough to convince you of the power of code reuse, another piece of evidence surfaced very late in this book project. It turns out that copying files off a CD using Windows drag-and-drop makes them read-only in the copy. That's less than ideal for the book examples directory on the enclosed CD (see http://examples.oreilly.com/python2) -- you must copy the directory tree onto your hard drive to be able to experiment with program changes (naturally, files on CD can't be changed in place). But if you copy with drag-and-drop, you may wind up with a tree of over 1000 read-only files.
Since drag-and-drop is perhaps the most common way to copy off a CD on Windows, I needed a portable and easy-to-use way to undo the read-only setting. Asking readers to make these all writable by hand would be impolite to say the least. Writing a full-blown install system seemed like overkill. Providing different fixes for different platforms doubles or triples the complexity of the task.
Much better, the Python script in Example 5-19 can be run in the root of the copied examples directory to repair the damage of a read-only drag-and-drop operation. It specializes the traversal implemented by the FileVisitor class again -- this time to run an os.chmod call on every file and directory visited along the way.
Example 5-19. PP2EPyToolsfixreadonly-all.py
#!/usr/bin/env python ############################################################### # Use: python PyToolsfixreadonly-all.py # run this script in the top-level examples directory after # copying all examples off the book's CD-ROM, to make all # files writeable again--by default, copying files off the # CD with Windows drag-and-drop (at least) creates them as # read-only on your hard drive; this script traverses entire # dir tree at and below the dir it is run in (all subdirs); ############################################################### import os, string from PP2E.PyTools.visitor import FileVisitor # os.path.walk wrapper listonly = 0 class FixReadOnly(FileVisitor): def __init__(self, listonly=0): FileVisitor.__init__(self, listonly=listonly) def visitDir(self, dname): FileVisitor.visitfile(self, fname) if self.listonly: return os.chmod(dname, 0777) def visitfile(self, fname): FileVisitor.visitfile(self, fname) if self.listonly: return os.chmod(fname, 0777) if __name__ == '__main__': # don't run auto if clicked go = raw_input('This script makes all files writeable; continue?') if go != 'y': raw_input('Canceled - hit enter key') else: walker = FixReadOnly(listonly) walker.run( ) print 'Visited %d files and %d dirs' % (walker.fcount, walker.dcount)
As we saw in Chapter 2, the built-in os.chmod call changes the permission settings on an external file (here, to 0777 -- global read, write, and execute permissions). Because os.chmod and the FileVisitor's operations are portable, this same script will work to set permissions in an entire tree on both Windows and Unix-like platforms. Notice that it asks whether you really want to proceed when it first starts up, just in case someone accidentally clicks the file's name in an explorer GUI. Also note that Python must be installed before this script can be run to make files writable; that seems a fair assumption to make of users about to change Python scripts.
C: empexamples>python PyToolsfixreadonly-all.py
This script makes all files writeable; continue?y
. ...
1 => .autoexec.bat
2 => .cleanall.csh
3 => .echoEnvironment.pyw
...more deleted...
1352 => .PyToolsvisitor_find.pyc
1353 => .PyToolsvisitor_find_quiet1.py
1354 => .PyToolsfixeoln_one.doc.txt
Visited 1354 files and 119 dirs
Категории