Monitoring Download Progress

One potential weakness in the examples presented so far in this chapter is that there hasn't been a way to monitor a download in progress. Sure, it's nice that a Deferred will pass you the results of a page once it's completely downloaded, but sometimes what you really need is to keep an eye on the download as it's happening.

3.5.1. How Do I Do That?

Again, the utility functions provided by twisted.web.client don't give you quite enough control. Define a subclass of client.HTTPDownloader, the factory class used for downloading a web page to a file. By overriding a couple of methods, you can keep track of a download in progress. The webdownload.py script in Example 3-6 shows how.

Example 3-6. webdownload.py

from twisted.web import client class HTTPProgressDownloader(client.HTTPDownloader): def gotHeaders(self, headers): if self.status == '200': # page data is on the way if headers.has_key('content-length'): self.totalLength = int(headers['content-length'][0]) else: self.totalLength = 0 self.currentLength = 0.0 print '' return client.HTTPDownloader.gotHeaders(self, headers) def pagePart(self, data): if self.status == '200': self.currentLength += len(data) if self.totalLength: percent = "%i%%" % ( (self.currentLength/self.totalLength)*100) else: percent = '%dK' % (self.currentLength/1000) print "33[1FProgress: " + percent return client.HTTPDownloader.pagePart(self, data) def downloadWithProgress(url, file, contextFactory=None, *args, **kwargs): scheme, host, port, path = client._parse(url) factory = HTTPProgressDownloader(url, file, *args, **kwargs) if scheme == 'https': from twisted.internet import ssl if contextFactory is None: contextFactory = ssl.ClientContextFactory( ) reactor.connectSSL(host, port, factory, contextFactory) else: reactor.connectTCP(host, port, factory) return factory.deferred if __name__ == "_ _main_ _": import sys from twisted.internet import reactor def downloadComplete(result): print "Download Complete." reactor.stop( ) def downloadError(failure): print "Error:", failure.getErrorMessage( ) reactor.stop( ) url, outputFile = sys.argv[1:] downloadWithProgress(url, outputFile).addCallback( downloadComplete).addErrback( downloadError) reactor.run( )

Run webdownload.py with two arguments: the URL of a page to download and a filename in which to save it. As the command works, it will print updates on the download progress:

$ python webdownload.py http://www.oreilly.com/ oreilly.html Progress: 100% <- updated during the download Download Complete.

If the web server doesn't return a Content-Length header indicating the total length of the download, it isn't possible to calculate the percentage complete. In this case, webdownload.py prints the number of kilobytes downloaded:

$ python webdownload.py http://www.slashdot.org/ slashdot.html Progress: 60K <- updated during the download Download Complete.

 

3.5.2. How Does That Work?

HTTPProgressDownloader is a subclass of client.HTTPDownloader. It overrides the gotHeaders method to check for a Content-Length header that would indicate the total size of the page being downloaded. It also overrides the pagePart method, which is called each time a chunk of page data is received, to keep track of the number of bytes downloaded so far.

Each time a chunk of data comes in, HTTPProgressDownloader prints out a progress report. The string 33[1F is a terminal escape sequence that causes each line of the progress report to be written over the preceding line. This effect makes it look like the progress information is being updated in place.

The downloadWithProgress function contains code similar to that in Example 3-5 for parsing the requested URL, creating the HTTPProgressDownloader factory object, and initializing the connection. downloadComplete and downloadError are simple callback and errback handlers that print a message and stop the reactor.

Категории