Downloading Messages from an IMAP Mailbox
This lab demonstrates how to download copies of all the messages in an IMAP mailbox. You can do this using three methods of IMAP4Client: select, fetchUID, and fetchMessage.
7.5.1. How Do I Do That?
Create a subclass of twisted.protocols.imap4.IMAP4Client. Call select to select the mailbox you want to work with. Call fetchUID to retrieve the list of message identifiers for the mailbox, and then use fetchMessage to download each message, as demonstrated in Example 7-6.
Example 7-6. imapdownload.py
from twisted.protocols import imap4 from twisted.internet import protocol, defer import email class IMAPDownloadProtocol(imap4.IMAP4Client): def serverGreeting(self, capabilities): login = self.login(self.factory.username, self.factory.password) login.addCallback(self._ _loggedIn) login.chainDeferred(self.factory.deferred) def _ _loggedIn(self, result): return self.select(self.factory.mailbox).addCallback( self._ _selectedMailbox) def _ _selectedMailbox(self, result): # get a list of all message IDs allMessages = imap4.MessageSet(1, None) return self.fetchUID(allMessages, True).addCallback( self._ _gotUIDs) def _ _gotUIDs(self, uidResults): self.messageUIDs = [result['UID'] for result in uidResults.values( )] self.messageCount = len(self.messageUIDs) print "%i messages in %s." % (self.messageCount, self.factory.mailbox) return self.fetchNextMessage( ) def fetchNextMessage(self): if self.messageUIDs: nextUID = self.messageUIDs.pop(0) messageListToFetch = imap4.MessageSet(nextUID) print "Downloading message %i of %i" % ( self.messageCount-len(self.messageUIDs), self.messageCount) return self.fetchMessage(messageListToFetch, True).addCallback( self._ _gotMessage) else: # all done! return self.logout( ).addCallback( lambda _: self.transport.loseConnection( )) def _ _gotMessage(self, fetchResults): messageData = fetchResults.values( )[0]['RFC822'] self.factory.handleMessage(messageData) return self.fetchNextMessage( ) def connectionLost(self, reason): if not self.factory.deferred.called: # connection was lost unexpectedly! self.factory.deferred.errback(reason) class IMAPDownloadFactory(protocol.ClientFactory): protocol = IMAPDownloadProtocol def _ _init_ _(self, username, password, mailbox, output): self.username = username self.password = password self.mailbox = mailbox self.output = output self.deferred = defer.Deferred( ) def handleMessage(self, messageData): parsedMessage = email.message_from_string(messageData) self.output.write(parsedMessage.as_string(unixfrom=True)) self.output.write(' ') def clientConnectionFailed(self, connection, reason): self.deferred.errback(reason) if __name__ == "_ _main_ _": from twisted.internet import reactor import sys, getpass def handleError(error): print >> sys.stderr, "Error:", error.getErrorMessage( ) reactor.stop( ) if len(sys.argv) != 5: usage = "Usage: %s server user mailbox outputfile" % ( sys.argv[0]) print >> sys.stderr, usage sys.exit(1) server = sys.argv[1] user = sys.argv[2] mailbox = sys.argv[3] outputfile = file(sys.argv[4], 'w+b') password = getpass.getpass("Password: ") factory = IMAPDownloadFactory(user, password, mailbox, outputfile) factory.deferred.addCallback(lambda _: reactor.stop( )).addErrback( handleError) reactor.connectTCP(server, 143, factory) reactor.run( )
Run imapdownload.py from the command line with four arguments: the IMAP server, your login, the mailbox from which messages should be downloaded, and the name of the file to which they will be written. It will print out a progress report as it downloads the messages:
$ python imapdownload.py imap.myisp.com mylogin Inbox imap-Inbox.mbox Password: 31 messages in Inbox. Downloading message 1 of 31 Downloading message 2 of 31 Downloading message 3 of 31 ... Downloading message 30 of 31 Downloading message 31 of 31
7.5.2. How Does That Work?
imapdownload.py retains the basic structure of imapfolders.py from the previous lab. A subclass of IMAP4Client communicates with the server, and a subclass of ClientFactory creates the protocol and provides a Deferred for tracking the success or failure of the operation.
The IMAPDownloadProtocol starts by calling self.login to log in. The callback handler for self.login is self._ _loggedIn, which calls self.select to select the mailbox and return another Deferred. The callback handler for self.select, self._ _selectedMailbox, creates an imap4.MessageSet object, which represents a set of messages. The MessageSet is initialized with the arguments 1 and None, which means that it represents the set of messages from message number 1 to the end of the mailbox. This MessageSet object is then passed to self.fetchUID, instructing the server to return a list of the unique message identifiers in that set of messages.
The callback function for self.fetchUID is self._ _gotUIDs, which sets the attributes self.messageUIDs and self.messageCount. Then it returns the deferred result of self.fetchNextMessage, which starts the process of downloading the messages one by one. To download the next available message, fetchNextMessage creates a MessageSet matching only that message's UID. Then the MessageSet is passed to self.fetchMessage, with the uid keyword set to true to tell the server that the MessageSet is referring to messages by UID, not sequence number. The callback for self.fetchMessage is self. _gotMessage, which passes the message data to the factory by calling self.factory.handleMessage. _gotMessage then returns another call to self.fetchNextMessage. This process creates a loop where fetchNextMessage will continue to be called until there are no messages left to download, at which point it calls self.logout to log out, and drops the connection.
As shown in IMAPFolderListProtocol in the previous lab, all of the work done in IMAPDownloadProtocol happens in the context of callback handlers for self.login, which is chained to self.factory.deferred. If everything works as expected, self.factory.deferred will be called back with the result of the last function in the chain, self.transport.loseConnection. And if any of the IMAP methods fail, self.factory.deferred will handle the error.