Downloading Mail from a POP3 Server

One of the most widely used email protocols is the Post Office Protocol version 3 (POP3). POP3 does one thing, and does it well: it allows a user to log into a mail server and download her messages, optionally deleting the copies on the server afterwards. POP3 is a simple enough protocol that you can talk to a server manually, through Telnet, as shown in Example 7-1.

Example 7-1. Communicating with a POP3 server using Telnet

$ telnet pop.myisp.com 110 Connected to pop.myisp.com. Escape character is '^]'. +OK dovecot ready. user myusername +OK pass mypassword +OK Logged in. list +OK 2 messages: 1 1385 2 100 . retr 2 +OK 100 octets From: somebody@example.com To: abe@fettig.net Subject: Hello How's the weather up there in Maine? . quit +OK Logging out. Connection closed by foreign host.

But even with such a simple protocol, it's nice to be saved the effort of writing your own implementation from scratch. Twisted comes with a Protocol class that implements POP3: twisted.mail.pop3client.POP3Client. You can use it in your programs to download mail from a POP3 server.

7.1.1. How Do I Do That?

Create a subclass of POP3Client. Use the methods login, listSize, listUidl, retrieve, and quit to send commands to the server and get Deferreds that will be called back with the server's response. Example 7-2 demonstrates a POP3 client that logs into a server, retrieves the list of available messages, and then downloads each message to an mbox file.

Example 7-2. pop3download.py

from twisted.mail import pop3client from twisted.internet import reactor, protocol, defer from cStringIO import StringIO import email class POP3DownloadProtocol(pop3client.POP3Client): # permit logging without encryption allowInsecureLogin = True def serverGreeting(self, greeting): pop3client.POP3Client.serverGreeting(self, greeting) login = self.login(self.factory.username, self.factory.password) login.addCallback(self._loggedIn) login.chainDeferred(self.factory.deferred) def _loggedIn(self, result): return self.listSize( ).addCallback(self._gotMessageSizes) def _gotMessageSizes(self, sizes): retreivers = [] for i in range(len(sizes)): retreivers.append(self.retrieve(i).addCallback( self._gotMessageLines)) return defer.DeferredList(retreivers).addCallback( self._finished) def _gotMessageLines(self, messageLines): self.factory.handleMessage(" ".join(messageLines)) def _finished(self, downloadResults) return self.quit( ) class POP3DownloadFactory(protocol.ClientFactory): protocol = POP3DownloadProtocol def _ _init_ _(self, username, password, output): self.username = username self.password = password self.output = output self.deferred = defer.Deferred( ) def handleMessage(self, messageData): parsedMessage = email.message_from_string(messageData) self.output.write(parsedMessage.as_string(unixfrom=True)) self.output.write(' ') def clientConnectionFailed(self, connection, reason): self.deferred.errback(reason) import sys, getpass def handleError(error): print error print >> sys.stderr, "Error:", error.getErrorMessage( ) reactor.stop( ) if __name__ == "_ _main_ _": if len(sys.argv) != 4: print "Usage: %s server username output.mbox" % sys.argv[0] sys.exit(1) else: server, username, outputfile = sys.argv[1:] password = getpass.getpass("Password: ") f = POP3DownloadFactory(username, password, file(outputfile, 'w+b')) f.deferred.addCallback( lambda _: reactor.stop( )).addErrback( handleError) reactor.connectTCP(server, 110, f) reactor.run( )

Run pop3download.py with three arguments: the server, the login username, and the output filename. It will prompt you for your password, log in to the server, download all the messages, and write them to the output file in the Unix-standard mbox format:

$ python pop3download.py pop.myisp.com mylogin pop-messages.mbox Password: Downloading message 1 of 31 Downloading message 2 of 31 Downloading message 3 of 31 ... Downloading message 30 of 31 Downloading message 31 of 31

Note that the contents of your output file will be overwritten every time you run pop3download.pybe careful not to clobber any important messages!

 

7.1.2. How Does That Work?

There are two main classes in pop3download.py: POP3DownloadProtocol and POP3DownloadFactory. As a standard factory, a POP3DownloadFactory will create a POP3DownloadProtocol object when a connection to the server is established. The two classes then work together. The POP3DownloadProtocol communicates with the server, and passes each downloaded message to the POP3DownloadFactory by calling self.factory.handleMessage. The POP3DownloadFactory then writes the message data to the output file. The POP3DownloadProtocol will also call either callback or errback on the POP3DownloadFactory's deferred object, to indicate that the process of downloading all messages has completed successfully, or failed with an error.

Example 7-2 uses the POP3Client class from twisted.mail.pop3client. There's another POP3Client in twisted.mail.pop3, but this is an old, deprecated class with a much less friendly API. Use the twisted.mail.pop3client version instead.

POP3Client, which POP3DownloadProtocol inherits from, provides high-level methods for sending POP3 commands, such as login, listSize, listUidl, retrieve, delete, and quit. The action in POP3DownloadProtocol starts in serverGreeting, which is called when the server sends the POP3 WELCOME message. POP3Client.login is an intelligent method that logs in to the server using the best available login technique. If the server supports APOP authentication, which doesn't send the password in plain text, login will use APOP; otherwise, it will log in using the plain-text username and password. POP3DownloadProtocol adds self._loggedIn as the callback handler for login, and then uses chainDeferred to connect the result of logging in to self.factory.deferred.

The self._loggedIn method responds to the successful login by calling POP3Client.listSize. This returns a Deferred that will be called back with a list of message sizes. Notice that _loggedIn returns this Deferred result, which keeps everything happening in the context of the Deferred from self.login, which is chained to self.factory.deferred. Therefore the result of listSize, or any exception that might occur, will be passed up to self.factory.deferred.

The callback handler for listSize, self._gotMessageSizes, requests each message from the server by calling POP3Client.retrieve with the message index number. You can send many retrieve requests at once, as POP3DownloadClient does, and POP3Client will automatically queue them up and send them one at a time. _gotMessageSizes keeps a list of all the calls it's made to retrieve, and then returns a defer.DeferredList wrapping the list. The DeferredList will call back after all the messages have been downloaded. Once again, _gotMessageSizes returns a Deferred, which keeps everything happening inside the callback chain of the original call to self.login.

The _gotMessageLines function is the callback handler used for each call to retrieve. It takes a list of the lines received from the server, joins them into a single string, and passes the message to self.factory.handleMessage.

Note that messages aren't being deleted from the server, which means they will still be available the next time you connect. If you wanted to delete the server's copy of a message, send the POP3 DELETE command by calling self.delete(messageNumber) for each message you want to delete. The POP3 DELETE command doesn't delete messages instantly; it queues up the list of messages to delete and then processes them after you quit. So if you send delete for several messages, and then something happens to your connection before you call quit, the messages will remain on the server.

The POP3DownloadFactory used in Example 7-2 takes three arguments at initialization: the username, the password, and the output file to use for the downloaded messages. Its handleMessage function uses Python's email module to parse the message and write it to the output file in mbox format.

The unixfrom=True argument that you see in handleMessage, in the call to parsedMessage.as_string, starts the message with a line in the form "From emailaddress timestamp," which the mbox format uses to determine where one message ends and a new message begins. Including that argument is key to generating an mbox file .

 

7.1.3. What About...

... using SSL (the Secure Socket Layer) for encrypted connections? Many POP servers support SSL as a secure alternative to regular TCP connections. By encrypting all the data passing between client and server, SSL keeps your password and email protected from anyone who might be snooping on your network connection.

To use POP3 over SSL , just replace reactor.connectTCP with reactor.connectSSL, and connect to port 995 instead of 110:

from twisted.internet import ssl reactor.connectSSL(server, 995, factory, ssl.ClientContextFactory( ))

reactor.connectSSL takes one additional argument, a twisted.internet.ssl.ClientContextFactory object. (The ClientContextFactory object can be used for advanced SSL tasks, like using custom certificates, but in most situations you just need to create it and pass it to reactor.connectSSL.)

Категории