The nntplib Module
The nntplib module provides a Network News Transfer Protocol (NNTP) client implementation.
7.17.1 Listing messages
Prior to reading messages from a news server, you have to connect to the server and then select a newsgroup. The script in Example 7-32 also downloads a complete list of all messages on the server and extracts some more or less interesting statistics from that list.
Example 7-32. Using the nntplib Module to List Messages
File: nntplib-example-1.py import nntplib import string SERVER = "news.spam.egg" GROUP = "comp.lang.python" AUTHOR = "fredrik@pythonware.com" # eff-bots human alias # connect to server server = nntplib.NNTP(SERVER) # choose a newsgroup resp, count, first, last, name = server.group(GROUP) print "count", "=>", count print "range", "=>", first, last # list all items on the server resp, items = server.xover(first, last) # extract some statistics authors = {} subjects = {} for id, subject, author, date, message_id, references, size, lines in items: authors[author] = None if subject[:4] == "Re: ": subject = subject[4:] subjects[subject] = None if string.find(author, AUTHOR) >= 0: print id, subject print "authors", "=>", len(authors) print "subjects", "=>", len(subjects) count => 607 range => 57179 57971 57474 Three decades of Python! ... 57477 More Python books coming... authors => 257 subjects => 200
7.17.2 Downloading Messages
Downloading a message is easy. Just call the article method, as shown in Example 7-33.
Example 7-33. Using the nntplib Module to Download Messages
File: nntplib-example-2.py import nntplib import string SERVER = "news.spam.egg" GROUP = "comp.lang.python" KEYWORD = "tkinter" # connect to server server = nntplib.NNTP(SERVER) resp, count, first, last, name = server.group(GROUP) resp, items = server.xover(first, last) for id, subject, author, date, message_id, references, size, lines in items: if string.find(string.lower(subject), KEYWORD) >= 0: resp, id, message_id, text = server.article(id) print author print subject print len(text), "lines in article" "Fredrik Lundh" Re: Programming Tkinter (In Python) 110 lines in article ...
Example 7-34 shows how you can further manipulate the messages by wrapping it up in a Message object (using the rfc822 module).
Example 7-34. Using the nntplib and rfc822 Modules to Process Messages
File: nntplib-example-3.py import nntplib import string, random import StringIO, rfc822 SERVER = "news.spam.egg" GROUP = "comp.lang.python" # connect to server server = nntplib.NNTP(SERVER) resp, count, first, last, name = server.group(GROUP) for i in range(10): try: id = random.randint(int(first), int(last)) resp, id, message_id, text = server.article(str(id)) except (nntplib.error_temp, nntplib.error_perm): pass # no such message (maybe it was deleted?) else: break # found a message! else: raise SystemExit text = string.join(text, " ") file = StringIO.StringIO(text) message = rfc822.Message(file) for k, v in message.items(): print k, "=", v print message.fp.read() mime-version = 1.0 content-type = text/plain; charset="iso-8859-1" message-id = <008501bf1417$1cf90b70$f29b12c2@sausage.spam.egg> lines = 22 ... from = "Fredrik Lundh" nntp-posting-host = parrot.python.org subject = ANN: (the eff-bot guide to) The Standard Python Library ...
Once you've gotten this far, you can use modules like htmllib, uu, and base64 to further process the messages.