Other Client-Side Tools

So far in this chapter, we have focused on Python's FTP and email processing tools and have met a handful of client-side scripting modules along the way: ftplib, poplib, smtplib, mhlib, mimetools, urllib, rfc822, and so on. This set is representative of Python's library tools for transferring and processing information over the Internet, but it's not at all complete. A more or less comprehensive list of Python's Internet-related modules appears at the start of the previous chapter. Among other things, Python also includes client-side support libraries for Internet news, Telnet, HTTP, and other standard protocols.

11.5.1 NNTP: Accessing Newsgroups

Python's nntplib module supports the client-side interface to NNTP -- the Network News Transfer Protocol -- which is used for reading and posting articles to Usenet newsgroups in the Internet. Like other protocols, NNTP runs on top of sockets and merely defines a standard message protocol; like other modules, nntplib hides most of the protocol details and presents an object-based interface to Python scripts.

We won't get into protocol details here, but in brief, NNTP servers store a range of articles on the server machine, usually in a flat-file database. If you have the domain or IP name of a server machine that runs an NNTP server program listening on the NNTP port, you can write scripts that fetch or post articles from any machine that has Python and an Internet connection. For instance, the script in Example 11-24 by default fetches and displays the last 10 articles from Python's Internet news group, comp.lang.python, from the news.rmi.net NNTP server at my ISP.

Example 11-24. PP2EInternetOther eadnews.py

############################################### # fetch and print usenet newsgroup postings # from comp.lang.python via the nntplib module # which really runs on top of sockets; nntplib # also supports posting new messages, etc.; # note: posts not deleted after they are read; ############################################### listonly = 0 showhdrs = ['From', 'Subject', 'Date', 'Newsgroups', 'Lines'] try: import sys servername, groupname, showcount = sys.argv[1:] showcount = int(showcount) except: servername = 'news.rmi.net' groupname = 'comp.lang.python' # cmd line args or defaults showcount = 10 # show last showcount posts # connect to nntp server print 'Connecting to', servername, 'for', groupname from nntplib import NNTP connection = NNTP(servername) (reply, count, first, last, name) = connection.group(groupname) print '%s has %s articles: %s-%s' % (name, count, first, last) # get request headers only fetchfrom = str(int(last) - (showcount-1)) (reply, subjects) = connection.xhdr('subject', (fetchfrom + '-' + last)) # show headers, get message hdr+body for (id, subj) in subjects: # [-showcount:] if fetch all hdrs print 'Article %s [%s]' % (id, subj) if not listonly and raw_input('=> Display?') in ['y', 'Y']: reply, num, tid, list = connection.head(id) for line in list: for prefix in showhdrs: if line[:len(prefix)] == prefix: print line[:80]; break if raw_input('=> Show body?') in ['y', 'Y']: reply, num, tid, list = connection.body(id) for line in list: print line[:80] print print connection.quit()

As for FTP and email tools, the script creates an NNTP object and calls its methods to fetch newsgroup information and articles' header and body text. The xhdr method, for example, loads selected headers from a range of messages. When run, this program connects to the server and displays each article's subject line, pausing to ask whether it should fetch and show the article's header information lines (headers listed in variable showhdrs only) and body text:

C:...PP2EInternetOther>python readnews.py Connecting to news.rmi.net for comp.lang.python comp.lang.python has 3376 articles: 30054-33447 Article 33438 [Embedding? file_input and eval_input] => Display? Article 33439 [Embedding? file_input and eval_input] => Display?y From: James Spears Newsgroups: comp.lang.python Subject: Embedding? file_input and eval_input Date: Fri, 11 Aug 2000 10:55:39 -0700 Lines: 34 => Show body? Article 33440 [Embedding? file_input and eval_input] => Display? Article 33441 [Embedding? file_input and eval_input] => Display? Article 33442 [Embedding? file_input and eval_input] => Display? Article 33443 [Re: PYHTONPATH] => Display?y Subject: Re: PYHTONPATH Lines: 13 From: sp00fd Newsgroups: comp.lang.python Date: Fri, 11 Aug 2000 11:06:23 -0700 => Show body?y Is this not what you were looking for? Add to cgi script: import sys sys.path.insert(0, "/path/to/dir") import yourmodule ----------------------------------------------------------- Got questions? Get answers over the phone at Keen.com. Up to 100 minutes free! http://www.keen.com Article 33444 [Loading new code...] => Display? Article 33445 [Re: PYHTONPATH] => Display? Article 33446 [Re: Compile snags on AIX & IRIX] => Display? Article 33447 [RE: string.replace() can't replace newline characters???] => Display? 205 GoodBye

We can also pass this script an explicit server name, newsgroup, and display count on the command line to apply it in different ways. Here is this Python script checking the last few messages in Perl and Linux newsgroups:

C:...PP2EInternetOther>python readnews.py news.rmi.net comp.lang.perl.misc 5 Connecting to news.rmi.net for comp.lang.perl.misc comp.lang.perl.misc has 5839 articles: 75543-81512 Article 81508 [Re: Simple Argument Passing Question] => Display? Article 81509 [Re: How to Access a hash value?] => Display? Article 81510 [Re: London =?iso-8859-1?Q?=A330-35K?= Perl Programmers Required] => Display? Article 81511 [Re: ODBC question] => Display? Article 81512 [Re: ODBC question] => Display? 205 GoodBye C:...PP2EInternetOther>python readnews.py news.rmi.net comp.os.linux 4 Connecting to news.rmi.net for comp.os.linux comp.os.linux has 526 articles: 9015-9606 Article 9603 [Re: Simple question about CD-Writing for Linux] => Display? Article 9604 [Re: How to start the ftp?] => Display? Article 9605 [Re: large file support] => Display? Article 9606 [Re: large file support] => Display?y From: andy@physast.uga.edu (Andreas Schweitzer) Newsgroups: comp.os.linux.questions,comp.os.linux.admin,comp.os.linux Subject: Re: large file support Date: 11 Aug 2000 18:32:12 GMT Lines: 19 => Show body?n 205 GoodBye

With a little more work, we could turn this script into a full-blown news interface. For instance, new articles could be posted from within a Python script with code of this form (assuming the local file already contains proper NNTP header lines):

# to post, say this (but only if you really want to post!) connection = NNTP(servername) localfile = open('filename') # file has proper headers connection.post(localfile) # send text to newsgroup connection.quit()

We might also add a Tkinter-based GUI frontend to this script to make it more usable, but we'll leave such an extension on the suggested exercise heap (see also the PyMailGui interface's suggested extensions in the previous section).

11.5.2 HTTP: Accessing Web Sites

Python's standard library (that is, modules that are installed with the interpreter) also includes client-side support for HTTP -- the Hypertext Transfer Protocol -- a message structure and port standard used to transfer information on the World Wide Web. In short, this is the protocol that your web browser (e.g., Internet Explorer, Netscape) uses to fetch web pages and run applications on remote servers as you surf the Net. At the bottom, it's just bytes sent over port 80.

To really understand HTTP-style transfers, you need to know some of the server-side scripting topics covered in the next three chapters (e.g., script invocations and Internet address schemes), so this section may be less useful to readers with no such background. Luckily, though, the basic HTTP interfaces in Python are simple enough for a cursory understanding even at this point in the book, so let's take a brief look here.

Python's standard httplib module automates much of the protocol defined by HTTP and allows scripts to fetch web pages much like web browsers. For instance, the script in Example 11-25 can be used to grab any file from any server machine running an HTTP web server program. As usual, the file (and descriptive header lines) is ultimately transferred over a standard socket port, but most of the complexity is hidden by the httplib module.

Example 11-25. PP2EInternetOtherhttp-getfile.py

####################################################################### # fetch a file from an http (web) server over sockets via httplib; # the filename param may have a full directory path, and may name a cgi # script with query parameters on the end to invoke a remote program; # fetched file data or remote program output could be saved to a local # file to mimic ftp, or parsed with string.find or the htmllib module; ####################################################################### import sys, httplib showlines = 6 try: servername, filename = sys.argv[1:] # cmdline args? except: servername, filename = 'starship.python.net', '/index.html' print servername, filename server = httplib.HTTP(servername) # connect to http site/server server.putrequest('GET', filename) # send request and headers server.putheader('Accept', 'text/html') # POST requests work here too server.endheaders() # as do cgi script file names errcode, errmsh, replyheader = server.getreply() # read reply info headers if errcode != 200: # 200 means success print 'Error sending request', errcode else: file = server.getfile() # file obj for data received data = file.readlines() file.close() # show lines with eoln at end for line in data[:showlines]: print line, # to save, write data to file

Desired server names and filenames can be passed on the command line to override hardcoded defaults in the script. You need to also know something of the HTTP protocol to make the most sense of this code, but it's fairly straightforward to decipher. When run on the client, this script makes a HTTP object to connect to the server, sends it a GET request along with acceptable reply types, and then reads the server's reply. Much like raw email message text, the HTTP server's reply usually begins with a set of descriptive header lines, followed by the contents of the requested file. The HTTP object's getfile method gives us a file object from which we can read the downloaded data.

Let's fetch a few files with this script. Like all Python client-side scripts, this one works on any machine with Python and an Internet connection (here it runs on a Windows client). Assuming that all goes well, the first few lines of the downloaded file are printed; in a more realistic application, the text we fetch would probably be saved to a local file, parsed with Python's htmllib module, and so on. Without arguments, the script simply fetches the HTML index page at http://starship.python.org:

C:...PP2EInternetOther>python http-getfile.py starship.python.net /index.html

Starship Python

Категории