Network Programming with Perl


 
Network Programming with Perl

By Lincoln  D.  Stein

Slots : 1

Table of Contents
Chapter  8.   POP, IMAP, and NNTP

    Content

Internet News Clients

The Netnews system dates back to 1979, when researchers at Duke University and the University of North Carolina designed a system to distribute discussion group postings that would overcome the limitations of simple mailing lists [Spencer & Lawrence, 1998]. This rapidly grew into Usenet, a global Internet-based bulletin-board system comprising thousands of named newsgroups.

Because of its sheer size (more than 34,000 newsgroups and daily news flow rates measured in the gigabytes), Usenet has been diminishing in favor among Internet users. However, there has been a resurgence of interest recently in using Netnews for private discussion servers, helpdesk applications, and other roles in corporate intranets .

Netnews is organized in a two-level hierarchy. At the upper level are the newsgroups. These have long meaningful names like comp.graphics.rendering. raytracing . Each newsgroup, in turn , contains zero or more articles. Users post articles to their local Netnews server, and the Netnews distribution software takes care of distributing the article to other servers. Within a day or so, a copy of the article appears on every Netnews server in the world. Articles live on Netnews for some period before they are expired . Depending on each server's storage capacity, a message may be held for a few days or a few weeks before expiring it. A few large Netnews servers, such as the one at http://www.deja.com/, hold news articles indefinitely.

Newsgroups are organized using a hierarchical namespace. For example, all newsgroups beginning with comp . are supposed to have something to do with computers or computer science, and all those beginning with soc.religion. are supposed to concern religion in society. The creation and destruction of newsgroups, by and large, is controlled by a number of senior administrators. The exception is the alt hierarchy, in which newsgroups can be created willy-nilly by anyone who desires to do so. Some very interesting material resides in these groups.

Regardless of its position in the namespace hierarchy, a newsgroup can be moderated or unmoderated. Moderated groups are "closed." Only a small number of people (typically a single moderator) have the right to post to the newsgroup. When others attempt to post to the newsgroup, their posting is automatically forwarded to the moderator via e-mail. The moderator then posts the message at his or her discretion. Anyone can post to unmoderated groups. The posted article is visible immediately on the local server, and diffuses quickly throughout the system.

Articles are structured like e- mails , and in fact share the same RFC 822 specification. Figure 8.6 shows a news article recently posted to comp.lang.perl.modules . The article consists of a message header and body. The header contains several fields that you will recognize from the standard e-mail, such as the Subject: and From: lines, and some fields that are specific to news articles, such as Article:, Path:, Message-ID:, Distribution:, and References:. Many of these fields are added automatically by the Netnews server.

Figure 8.6. A typical Netnews article

To construct a valid Netnews article, you need only take a standard e-mail message and add a Newsgroups: header containing a comma-delimited list of newsgroups to post to. Another frequently used article header is Distribution:, which limits the distribution of an article. Valid values for Distribution: depend on the setup of your local Netnews server, but they are typically organized geographically . For example, the usa distribution limits message propagation to the political boundaries of the United States, and nj limits distribution to New Jersey. The most common distribution is world , which allows the article to propagate globally.

Other article header fields have special meaning to the Netnews system, and can be used to create control messages that cancel articles, add or delete newsgroups, and perform other special functions. See [Spencer and Lawrence 1998] for information on constructing your own control messages.

Netnews interoperates well with MIME. An article can have any number of MIME-specific headers, parts, and subparts, and MIME-savvy news readers are able to decode and display the parts .

Articles can be identified in either of two ways. Within a newsgroup, an article can be identified by its message number within the group. For example, the article shown in Figure 8.6 is message number 36,166 of the newsgroup comp.lang.perl.modules . Because articles are constantly expiring and being replaced by new ones, the number of the first message in a group is usually not 1, but more often a high number. The message number for an article is stable on any given news server. On two subsequent days, you can retrieve the same article by entering a particular newsgroup and retrieving the same message number. However, message numbers are not stable across servers. An article's number on one news server may be quite different on another server.

The other way to identify articles is by the message ID. The message ID of the sample article is <397a6e8d.524144494f47414741@radiogaga.harz.de> , including the angle brackets at either side. Message IDs are unique, global identifiers that remain the same from server to server.

Net::NNTP

Historically, Netnews has been distributed in a number of ways, but the dominant mode is now the Net News Transfer Protocol, or NNTP, described in RFC 977. NNTP is used both by Netnews servers to share articles among themselves and by client applications to scan and retrieve articles of interest. Graham Barr's Net::NNTP module, part of the libnet utilities, provides access to NNTP servers.

Like other members of the libnet clan, Net::NNTP descends from Net::Cmd and inherits that module's methods . Its API is similar to Net::POP3 and Net::IMAP::Simple. You connect to a remote Netnews server, creating a new Net::NNTP object, and use this object to communicate with the server. You can list and filter newsgroups, make a particular newsgroup current, list articles, download them, and post new articles.

newsgroup_stats.pl is a short script that uses Net::NNTP to find all newsgroups that match a pattern and count the number of articles in each. For example, to find all the newsgroups that have something to do with Perl, we could search for the pattern "*.perl*" (the output has been edited slightly for space):

% newsgroup_stats.pl '*.perl*' alt.comp.perlcgi.freelance 454 articles alt.flame.marshal.perlman 3 articles alt.music.perl-jam 11 articles alt.perl.sockets 45 articles comp.lang.perl.announce 43 articles comp.lang.perl.misc 18940 articles comp.lang.perl.moderated 622 articles comp.lang.perl.modules 2240 articles comp.lang.perl.tk 779 articles cz.comp.lang.perl 63 articles de.comp.lang.perl.cgi 1989 articles han.comp.lang.perl 174 articles it.comp.lang.perl 715 articles japan.comp.lang.perl 53 articles

Notice that the pattern match wasn't perfect, and we matched alt.music.perl-jam as well as newsgroups that have to do with the language. Figure 8.7 lists the code.

Figure 8.7. match_newsgroup.pl script

Lines 1 “3: Load modules We turn on strict checking and load the Net::NNTP module.

Line 4: Create new Net::NNTP object We call Net::NNTP->new() to connect to a Netnews host. If the host isn't specified explicitly, then Net::NNTP chooses a suitable host from environment variables or the default NNTP server specified when libnet was installed.

Lines 5 “6: Print stats and quit For each argument on the command line, we call the print_stats() print_stats() subroutine to look up the pattern and print out matching newsgroups. We then call the NNTP object's quit() method.

Lines 7 “17: print_stats() subroutine In the print_stats() subroutine we invoke the NNTP object's newsgroups() method to find newsgroups that match a pattern. If successful, newsgroups() returns a hash reference in which the keys are newsgroup names and the values are brief descriptions of the newsgroup.

If the value returned by newsgroups() is undef or empty, we return. Otherwise, we sort the groups alphabetically by name , and loop through them. For each group, we call the NNTP object's group() method to return a list containing information about the number of articles in the group and the message numbers of the first and last articles. We print the newsgroup name and the number of articles it contains.

The Net::NNTP API

The Net::NNTP API can be divided roughly into those methods that deal with the server as a whole, those that affect entire newsgroups, and those that concern individual articles in a newsgroup.

Newsgroups can be referred to by name or, for some methods, by a wildcard pattern match. The pattern-matching system used by most NNTP servers is similar to that used by the UNIX and DOS shells . "*" matches zero or more of any characters, "?" matches exactly one character, and a set of characters enclosed in square brackets, as in "[abc]", matches any member of the set. Bracketed character sets can also contain character ranges, as in "[0 “9]" to match the digits 0 through 9, and the "^" character may be used to invert a set ”for example, "[^A “Z]" to match any character that is not in the range A through Z. Any other character matches itself exactly once. As in the shell (and unlike Perl's regular expression operations), NNTP patterns are automatically anchored to the beginning and end of the target string.

Articles can be referred to by their number in the current newsgroup, by their unique message IDs, or, for some methods, by a range of numbers. In the latter case, the range is specified by providing a reference to a two-element array containing the first and last message numbers of the range. Some methods allow you to search for particular articles by looking for wildcard patterns in the header or body of the message using the same syntax as newsgroup name wildcards.

Other methods accept times and dates, as for example, the newgroups() method that searches for newsgroups created after a particular date. In all cases, the time is expressed in its native Perl form as seconds since the epoch , the same as that returned by the time() built-in.

In addition to the basic NNTP functions, many servers implement a number of extension commands. These extensions make it easier to search a server for articles that match certain criteria and to summarize quickly the contents of a discussion group. Naturally, not all servers support all extensions, and in such cases the corresponding method usually returns undef In the discussion that follows , methods that depend on NNTP extensions are marked .

We look first at methods that affect the server itself.

$nntp = Net::NNTP->new([$host],[$option1=>$val1,$option2=>$val2 ])

The new() method attempts to connect to an NNTP server. The $host argument is the DNS name or IP address of the server. If not specified, Net::NNTP looks for the server name in the NNTPSERVER and NEWSHOSTS environment variables first, and then in the Net::Config nntp_hosts key. If none of these variables is set, the Netnews host defaults to news.

In addition to the options accepted by IO::Socket::INET, Net::NNTP recognizes the name/value pairs shown in Table 8.2.

By default, when Net::NNTP connects to a server, it announces that it is a news reader rather than a news transport agent (a program chiefly responsible for bulk transfer of messages). If you want to act like a news transfer agent and really know what you're doing, provide new() with the option Reader=>0 .

$success = $nntp->authinfo($ user => $password)

Some NNTP servers require the user to log in before accessing any information. The authinfo() method takes a username and password, and returns true if the credentials were accepted.

$ok = $nntp->postok()

postok() returns true if the server allows posting of new articles. Even though the server as a whole may allow posting, individual moderated newsgroups may not.

$time = $nntp->date()

The date() method returns the time and date on the remote server, as the number of seconds since the epoch. You can convert this into a human-readable time-date string using the localtime() or gmtime () functions.

Table 8.2. Net::NNTP->new() Options

Option Description Default
Timeout Seconds to wait for response from server 120
Debug Turn on verbose debugging information undef
Port Numeric or symbolic name of port to connect to 119
Reader Act like a news reader 1

$nntp->slave()

$nntp->reader() [extension]

The slave() method puts the NNTP server into a mode in which it expects to engage in bulk transfer with the client. The reader() method engages a mode more suitable for the interactive transfer of individual articles. Unless explicitly disabled, reader() is issued automatically by the new() method.

$nntp->quit()

The quit() method cleans up and severs the connection with the server. This is also issued automatically when the NNTP object is destroyed .

Once created, you can query an NNTP object for information about newsgroups. The following methods deal with newsgroup-level functions.

$group_info = $nntp->list()

The list() method returns information about all active newsgroups. The return value is a hash reference in which each key is the name of a newsgroup, and each value is a reference to a three-element array that contains group information. The elements of the array are [$first,$last,$postok] , where $first and $last are the message numbers of the first and last articles in the group, and $postok is "y" if the posting is allowed to the group or "m" if the group is moderated.

$group = $nntp->group([$group])

($articles,$first,$last,$name) = $nntp->group([$group])

The group() method gets or sets the current group. Called with a group name as its argument, it sets the current group used by the various article-retrieval methods.

Called without arguments, the method returns information about the current group. In a scalar context, the method returns the group name. In a list context, the method returns a four-element list that contains the number of articles in the group, the message numbers of the first and last articles, and the name of the group.

$group_info = $nntp->newgroups($since [,$distributions])

The newgroups() method works like list() , but returns only newsgroups that have been created more recently than the date specified in $since . The date must be expressed in seconds since the epoch as returned by time() .

The $distributions argument, if provided, limits the returned list to those newsgroups that are restricted to the specified distribution(s). You may provide a single distribution name as a string, such as nj , or a reference to an array of distributions, such as ['nj','ct','ny'] for the New York tristate region.

$new_articles = $nntp->newnews($since [,$groups [,$distributions]])

The newnews() method returns a list of articles that have been posted since the time value indicated by $since . You may optionally provide a group pattern or a reference to an array of patterns in $groups , and a distribution pattern or reference to an array of distribution patterns in $distributions .

If successful, the method returns a reference to an array that contains the message IDs of all the matching articles. You may then use the article() and/or articlefh() methods described below to fetch the contents of the articles. This method is chiefly of use for mirroring an entire group or set of groups.

$group_info = $nntp->active([$pattern]) [extension]

The active() method works like list() , but limits retrieval to those newsgroup that match the wildcard pattern $pattern . If no pattern is specified, active() is functionally equivalent to list() .

This method and the ones that follow all use common extensions to the NTTP protocol, and are not guaranteed to work with all NNTP servers.

$group_descriptions = $nntp->newsgroups([$pattern]) [extension]

$group_descriptions = $nntp->xgtitle($pattern) [extension]

The newsgroups() method takes a newsgroup wildcard pattern and returns a hash reference in which the keys are group names and the values are brief text descriptions of the group. Because many Netnews sites have given up on keeping track of all the newsgroups (which appear and disappear very dynamically), descriptions are not guaranteed to be available. In such cases, they appear as the string "No description", as "?", or simply as an empty string.

xgtitle() is another extension method that is functionally equivalent to newsgroups() , with the exception that the group pattern argument is required.

$group_times = $nntp->active_times()[extension]

This method returns a reference to a hash in which the keys are newsgroup names and the values are a reference to a two-element list giving the time the group was created and the ID of its creator. The creator ID may be something useful, like an e-mail address, but is more often something unhelpful, like "newsmaster."

$distributions = $nntp->distributions() [extension]

$subscriptions = $nntp->subscriptions() [extension]

These two methods return information about local server distribution and subscription lists. Local distributions can be used to control the propagation of messages in the local area network; for example, a company that is running multiple NNTP servers might define a distribution named engineering . Subscription lists are used to recommend lists of suggested newsgroups to new users of the system.

distributions() returns a hash reference in which the keys are distribution names and the values are human-readable descriptions of the distributions. subscriptions() returns a hash reference in which the keys are subscription list names and the values are array references containing the newsgroups that belong to the subscription list.

Once a group is selected using the group() method, you can list and retrieve articles. Net::NNTP gives you the option of retrieving a specific article by specifying its ID or message number, or iteratively fetching articles in sequence, starting at the current message number and working upward.

$article_arrayref = $nntp->article ([$message] [,FILEHANDLE])

The article() method retrieves the indicated article. If $message is numeric, it is interpreted as a message number in the current newsgroup. Net::NNTP returns the contents of the indicated message, and sets the current message pointer to this article. An absent first argument or a value of undef retrieves the current article.

If the first argument is not numeric, Net::NNTP treats it as the article's unique message ID. Net::NNTP retrieves the article, but does not change the position of the current message pointer. In fact, when referring to an article by its message ID, it is not necessary for the indicated article to belong to the current group

The optional filehandle argument can be used to write the article to the specified destination. Otherwise, the article's contents (header, blank separating line, and body) are returned as a reference to an array containing the lines of the article.

Should something go wrong, article() returns undef and $nntp->message contains an error message from the server. A common error is "no such article number in this group", which can be issued even when the message number is in range because of articles that expire or are cancelled while the NNTP session is active.

Other article-retrieval methods are more specialized.

$header_arrayref = $nntp->head([$message] [,FILEHANDLE])

$body_arrayref = $nntp->body([$message] [,FILEHANDLE])

The head() and body() methods work like article() but retrieve only the header or body of the article, respectively.

$fh = $nntp->articlefh([$message])

$fh = $nntp->headfh([$message])

$fh = $nntp->bodyfh([$message])

These three methods act like article() , head() , and body() , but return a tied filehandle from which the contents of the article can be retrieved. After using the filehandle, you should close it. For example, here is one way to read message 10000 of the current newsgroup:

$fh = $nntp->articlefh(10000) or die $nntp->message; while (<$fh>) { print; }

$msgid = $nntp-> next ()

$msgid = $nntp->last()

$msgid = $nntp->nntpstat($message)

The next() , last() , and nntpstat() methods control the current article pointer. next() advances the current article pointer to the next article in the newsgroup, and last() moves the pointer to the previous entry. The nntpstat() method moves the current article pointer to the position indicated by $message , which should be a valid message number. After setting the current article pointer, all three methods return the message ID of the current article.

Net::NMTP allows you to post new articles using the post() , postfh() , and ihave() methods.

$success = $nntp->post([$message])

The post() method posts an article to Netnews. The posted article does not have to be directed to the current newsgroup; in fact, the news server ignores the current newsgroup when accepting an article and looks only at the contents of its Newsgroups: header. The article may be provided as an array containing the lines of the article or as a reference to such an array. Alternatively, you may call post() with no arguments and use the datasend() and dataend() methods inherited from Net::Cmd to send the article one line at a time.

If successful, post() returns a true value. Otherwise, it returns undef and $nntp->message contains an error message from the server.

$fh = $nntp->postfh()

The postfh() method provides an alternative interface for posting an article. If the server allows posting, this method returns a tied filehandle to which you can print the contents of the article. After finishing, be sure to close the filehandle. The result code from close() indicates whether the article was accepted by the server.

$wants_it = $nntp->ihave($messageID[,$message])

The ihave() method is chiefly of use for clients that are acting as news relays. The method asks the Netnews server whether it wishes to accept the article whose ID is $messageID .

If the server indicates its assent, it returns a true result. The article must then be transferred to the server, either by providing the article's contents in the $message argument or by sending the article one line at a time using the Net::Cmd datasend() and dataend() methods. $message can be an array of article lines or a reference to such an array.

Last, several methods allow you to search for particular articles of interest.

$header_hashref = $nntp->xhdr($header,$message_range) [extension]

$header_hashref = $nntp->xpat($header,$pattern,$message_range) [extension]

$references = $nntp->xrover($message_range) [extension]

The xhdr() method is an extension function that allows you to retrieve the value of a header field from multiple articles. The $header article is the name of an article header field, such as "Subject". $message_range is either a single message number or a reference to a two-element array containing the first and last messages in the desired range. If successful, xhdr() returns a hash reference in which the keys are the message numbers (not IDs) and the values are the requested header fields.

The header field is case-insensitive. However, not all headers can be retrieved in this way because NNTP servers typically index only that subset of the headers used to generate overview listings (see the next method).

The xpat() method is similar to xhdr() , but it filters the articles returned for those with $header fields that match the wildcard pattern in $pattern . The xrover() method returns the cross-reference fields for articles in the specified range. It is functionally identical to:

$xref = $nntp->xhdr('References',[$start,$end]);

The result of this call is a hash reference in which the keys are message numbers and the values are the message IDs that the article refers to. These are typically used to reconstruct discussion threads.

$overview_hashref = $nntp->xover($message_range) [extension]

$format_arrayref = $nntp->overview_fmt() [extension]

The overview_fmt() and xover() methods return newsgroup "overview" information. The overview is a summary of selected article header fields; it typically contains the Subject: line, References:, article Date:, and article length. It is used by newsreaders to index, sort, and thread articles.

Pass the xover() method a message range (a single message number or a reference to an array containing the extremes of the range). If successful, the method's return value is a hash reference in which each key is a message number and each value is a reference to an array of the overview fields.

To discover what these fields are, call the overview_fmt() method. It returns an array reference containing field names in the order in which they appear in the arrays returned by xover() . Each field is followed by a colon and, occasionally, by a server-specific modifier. For example, my laboratory's Netnews server returns the following overview fields:

('Subject:','From:','Date:','Message- ID:','References:', 'Bytes:','Lines:','Xref:full')

If you would prefer the values of the overview array to be a hash reference rather than an array reference, you can use the small subroutine shown here to do the transformation. The trick is to use the list of field names returned by overview_fmt() to create a hash slice to which we assign the article overview array:

sub get_overview { my ($nntp,$range) = @_; my @fields = map {/(\w+):/&& } @{$nntp->overview_fmt}; my $over = $nntp->xover($range) return; foreach (keys %$over) { my $h = {}; @{$h}{@fields}= @{$over->{$_}}; $over->{$_} = $h; } return $over; }

Use the subroutine like this:

$over = get_overview($nntp,[30000,31000]);

The returned value will have a structure like this:

{ 30000 => { 'Bytes' => 2704 'Date' => 'Sat, 27 May 2000 19:35:10 GMT' 'From' => 'mr_lowell@my-deja.com' 'Lines' => 72 'Message-ID' => '<8gp81d$cuo@nnrp1.deja.com>' 'References' => '' 'Subject' => 'mod_perl make test' 'Xref' => 'Xref: rQdQ comp.lang.perl.modules:34162' }, 30001 => { 'Bytes' => 1117 'Date' => 'Sat, 27 May 2000 20:28:22 GMT' 'From' => 'Robert Gasiorowski <gasior@snet.net>' 'Lines' => 6 'Message-ID' => '<39303E6A.88397549@snet.net>' 'References' => '' 'Subject' => 'installing module as non-root' 'Xref' => 'Xref: rQdQ comp.lang.perl.modules:34163' }, .... }


   
Top

Категории