Network Programming with Perl


 
Network Programming with Perl

By Lincoln  D.  Stein

Slots : 1

Table of Contents
Chapter  11.   Multithreaded Applications

    Content

Multithreading is quite different from multiprocessing. Instead of there being two or more processes, each with its own memory space, signal handlers, and global variables, multithreaded programs have a single process in which run several "threads of execution." Each thread runs independently; it can loop or perform I/O without worrying about the other threads that are running. However, all threads share global variables , filehandles, signal handlers, and other resources.

While this sharing of resources enables threads to interact in a much more intimate way than the separate processes created by fork() , it creates the possibility of resource contention . For example, if two threads try to modify a variable at the same time, the result may not be what you expect. For this reason, resource locking and control becomes an issue in threaded programs. Although multithreaded programming simplifies your programming in some ways, it complicates it in others.

The Thread module was introduced in Perl 5.005. To use it, you must run an operating system that supports threads (including most versions of UNIX and Microsoft Windows) and have compiled Perl with threading enabled. In Perl 5.005, you do this by running the Configure installation program with the option -Dusethreads . With Perl 5.6.0 and higher, the option becomes -Duse5005threads . No precompiled Perl binaries come with threading support activated.

Threads Are Experimental

Perl threads are an experimental feature. The 5.005 thread implementation has known bugs that can lead to mysterious crashes, particularly when running on machines with more than one CPU. Not all Perl modules are thread-safe; that is, using these modules in a multithreaded program will lead to crashes and/or incorrect results, and even some core Perl features are problematic . Although the thread implementation has improved in Perl 5.6 and higher, some fundamental design flaws remain in the system. In fact, the Perl thread documentation warns that multithreading should not be used in production systems.

The Perl developers are developing a completely new threading design that will be known as interpreter threads ( ithreads ) that will be part of Perl version 6, expected to be available in the summer of 2001. It promises to be more stable than the 5.005 implementation, but its API may be different from what is described here.

The Thread API

The thread API described here is the 5.005 version of threads, and not the interpreter threads currently under development.

The API threads, which is described in the Thread , Thread::Queue, Thread:: Semaphore , and attrs manual pages, seems simple but hides many complexities. Each program starts with a single thread, called the main thread. The main thread starts at the beginning of the program and runs to the end (or until exit() or die() is called).

To create a new thread, you call Thread->new() , passing it a reference to a subroutine to execute and an optional set of arguments. This creates a new concurrent thread, which immediately executes the indicated subroutine. When the subroutine is finished, the thread exits. For example, here's how you might launch a new thread to perform a time-consuming calculation:

my $thread = Thread->new(\&calculate_pi, precision => 190);

The new thread executes calculate_pi() , passing it the two arguments " precision " and " 190 ." If successful, the call immediately returns with a new Thread object, which the calling thread usually stashes somewhere. The Thread object can now call detach() , which frees the main thread from any responsibility for dealing with it.

Alternatively, the thread can remain in its default attached state, in which case the main thread (or any other thread) should at some point call the Thread object's join() method to retrieve the subroutine's return value. This is sometimes done just before exiting the program, or at the time the return value is needed. If the thread has not yet finished, join() blocks until it does. To continue with the previous example, at some point the main thread may wish to retrieve the value of pi computed by the calculate_pi() subroutine. It can do this by calling:

my $pi = $thread->join;

Unlike the case with parent and children processes where only a parent can wait() on its children, there is no strict familial relationship between threads. Any thread can call join() on any other thread (but a thread cannot join() itself).

For a thread to exit, it need only return() from its subroutine, or just let control fall naturally through to the bottom of the subroutine block. Threads should never call Perl's exit() function, because that would kill both the current thread and all other threads (usually not the intended effect!). Nor should any thread other than the main one try to install a signal handler. There's no way to ensure that a signal will be delivered to the thread you intend to receive it, and it's more than likely that Perl will crash.

A thread can also exit abnormally by calling die() with an error message. However, the effect of dying in a thread is not what you would expect. Instead of raising some sort of exception immediately, the effect of die() is postponed until the main thread tries to join() the thread that died. At that point, the die() takes effect, and the program terminates. If a non-main thread calls join() on a thread that has died, the effect is postponed until that thread itself is joined.

You can catch this type of postponed death and handle using eval() . The error message passed to die() will be available in the $@ global.

my $pi = eval {$thread->join} warn "Got an error: $@";

A Simple Multithreaded Application

Here's a very simple multithreaded application. It spawns two new threads, each of which runs the hello() subroutine. hello() loops a number of times, printing out a message specified by the caller. The subroutine sleeps for a second each time through the loop (this is just for illustration purposes and is not needed to obtain thread concurrency). After spawning the two threads, the main thread waits for the two threads to terminate by calling join() .

#!/usr/bin/perl use Thread; my $thread1 = Thread->new(\&hello, "I am thread 1",3); my $thread2 = Thread->new(\&hello, "I am thread 2",6); $_->join foreach ($thread1,$thread2); sub hello { my ($message,$loop) =@_; for (1..$loop) { print $message,"\n"; sleep 1; } }

When you run this program, you'll see output like this:

% perl hello.pl I am thread 1 I am thread 2 I am thread 1 I am thread 2 I am thread 1 I am thread 2 I am thread 2 I am thread 2 I am thread 2

Locking

The problem with threads appears as soon as two threads attempt to modify the same variable simultaneously . To illustrate the problem, consider this deceptively simple bit of code:

my $bytes_sent = 0; my $socket = IO::Socket->new(....); sub send_data { my $data = shift my $bytes = $socket->syswrite($data); $bytes_sent += $bytes; }

The problem occurs in the last line of the subroutine, where the $bytes_sent variable is incremented. If there are multiple simultaneous connections running, then the following scenario can occur:

  1. Thread 1 fetches the value of $bytes_sent and prepares to increment it.

  2. A context switch occurs. Thread 1 is suspended and thread 2 takes control. It fetches the value of $bytes_sent and increments it.

  3. A context switch again occurs, suspending thread 2 and resuming thread 1. However, thread 1 is still holding the value of $bytes_sent it fetched from step 1. It increments the original value and stores it back into $bytes_sent , overwriting the changes made by thread 2.

This chain of events won't happen every time but will happen in a rare, nondeterministic fashion, leading to obscure bugs that are hard to track down.

The fix for this is to use the lock() call to lock the $bytes_sent variable before trying to use it. With this small modification, the example now works properly:

my $bytes_sent = 0; my $socket = IO::Socket->new(....); sub send_data { my $data = shift my $bytes = $socket->syswrite($data); lock($bytes_sent); $bytes_sent += $bytes; }

lock() creates an "advisory" lock on a variable. An advisory lock prevents another thread from calling lock() to lock the variable until the thread that currently holds the lock has relinquished it. However, the lock doesn't prevent access to the variable, which can still be read and written even if the thread doesn't hold a lock on it. Locks are generally used to prevent two threads from trying to update the same variable at the same time.

If a variable is locked and another thread tries to lock it, that thread is suspended until such time as the lock is available. A lock remains in force until the lock goes out of scope, just like a local variable. In the preceding example, $bytes_sent is locked just before it's incremented, and the lock remains in force throughout the scope of the subroutine.

If a number of variables are changed at the same time, it is common to create an independent variable that does nothing but manage access to the variables. In the following example, the $ok_to_update variable serves as the lock for two related variables, $bytes_sent and $bytes_left :

my $ok_to_update; sub send_data { my $data = shift my $bytes = $socket->syswrite($data); lock($ok_to_update); $bytes_sent += $bytes; $bytes_left -= $bytes; }

It is also possible to lock an entire subroutine using the notation lock(\&subroutine) . When a subroutine is locked, only one thread is allowed to run it at one time. This is recommended only for subroutines that execute very quickly; otherwise , the multiple threads serialize on the subroutine like cars backed up at a traffic light, obliterating most of the advantages of threads in the first place.

Variables that are not shared, such as the local variables $data and $bytes in the preceding example, do not need to be locked. Nor do you need to lock object references, unless two or more threads share the object.

When using threads in combination with Perl objects, object methods often need to lock the object before changing it. Otherwise, two threads could try to modify the object simultaneously, leading to chaos. This object method, for example, is not thread safe, because two threads might try to modify the $self object simultaneously:

sub acknowledge { # NOT thread safe my $self = shift; print $self->{socket} "200 OK\n"; $self->{acknowledged}++; }

You can lock objects within object methods explicitly, as in the previous example:

sub acknowledge { # thread safe my $self = shift; lock($self); print $self->{socket} "200 OK\n"; $self->{acknowledged}++; }

Since $self is a reference, you might wonder whether the call to lock() is locking the $self reference or the thing that $self points to. The answer is that lock() automatically follows references up one level (and one level only). The call to lock($self) is exactly equivalent to calling lock(%$self) , assuming that $self is a hash reference.

Threading versions of Perl provide a new syntax for adding attributes to subroutines. With this syntax, the subroutine name is followed by a colon and a set of attributes:

sub acknowledge: locked method { # thread safe my $self = shift; print $self->{socket} "200 OK\n"; $self->{acknowledged}++; }

To create a locked method, use the attributes locked and method . If both attributes are present, as in the preceding example, then the first argument to the subroutine (the object reference) is locked on entry into the method and released on exit. If only locked is specified, then Perl locks the subroutine itself, as if you had specifically written lock(\&acknowledge) . The key difference here is that when the attributes are set to locked method, it's possible for multiple threads to run the subroutine simultaneously so long as they're working with different objects. When a subroutine is marked locked only, then only one thread can gain access to the subroutine at a time, even if they're working with different objects.

Thread Module Functions and Methods

The thread API has several other core parts , including ways for threads to signal each other when a particular condition has become true. Here is a very brief synopsis of the thread API. More information is available in the perlthread manual page, and other features are explained in depth later when we use them.

$thread = Thread->new(\&subroutine [, @arguments]);

Creates a new thread of execution and returns a Thread object. The new thread immediately runs the subroutine given as the first argument, passing it the arguments listed in the optional second and subsequent arguments.

$return_value = $thread->join()

join() waits for the given thread to terminate. The return value is the result (if any) returned by the subroutine specified when the thread was created. If the thread is running, then join() blocks until it terminates there is no way to do a nonblocking join on a particular thread.

$thread->detach()

If you aren't interested in a thread's return value, you can call its detach() method. This makes it impossible to call join() later. The main advantage of detaching a thread is that it frees the main thread from the responsibility of joining the other threads later.

@threads = Thread->list()

This class method returns a list of Thread objects. The list includes those that are running as well as those that have terminated but are waiting to be joined.

$thread = Thread->self()

This class method returns the Thread object corresponding to the current thread.

$tid = $thread->tid()

Each thread is associated with a numeric identifier known as the thread ID (tid). There's no particular use for this identifier except perhaps as an index into an array or to incorporate into debugging messages. This tid can be retrieved with the tid() method.

lock($variable)

The lock() function locks the scalar, array, or hash passed to it in such a way that no other thread can lock the variable until the first thread's lock goes out of scope. For container variables, such as arrays, locking the whole array (e.g., with lock(@connections) ) is different from locking a component of the array (e.g., lock($connections[3])) ).

You do not need to explicitly import the Thread module to use lock() . It is built into the core of all versions of Perl that support multithreading. On versions of Perl that don't support multithreading, lock() has no effect. This allows you to write thread-safe modules that will work equally well on threading and nonthreading versions of Perl.

The next five items are functions that must be imported explicitly from the Thread module:

use Thread qw(async yield cond_wait cond_signal cond_broadcast);

$thread = async {BLOCK}

The async() function is an alternative way to create a new Thread object. Instead of taking a code reference and its arguments like new() , it accepts a code block, which becomes the body of the new thread. The Thread object returned by async() can be join() ed, just like a thread created with new() .

yield()

The yield() function is a way for a thread to hint to Perl that a particular spot might be a good place to do a thread context shift. Because of the differences in thread implementations on different operating systems, this may or may not have an effect. It is not usually necessary to call yield() to obtain concurrency, but it might help in some circumstances to distribute the time slices of execution more equitably among the threads.

cond_wait($variable)

cond_wait() waits on a variable it is signaled. The function takes a locked variable, releases the lock, and puts the thread to sleep until the variable is signalled by another thread calling cond_signal() or cond_broadcast() . The variable is relocked before cond_wait() returns.

cond_signal($variable)

cond_signal() signals $variable , restarting any threads that are waiting on it. If no threads are waiting, then the call does nothing. If multiple threads are waiting on the variable, one (and only one) of them is unblocked. Which one is awakened is indeterminate.

cond_broadcast($variable)

cond_broadcast() works like cond_signal() , except that all waiting threads are awakened. Each thread reacquires the lock in turn and executes the code following the cond_wait() . The order in which the waiting threads are awakened is indeterminate.

We will use cond_wait() and cond_broadcast() in Chapter 14, when we develop an adaptive prethreaded server.

Threads and Signals

If you plan to mix threads with signals, you must be aware that the integration of signal handling with threads is one of the more experimental parts of Perl's experimental threads implementation. The issue is that signals arrive at unpredictable times and may be delivered to any currently executing thread, leading to unpredictable results.

The Thread::Signal module is supposed to help with this by arranging for all signals to be delivered to a special thread that runs in parallel with the main thread. You don't have to do anything special. Just loading the module is sufficient to start the signal thread running:

use Thread::Signal; However, you should be aware that Thread::Signal changes the semantics of signals so that they can no longer be used to interrupt long-running system calls. Hence, this trick will no longer work:

alarm (10); my $bytes = eval { local $SIG{ALARM} = sub { die }; sysread($socket,$data,1024); };

In some cases, you can work around this limitation by replacing the eval{} section with a call to select() . We use this trick in Chapter 15.

In practice, Thread::Signal sometimes seems to make programs less stable rather than more so, depending on which version of Perl and which threading libraries you are using. My advice for experimenting with threading features is to first write the program without Thread::Signal and add it later if unexpected crashes or other odd behavior occurs.


   
Top

Категории