The Psychotherapist as a Forking Server We are going to reimplement the psychotherapist program as a forking network server, but before we do so we must discuss issues surrounding the termination of child processes in UNIX-based forking servers. This discussion does not apply to servers running under the Microsoft Windows versions of Perl. Zombies We've already used fork() : In Chapter 2 we used it in a toy example to distribute the load of a computation across two child processes (Figure 2.5), and in Chapter 5 we used it to avoid synchronization and deadlock problems in the gab2.pl script (Figure 5.8). One difference between those examples and the forking server examples in this chapter is the relative longevity of the parent and the child processes. In the earlier examples, the parent process does not survive the demise of its children for any significant length of time. The parent exits soon after its children do. In forking servers, however, the parent process is very long-lived. Web servers, for example, run for months at a time. The children, however, live only as long as a client connection, and a server may spawn thousands of children during its lifetime. Under this scenario, the issue of "zombie processes" becomes important. Once fork() is called, parent and child processes are almost, but not quite, free to go their own ways. The UNIX system maintains a tenuous connection between the two processes. If the child exits before the parent does, the child process does not disappear, but instead remains in the system process table in a mummified form known as a "zombie." The zombie remains in the process table for the sole purpose of being able to deliver its exit status code to the parent process when and if the parent process asks for it using the wait() or waitpid() call, a process known as "reaping." This is a limited form of IPC that allows the parent to find out whether a process it launched exited successfully, and if not, why. If a parent process forks a lot of children and does not reap them in a timely manner, zombie processes accumulate in the process table, ultimately creating a virtual Night of the Living Dead, in which the table fills up with defunct processes. Eventually, the parent process hits a system-imposed limitation on the number of subprocesses it can launch, and subsequent calls to fork() fail. To avoid this eventuality, any program that calls fork() must be prepared to reap its children by calling wait() or waitpid() at regular intervals, preferably immediately after a child exits. UNIX makes it convenient to call wait() or waitpid() at the appropriate time by providing the CHLD signal. The CHLD signal is sent to a parent process whenever the state of any of its children changes. Possible state changes include the child exiting (which is the event we're interested in) and the child being suspended by a STOP signal. The CHLD signal does not provide information beyond the bare-bones fact that some child's state changed. The parent must call wait() or waitpid() to determine which child was affected, and if so, what happened to it. $pid = wait () This function waits for any child process to exit and then returns the PID of the terminated child. If no child is immediately ready for reaping, the call hangs (block) until there is one. If you wish to determine whether the child exited normally or because of an error, you may examine the special $? variable, which contains the child's exit status code. A code of 0 indicates that the child exited normally. Anything else indicates an abnormal termination. See the perlvar POD page for information on how to interpret the contents of $? . $pid = waitpid ($pid, $flags) This version waits for a particular child to exit and returns its PID, placing the exit status code in $? . If the child named by $pid is not immediately available for reaping, waitpid() blocks until it is. To wait for any child to be available as wait() does, use a $pid argument of -1. | The behavior of waitpid() can be modified by the $flags argument. There are a number of handy constants defined in the :sys_wait_h group of the standard POSIX module. These constants can be bitwise ORed together to combine them. The most frequently used flag is WNOHANG , which, if present, puts waitpid() into nonblocking mode. waitpid() returns the PID of the child process if available; if no children are available, it returns -1 and waitpid() blocks waiting for them. Another occasionally useful flag is WUNTRACED , which tells waitpid() to return the PIDs of stopped children as well as terminated ones. Reaping Children in the CHLD Handler The standard way for Perl servers to reap their children is to install a handler for the CHLD signal. You'll see this fragment in many examples of server code: $SIG{CHLD} = sub { wait(); } The effect of this is to call wait() every time the server receives a CHLD signal, immediately reaping the child and ignoring its result code. This code works most of the time, but there are a number of unusual situations that will break it. One such event is when a child is stopped or restarted by a signal. In this case, the parent gets a CHLD signal, but no child has actually exited. The wait() call stalls indefinitely, bringing the server to a halt ”not at all a desirable state of affairs. Another event that can break this simple signal handler is the nearly simultaneous termination of two or more children. The UNIX signal mechanism can deal with only one signal of a particular type at a time. The two termination events are bundled into a single CHLD event and delivered to the server. Although two children need to be reaped, the server calls wait() only once, leaving an unreaped zombie. This "zombie leak" becomes noticeable after a sufficiently long period of time. The last undesirable situation occurs when the parent process makes calls that spawn subprocesses, including the backtick operator (`), the system() function, and piped open() s. For these functions Perl takes care of calling wait() for you before returning to the main body of the code. On some platforms, however, extraneous CHLD signals leak through even though there's no unreaped child to wait for. The wait() call again hangs. The solution to these three problems is to call waitpid() with a PID of -1 and a flag of WNOHANG . The first argument tells waitpid() to reap any available child. The second argument prevents the call from hanging if no children are available for reaping. To avoid leaking zombies, you should call waitpid() in a loop until it indicates, by returning a result code of -1, that there are no more children to reap. Here's the idiom: use POSIX 'WNOHANG'; $SIG{CHLD} = \&reaper; sub reaper { while ((my $kid = waitpid(-1,WNOHANG)) > 0) { warn "Reaped child with PID $kid\n"; } } In this case we print the PID of the reaped child for the purpose of debugging. In many cases you will ignore the child PID, but in others you'll want to examine the child PID and status code and perform some action in case of a child that exited abnormally. We'll see examples of this in later sections. Psychotherapist Server with fork We're now ready to rewrite the psychotherapist example as a forking server (Figure 10.3). Figure 10.3. Psychotherapist as a forking server Lines 1 “5: Bring in modules We begin by loading the Chatbot::Eliza and IO::Socket modules, and importing the WNOHANG constant from the POSIX module. We also define the port our server will listen to, in this case 12000. Lines 6 “7: Define constants and variables We define the default port to bind to, and initialize a global variable, $quit to false. When this variable becomes true, the main server loop exits. Lines 8 “11: Install signal handlers We install a signal handler for CHLD events using a variant of the waitpid() idiom previously discussed. $SIG{CHLD} = sub { while ( waitpid(-1,WNOHANG)>0 ) { } }; We want the server to clean up gracefully after interruption from the command line, so we create an INT handler. This handler just sets $quit to true and returns. Lines 12 “19: Create listening socket We create a new listening socket by calling IO::Socket::INET->new() with the LocalPort and Listen arguments. We also specify a PROTO argument of "tcp" and a true value for Reuse , allowing this server to be killed and relaunched without the otherwise mandatory wait for the port to be freed. In addition to these standard arguments, we declare a Timeout of 1 hour . As we did in the reverse echo server of Figure 5.4, this is done in order to make accept() interruptable by signals. We want accept() to return prematurely when interrupted by INT so that we can check the status of $quit . Lines 20 “21: Accept incoming connections We now enter a while() loop. Each time through the loop we call accept() to get an IO::Socket object connected to a new client. Lines 22 “27: Fork: child handles connection Once accept() returns, instead of talking directly to the connected socket, we immediately call fork() and save the result code in the variable $child . If $child is undefined, then the fork() failed for some reason and we die with an error message. Otherwise, if the value of $child is equal to numeric 0, then we know we are inside the child process and will be responsible for handling the communications session. As the child, we will not call accept() again, so we close our copy of the listening socket. This closing is not strictly necessary, but it's always a good idea to tidy up unneeded resources, and it avoids the possibility of the child inadvertently trying to perform operations on the listen socket. We now call a subroutine named interact() , passing it the connected socket object. interact() manages the Eliza conversation and returns when the user terminates the connection (by typing "bye" for example). After interact() returns, the child terminates by calling exit() . Lines 28 “29: Parent cleans up If $child was nonzero, then we are the parent process. In this case, we just close our copy of the connected socket and go back to the top of the loop to accept() another connection. While we are waiting for a new connection, the child is taking care of the old one. Lines 30 “38: interact() subroutine The interact() subroutine is called by the child process to handle all interaction between the client and the server. The Chatbot::Eliza command_interface() method is hardwired to read from STDIN and write to STDOUT . But we want it to read and write to the socket instead. This is actually an easy problem to finesse. When we loaded IO::Socket, it also brought in methods from its parent class, IO::Handle. Among these methods is fdopen() , which we looked at in Chapter 1 (The IO::Handle and IO::File Modules). The fdopen() method closes an existing filehandle and then reopens it using information from another filehandle that you give it. It works with any filehandle, including the standard three. We call fdopen() three times, once each for STDIN , STDOUT , and STDERR . Each time we call fdopen() , we pass it the socket object and a symbolic access mode. STDIN is reopened for reading with a mode of < , while STDOUT and STDERR are each reopened for writing with a mode of > . Now, almost as if by magic, writing to STDOUT and STDERR send data flying down the connected socket, and reading from STDIN performs a read on the socket. The remainder of this subroutine is identical to the command-line psychotherapist from Figure 10.2. We turn on autoflush mode so that data sent to STDOUT by Chatbot::Eliza is immediately visible at the remote end, create a new Chatbot::Eliza object, and call its command_interface() method. With the server running, you may confirm that you can telnet to port 12000 and regale the psychiatrist simulation with your personal hang-ups and pet peeves. Without closing the previous session, open a new window and telnet to port 12000 again. You should be able to conduct multiple simultaneous sessions. Run the ps program to confirm that each active session creates a new copy of the server, and that when you close a session, the corresponding process disappears immediately without leaving a zombie. When you are done, kill the server with ^C. The following is an example of the server in action: % telnet localhost 12000 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Eliza: How do you do. Please tell me your problem. you: hey this really works! Eliza: I'm not sure I understand you fully. you: you are a server Eliza: Perhaps you would like to be a server. you: no Eliza: Why not? you: quit Eliza: Goodbye. It was nice talking to you. Connection closed by foreign host. Using the Psychotherapist Server on Windows Platforms Although fork() works correctly on Windows systems, fdopen() on sockets does not. For Windows systems, the interact() subroutine from Figure 10.3 must be modified to avoid the fdopen() call. The easiest way to do this is to replace the call to command_interface() with a new version that accepts the input and output filehandles to use instead of hardwired STDIN and STDOUT . In the next chapter, Figure 11.2 develops a subclass of Chatbot::Eliza, called Chatbot::Eliza::Server, that does exactly that. To run the forking server on Windows platforms, change the use Chatbot::Eliza line to: use Chatbot::Eliza::Server; and modify interact() to read like this: sub interact { my $sock = shift; my $bot = Chatbot::Eliza::Server->>new; $bot->>command_interface ($sock, $sock); close $sock; { |