Waiting on Processes
They also serve who only stand and wait.
John Milton 16081674 On His Blindness [1652]
More often than not, a parent process needs to synchronize its actions by waiting until a child process has either stopped or terminated its actions. The wait system call allows the parent process to suspend its activity until one of these actions has occurred (Table 3.9).
Table 3.9. Summary of the wait System Call.
Include File(s) |
|
Manual Section |
2 |
|
Summary |
pid_t wait(int *status); |
|||
Return |
Success |
Failure |
Sets errno |
|
Child process ID or 0 |
-1 |
Yes |
The activities of wait are summarized in Figure 3.11.
Figure 3.11. Summary of wait activities.
The wait system call accepts a single argument, which is a pointer to an integer, and returns a value defined as type pid_t . Data type pid_t is found in the header file and is most commonly a long int. If the calling process does not have any child processes associated with it, wait will return immediately with a value of -1 and errno will be set to ECHILD (10) . However, if any child processes are still active, the calling process will block (suspend its activity) until a child process terminates. When a waited-for child process terminates, the status information for the child and its process ID (PID) are returned to the parent. The status information is stored as an integer value at the location referenced by the pointer status . The low-order 16 bits of the location contain the actual status information, and the high-order bits ( assuming a 32-bit machine) are set to zero. The low-order bit information can be further subdivided into a low- and high-order byte. This information is interpreted in one of two ways:
- If the child process terminated normally, the low-order byte will be 0 and the high-order byte will contain the exit code (0255):
byte 3
byte 2
byte 1
byte 0
exit code
- If the child process terminated due to an uncaught signal, the low-order byte will contain the signal number and the high-order byte will be 0:
byte 3
byte 2
byte 1
byte 0
signal #
In this second situation, if a core file has been produced, the leftmost bit of byte 0 will be a 1. If a NULL argument is specified for wait , the child status information is not returned to the parent process, the parent is only notified of the child's termination.
Here are two programs, a parent (Program 3.9) and child (Program 3.10), that demonstrate the use of wait .
Program 3.9 The parent process.
File : p3.9.cxx /* A parent process that waits for a child to finish */ #include + #include #include #include #include #include 10 using namespace std; int main(int argc, char *argv[] ){ pid_t pid, w; int status; + if ( argc < 4 ) { cerr << "Usage " << *argv << " value_1 value_2 value_3 "; return 1; } for (int i = 1; i < 4; ++i) // generate 3 child processes 20 if ((pid = fork( )) == 0) execl("./child", "child", argv[i], (char *) 0); else // assuming no failures here cout << "Forked child " << pid << endl; /* + Wait for the children */ while ((w=wait(&status)) && w != -1) cout << "Wait on PID: " << dec << w << " returns status of " << setw(4) << setfill('0') << hex 30 << setiosflags(ios::uppercase) << status << endl; return 0; }
The parent program forks three child processes. Each child process is overlaid with the executable code for the child (found in Program 3.10). The parent process passes to each child, from the parent's command line, a numeric value. As each child process is produced, the parent process displays the child process ID. After all three processes have been generated; the parent process initiates a loop to wait for the child processes to finish their execution. As each child process terminates, the value returned to the parent process is displayed.
Program 3.10 The child process.
File : p3.10.cxx /* The child process */ #define _GNU_SOURCE + #include #include #include #include #include 10 #include using namespace std; int main(int argc, char *argv[ ]){ pid_t pid = getpid( ); + int ret_value; srand((unsigned) pid); ret_value = int(rand( ) % 256); // generate a return value sleep(rand( ) % 3); // sleep a bit if (atoi(*(argv + 1)) % 2) { // assuming argv[1] exists! 20 cout << "Child " << pid << " is terminating with signal 0009" << endl; kill(pid, 9); // commit hara-kiri } else { cout << "Child " << pid << " is terminating with exit(" << setw(4) << setfill('0') << setiosflags(ios::uppercase) + << hex << ret_value << ")" << endl; exit(ret_value); } }
In the child program, the child process obtains its own PID using the getpid call. The PID value is used as a seed value to initialize the srand function. A call to rand is used to generate a unique value to be returned when the process exits. The child process then sleeps a random number of seconds (03). After sleeping, if the argument passed to the child process on the command line is odd (i.e., not evenly divisible by 2), the child process kills itself by sending a signal 9 (SIGKILL) to its own PID. If the argument on the command line is even, the child process exits normally, returning the previously calculated return value. In both cases, the child process displays a message indicating what it will do before it actually executes the statements.
The source programs are compiled and the executables named parent and child respectively. They are run by calling the parent program. Two sample output sequences are shown in Figure 3.12.
Figure 3.12 Two runs of Programs 3.9 and 3.10.
linux$ parent 2 1 2 <-- 1 Forked child 8975 Forked child 8976 Child 8976 is terminating with signal 0009 Forked child 8977 Wait on PID: 8976 returns status of 0009 Child 8977 is terminating with exit(008F) Wait on PID: 8977 returns status of 8F00 Child 8975 is terminating with exit(0062) Wait on PID: 8975 returns status of 6200 linux$ parent 2 2 1 <-- 2 Forked child 8980 Forked child 8981 Forked child 8982 Child 8982 is terminating with signal 0009 Wait on PID: 8982 returns status of 0009 Child 8980 is terminating with exit(00B0) Wait on PID: 8980 returns status of B000 Child 8981 is terminating with exit(00D3) Wait on PID: 8981 returns status of D300
(1) Two even values and one odd
(2) Two even values and one odd but in a different order.
There are several things of interest to note in this output. In the first output sequence, one child processes (PID 8976) has terminated before the parent has finished its process generation. Processes that have terminated but have not been wait ed upon by their parent process are called zombie processes. Zombie processes occupy a slot in the process table, consume no other system resources, and will be marked with the letter Z when a process status command is issued (e.g., ps -alx or ps -el ). A zombie process cannot be killed [11] even with the standard Teflon bullet (e.g., at a system level: kill -9 process_id_number ). Zombies are put to rest when their parent process performs a wait to obtain their process status information. When this occurs, any remaining system resources allocated for the process are recovered by the kernel. Should the child process become an orphan before its parent issues the wait , the process will be inherited by init , which, by design, will issue a wait for the process. On some very rare occasions, even this will not cause the zombie process to "die." In these cases, a system reboot may be needed to clear the process table of the entry.
[11] This miraculous ability is the source of the name zombie .
Both sets of output clearly show that when the child process terminates normally, the exit value returned by the child is stored in the second byte of the integer value referenced by argument to the wait call in the parent process. Likewise, if the child terminates due to an uncaught signal, the signal value is stored in the first byte of the same referenced location. It is also apparent that wait will return with the information for the first child process that terminates, which may or may not be the first child process generated.
EXERCISE
Add the wait system call to the huh shell program (Program 3.7). |
EXERCISE
Write a program that produces three zombie processes. Submit evidence, via the output of the ps command, that these processes are truly generated and are eventually destroyed . |
EXERCISE
In Program 3.10 if the child process uses a signal 8 (versus 9) to terminate, what is returned to the parent as the signal value? Why? |
It is easy to see that the interpretation of the status information can be cumbersome, to say the least. At one time, programmers wrote their own macros to interrogate the contents of status. Now most use one of the predefined status macros. These macros are shown in Table 3.10.
Table 3.10. The wstat Macros.
Macro |
Description |
---|---|
WIFEXITED(status) |
Returns a true if the child process exited normally. |
WEXITSTATUS(status) |
Returns the exit code or return value from main. Should be called only if WIFEXITED(status) has returned a true. |
WIFSIGNALED(status) |
Returns a true if the child exited due to uncaught signal. |
WTERMSIG(status) |
Returns the signal that terminated the child. Should be called only if WIFSIGNALED(status) has returned a true. |
WIFSTOPPED(status) |
Returns a true if the child process is stopped. |
WSTOPSIG(status) |
Returns the signal that stopped the child. Should be called only if WIFSTOPPED(status)has returned a true. |
The argument to each of these macros is the integer status value (not the pointer to the value) that is returned to the wait call. The macros are most often used in pairs. The WIF macros are used as a test for a given condition. If the condition is true, the second macro of the pair is used to return the specified value. As shown below, these macros could be incorporated in the wait loop in the parent Program 3.9 to obtain the child status information:
... while ((w = wait(&status)) && w != -1) if (WIFEXITED(status)) // test with macro cout << "Wait on PID: " << dec << w << " returns a value of " << hex << WEXITSTATUS(status) << endl; // obtain value else if (WIFSIGNALED(status)) // test with macro cout << "Wait on PID: " << dec << w << " returns a signal of " << hex << WTERMSIG(status) << endl; // obtain value ...
While the wait system call is helpful, it does have some limitations. It will always return the status of the first child process that terminates or stops. Thus, if the status information returned by wait is not from the child process we want, the information may need to be stored on a temporary basis for possible future reference and additional calls to wait made. Another limitation of wait is that it will always block if status information is not available. Fortunately, another system call, waitpid , which is more flexible (and thus more complex), addresses these shortcomings. In most invocations, the waitpid call will block the calling process until one of the specified child processes changes state. The waitpid system call summary is shown in Table 3.11.
Table 3.11. Summary of the waitpid System Call.
Include File(s) |
|
Manual Section |
2 |
||
Summary |
pid_t waitpid(pid_t pid, int *status, int options); |
||||
Return |
Success |
Failure |
Sets errno |
||
Child PID or 0 |
-1 |
Yes |
The first argument of the waitpid system call, pid , is used to stipulate the set of child process identification numbers that should be waited for (Table 3.12).
Table 3.12. Interpretation of pid Values by waitpid .
pid Value |
Wait for |
---|---|
< -1 |
Any child process whose process group ID equals the absolute value of pid . |
-1 |
Any child processin a manner similar to wait . |
Any child process whose process group ID equals the caller's process group ID. |
|
> 0 |
The child process with this process ID. |
The second argument, *status , as with the wait call, references an integer status location where the status information of the child process will be stored if the waitpid call is successful. This location can be examined directly or with the previously presented wstat macros.
The third argument, options , may be 0 (don't care), or it can be formed by a bitwise OR of one or more of the flags listed in Table 3.13 (these flags are usually defined in the header file). The flags are applicable to the specified child process set discussed previously.
Table 3.13. Flag Values for waitpid .
FLAG Value |
Specifies |
---|---|
WNOHANG |
Return immediately if no child has exiteddo not block if the status cannot be obtained; return a value of 0, not the PID . |
WUNTRACED |
Return immediately if child is blocked. |
If the value given for pid is -1 and the option flag is set to 0, the waitpid and wait system call act in a similar fashion. If waitpid fails, it returns a value of 1 and sets errno to indicate the source of the error (Table 3.14).
Table 3.14. waitpid Error Messages.
# |
Constant |
perror Message |
Explanation |
---|---|---|---|
4 |
EINTR |
Interrupted system call |
Signal was caught during the system call. |
10 |
ECHILD |
No child process |
Process specified by pid does not exist, or child process has set action of SIGCHILD to be SIG_IGN (ignore signal). |
22 |
EINVAL |
Invalid argument |
Invalid value for options. |
85 |
ERESTART |
Interrupted system call should be restarted |
WNOHANG not specified, and unblocked signal or SIGCHILD was caught. |
We can modify a few lines in our current version of the parent process (Program 3.9) to save the generated child PIDs in an array. This information can be used with the waitpid system call to coerce the parent process into displaying status information from child processes in the order of child process generation instead of their termination order. Program 3.11 shows how this can be done.
Program 3.11 A parent program using waitpid .
File : p3.11.cxx #include #include #include #include + #include #include using namespace std; int main(int argc, char *argv[] ){ 10 pid_t pid[3], w; int status; if ( argc < 4 ) { cerr << "Usage " << *argv << " value_1 value_2 value_3 "; return 1; + } for (int i=1; i < 4; ++i) // generate 3 child processes if ((pid[i-1] = fork( )) == 0) execl("./child", "child", argv[i], (char *) 0); else // assuming no failures here 20 cout << "Forked child " << pid[i-1] << endl; /* Wait for the children */ for (int i=0;(w=waitpid(pid[i], &status,0)) && w != -1; ++i){ + cout << "Wait on PID " << dec << w << " returns "; if (WIFEXITED(status)) // test with macro cout << " a value of " << setw(4) << setfill('0') << hex << setiosflags(ios::uppercase) << WEXITSTATUS(status) << endl; else if (WIFSIGNALED(status)) // test with macro 30 cout << " a signal of " << setw(4) << setfill('0') << hex << setiosflags(ios::uppercase) << WTERMSIG(status) << endl; else cout << " unexpectedly!" << endl; } + return 0; }
A run of this program (using the same child processProgram 3.10) confirms that the status information returned to the parent is indeed ordered based on the sequence of child processes generation, not the order in which the processes terminated. Also, note that the status macros are used to evaluate the return from waitpid system call (Figure 3.13).
Figure 3.13 Output of Program 3.11.
linux$ p3.11 2 2 1 Forked child 9772 Forked child 9773 <-- 1 Child 9773 is terminating with exit(008B) <-- 2 Forked child 9774 Child 9772 is terminating with exit(00CD) Wait on PID 9772 returns a value of 00CD <-- 3 Wait on PID 9773 returns a value of 008B Child 9774 is terminating with signal 0009 Wait on PID 9774 returns a signal of 0009
(1) Order of creation :
(2) Order of termination :
(3) Order of wait :
On some occasions, the information returned from wait or waitpid may be insufficient. Additional information on resource usage by a child process may be sought. There are two BSD compatibility library functions, wait3 and wait4 , [12] that can be used to provide this information (Table 3.15).
[12] It is not clear if these functions will be supported in subsequent versions of the GNU compiler, and they may limit the portability of programs that incorporate them. As these are BSD-based functions, _USE_BSD must be defined in the program code or defined on the command line when the source code is compiled.
Table 3.15. Summary of the wait3/wait4 Library Functions.
Include File(s) |
#define _USE_BSD #include #include #include |
Manual Section |
3 |
||
Summary |
pid_t wait3(int *status, int options, struct rusage *rusage); pid_t wait4(pid_t pid, int *status, int options, struct rusage *rusage); |
||||
Return |
Success |
Failure |
Sets errno |
||
Child PID or 0 |
-1 |
Yes |
The wait3 and wait4 functions parallel the wait and waitpid functions respectively. The wait3 function waits for the first child process to terminate or stop. The wait4 function waits for the specified PID ( pid ). In addition, should the pid value passed to the wait4 function be set to 0, wait4 will wait on the first child process in a manner similar to wait3 . Both functions accept option flags to indicate whether or not they should block and/or report on stopped child processes. These option flags are shown in Table 3.16.
Table 3.16. Option Flag Values for wait3 / wait4 .
FLAG Value |
Specifies |
---|---|
WNOHANG |
Return immediately if no child has exiteddo not block if the status cannot be obtained; return a value of 0 not the PID . |
WUNTRACED |
Return immediately if child is blocked. |
Both functions contain an argument that is a reference to a rusage structure. This structure is defined in the header file . [13]
[13] On some systems, you may need the header file instead of , and you may need to explicitly link in the BSD library that contains the object code for the wait3/wait4 functions.
struct rusage { struct timeval ru_utime; /* user time used */ struct timeval ru_stime; /* system time used */ long ru_maxrss; /* maximum resident set size */ long ru_ixrss; /* integral shared memory size */ long ru_idrss; /* integral unshared data size */ long ru_isrss; /* integral unshared stack size */ long ru_minflt; /* page reclaims */ long ru_majflt; /* page faults */ long ru_nswap; /* swaps */ long ru_inblock; /* block input operations */ long ru_oublock; /* block output operations */ long ru_msgsnd; /* messages sent */ long ru_msgrcv; /* messages received */ long ru_nsignals; /* signals received */ long ru_nvcsw; /* voluntary context switches */ long ru_nivcsw; /* involuntary context switches */ };
If the rusage argument is non-null, the system populates the rusage structure with the current information from the specified child process. See the getrusage system call in Section 2 of the manual pages for additional information. The status macros (see previous section on wait and waitpid ) can be used with the status information returned by wait3 and wait4 . See Table 3.17.
Table 3.17. wait3 / wait4 Error Messages.
# |
Constant |
perror Message |
Explanation |
---|---|---|---|
4 |
EINTR |
Interrupted system call |
Signal was caught during the system call. |
10 |
ECHILD |
No child process |
Process specified by pid does not exist, or child process has set action of SIGCHILD to be SIG_IGN (ignore signal). |
22 |
EINVAL |
Invalid argument |
Invalid value for options . |
85 |
ERESTART |
Interrupted system call should be restarted |
WNOHANG not specified, and unblocked signal or SIGCHILD was caught. |