exec s Minions
Processes generate child processes for a number of reasons. In a Linux environment, there are several long-lived processes, which run continuously in the background and provide system services upon demand. These processes, called daemon processes, frequently generate child processes to carry out the requested service. Some daemon processes commonly found in a Linux environment are lpd , the line printer daemon; xinetd , the extended Internet services daemon; and syslogd , the system logging daemon. Some problems (such as with databases) lend themselves to concurrent type solutions that can be effected via multiple child processes executing the same code. More commonly, such as when the shell processes a command, a process procreates a child process because it would like to transform the child process by changing the program code the child process is executing.
In Linux, any one of five library functions and one system call can be used to replace the current process image with a new image. [1] The library functions act as a front end to the system call. The library functions are discussed in the exec manual pages (Section 3), while the system call ( execve ) warrants its own manual page entry in Section 2. Any of these can be directly invoked by the programmer. For ease of comparison, the library functions and the system call are discussed as a group . The phrase exec call will reference this group.
[1] In some versions of UNIX, such as Solaris, all the exec calls are system calls and are grouped together as library functions and discussed in one section of the manual. Linux has a more historic approach to things.
It is important to remember that when a process issues any exec call, if the call is successful, the existing process is overlaid with a new set of program code. The text, data ( initialized and uninitialized ), and stack segment of the process are replaced and only the u ( user ) area of the process remains the same. The new program code (if a C/C++ binary) begins its execution at the function main . Since the system is now executing a different set of code for the same process, some things, by necessity, must change:
- Signals that were specified as being caught by the process (i.e., associated with a signal-catching routine) are reset to their default action. This is necessary, as the addresses for the signal-catching routines are no longer valid.
- In a similar vein, if the process was profiling (determining how much time is spent in individual routines), the profiling will be turned off in the overlaid process.
- If the new program has its SUID bit set, the effective EUID and EGID are set accordingly .
The program to be executed can be a script. In this case, the script should have its execute bit set and start with the line #! interpreter [ arg(s) ], where interpreter is a valid executable (but not another script). If successful, the exec calls do not return, as the initial calling image is lost when overlaid with a new image.
Before we delve into these calls, we should take a quick look at what normally transpires when a valid command is issued at the system (shell) level, as this process will reflect the functionality available in a program. If the command issued is
linux$ cat file.txt > file2.txt
the shell parses the command line and divides it into valid tokens (e.g., cat , file.txt , etc.). The shell (via a call to fork ) then generates a child process. After the fork , the shell closes standard output and opens the file file2.txt , mapping it to standard output in the child process. Next , by calling execve , the shell overlays the current program code with the program code for the command (in this case, the code for cat ). When the command is finished, the shell redisplays its prompt. Figure 3.2 shows the process creation and command execution sequence.
Figure 3.2. Process creation and command execution at the shell level.
While the command is executing, the shell, by default, waits in the background. As we will see, there is a wait system call that allows the shell or any other process to wait. Should the user place an & at the end of the command (to indicate to the shell that the command be placed in background), the shell will not wait and will return immediately with its prompt. When the command is finished, it may perform a call to exit or return when in the function main . The integer value passed to these calls is made available to the parent process via an argument to the wait system call. When on the command line, the returned value is stored in the system variable named status . If in the Bourne or BASH shell you issue the command
linux$ echo $?
the system will display the value returned by the last command executed. As the mapping of standard output to the file file2.txt was done in the child process and not in the shell, the I/O redirection has no further impact on ensuing command sequences.
We should note that it is possible for a user at the command line to issue an exec call. The syntax would be
linux$ exec command [arguments]
However, most users would not do this. The current process (the shell) would be overlaid with the program code for the command. Once the command was finished, the user would be logged out, as the original shell process would no longer exist!
In a programming environment, the exec calls can be used to execute another program. The prototypes for the exec calls are listed in Table 3.1.
Table 3.1. The exec Call Prototypes.
[View full width]
#include
extern char **environ;
int execl (const char *path, const char *arg, ...);
int execv (const char *path, char *const argv[]);
int execle(const char *path, const char *arg , ...
|
The naming convention for these system calls reflects their functionality. Each call starts with the letters exec . The next letter in the call name indicates if the call takes its arguments in a list format (i.e., literally specified as a series of arguments) or as a pointer to an array of arguments (analogous to the argv structure discussed earlier). The presence of the letter l indicates a list arrangement (a variable argument listsee the manual page on stdarg for details); v indicates the array or vector arrangement. The next letter of the call name (if present) is either an e or a p . The presence of an e indicates the programmers will construct (in the array/vector format) and pass their own environment variable list. The passed environment variable list will become the third argument to the function main (i.e., envp ). As noted in the section on environment variables , envp is of limited practical value. When the programmer is responsible for the environment, the current environment variable list is not passed. The presence of a p indicates the current environment PATH variable should be used when searching for a file whose name does not contain a slash. [2] In the four calls, where the PATH string is not used ( execl , execv , execle and execve ), the path to the program to be executed must be fully specified.
[2] If the executable file is a script, the Bourne shell ( /bin/sh ) is invoked to execute the script. The shell is then passed the specified argument information.
The functionality of the exec system calls is best summarized by Table 3.2.
Table 3.2. exec Call Functionality.
Library Call Name |
Argument Format |
Pass Current Set of Environment Variables? |
Search of PATH Automatic? |
---|---|---|---|
execl |
list |
yes |
no |
execv |
array |
yes |
no |
execle |
list |
no |
no |
execve |
array |
no |
no |
execlp |
list |
yes |
yes |
execvp |
array |
yes |
yes |
Of the six variations, execlp and execvp calls are used most frequently (as automatic environment passing and path searching are usually desirable) and will be explained in detail.
3.3.1 execlp
The execlp library function (Table 3.3) is used when the number of arguments to be passed to the program to be executed is known in advance.
When using execlp , the initial argument, file , is a pointer to the file that contains the program code to be executed. If this file reference begins with a /, it is assumed that the reference is an absolute path to the file. In this circumstance, it would appear that the p specification ( execlp ) is superfluous; however, the PATH string is still used if other arguments are file names or if the code to be executed contains file references. If no / is found, each of the directories specified in the PATH variable will be, in turn , preappended to the file name specified, and the first valid program reference found will be the one executed. It is a good practice to fully specify the program to be executed in all situations to prevent a program with the same name, found in a prior PATH string directory, from being inadvertently executed. For the execlp call to be successful, the file referenced must be found and be marked as executable. If the call fails, it returns a -1 and sets errno to indicate the error. As the overlaying of one process image with another is very complex, the possibilities for failure are numerous (as shown in Table 3.4).
Table 3.3. Summary of the execlp Library Function.
Include File(s) |
extern char **environ; |
Manual Section |
3 |
|
Summary |
int execlp(const char *file,const char *arg, . . .); |
|||
Return |
Success |
Failure |
Sets errno |
|
Does not return |
-1 |
Yes |
Table 3.4. exec Error Messages.
# |
Constant |
perror Message |
Explanation |
---|---|---|---|
1 |
EPERM |
Operation not permitted |
|
2 |
ENOENT |
No such file or directory |
One or more parts of path to new process file does not exist (or is NULL). |
4 |
EINTR |
Interrupted system call |
Signal was caught during the system call. |
5 |
EIO |
Input/output error |
|
7 |
E2BIG |
Argument list too long |
New process argument list plus exported shell variables exceed the system limits. |
8 |
ENOEXEC |
Exec format error |
New process file is not in a recognized format. |
11 |
EAGAIN |
Resource temporarily unavailable |
Total system memory while reading raw I/O is temporarily insufficient. |
12 |
ENOMEM |
Cannot allocate memory |
New process memory requirements exceed system limits. |
13 |
EACCES |
Permission denied |
|
14 |
EFAULT |
Bad address |
path references an illegal address. |
20 |
ENOTDIR |
Not a directory |
Part of the specified path is not a directory. |
21 |
EISDIR |
Is a directory |
An ELF interpreter was a directory. |
22 |
EINVAL |
Invalid argument |
An ELF executable had more than one interpreter. |
24 |
EMFILE |
Too many open files |
Process has exceeded the maximum number of files open. |
26 |
ETXTBSY |
Text file busy |
More than one process has the executable open for writing. |
36 |
ENAMETOOLONG |
File name too long |
The path value exceeds system path/file name length. |
40 |
ELOOP |
Too many levels of symbolic links |
The perror message says it all. |
67 |
ENOLINK |
Link has been severed |
The path value references a remote system that is no longer active. |
72 |
EMULTIHOP |
Multihop attempted |
The path value requires multiple hops to remote systems, but file system does not allow it. |
80 |
ELIBBAD |
Accessing a corrupted shared library |
An ELF interpreter was not in a recognized format. |
The ellipses in the execlp function prototype can be thought of as argument 0 ( arg0 ) through argument n ( argn ). These arguments are pointers to the null- terminated strings that would be normally passed by the system to the program if it were invoked on the command line. That is, argument 0, by convention, should be the name of the program that is executing. This is usually the same as the value in file , although the program referenced by file may include an absolute path, while the value in argument 0 most often would not. Argument 1 would be the first parameter to be passed to the program (which, using argv notation, would be argv[1] ), argument 2 would be the second, and so on. The last argument to the execlp library call must be a NULL that is, for portability reasons, cast to a character pointer. Program 3.3, which invokes the cat utility program, demonstrates the use of the execlp library call.
Program 3.3 Using the execlp system call.
File : p3.3.cxx /* Running the cat utility via an exec system call */ #include + #include #include using namespace std; int main(int argc, char *argv[ ]){ 10 if (argc > 1) { execlp("/bin/cat", "cat", argv[1], (char *) NULL); perror("exec failure "); return 1; } + cerr << "Usage: " << *argv << " text_file" << endl; return 2; }
When passed a text file name on the command line, this program displays the contents of the file to the screen. The program accomplishes this by overlaying its own process image with the program code for the cat utility program. The program passes the cat utility program the name (referenced by argv[1] ) of the file to display. If the execlp system call fails, the call to perror is made and the program exits and returns the value 1 to the system. If the call is successful, the perror and return statements are never reached, as they are replaced with the program code for the cat utility.
A sample run of the program is shown in Figure 3.3.
Figure 3.3 Output of Program 3.3.
linux$ p3.3 test.txt This is a sample text file for the program to display!
3.3.2 execvp
If the number of arguments for the program to be executed is dynamic, then the execvp call can be used (Table 3.5). As with the execlp call, the initial argument to execvp is a pointer to the file that contains the program code to be executed. However, unlike execlp , there is only one additional argument that execvp requires. This second argument, defined as
char *const argv[ ]
specifies that a reference to an array of pointers to character strings should be passed. The format of this array parallels that of argv and, in many cases, is argv . If the reference is not the argv values for the current program, the programmer is responsible for constructing and initializing a new argv -like array. If this second approach is taken, the last element of the new argv array should contain a NULL address value. If execvp fails, it returns a value of -1 and sets the value in errno to indicate the source of the error (see Table 3.5).
Table 3.5. Summary of the execvp System Call.
Include File(s) |
|
Manual Section |
3 |
|
Summary |
Int execvp(const char *file, char *const argv[]); |
|||
Return |
Success |
Failure |
Sets errno |
|
Does not return |
-1 |
Yes |
Program 3.4 makes use of the argv values for the current program.
Program 3.4 Using execvp with argv values.
File : p3.4.cxx /* Using execvp to execute the contents of argv */ #include + #include #include using namespace std; int main(int argc, char *argv[ ]) { 10 if ( argc > 1 ) { execvp(argv[1], &argv[1]); perror("exec failure"); return 1; } + cerr << "Usage: " << *argv << " exe [arg(s)]" << endl; return 2; }
The program will execute, via execvp , the program passed to it on the command line. The first argument to execvp , argv[1] , is the reference to the program to execute.
The second argument, &argv[1] , is the reference to the remainder of the command-line argv array. Notice that both of these references began with the second element of argv (that is, argv[1] ), as argv[0] is the name of the current program (e.g., p3.4 ). The output in Figure 3.4 shows that the program does work as expected.
Figure 3.4 Output of Program 3.4 when passed the cat command.
linux$ p3.4 cat test.txt This is a sample text file for a program to display!
If we place additional information on the command line when running Program 3.4, we find the program will pass the information on, as demonstrated in Figure 3.5.
Figure 3.5 Output of Program 3.4 when passed the cat command with the -n option.
linux$ p3.4 cat -n test.txt 1 This is a sample text 2 file for a program to 3 display!
If command-line argv values of the current program are not used with execvp , then the programmer must construct a new argv to be passed. An example of how this can be done is shown in Program 3.5.
Program 3.5 Using execvp with a programmer-generated argument list.
File : p3.5.cxx /* Generating our own argv type list for execvp */ #include + #include #include using namespace std; int main( ){ 10 char *new_argv[ ] = {"cat", "test.txt", (char *) 0 }; execvp("/bin/cat", new_argv ); + perror("exec failure "); return 1; }
When compiled and run as p3.5 , the output of this program will be the same as the output from the first run of Program 3.4.