Debugging Multithreaded Programs
Writing multithreaded programs that execute correctly can be quite a challenge. Fortunately, there are some tools available to help with the task. Many current C, C++ compilers are bundled with thread-aware debuggers . For example, newer versions of the GNU C, C++ compiler gcc , g++ , come with gdb , and Solaris' C, C++ compiler comes with dbx . Thread-aware debuggers automatically recognize multithreaded code. Such debuggers can be used to step through multithreaded programs and examine the contents of mutexes and TSD.
We will use Program 11.11 as source for our thread-debugging example. The debugger presented will be GNU's gdb (version 5.1.90CVS-5.) [24] As presented, this program is syntactically correct but contains logic errors pertaining to the access and manipulation of common data by the multiple detached threads.
[24] Only the command-line version of the debugger will be addressed. GNU also provides a graphical interface for its debugger called xxgdb for those who are working in a windowing environment.
Program 11.11 Debugging multithreaded programs.
File : p11.11.cxx /* Debugging multithreaded prgrms - WITH LOCKING ERRORS Compile: g++ p11.11.cxx -lpthread -o p11.11 */ + #define _REENTRANT #define _GNU_SOURCE #include #include #include 10 #include #include #include using namespace std; const int MAX=5, + HOME=25; int my_rand(int start, int range){ struct timeval t; gettimeofday(&t, (struct timezone *)NULL); 20 return (int)(start+((float)range * rand_r((unsigned *)&t.tv_usec)) / (RAND_MAX+1.0)); } typedef struct { int increment; + char *phrase; } argument; void step( void * ); // common to all threads pthread_t thread_id[MAX]; 30 bool alive = true, home = false; int position,total=0; char walk[] = " "; int main(int argc, char *argv[]) { + argument right={ +1, "ZOINK! Stepped off the RIGHT side. "}, left ={ -1, "SPLAT! Stepped off the LEFT side. "}; pthread_attr_t attr_obj; if (argc < 2) { /* check arg list */ cerr << *argv << " start_position" << endl; 40 return 1; } position = atoi(argv[1]); if ( position < 1 ) position = 1; + else if ( position > MAX ) position = MAX; walk[position+5] = '*'; setvbuf(stdout, (char *) NULL, _IONBF, 0); cout << "The drunken sailor walk" << endl << endl; 50 cout << " +12345+" << endl; cout << walk << endl; pthread_attr_init( &attr_obj ); pthread_attr_setdetachstate( &attr_obj, PTHREAD_CREATE_DETACHED ); pthread_create(&thread_id[0], &attr_obj, + (void *(*) (void *)) step, (void *) &right); pthread_create(&thread_id[1], &attr_obj, (void *(*) (void *)) step, (void *) &left ); pthread_exit(NULL); return 0; 60 } void step( void *a ) { argument *my_arg=(argument *)a; do { + sleep( my_rand(1,3) ); // pause a bit walk[position+MAX] = ' '; // clear old position position += my_arg->increment; // calculate new position alive = bool(position > 0 && position <= MAX); walk[position+MAX] = alive ? '*' : '$'; 70 cout << walk << endl; home = bool(++total >= HOME); if ( !alive home ) { if ( !alive ) cout << my_arg->phrase; + else cout << "The sailor made it home safely this time! "; pthread_kill(thread_id[ (position < 1 ? 1 : 0)], 9); } sched_yield( ); 80 } while ( alive && !home ); }
Program 11.11 contains an assortment of POSIX thread calls. The program, which is purely pedagogical in nature, implements a version of the "drunken sailor" problem. In this version, a drunken sailor is given a starting position on a boardwalk that is five steps wide. The program traces the path of the sailor as he or she progresses down the boardwalk toward home (located an arbitrary number of steps from the start). If the sailor steps off either side of the boardwalk, he or she perishes. If the sailor is still on the boardwalk after a set number of steps he or she is considered to have made it home. The sailor's position on the boardwalk is stored in a variable called position . Two threads manipulate this data. One thread executes a user -defined function, step , moving the sailor to the right, while a second thread executes the same function, moving the sailor to the left (the movement is based on the argument passed to the step function). Both threads are detached from the initiating thread. When the sailor perishes or reaches the end of the walk, the detached threads are terminated . Typical output from Program 11.11 is shown in Figure 11.19.
Figure 11.19. Several runs of Program 11.11.
In the first run it appears that the program is working pretty much as would be expected. However, the second and third run produces somewhat unexpected results. In the second run it looks as if there might be two sailors on the boardwalk (I suppose one could be seeing doublebut this is not the case). In the third run the right side of the boardwalk seems to have disappeared. Clearly, something funny is going on! The problem is tied to the unrestricted access of common data by competing threads. One way to check on what is happening is to run the program in the debugger.
To prepare the program for the debugger, pass the -g argument at compilation time to prevent the automatic removal of additional symbol table information from the executable. For example, the command sequence
linux$ g++ -g p11.11.cxx -lpthread -o p11.11
produces an executable, p11.11 , that can be loaded and run in the debugger. When the debugger is invoked, it is passed the name of the executable. For our example this would be
linux$ gdb p11.11 GNU gdb Red Hat Linux (5.1.90CVS-5) Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... (gdb)
Suppose we want the debugger to stop in the user-defined function step . We can use the list command in gdb to show us a given sequence of lines (with their line numbers ). For example,
(gdb) list 61,81 61 void 62 step( void *a ) { 63 argument *my_arg=(argument *)a; 64 do { 65 sleep( my_rand(1,3) ); // pause a bit 66 walk[position+MAX] = ' '; // clear old position 67 position += my_arg->increment; // calculate new position 68 alive = bool(position > 0 && position <= MAX); 69 walk[position+MAX] = alive ? '*' : '$'; 70 cout << walk << endl; 71 home = bool(++total >= HOME); 72 if ( !alive home ) { 73 if ( !alive ) 74 cout << my_arg->phrase; 75 else 76 cout << "The sailor made it home safely this time! "; 77 pthread_kill(thread_id[ (position < 1 ? 1 : 0)], 9); 78 } 79 sched_yield( ); 80 } while ( alive && !home ); 81 }
Or, we can also use the list command and pass the name of the user-defined function we would like to see listed (such as step ). If we do this, the debugger will show the first N (usually 10) lines of the referenced function. The listing usually begins a line or two prior to the actual function.
(gdb) list step 57 (void *(*) (void *)) step, (void *) &left ); 58 pthread_exit(NULL); 59 return 0; 60 } 61 void 62 step( void *a ) { 63 argument *my_arg=(argument *)a; 64 do { 65 sleep( my_rand(1,3) ); // pause a bit 66 walk[position+MAX] = ' '; // clear old position
To stop at line 66, we establish a breakpoint.
(gdb) break 66 Breakpoint 1 at 0x8048b61: file p11.11.cxx, line 66.
To execute (run) the program, the run command is used. Any values that would normally be passed on the command line are placed after the run command.
(gdb) run 5 Starting program: /home/faculty/gray/revision/11/sailor/p11.11 5 [New Thread 1024 (LWP 3176)] The drunken sailor walk +12345+ <-- 1 * [New Thread 2049 (LWP 3193)] [New Thread 1026 (LWP 3194)] [New Thread 2051 (LWP 3195)] [Switching to Thread 1026 (LWP 3194)] Breakpoint 1, step (a=0xbffffb48) at p11.11.cxx:66 66 walk[position+MAX] = ' '; // clear old position
(1) General program output.
When the debugger stops at the indicated line, the command info thread can be issued to obtain a wealth of thread information.
(gdb) info thread 4 Thread 2051 (LWP 3195) 0x420b4b31 in nanosleep () from /lib/i686/libc.so.6 * 3 Thread 1026 (LWP 3194) step (a=0xbffffb48) at p11.11.cxx:66 2 Thread 2049 (LWP 3193) 0x420e0037 in poll () from /lib/i686/libc.so.6 1 Thread 1024 (LWP 3176) 0x420292e5 in sigsuspend () from /lib/i686/libc.so.6
The astute reader will notice a number of things. Thread 1 (the initiating thread) was directed to exit (line 58, pthread_exit(NULL); ) but at this juncture still appears to be active. At present, there are four threads associated with the program. The current active thread, identified with an asterisk, is thread ID 3, which is associated with LWP 3194.
The command display variable_name , where variable_name is the name of the variable of interest, directs the debugger to display the current contents of the variable each time a breakpoint is encountered . In the sequence below we have directed the debugger to display the contents of the global variables alive , position , and home before we issue run .
. . . Starting program: /home/faculty/gray/revision/11/sailor/p11.11 5 [New Thread 1024 (LWP 3274)] The drunken sailor walk +12345+ * [New Thread 2049 (LWP 3291)] [New Thread 1026 (LWP 3292)] [New Thread 2051 (LWP 3293)] [Switching to Thread 1026 (LWP 3292)] Breakpoint 1, step (a=0xbffffb48) at p11.11.cxx:66 66 walk[position+MAX] = ' '; // clear old position 3: home = false 2: position = 5 <-- 1 1: alive = true (gdb) cont Continuing. $ ZOINK! Stepped off the RIGHT side. [Switching to Thread 2051 (LWP 3293)] Breakpoint 1, step (a=0xbffffb40) at p11.11.cxx:66 66 walk[position+MAX] = ' '; // clear old position 3: home = false 2: position = 6 <-- 2 1: alive = false (gdb) cont Continuing. * Breakpoint 1, step (a=0xbffffb40) at p11.11.cxx:66 66 walk[position+MAX] = ' '; // clear old position 3: home = false 2: position = 5 1: alive = true . <-- 3 . .
(1) At this point the sailor, at position 5, has not reached home and is still alive.
(2) Now the sailor is at position 6. He or she has not reached home and is no longer alive. The program should stop here, but it does not.
(3) Suddenly, the sailor is at position 5. While still having reached home, the sailor is now alive! Clearly, the thread doing the decrement to the position has performed its activity before the test for being alive was done.
A specific thread can be referenced with the command thread N, where N is the number of the appropriate thread. As shown below, information specific to the thread can be referenced once the thread is loaded.
(gdb) thread 4 [Switching to thread 4 (Thread 2051 (LWP 3342))]#0 step (a=0xbffffb40) at p11.11.cxx:66 66 walk[position+MAX] = ' '; // clear old position (gdb) print *my_arg = {increment = -1, phrase = 0x8048da0 "SPLAT! Stepped off the LEFT side. "} <-- 1 (gdb) thread 3 [Switching to thread 3 (Thread 1026 (LWP 3341))]#0 step (a=0xbffffb48) at p11.11.cxx:66 66 walk[position+MAX] = ' '; // clear old position (gdb) print *my_arg = {increment = 1, phrase = 0x8048d60 "ZOINK! Stepped off the RIGHT side. "}
(1) This is the thread that does the decrement.
Anytime the debugger is stopped , the contents of a mutex can be displayed ( assuming it is within the current scope). For example, if we had a mutex called my_lock , its contents before it is acquired would be
(gdb) print my_lock = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = { __status = 0, __spinlock = 0}} <-- 1 (gdb) print my_lock <-- 1 = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = { __status = 1, __spinlock = 0}}
(1) This member is set to 1 when the mutex is locked.
and after it is acquired
(1) This member is set to 1 when the mutex is locked.
The quit command is used to leave the debugger. An abbreviated listing of gdb commands can be displayed in gdb using the command help . The manual pages on gdb contain a more detailed explanation of how to invoke gdb . On the command line, info gdb provides a wealth of information on how to use gdb (including a fairly detailed sample session).