Testing Concurrent Programs
Concurrent programs employ similar design principles and patterns to sequential programs. The difference is that concurrent programs have a degree of nondeterminism that sequential programs do not, increasing the number of potential interactions and failure modes that must be planned for and analyzed.
Similarly, testing concurrent programs uses and extends ideas from testing sequential ones. The same techniques for testing correctness and performance in sequential programs can be applied to concurrent programs, but with concurrent programs the space of things that can go wrong is much larger. The major challenge in constructing tests for concurrent programs is that potential failures may be rare probabalistic occurrences rather than deterministic ones; tests that disclose such failures must be more extensive and run for longer than typical sequential tests.
Most tests of concurrent classes fall into one or both of the classic categories of safety and liveness. In Chapter 1, we defined safety as "nothing bad ever happens" and liveness as "something good eventually happens".
Tests of safety, which verify that a class's behavior conforms to its specification, usually take the form of testing invariants. For example, in a linked list implementation that caches the size of the list every time it is modified, one safety test would be to compare the cached count against the actual number of elements in the list. In a single-threaded program this is easy, since the list contents do not change while you are testing its properties. But in a concurrent program, such a test is likely to be fraught with races unless you can observe the count field and count the elements in a single atomic operation. This can be done by locking the list for exclusive access, employing some sort of "atomic snapshot" feature provided by the implementation, or by using "test points" provided by the implementation that let tests assert invariants or execute test code atomically.
In this book, we've used timing diagrams to depict "unlucky" interactions that could cause failures in incorrectly constructed classes; test programs attempt to search enough of the state space that such bad luck eventually occurs. Unfortunately, test code can introduce timing or synchronization artifacts that can mask bugs that might otherwise manifest themselves.[1]
[1] Bugs that disappear when you add debugging or test code are playfully called Heisenbugs.
Liveness properties present their own testing challenges. Liveness tests include tests of progress and nonprogress, which are hard to quantifyhow do you verify that a method is blocking and not merely running slowly? Similarly, how do you test that an algorithm does not deadlock? How long should you wait before you declare it to have failed?
Related to liveness tests are performance tests. Performance can be measured in a number of ways, including:
Throughput: the rate at which a set of concurrent tasks is completed;
Responsiveness: the delay between a request for and completion of some action (also called latency); or
Scalability: the improvement in throughput (or lack thereof) as more resources (usually CPUs) are made available.
Testing for Correctness
|