Scatter Diagram

Compared to other tools, the scatter diagram is more difficult to apply. It usually relates to investigative work and requires precise data. It is often used with other techniques such as correlational analysis, regression, and statistical modeling.

Figure 5.9 is a scatter diagram that illustrates the relationship between McCabe's complexity index and defect level. Each data point represents a program module with the X coordinate being its complexity index and the Y coordinate its defect level. Because program complexity can be measured as soon as the program is complete, whereas defects are discovered over a long time, the positive correlation between the two allows us to use program complexity to predict defect level. Furthermore, we can reduce the program complexity when it is developed (as measured by McCabe's index), thereby reducing the chance for defects. Reducing complexity can also make programs easier to maintain. Some component teams of the AS/400 operating system adopt this approach as their strategy for quality and maintainability improvement. Program modules with high-complexity indexes are the targets for analysis and possible module breakup, encapsulation, intramodule cleanup, and other actions. Of course, low-complexity indexes coupled with high defects are clear indications of modules that are poorly designed or implemented and should also be scrutinized.

Figure 5.9. Scatter Diagram of Program Complexity and Defect Level

Other examples of the scatter diagram include the relationships among defects, fan-in and fan-out, quality index of the same components between the current and previous releases, the relationship between testing defect rates and field defect rates, and so forth. We have gained insights in software quality engineering through the investigations of such relationships.

In software development, reuse is perhaps the most significant factor in improving productivity. The quality of the new software, however, is often constrained by the latent defects or design limitations in the legacy code. For the AS/400 software system, some products were developed by reusing components of products on the IBM System/38 platform. To examine the relationship of the defect rate of the reused components between the two platforms, we used the scatter diagrams. Figure 5.10 is a scatter diagram for one product. In the figure, each data point represents a component, with the X coordinate indicating its defect rate in the System/38 platform and the Y coordinate indicating its defect rate in the AS/400 platform. Although there are changes and modifications to the AS/400 product and additional reviews and tests were conducted , clearly the correlation (0.69) is quite strong. Also shown are both the linear regression line (the diagonal line) and the 95% confidence interval (area between the two broken lines).

Figure 5.10. Correlation of Defect Rates of Reused Components Between Two Platforms

We then proceeded to classify the scattergram into four quadrants according to the medians of the component defect rates on the AS/400 and System/38 platforms (Figure 5.11). Such classification allows different analysis and improvement strategies to be applied to different groups of components.

Figure 5.11. Grouping of Reused Components Based on Defect Rate Relationship

Категории