Lines of Code

The lines of code (LOC) count is usually for executable statements. It is actually a count of instruction statements. The interchangeable use of the two terms apparently originated from Assembler program in which a line of code and an instruction statement are the same thing. Because the LOC count represents the program size and complexity, it is not a surprise that the more lines of code there are in a program, the more defects are expected. More intriguingly, researchers found that defect density (defects per KLOC) is also significantly related to LOC count. Early studies pointed to a negative relationship: the larger the module size, the smaller the defect rate. For instance, Basili and Perricone (1984) examined FORTRAN modules with fewer than 200 lines of code for the most part and found higher defect density in the smaller modules. Shen and colleagues (1985) studied software written in Pascal, PL/S, and Assembly language and found an inverse relationship existed up to about 500 lines. Since larger modules are generally more complex, a lower defect rate is somewhat counterintuitive. Interpretation of this finding rests on the explanation of interface errors: Interface errors are more or less constant regardless of module size, and smaller modules are subject to higher error density because of smaller denominators.

More recent studies point to a curvilinear relationship between lines of code and defect rate: Defect density decreases with size and then curves up again at the tail when the modules become very large. For instance, Withrow (1990) studied modules written in Ada for a large project at Unisys and confirmed the concave relationship between defect density (during formal test and integration phases) and module size (Table 11.1). Specifically, of 362 modules with a wide range in size (from fewer than 63 lines to more than 1,000), Withrow found the lowest defect density in the category of about 250 lines. Explanation of the rising tail is readily available. When module size becomes very large, the complexity increases to a level beyond a programmer's immediate span of control and total comprehension . This new finding is also consistent with previous studies that did not address the defect density of very large modules.

Experience from the AS/400 development also lends support to the curvilinear model. In the example in Figure 11.1, although the concave pattern is not as significant as that in Withrow's study, the rising tail is still evident.

Figure 11.1. Curvilinear Relationship Between Defect Rate and Module Size ”AS/400 data

The curvilinear model between size and defect density sheds new light on software quality engineering. It implies that there may be an optimal program size that can lead to the lowest defect rate. Such an optimum may depend on language, project, product, and environment; apparently many more empirical investigations are needed. Nonetheless, when an empirical optimum is derived by reasonable methods (e.g., based on the previous release of the same product, or based on a similar product by the same development group ), it can be used as a guideline for new module development.

Table 11.1. Curvilinear Relationship Between Defect Rate and Module Size ”Withrow (1990)

Maximum Source Lines of Modules

Average Defect per 1,000 Source Lines

63

1.5

100

1.4

158

0.9

251

0.5

398

1.1

630

1.9

1000

1.3

>1000

1.4

Категории