The Exponential Model

The exponential model is another special case of the Weibull family, with the shape parameter m equal to 1. It is best used for statistical processes that decline monotonically to an asymptote. Its cumulative distribution function (CDF) and probability density function (PDF) are

where c is the scale parameter, t is time, and l =1/ c . Applied to software reliability, l is referred to as the error detection rate or instantaneous failure rate . In statistical terms it is also called the hazard rate .

Again the preceding formulas represent a standard distribution ”the total area under the PDF curve is 1. In actual application, the total number of defects or the total cumulative defect rate K needs to be multiplied to the formulas. K and lambda ( l ) are the two parameters for estimation when deriving a specific model from a data set.

The exponential distribution is the simplest and most important distribution in reliability and survival studies. The failure data of much equipment and many processes are well described by the exponential distribution: bank statement and ledger errors, payroll check errors, light bulb failure, automatic calculating machine failure, radar set component failure, and so forth. The exponential distribution plays a role in reliability studies analogous to that of normal distribution in other areas of statistics.

In software reliability the exponential distribution is one of the better known models and is often the basis of many other software reliability growth models. For instance, Misra (1983) used the exponential model to estimate the defect-arrival rates for the shuttle's ground system software of the National Aeronautics and Space Administration (NASA). The software provided the flight controllers at the Johnson Space Center with processing support to exercise command and control over flight operations. Data from an actual 200- hour flight mission indicate that the model worked very well. Furthermore, the mean value function (CDF) of the Goel-Okumoto (1979) nonhomogeneous Poisson process model (NPPM) is in fact the exponential model.

Figures 8.1 and 8.2 show the exponential model applied to the data of one of the AS/400 software products. We have modeled the weekly defect arrival data since the start of system test, when the development work was virtually complete. The system-testing stage uses customer interfaces, tests external requirements, and simulates end- user application environments. The pattern of defect arrivals during this stage, therefore, should be indicative of the latent defect rate when the system is shipped.

Figure 8.1. Exponential Model ”Density Distribution

Figure 8.2. Exponential Model ”Cumulative Distribution

Like the Rayleigh model, the exponential model is simple and quick to implement when powerful statistical software is available. For example, it can be implemented via SAS programs similar to the one shown in Figure 7.5 of the previous chapter. Of course, if a high degree of usability and various scenarios are desired, more elaborate software is needed.

Besides programming, the following should be taken into consideration when applying the exponential distribution for reliability projection or estimating the number of software defects. First, as with all types of modeling and estimation, the more accurate and precise the input data, the better the outcome. Data tracking for software reliability estimation is done either in terms of precise CPU execution time or on a calendar-time basis. Normally execution-time tracking is for small projects or special reliability studies; calendar-time tracking is common for commercial development. When calendar-time data are used, a basic assumption for the exponential model is that the testing effort is homogeneous throughout the testing phase. Ohba (1984) notes that the model does not work well for calendar-time data with a nonhomogeneous time distribution of testing effort. Therefore, this assumption must be examined when using the model. For instance, in the example shown in Figures 8.1 and 8.2 the testing effort remained consistently high and homogeneous throughout the system test phase; a separate team of testers worked intensively based on a predetermined test plan. The product was also large (>100 KLOC) and therefore the trend of the defect arrival rates tended to be stable even though no execution-time data were available.

To verify the assumption, indicators of the testing effort, such as the person-hours in testing for each time unit (e.g., day or week), test cases run, or the number of variations executed, are needed. If the testing effort is clearly not homogeneous, some sort of normalization has to be made. Otherwise, models other than the exponential distribution should be considered .

As an example of normalization, let us assume the unit of calendar time is a week and it is clear that the weekly testing effort is not homogeneous. Further assume that weekly data on the number of person-hours in testing are known. Simple adjustments such as the following can reduce artificial fluctuations in the data and can make the model work better:

  1. Accumulate the total person-hours in testing for the entire testing phase and calculate the average number of person-hours in testing per week, n.
  2. Starting from the beginning of testing, calculate the defect rates (or defect count) for each n person-hour units. Allocate the defect rates to the calendar week in sequence. Specifically, allocate the defect rate observed for the first n person-hours of testing to the first week; allocate the defect rate observed for the second n person-hours of testing to the second week, and so forth.
  3. Use the allocated data as weekly input data for the model.

Second, the more data points available, the better the model will perform ” assuming there is an adequate fit between the model and the data. The question is: When the test is in progress, how much data is needed for the model to yield reasonably adequate output? Ehrlich and associates (1990) investigated this question using data from AT&T software that was a transmission measurement system for remote testing of special service circuits. They assessed the predictive validity of the exponential model with data at 25%, 50%, 60%, 70%, and 80% into test, and at test completion. They found that at 25% into test the model results were way off. At 50% the results improved considerably but were still not satisfactory. At 60% into test, the exponential model had satisfactory predictive validity. Although it is not clear whether these findings can be generalized, they provide a good reference point for real-time modeling.

Категории