Six Sigma and Beyond: Statistical Process Control, Volume IV

One of the fundamental bases of the statistical analysis of measurements is our ability to describe the data within the context of a model, or probability distribution. These models are used primarily to describe the shape and area of a given process, so that probabilities may be associated with questions concerning the occurrence of scores of values in the distribution. Common probability distributions for discrete random variables include the binomial and poison distributions. Probability distributions employed to describe continuous random variables include the normal, exponential, Weibull, gamma, and lognormal.

What is not commonly understood , however, is that most techniques typically employed in statistical quality control and research are based on the assumption that the process(es) studied are approximated by a particular model. The selection of a specific formula or method of analysis may, in fact, be incorrect if this assumption is in error. If this erroneous assumption does occur, the decisions that are based on data studied may be incorrect, regardless of the quality of the calculations. Some examples of this situation are as follows :

Given that the validity of the statistical analysis selected is largely dependent on the correct assumption of a specific process distribution, it is desirable, if not essential, to determine whether the assumption we have made regarding an underlying distribution is reasonable.

Many statisticians would state this hypothesis as follows:

H o : It is reasonable to assume that the sample data were drawn from a (for instance) normal distribution.

However, many others (Shapiro 1980, for example) believe that this is a misleading statement. R. C. Geary (1947) once suggested that in the front of all textbooks on statistics, the following statement should appear: "Normality is a myth. There never was, and will never be, a normal distribution."

Therefore, Shapiro suggests, the hypothesis tested should actually be stated as follows:

H o : It is reasonable to approximate our process or population data with a (for example) normal distribution model and its associated analytical techniques.

Given that this is the approach with the most validity, these tests are often run at relatively higher levels of Type 1 error (.10 is frequently suggested). This is due to the fact that in this case, the consequences of committing a Type 1 error are relatively minor. Rejection of the null hypothesis will lead to one or more of the following actions:

  1. Tests are run to find an alternative model and procedures that may be used to assess the data.

  2. The data are transformed so that the assumed model is approximate. An example of this is the Box procedure for comparing the logarithms of variances rather than the variances themselves when the assumption of normality may not be accepted.

  3. Nonparametric, or supposedly "distribution-free," statistical analyses may be used in place of equivalent parametric methods ”for example, the Mann-Whitney U ”rather than a t test, or the Kruskal-Wallis test as a replacement for the ANOVA.

Категории