Computing Variance, Standard Deviation, and Other Statistical Functions
Problem
You want to compute one or more of the common statistics such as variance, standard deviation, skew, and kurtosis of a sequence of numbers.
Solution
You can use the accumulate function from the header to compute many meaningful statistical functions beyond simply the sum by passing custom function objects. Example 11-9 shows how to compute several important statistical functions, using accumulate.
Example 11-9. Statistical functions
#include #include #include #include #include #include using namespace std; template T nthPower(T x) { T ret = x; for (int i=1; i < N; ++i) { ret *= x; } return ret; } template struct SumDiffNthPower { SumDiffNthPower(T x) : mean_(x) { }; T operator( )(T sum, T current) { return sum + nthPower(current - mean_); } T mean_; }; template T nthMoment(Iter_T first, Iter_T last, T mean) { size_t cnt = distance(first, last); return accumulate(first, last, T( ), SumDiffNthPower(mean)) / cnt; } template T computeVariance(Iter_T first, Iter_T last, T mean) { return nthMoment(first, last, mean); } template T computeStdDev(Iter_T first, Iter_T last, T mean) { return sqrt(computeVariance(first, last, mean)); } template T computeSkew(Iter_T begin, Iter_T end, T mean) { T m3 = nthMoment(begin, end, mean); T m2 = nthMoment(begin, end, mean); return m3 / (m2 * sqrt(m2)); } template T computeKurtosisExcess(Iter_T begin, Iter_T end, T mean) { T m4 = nthMoment(begin, end, mean); T m2 = nthMoment(begin, end, mean); return m4 / (m2 * m2) - 3; } template void computeStats(Iter_T first, Iter_T last, T& sum, T& mean, T& var, T& std_dev, T& skew, T& kurt) { size_t cnt = distance(first, last); sum = accumulate(first, last, T( )); mean = sum / cnt; var = computeVariance(first, last, mean); std_dev = sqrt(var); skew = computeSkew(first, last, mean); kurt = computeKurtosisExcess(first, last, mean); } int main( ) { vector v; v.push_back(2); v.push_back(4); v.push_back(8); v.push_back(10); v.push_back(99); v.push_back(1); double sum, mean, var, dev, skew, kurt; computeStats(v.begin( ), v.end( ), sum, mean, var, dev, skew, kurt); cout << "count = " << v.size( ) << " "; cout << "sum = " << sum << " "; cout << "mean = " << mean << " "; cout << "variance = " << var << " "; cout << "standard deviation = " << dev << " "; cout << "skew = " << skew << " "; cout << "kurtosis excess = " << kurt << " "; cout << endl; }
The program in Example 11-9 produces the following output:
count = 6 sum = 124 mean = 20.6667 variance = 1237.22 standard deviation = 35.1742 skew = 1.75664 kurtosis excess = 1.14171
Discussion
Some of the most important statistical functions (e.g., variance, standard deviation, skew, and kurtosis) are defined in terms of standardized sample moments about the mean. The precise definitions of statistical functions vary somewhat from text to text. This text uses the unbiased definitions of the statistical functions shown in Table 11-1.
Statistical function |
Equation |
---|---|
nth central moment ( |
|
Variance |
|
Standard deviation |
|
Skew |
|
Kurtosis excess |
|
|
The simplest way to code the statistical functions is to define them all in terms of moments. Since there are several different moments used, each one accepting a constant integer value, I pass the constant as a template parameter. This allows the compiler to potentially generate more efficient code because the integer is known at compile time.
The moment function is defined using the mathematical summation operator . Whenever you think of the summation operation you should think of the accumulate function from the header. The accumulate function has two forms: one accumulates using operator+ and the other uses an accumulator functor that you need to provide. Your accumulator functor will accept two values: the accumulated value so far, and the value at a specific position in the sequence.
Example 11-10 illustrates how accumulate works by showing how the user supplied functor is called repeatedly for each element in a series.
Example 11-10. Sample implementation of accumulate
template Iter_T accumulate(Iter_T begin, Iter_T end, Value_T value, BinOp_T op) { while (begin != end) { value = op(value, *begin++) } return value; }