Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4

Simple Statistics

The base SAS procedures use a standardized set of keywords to refer to statistics. You specify these keywords in SAS statements to request the statistics to be displayed or stored in an output data set.

In the following notation, summation is over observations that contain nonmissing values of the analyzed variable and, except where shown, over nonmissing weights and frequencies of one or more:

x i

f i

w i

n

x

s 2

z i

The standard keywords and formulas for each statistic follow. Some formulas use keywords to designate the corresponding statistic.

Table A1.1: The Most Common Simple Statistics

Statistic

PROC MEANS and SUMMARY

PROC UNIVARIATE

PROC TABULATE

PROC REPORT

PROC CORR

PROC SQL

Number of missing values

X

X

X

X

 

X

Number of nonmissing values

X

X

X

X

X

X

Number of observations

X

X

     

X

Sum of weights

X

X

X

X

X

X

Mean

X

X

X

X

X

X

Sum

X

X

X

X

X

X

Extreme values

X

X

 

Minimum

X

X

X

X

X

X

Maximum

X

X

X

X

X

X

Range

X

X

X

X

 

X

Uncorrected sum of squares

X

X

X

X

X

X

Corrected sum of squares

X

X

X

X

X

X

Variance

X

X

X

X

X

X

Covariance

 

X

 

Standard deviation

X

X

X

X

X

X

Standard error of the mean

X

X

X

X

 

X

Coefficient of variation

X

X

X

X

 

X

Skewness

X

X

X

 

Kurtosis

X

X

X

 

Confidence Limits

 
 

of the mean

X

X

X

 
 

of the variance

 

X

 
 

of quantiles

 

X

 

Median

X

X

X

X

X

 

Mode

 

X

 

Percentiles/Deciles/ Quartiles

X

X

X

X

 

t test

 
 

for mean=0

X

X

X

X

 

X

 

for mean= ¼

 

X

 

Nonparametric tests for location

 

X

 

Tests for normality

 

X

 

Correlation coefficients

 

X

 

Cronbach s alpha

 

X

 

Descriptive Statistics

The keywords for descriptive statistics are

CSS

CV

KURTOSIS KURT

where . The weighted kurtosis is computed as

When VARDEF=N, the kurtosis is computed as

and the weighted kurtosis is computed as

where is ƒ 2 / w i . The formula is invariant under the transformation , z > 0. When you use VARDEF=WDF or VARDEF=WEIGHT, the kurtosisis set to missing.

Note: PROC MEANS and PROC TABULATE do not compute weighted kurtosis.

MAX

MEAN

MIN

MODE

N

NMISS

NOBS

RANGE

SKEWNESS SKEW

STDDEVSTD

STDERR STDMEAN

SUM

SUMWGT

USS

VAR

Quantile and Related Statistics

The keywords for quantiles and related statistics are

MEDIAN

P1

P5

P10

P90

P95

P99

Q1

Q3

QRANGE

You use the QNTLDEF= option (PCTLDEF= in PROC UNIVARIATE) to specify the method that the procedure uses to compute percentiles. Let n be the number of nonmissing values for a variable, and let x 1 , x 2 , , x n represent the ordered values of the variable such that is the smallest value, x 2 is next smallest value, and x n is the largest value. For the t th percentile between 0 and 1, let p = t /100. Then define as the integer part of np and g as the fractional part of np or ( n + 1) p , so that

Here, QNTLDEF= specifies the method that the procedure uses to compute the t th percentile, as shown in the table that follows.

When you use the WEIGHT statement, the t th percentile is computed as

where w i is the weight associated with x i and is the sum of the weights.

When the observations have identical weights, the weighted percentiles are the same as the unweighted percentiles with QNTLDEF=5.

Table A1.2: Methods for Computing Quantile Statistics

QNTLDEF= Description

Formula

1

 

weighted average at x np

y = (1 ˆ’ g ) x j + gx j +1

where x o is taken to be x 1

 

2

 

observation numbered closest to np

y = x i

y = x j

y = x j + 1

if g ‰  1/2

if g = 1/2 and j is even and is even

if g = 1/2 j and is odd

     

where i is the integer part of np + 1/2

 

3

empirical distribution function

y = x j

y = x j +1

if g ‰ 

if g > 0

4

 

weighted average aimed at x ( n+1 ) p

y = (1 ˆ’ g ) x j + gx j + 1

where x n+1 is taken to be x n

 

5

 

empirical distribution function with averaging

y = 1/2( x j + x j +1)

y = x j + 1

if g = 0

if g > 0

Hypothesis Testing Statistics

The keywords for hypothesis testing statistics are

T

PROBT

Confidence Limits for the Mean

The keywords for confidence limits are

CLM

LCLM

UCLM

Using Weights

For more information on using weights and an example, see WEIGHT on page 63.

Data Requirements for Summarization Procedures

The following are the minimal data requirements to compute unweighted statistics and do not describe recommended sample sizes. Statistics are reported as missing if VARDEF=DF (the default) and these requirements are not met:

Категории