SAS/STAT 9.1 Users Guide, Volumes 1-7

Missing Values

If an observation has a missing value for any of the quantitative variables , it is omitted from the analysis. If an observation has a missing CLASS value but is otherwise complete, it is not used in computing the canonical correlations and coefficients; however, canonical variable scores are computed for that observation for the OUT= data set.

Computational Details

General Formulas

Canonical discriminant analysis is equivalent to canonical correlation analysis between the quantitative variables and a set of dummy variables coded from the class variable. In the following notation the dummy variables will be denoted by y and the quantitative variables by x . The total sample covariance matrix for the x and y variables is

When c is the number of groups, n t is the number of observations in group t , and S t is the sample covariance matrix for the x variables in group t , the within-class pooled covariance matrix for the x variables is

The canonical correlations, i , are the square roots of the eigenvalues, » i , of the following matrix. The corresponding eigenvectors are v i .

Let V be the matrix with the eigenvectors v i that correspond to nonzero eigenvalues as columns . The raw canonical coefficients are calculated as follows

The pooled within-class standardized canonical coefficients are

And the total sample standardized canonical coefficients are

Let X c be the matrix with the centered x variables as columns. The canonical scores may be calculated by any of the following

For the Multivariate tests based on E ˆ’ 1 H

where n is the total number of observations.

Input Data Set

The input DATA= data set can be an ordinary SAS data set or one of several specially structured data sets created by statistical procedures available with SAS/STAT software. For more information on special types of data sets, see Appendix A, Special SAS Data Sets. The BY variable in these data sets becomes the CLASS variable in PROC CANDISC. These specially structured data sets include

When the input data set is TYPE=CORR, TYPE=COV, or TYPE=CSSCP, PROC CANDISC reads the number of observations for each class from the observations with _TYPE_= N and the variable means in each class from the observations with _TYPE_= MEAN . The CANDISC procedure then reads the within-class correlations from the observations with _TYPE_= CORR , the standard deviations from the observations with _TYPE_= STD (data set TYPE=CORR), the within-class covariances from the observations with _TYPE_= COV (data set TYPE=COV), or the within-class corrected sums of squares and crossproducts from the observations with _TYPE_= CSSCP (data set TYPE=CSSCP).

When the data set does not include any observations with _TYPE_= CORR (data set TYPE=CORR), _TYPE_= COV (data set TYPE=COV), or _TYPE_= CSSCP (data set TYPE=CSSCP) for each class, PROC CANDISC reads the pooled within-class information from the data set. In this case, PROC CANDISC reads the pooled within-class correlations from the observations with _TYPE_= PCORR , the pooled within-class standard deviations from the observations with _TYPE_= PSTD (data set TYPE=CORR), the pooled within-class covariances from the observations with _TYPE_= PCOV (data set TYPE=COV), or the pooled within-class corrected SSCP matrix from the observations with_TYPE_= PSSCP (data set TYPE=CSSCP).

When the input data set is TYPE=SSCP, PROC CANDISC reads the number of observations for each class from the observations with _TYPE_= N , the sum of weights of observations from the variable INTERCEPT in observations with _TYPE_= SSCP and _NAME_= INTERCEPT , the variable sums from the variable= variablenames in observations with _TYPE_= SSCP and _NAME_= INTERCEPT , and the uncorrected sums of squares and crossproducts from the variable= variablenames in observations with _TYPE_= SSCP and _NAME_= variablenames .

Output Data Sets

OUT= Data Set

The OUT= data set contains all the variables in the original data set plus new variables containing the canonical variable scores. You determine the number of new variables using the NCAN= option. The names of the new variables are formed as described in the PREFIX= option. The new variables have means equal to zero and pooled within-class variances equal to one. An OUT= data set cannot be created if the DATA= data set is not an ordinary SAS data set.

OUTSTAT= Data Set

The OUTSTAT= data set is similar to the TYPE=CORR data set produced by the CORR procedure but contains many results in addition to those produced by the CORR procedure.

The OUTSTAT= data set is TYPE=CORR, and it contains the following variables:

The observations, as identified by the variable _TYPE_ , have the following _TYPE_ values:

_TYPE_

Contents

N

number of observations for both the total sample (CLASS variable missing) and within each class (CLASS variable present)

SUMWGT

sum of weights for both the total sample (CLASS variable missing) and within each class (CLASS variable present) if a WEIGHT statement is specified

MEAN

means for both the total sample (CLASS variable missing) and within each class (CLASS variable present)

STDMEAN

total-standardized class means

PSTDMEAN

pooled within-class standardized class means

STD

standard deviations for both the total sample (CLASS variable missing) and within each class (CLASS variable present)

PSTD

pooled within-class standard deviations

BSTD

between-class standard deviations

RSQUARED

univariate R 2 s

The following kinds of observations are identified by the combination of the variables _TYPE_ and _NAME_ . When the _TYPE_ variable has one of the following values, the _NAME_ variable identifies the row of the matrix.

_TYPE_

Contents

CSSCP

corrected SSCP matrix for the total sample (CLASS variable missing) and within each class (CLASS variable present)

PSSCP

pooled within-class corrected SSCP matrix

BSSCP

between-class SSCP matrix

COV

covariance matrix for the total sample (CLASS variable missing) and within each class (CLASS variable present)

PCOV

pooled within-class covariance matrix

BCOV

between-class covariance matrix

CORR

correlation matrix for the total sample (CLASS variable missing) and within each class (CLASS variable present)

PCORR

pooled within-class correlation matrix

BCORR

between-class correlation matrix

When the _TYPE_ variable has one of the following values, the _NAME_ variable identifies the canonical variable:

_TYPE_

Contents

CANCORR

canonical correlations

STRUCTUR

canonical structure

BSTRUCT

between canonical structure

PSTRUCT

pooled within-class canonical structure

SCORE

total sample standardized canonical coefficients

PSCORE

pooled within-class standardized canonical coefficients

RAWSCORE

raw canonical coefficients

CANMEAN

means of the canonical variables for each class

You can use this data set with PROC SCORE to get scores on the canonical variables for new data using one of the following forms.

* The CLASS variable C is numeric; proc score data=NewData score=Coef(where=(c = . )) out=Scores; run; * The CLASS variable C is character; proc score data=NewData score=Coef(where=(c = ' ')) out=Scores; run;

The WHERE clause is used to exclude the within-class means and standard deviations. PROC SCORE standardizes the new data by subtracting the original variable means that are stored in the _TYPE_ = MEAN observations, and dividing by the original variable standard deviations from the _TYPE_ = STD observations. Then PROC SCORE multiplies the standardized variables by the coefficients from the _TYPE_ = SCORE observations to get the canonical scores.

Computational Resources

In the following discussion, let

Memory Requirements

The amount of memory in bytes for temporary storage needed to process the data is

With the ANOVA option, the temporary storage must be increased by 16v bytes. The DISTANCE option requires an additional temporary storage of 4 v 2 + 4 v bytes.

Time Requirements

The following factors determine the time requirements of the CANDISC procedure.

Each of the preceding factors has a different constant of proportionality.

Displayed Output

The output produced by PROC CANDISC includes

Optional output includes

By default, PROC CANDISC displays these statistics:

The following statistics can be suppressed with the SHORT option:

ODS Table Names

PROC CANDISC assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. For more information on ODS, see Chapter 14, Using the Output Delivery System.

Table 21.2: ODS Tables Produced in PROC CANDISC

ODS Table Name

Description

PROC CANDISC Option

ANOVA

Univariate statistics

ANOVA

AveRSquare

Average R-square

ANOVA

BCorr

Between-class correlations

BCORR

BCov

Between-class covariances

BCOV

BSSCP

Between-class SSCP matrix

BSSCP

BStruc

Between canonical structure

default

CanCorr

Canonical correlations

default

CanonicalMeans

Class means on canonical variables

default

Counts

Number of observations, variables, classes, df

default

CovDF

DF for covariance matrices, not printed

any *COV option

Dist

Squared distances

MAHALANOBIS

DistFValues

F statistics based on squared distances

MAHALANOBIS

DistProb

Probabilities for F statistics from squared distances

MAHALANOBIS

Levels

Class level information

default

MultStat

MANOVA

default

PCoef

Pooled standard canonical coefficients

default

PCorr

Pooled within-class correlations

PCORR

PCov

Pooled within-class covariances

PCOV

PSSCP

Pooled within-class SSCP matrix

PSSCP

PStdMeans

Pooled standardized class means

STDMEAN

PStruc

Pooled within canonical structure

default

RCoef

Raw canonical coefficients

default

SimpleStatistics

Simple statistics

SIMPLE

TCoef

Total-sample standard canonical coefficients

default

TCorr

Total-sample correlations

TCORR

TCov

Total-sample covariances

TCOV

TSSCP

Total-sample SSCP matrix

TSSCP

TStdMeans

Total standardized class means

STDMEAN

TStruc

Total canonical structure

default

WCorr

Within-class correlations

WCORR

WCov

Within-class covariances

WCOV

WSSCP

Within-class SSCP matrices

WSSCP

Категории