SAS/STAT 9.1 Users Guide, Volumes 1-7

2017-07-07 02:10:07

Missing Values

Observations with missing values are omitted from the analysis and are given missing values for canonical variable scores in the OUT= data set.

Output Data Sets

OUT= Data Set

The OUT= data set contains all the variables in the original data set plus new variables containing the canonical variable scores. The N= option determines the number of new variables . The OUT= data set is not created if N=0. The names of the new variables are formed by concatenating the value given by the PREFIX= option (or the prefix CAN if the PREFIX= option is not specified) and the numbers 1, 2, 3, and so on. The OUT= data set can be used as input to PROC CLUSTER or PROC FASTCLUS. The cluster analysis should be performed on the canonical variables, not on the original variables.

OUTSTAT= Data Set

The OUTSTAT= data set is a TYPE=ACE data set containing the following variables.

the BY variables, if any

the two new character variables, _TYPE_ and _NAME_

the variables analyzed , that is, those in the VAR statement, or, if there is no VAR statement, all numeric variables not listed in any other statement

Each observation in the new data set contains some type of statistic as indicated by the _TYPE_ variable. The values of the _TYPE_ variable are as follows :

_TYPE_
MEAN	mean of each variable
STD	standard deviation of each variable
N	number of observations on which the analysis is based. This value is the same for each variable.
SUMWGT	sum of the weights if a WEIGHT statement is used. This value is the same for each variable.
COV	covariances between each variable and the variable named by the _NAME_ variable. The number of observations with _TYPE_ =COV is equal to the number of variables being analyzed.
ACE	estimated within-cluster covariances between each variable and the variable named by the _NAME_ variable. The number of observations with _TYPE_ =ACE is equal to the number of variables being analyzed.
EIGENVAL	eigenvalues of INV(ACE)*(COV ˆ’ ACE). If the N= option requests fewer than the maximum number of canonical variables, only the specified number of eigenvalues are produced, with missing values filling out the observation.
RAWSCORE	raw canonical coefficients. To obtain the canonical variable scores, these coefficients should be multiplied by the raw data centered by means obtained from the observation with _TYPE_ = MEAN .
SCORE	standardized canonical coefficients. The _NAME_ variable contains the name of the corresponding canonical variable as constructed from the PREFIX= option. The number of observations with _TYPE_ =SCORE equals the number of canonical variables computed. To obtain the canonical variable scores, these coefficients should be multiplied by the standardized data using means obtained from the observation with _TYPE_ = MEAN and standard deviations obtained from the observation with _TYPE_ = STD .

The OUTSTAT= data set can be used

to initialize another execution of PROC ACECLUS

to compute canonical variable scores with the SCORE procedure

as input to the FACTOR procedure, specifying METHOD=SCORE, to rotate the canonical variables

Computational Resources

Let

n = number of observations

v = number of variables

i = number of iterations

Memory

The memory in bytes required by PROC ACECLUS is approximately

bytes. If you request the PP or QQ option, an additional 4 n ( n ˆ’ 1) bytes are needed.

Time

The time required by PROC ACECLUS is roughly proportional to

Displayed Output

Unless the SHORT option is specified, the ACECLUS procedure displays the following items:

Means and Standard Deviations of the input variables

the S matrix, labeled COV: Total Sample Covariances

the name or value of the matrix used for the Initial Within-Cluster Covariance Estimate

the Threshold value if the PROPORTION= option is specified

For each iteration, PROC ACECLUS displays

the Iteration number

RMS Distance, the root mean square distance between all pairs of observations

the Distance Cutoff ( u ) for including pairs of observations in the estimate of the within-cluster covariances, which equals the RMS distance times the threshold

the number of Pairs Within Cutoff

the Convergence Measure ( e _i ) as specified by the METRIC= option

If the SHORT option is not specified, PROC ACECLUS also displays the A matrix, labeled ACE: Approximate Covariance Estimate Within Clusters.

The ACECLUS procedure displays a table of eigenvalues from the canonical analysis containing the following items:

Eigenvalues of Inv(ACE)*(COV ˆ’ ACE)

the Difference between successive eigenvalues

the Proportion of variance explained by each eigenvalue

the Cumulative proportion of variance explained

If the SHORT option is not specified, PROC ACECLUS displays

the Eigenvectors or raw canonical coefficients

the standardized eigenvectors or standard canonical coefficients

ODS Table Names

PROC ACECLUS assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. For more information on ODS, see Chapter 14, Using the Output Delivery System.

Table 16.3: ODS Tables Produced in PROC ACECLUS
ODS Table Name	Description	Statement	Option
ConvergenceStatus	Convergence status	PROC	default
DataOptionInfo	Data and option information	PROC	default
Eigenvalues	Eigenvalues of Inv(ACE)*(COV-ACE)	PROC	default
Eigenvectors	Eigenvectors (raw canonical coefficients)	PROC	default
InitWithin	Initial within-cluster covariance estimate	PROC	INITIAL=INPUT
IterHistory	Iteration history	PROC	default
SimpleStatistics	Simple statistics	PROC	default
StdCanCoef	Standardized canonical coefficients	PROC	default
Threshold	Threshold value	PROC	PROPORTION=
TotSampleCov	Total sample covariances	PROC	default
Within	Approximate covariance estimate within clusters	PROC	default