SAS.STAT 9.1 Users Guide (Vol. 6)

The following statements are available in the STDIZE procedure.

The PROC STDIZE statement is required. The BY, LOCATION, FREQ, VAR, SCALE, and WEIGHT statements are described in alphabetical order following the PROC STDIZE statement.

PROC STDIZE Statement

The PROC STDIZE statement invokes the procedure. You can specify the following options in the PROC STDIZE statement.

Table 66.1: Summary of PROC STDIZE Statement Options

Task

Options

Description

Specify standardization methods

METHOD=

specifies the name of the standardization method

INITIAL=

specifies the method for computing initial estimates for the A estimates

Unstandardize variables

UNSTD

unstandardizes variables when you also specify the METHOD=IN option

Process missing values

NOMISS

omits observations with any missing values from computation

MISSING=

specifies the method or a numeric value for replacing missing values

REPLACE

replaces missing data by zero in the standardized data

REPONLY

replaces missing data by the location measure (does not standardize the data)

Specify data set details

DATA=

specifies the input data set

OUT=

specifies the output data set

OUTSTAT=

specifies the output statistic data set

Specify computational settings

VARDEF=

specifies the variances divisor

NMARKERS=

specifies the number of markers when you also specify PCTLMTD=ONEPASS

MULT=

specifies the constant to multiply each value by after standardizing

ADD=

specifies the constant to add to each value after standardizing and multiplying by the value specified in the MULT= option

FUZZ=

specifies the relative fuzz factor for writing the output

Specify percentiles

PCTLDEF=

specifies the definition of percentiles when you also specify the PCTLMTD=ORD_STAT option

PCTLMTD=

specifies the method used to estimate percentiles

PCTLPTS=

writes observations containing percentiles to the data set specified in the OUTSTAT= option

Normalize scale estimators

NORM

normalizes the scale estimator to be consistent for the standard deviation of a normal distribution

SNORM

normalizes the scale estimator to have an expectation of approximately 1 for a standard normal distribution

Specify output

PSTAT

displays the location and scale measures

These options and their abbreviations are described, in alphabetical order, in the remainder of this section.

ADD= c

DATA= SAS-data-set

FUZZ= c

INITIAL= method

METHOD= name

MISSING= method

MISSING= value

MULT= c

NMARKERS= n

NOMISS

NORM

OUT= SAS-data-set

OUTSTAT= SAS-data-set

PCTLDEF= percentiles

PCTLMTD=ORD_STAT

PCTLMTD=ONEPASS P2

PCTLPTS= n

PSTAT

REPLACE

REPONLY

SNORM

UNSTD

UNSTDIZE

VARDEF= DF

VARDEF= N

VARDEF= WDF

VARDEF= WEIGHT WGT

BY Statement

You can specify a BY statement with PROC STDIZE to obtain separate standardization for observations in groups defined by the BY variables.

If your DATA= input data set is not sorted in ascending order, use one of the following alternatives:

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

When you specify the option METHOD=IN( ds ), the following rules are applied to BY- group processing:

FREQ Statement

If one variable in the input data set represents the frequency of occurrence for other values in the observation, specify the variable name in a FREQ statement. PROC STDIZE treats the data set as if each observation appeared n times, where n is the value of the FREQ variable for the observation. Nonintegral values of the FREQ variable are truncated to the largest integer less than the FREQ value. If the FREQ variable has a value that is less than 1 or is missing, the observation is not used in the analysis.

LOCATION Statement

The LOCATION statement specifies a list of numeric variables that contain location measures in the input data set specified by the METHOD=IN option.

SCALE Statement

The SCALE statement specifies the list of numeric variables containing scale measures in the input data set specified by the METHOD=IN option.

VAR Statement

The VAR statement lists numeric variables to be standardized. If you omit the VAR statement, all numeric variables not listed in the BY, FREQ, and WEIGHT statements are used.

WEIGHT Statement

The WEIGHT statement specifies a numeric variable in the input data set with values that are used to weight each observation. Only one variable can be specified.

The WEIGHT variable values can be nonintegers. An observation is used in the analysis only if the value of the WEIGHT variable is greater than zero. The WEIGHT variable applies only when you specify the option METHOD=MEAN, METHOD=SUM, METHOD=EUCLEN, METHOD=USTD, METHOD=STD, METHOD=AGK, or METHOD=L.

PROC STDIZE uses the value of the WEIGHT variable w i , as follows.

The sample mean and (uncorrected) sample variances are computed as

where w i is the weight value of the i th observation, x i is the value of the i th observation, and d is the divisor controlled by the VARDEF= option (see the VARDEF= option for details).

PROC STDIZE uses the value of the WEIGHT variable to calculate the following statistics:

MEAN

the weighted mean, x w

SUM

the weighted sum, i w i x i

USTD

the weighted uncorrected standard deviation,

STD

the weighted standard deviation,

EUCLEN

the weighted Euclidean length, computed as the square root of the weighted uncorrected sum of squares:

 

AGK

the AGK estimate. This estimate is documented further in the ACECLUS procedure as the METHOD=COUNT option. See the discussion of the WEIGHT statement in Chapter 16, ' The ACECLUS Procedure,' for information on how the WEIGHT variable is applied to the AGK estimate.

L

the L p estimate. This estimate is documented further in the FASTCLUS procedure as the LEAST= option. See the discussion of the WEIGHT statement in Chapter 28, 'The FASTCLUS Procedure,' for information on how the WEIGHT variable is used to compute weighted cluster means. Note that the number of clusters is always 1.

Категории