SAS.STAT 9.1 Users Guide (Vol. 6)

The CLUSTER statement names variables that identify the clusters in a clustered sample design. The combinations of categories of CLUSTER variables define the clusters in the sample. If there is a STRATA statement, clusters are nested within strata.

If your sample design has clustering at multiple stages, you should identify only the first-stage clusters, or primary sampling units (PSUs), in the CLUSTER statement. See the section 'Primary Sampling Units (PSUs)' on page 4281 for more information.

The CLUSTER variables are one or more variables in the DATA= input data set. These variables can be either character or numeric. The formatted values of the CLUSTER variables determine the CLUSTER variable levels. Thus, you can use formats to group values into levels. Refer to the discussion of the FORMAT procedure in the SAS Procedures Guide and to the discussions of the FORMAT statement and SAS formats in SAS Language Reference: Dictionary .

You can use multiple CLUSTER statements to specify cluster variables. The procedure uses all variables from all CLUSTER statements to create clusters.

CONTRAST Statement

The CONTRAST statement provides a mechanism for obtaining customized hypothesis tests. It is similar to the CONTRAST statement in PROC LOGISTIC and PROC GLM, depending on the coding schemes used with any classification variables involved.

The CONTRAST statement enables you to specify a matrix, L , for testing the hypothesis L = , where is the parameter vector. You must be familiar with the details of the model parameterization that PROC SURVEYLOGISTIC uses (for more information, see the PARAM= option in the section 'CLASS Statement' on page 4253). Optionally, the CONTRAST statement enables you to estimate each row, , of L and test the hypothesis = 0. Computed statistics are based on the asymptotic chi-square distribution of the Wald statistic.

There is no limit to the number of CONTRAST statements that you can specify, but they must appear after the MODEL statement.

The following parameters are specified in the CONTRAST statement:

label

identifies the contrast on the output. A label is required for every contrast specified, and it must be enclosed in quotes.

effect

identifies an effect that appears in the MODEL statement. The name INTERCEPT can be used as an effect when one or more intercepts are included in the model. You do not need to include all effects that are included in the MODEL statement.

values

are constants that are elements of the L matrix associated with the effect. To correctly specify your contrast, it is crucial to know the ordering of parameters within each effect and the variable levels associated with any parameter. The 'Class Level Information' table shows the ordering of levels within variables. The E option, described later in this section, enables you to verify the proper correspondence of values to parameters.

The rows of L are specified in order and are separated by commas. Multiple degreeof-freedom hypotheses can be tested by specifying multiple row-descriptions . For any of the full-rank parameterizations, if an effect is not specified in the CONTRAST statement, all of its coefficients in the L matrix are set to 0. If too many values are specified for an effect, the extra ones are ignored. If too few values are specified, the remaining ones are set to 0.

When you use effect coding (by default or by specifying PARAM=EFFECT in the CLASS statement), all parameters are directly estimable (involve no other parameters). For example, suppose an effect coded CLASS variable A has four levels. Then there are three parameters ( ± 1 , ± 2 , ± 3 ) representing the first three levels, and the fourth parameter is represented by

To test the first versus the fourth level of A , you would test

or, equivalently,

which, in the form L = , is

Therefore, you would use the following CONTRAST statement:

contrast '1 vs. 4' A 2 1 1;

To contrast the third level with the average of the first two levels, you would test

or, equivalently,

Therefore, you would use the following CONTRAST statement:

contrast '1&2 vs. 3' A 1 1 -2;

Other CONTRAST statements are constructed similarly. For example,

contrast '1 vs. 2 ' A 1 1 0; contrast '1&2 vs. 4 ' A 3 3 2; contrast '1&2 vs. 3&4' A 2 2 0; contrast 'Main Effect' A 1 0 0, A 0 1 0, A 0 0 1;

When you use the less-than -full-rank parameterization (by specifying PARAM=GLM in the CLASS statement), each row is checked for estimability. If PROC SURVEYLOGISTIC finds a contrast to be nonestimable, it displays missing values in corresponding rows in the results. PROC SURVEYLOGISTIC handles missing level combinations of classification variables in the same manner as PROC LOGISTIC. Parameters corresponding to missing level combinations are not included in the model. This convention can affect the way in which you specify the L matrix in your CONTRAST statement. If the elements of L are not specified for an effect that contains a specified effect, then the elements of the specified effect are distributed over the levels of the higher-order effect just as the LOGISTIC procedure does for its CONTRAST and ESTIMATE statements. For example, suppose that the model contains effects A and B and their interaction A*B. If you specify a CONTRAST statement involving A alone, the L matrix contains nonzero terms for both A and A*B, since A*B contains A.

The degrees of freedom is the number of linearly independent constraints implied by the CONTRAST statement, that is, the rank of L .

You can specify the following options after a slash (/).

ALPHA= ±

E

ESTIMATE= keyword

SINGULAR = number

FREQ Statement

The variable in the FREQ statement identifies a variable that contains the frequency of occurrence of each observation. PROC SURVEYLOGISTIC treats each observation as if it appears n times, where n is the value of the FREQ variable for the observation. If it is not an integer, the frequency value is truncated to an integer. If the frequency value is less than 1 or missing, the observation is not used in the model fitting. When the FREQ statement is not specified, each observation is assigned a frequency of 1.

If you use the events/trials syntax in the MODEL statement, the FREQ statement is disallowed because the event and trial variables represent the frequencies in the data set.

MODEL Statement

The MODEL statement names the response variable and the explanatory effects, including covariates, main effects, interactions, and nested effects; see the section 'Specification of Effects' on page 1784 of Chapter 32, 'The GLM Procedure,' for more information. If you omit the explanatory variables, the procedure fits an intercept-only model. Model options can be specified after a slash (/).

Two forms of the MODEL statement can be specified. The first form, referred to as single-trial syntax, is applicable to binary, ordinal, and nominal response data. The second form, referred to as events/trials syntax, is restricted to the case of binary response data. The single-trial syntax is used when each observation in the DATA= data set contains information on only a single trial, for instance, a single subject in an experiment. When each observation contains information on multiple binaryresponse trials, such as the counts of the number of subjects observed and the number responding, then events/trials syntax can be used.

In the events/trials syntax, you specify two variables that contain count data for a binomial experiment. These two variables are separated by a slash. The value of the first variable, events , is the number of positive responses (or events). The value of the second variable, trials , is the number of trials. The values of both events and ( trials - events ) must be nonnegative and the value of trials must be positive for the response to be valid.

In the single-trial syntax, you specify one variable (on the left side of the equal sign) as the response variable. This variable can be character or numeric. Options specific to the response variable can be specified immediately after the response variable with a pair of parentheses around them.

For both forms of the MODEL statement, explanatory effects follow the equal sign. Variables can be either continuous or classification variables. Classification variables can be character or numeric, and they must be declared in the CLASS statement. When an effect is a classification variable, the procedure enters a set of coded columns into the design matrix instead of directly entering a single column containing the values of the variable.

Response Variable Options

You specify the following options by enclosing them in a pair of parentheses after the response variable.

DESCENDING DESC

EVENT= ' category ' keyword

ORDER= DATA FORMATTED FREQ INTERNAL

REFERENCE= ' category ' keyword

REF= ' category ' keyword

Model Options

Model options can be specified after a slash (/). Table 69.1 summarizes the options available in the MODEL statement.

Table 69.1: Model Statement Options

Option

Description

Model Specification Options

LINK=

Specifies link function

NOINT

Suppresses intercept(s)

OFFSET=

Specifies offset variable

Convergence Criterion Options

ABSFCONV=

Specifies absolute function convergence criterion

FCONV=

Specifies relative function convergence criterion

GCONV=

Specifies relative gradient convergence criterion

XCONV=

Specifies relative parameter convergence criterion

MAXITER=

Specifies maximum number of iterations

NOCHECK

Suppresses checking for infinite parameters

RIDGING=

Specifies technique used to improve the log- likelihood function when its value is worse than that of the previous step

SINGULAR=

Specifies tolerance for testing singularity

TECHNIQUE=

Specifies iterative algorithm for maximization

Options for Adjustment to Variance Estimation

VADJUST=

Choose variance estimation adjustment method

Options for Confidence Intervals

ALPHA=

Specifies ± for the 100(1 ˆ’ ± )% confidence intervals

CLPARM

Computes confidence intervals for parameters

CLODDS

Computes confidence intervals for odds ratios

Options for Display of Details

CORRB

Displays correlation matrix

COVB

Displays covariance matrix

EXPB

Displays exponentiated values of estimates

ITPRINT

Displays iteration history

NODUMMYPRINT

Suppresses 'Class Level Information' table

PARMLABEL

Displays parameter labels

RSQUARE

Displays generalized R 2

STB

Displays standardized estimates

The following list describes these options.

ABSFCONV= value

ALPHA= ±

CLODDS

CLPARM

CORRB

COVB

EXPB

EXPEST

FCONV= value

GCONV= value

ITPRINT

LINK= keyword

L= keyword

MAXITER= n

NOCHECK

NODUMMYPRINT

NODESIGNPRINT

NODP

NOINT

OFFSET= name

PARMLABEL

RIDGING=ABSOLUTE RELATIVE NONE

RSQUARE

RSQ

SINGULAR= value

STB

TECHNIQUE=FISHER NEWTON

TECH=FISHER NEWTON

VADJUST=DF MOREL NONE < ( Morel-options ) >

VARADJ=DF MOREL NONE < ( Morel-options ) >

VARADJUST=DF MOREL NONE < ( Morel-options ) >

XCONV = value

STRATA Statement

The STRATA statement names variables that form the strata in a stratified sample design. The combinations of levels of STRATA variables define the strata in the sample.

If your sample design has stratification at multiple stages, you should identify only the first-stage strata in the STRATA statement. See the section 'Specification of Population Totals and Sampling Rates' on page 4280 for more information.

The STRATA variables are one or more variables in the DATA= input data set. These variables can be either character or numeric. The formatted values of the STRATA variables determine the levels. Thus, you can use formats to group values into levels. See the discussion of the FORMAT procedure in the SAS Procedures Guide .

You can specify the following option in the STRATA statement after a slash (/):

LIST

TEST Statement

The TEST statement tests linear hypotheses about the regression coefficients. The Wald test is used to jointly test the null hypotheses ( H : L = c ) specified in a single TEST statement. When c = you should specify a CONTRAST statement instead.

Each equation specifies a linear hypothesis (a row of the L matrix and the corresponding element of the c vector); multiple equations are separated by commas. The label, which must be a valid SAS name, is used to identify the resulting output and should always be included. You can submit multiple TEST statements.

The form of an equation is as follows :

where term is a parameter of the model, or a constant, or a constant times a parameter. For a binary response model, the intercept parameter is named INTERCEPT; for an ordinal response model, the intercept parameters are named INTERCEPT, INTERCEPT2, INTERCEPT3, and so on. When no equal sign appears, the expression is set to 0. The following code illustrates possible uses of the TEST statement:

proc surveylogistic; model y= a1 a2 a3 a4; test1: test intercept + .5 * a2 = 0; test2: test intercept + .5 * a2; test3: test a1=a2=a3; test4: test a1=a2, a2=a3; run;

Note that the first and second TEST statements are equivalent, as are the third and fourth TEST statements.

You can specify the following option in the TEST statement after a slash(/).

PRINT

Категории