SAS/STAT 9.1 Users Guide Volume 2 only

The following statements are available in PROC CATMOD.

You can use all of the statements in PROC CATMOD interactively. The first RUN statement executes all of the previous statements. Any subsequent RUN statement executes only those statements that appear between the previous RUN statement and the current one. However, if you specify a BY statement, interactive processing is disabled. That is, all statements through the following RUN statement are processed for each BY group in the data set, but no additional statements are accepted by the procedure.

If more than one CONTRAST statement appears between two RUN statements, all the CONTRAST statements are processed. If more than one RESPONSE statement appears between two RUN statements, then analyses associated with each RESPONSE statement are produced. For all other statements, there can be only one occurrence of the statement between any two RUN statements. For example, if there are two LOGLIN statements between two RUN statements, the first LOGLIN statement is ignored.

The PROC CATMOD and MODEL statements are required. If specified, the DIRECT statement must precede the MODEL statement. As a result, if you use the DIRECT statement interactively, you need to specify a MODEL statement in the same RUN group. See the section 'DIRECT Statement' on page 835 for an example.

The CONTRAST statements, if any, must follow the MODEL statement.

You can specify only one of the LOGLIN, REPEATED, and FACTORS statements between any two RUN statements, because they all specify the same information: how to partition the variation among the response functions within a population.

A QUIT statement executes any statements that have not been processed and then ends the CATMOD procedure.

The purpose of each statement, other than the PROC CATMOD statement, are summarized in the following list:

BY

determines groups in which data are to be processed separately.

CONTRAST

specifies a hypothesis to test.

DIRECT

specifies independent variables that are to be treated quantitatively (like continuous variables) rather than qualitatively (like class or discrete variables). These variables also help to determine the rows of the contingency table and distinguish response functions in one population from those in other populations.

FACTORS

specifies (1) the factors that distinguish response functions from others in the same population and (2) model effects, based on these factors, which help to determine the design matrix.

LOGLIN

specifies log-linear model effects.

MODEL

specifies (1) dependent variables, which determine the columns of the contingency table, (2) independent variables, which distinguish response functions in one population from those in other populations, and (3) model effects, which determine the design matrix and the way in which total variation among the response functions is partitioned.

POPULATION

specifies variables which determine the rows of the contingency table and distinguish response functions in one population from those in other populations.

REPEATED

specifies (1) the repeated measurement factors that distinguish response functions from others in the same population and (2) model effects, based on these factors, which help to determine the design matrix.

RESPONSE

determines the response functions that are to be modeled .

RESTRICT

restricts values of parameters to the values you specify.

WEIGHT

specifies a variable containing frequency counts.

PROC CATMOD Statement

The PROC CATMOD statement invokes the procedure. You can specify the following options.

DATA= SAS-data-set

NAMELEN= n

NOPRINT

ORDER=DATA FORMATTED FREQ INTERNAL

Value of ORDER=

Levels Sorted By

DATA

order of appearance in the input data set

FORMATTED

external formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value

FREQ

descending frequency count; levels with the most observations come first in the order

INTERNAL

unformatted value

By default, ORDER=INTERNAL. For ORDER=FORMATTED and ORDER=INTERNAL, the sort order is machine dependent. See the section 'Ordering of Populations and Responses' on page 863 for more information and examples. For more information on sorting order, see the chapter on the SORT procedure in the SAS Procedures Guide and the discussion of BY-group processing in SAS Language Reference: Concepts .

BY Statement

You can specify a BY statement with PROC CATMOD to obtain separate analyses of groups determined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. The variables are one or more variables in the input data set.

If your input data set is not sorted in ascending order, use one of the following alternatives:

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

When you specify a BY statement with PROC CATMOD, no further interactive processing is possible. In other words, once the BY statement appears, all statements up to the associated RUN statement are executed for each BY group in the data set. After the RUN statement, no further statements are accepted by the procedure.

CONTRAST Statement

where a row-description is

The CONTRAST statement constructs and tests linear functions of the parameters in the MODEL statement or effects listed in the LOGLIN statement. Each set of effects (separated by commas) specifies one row or set of rows of the matrix C that PROC CATMOD uses to test the hypothesis C ² = .

CONTRAST statements must be preceded by the MODEL statement, and by the LOGLIN statement, if one is used. You can specify the following terms in the CONTRAST statement.

' label '

specifies up to 256 characters of identifying information displayed with the test. The ' label ' is required.

effect

is one of the effects specified in the MODEL or LOGLIN statement, INTERCEPT (for the intercept parameter), or ALL_PARMS (for the complete set of parameters).

The ALL_PARMS option is regarded as an effect with the same number of parameters as the number of columns in the design matrix. This is particularly useful when the design matrix is input directly, as in the following example:

model y=(1000, 1010, 1100, 1111); contrast 'Main Effect of B' all_parms0100; contrast 'Main Effect of C' all_parms0010; contrast 'B*C Interaction ' all_parms0001;

values

are numbers that form the coefficients of the parameters associated with the given effect. If there are fewer values than parameters for an effect, the remaining coefficients become zero. For example, if you specify two values and the effect actually has five parameters, the final three are set to zero.

@ n

points to the parameters in the n th set when the model has a separate set of parameters for each of the response functions. The @ n notation is seldom needed. It enables you to test the variation among response functions in the same population. However, it is usually easier to model and test such variation by using the _RESPONSE_ effect in the MODEL statement or by using the ALL_PARMS designation. Usually, contrasts are performed with respect to all of the response functions, and this is what the CONTRAST statement does by default (in this case, do not use the @ n notation).

 

For example, if there are three response functions per population, then

contrast 'Level 1 vs. Level 2'A 1 1 0;

 

results in a three-degree-of-freedom test comparing the first two levels of A simultaneously on the three response functions.

 

If, however, you want to specify a contrast with respect to the parameters in the n th set only, then use a single @ n in a row-description . For example, to test that the first parameter of A and the first parameter of B are zero in the third response function, specify

contrast 'A=0, B=0, Function 3' @3 A 1 B 1;

 

To specify a contrast with respect to parameters in two or more different sets of effects, use @ n with each effect. For example,

contrast 'Average over Functions' @1 A 1 0 1 @2 A 1 1 2;

 

When the model does not have a separate set of parameters for each of the response functions, the @ n notation is invalid. This type of model is called AVERAGED. For details, see the description of the AVERAGED option on page 842 and the 'Generation of the Design Matrix' section on page 876.

You can specify the following options in the CONTRAST statement after a slash.

ALPHA= value

ESTIMATE= keyword

EST= keyword

PARM

specifies that the contrast itself be estimated.

EXP

specifies that the exponentiated contrast be estimated.

BOTH

specifies that both the contrast and the exponentiated contrast be estimated.

Specifying Contrasts

PROC CATMOD is parameterized differently than PROC GLM, so you must be careful not to use the same contrasts that you would with PROC GLM. Since PROC CATMOD uses a full-rank parameterization, all estimable parameters are directly estimable without involving other parameters.

For example, suppose a class variable A has four levels. Then there are four parameters ( ± 1 , ± 2 , ± 3 , ± 4 ), of which PROC CATMOD uses only the first three. The fourth parameter is related to the others by the equation

To test the first versus the fourth level of A , you would test ± 1 = ± 4 , which is

or, equivalently,

Therefore, you would use the following CONTRAST statement:

contrast '1 vs. 4'A211;

To contrast the third level with the average of the first two levels, you would test

or, equivalently,

Therefore, you would use the following CONTRAST statement:

contrast '1&2 vs. 3' A 1 1 -2;

Other CONTRAST statements are constructed similarly; for example,

contrast '1 vs. 2 ' A 1 -1 0; contrast '1&2 vs. 4 ' A 3 3 2; contrast '1&2 vs. 3&4' A 2 2 0; contrast 'Main Effect' A 1 0 0, A 0 1 0, A 0 0 1;

The actual form of the C matrix depends on the effects in the model. The following examples assume a single response function for each population.

proc catmod; model y=a; contrast '1 vs. 4' A 2 1 1; run;

The C matrix for the preceding statements is

since the first parameter corresponds to the intercept.

But if there is a variable B with three levels and you use the following statements,

proc catmod; model y=b a; contrast '1 vs. 4' A 2 1 1; run;

then the CONTRAST statement induces the C matrix

since the first parameter corresponds to the intercept and the next two correspond to the B main effect.

You can also use the CONTRAST statement to test the joint effect of two or more effects in the MODEL statement. For example, the joint effect of A and B in the previous model has five degrees of freedom and is obtained by specifying

contrast 'Joint Effect of A&B' A 1 0 0, A 0 1 0, A 0 0 1, B 1 0, B 0 1;

The ordering of variable levels is determined by the ORDER= option in the PROC CATMOD statement. Whenever you specify a contrast that depends on the order of the variable levels, you should verify the order from the 'Population Profiles' table, the 'Response Profiles' table, or the 'One-Way Frequencies' table.

DIRECT Statement

The DIRECT statement lists numeric independent variables to be treated in a quantitative, rather than qualitative, way. The DIRECT statement is useful for logistic regression, which is described in the 'Logistic Regression' section on page 869. For limitations of models involving continuous variables, see the 'Continuous Variables' section on page 870.

If a DIRECT variable is formatted, then the unformatted (internal) values are used in the analysis and the formatted values are displayed. CAUTION: If you use a format to group the internal values into one formatted value, then the first internal value is used in the analysis.

If specified, the DIRECT statement must precede the MODEL statement. For example,

proc catmod; direct X; model Y=X; run;

Suppose X has five levels. Then the main effect X induces only one column in the design matrix, rather than four. The values inserted into the design matrix are the actual values of X .

You can interactively change the variables declared as DIRECT variables by using the statement without listing any variables. The following statements are valid:

proc catmod; direct X; model Y=X; weight wt; run; direct; model Y=X; run;

The first MODEL statement uses the actual values of X , and the second MODEL statement uses the four variables created when PROC CATMOD generates the design matrix. Note that the preceding statements can be run without a WEIGHT statement if the input data are raw data rather than cell counts.

For more details, see the discussions of main and direct effects in the section 'Generation of the Design Matrix' on page 876 .

FACTORS Statement

where a factor-description is

and factor-description s are separated from each other by a comma. The $ is required for character-valued factors. The value of levels provides the number of levels of the factor identified by a given factor-name . For only one factor, levels is optional; for two or more factors, it is required.

The FACTORS statement identifies factors that distinguish response functions from others in the same population. It also specifies how those factors are incorporated into the model. You can use the FACTORS statement whenever there is more than one response function per population and the keyword _RESPONSE_ is specified in the MODEL statement. You can specify the name, type, and number of levels of each factor and the identification of each level.

The FACTORS statement is most useful when the response functions and their covariance matrix are read directly from the input data set. In this case, PROC CATMOD reads the response functions as though they are from one population (this poses no problem in the multiple-population case because the appropriately constructed covariance matrix is also read directly). Thus, you can use the FACTORS statement to partition the variation among the response functions into appropriate sources, even when the functions actually represent separate populations.

The format of the FACTORS statement is identical to that of the REPEATED statement. In fact, repeated measurement factors are simply special cases of factors in which some of the response functions correspond to multiple dependent variables that are measurements on the same experimental (or sampling) units.

You cannot specify the FACTORS statement for an analysis that also contains the REPEATED or LOGLIN statement since all of them specify the same information: how to partition the variation among the response functions within a population.

In the FACTORS statement,

factor-name

names a factor that corresponds to two or more response functions. This name must be a valid SAS variable name, and it should not be the same as the name of a variable that already exists in the data set being analyzed.

$

indicates that the factor is character-valued. If the $ is omitted, then PROC CATMOD assumes that the factor is numeric. The type of the factor is relevant only when you use the PROFILE= option or when the _RESPONSE_= option (described later in this section) specifies nested- by-value effects.

levels

specifies the number of levels of the corresponding factor. If there is only one such factor, and the number is omitted, then PROC CATMOD assumes that the number of levels is equal to the number of response functions per population ( q ). Unless you specify the PROFILE= option, the number q must either be equal to or be a multiple of the product of the number of levels of all the factors.

You can specify the following options in the FACTORS statement after a slash.

PROFILE=( matrix )

_RESPONSE_= effects

TITLE= ' title '

LOGLIN Statement

The LOGLIN statement is used to define log-linear model effects. It can be used whenever the default response functions (generalized logits) are used.

In the LOGLIN statement, effects are design effects that contain dependent variables in the MODEL statement, including interaction, nested, and nested-by-value effects. You can use the bar () and at (@) operators as well. The following lists of effects are equivalent:

a b c a*b a*c b*c

and

abc @2

When you use the LOGLIN statement, the keyword _RESPONSE_ should be specified in the MODEL statement. For further information on log-linear model analysis, see the 'Log-Linear Model Analysis' section on page 870.

You cannot specify the LOGLIN statement for an analysis that also contains the REPEATED or FACTORS statement since all of them specify the same information: how to partition the variation among the response functions within a population. You can specify the following option in the LOGLIN statement after a slash.

TITLE= ' title '

MODEL Statement

PROC CATMOD requires a MODEL statement. You can specify the following in a MODEL statement:

response-effect

can be either a single variable, a crossed effect with two or more variables joined by asterisks , or _F_.The_F_ specification indicates that the response functions and their estimated covariance matrix are to be read directly into the procedure (see the 'Inputting Response Functions and Covariances Directly' section on page 862 for details). The response-effect indicates the dependent variables that determine the response categories (the columns of the underlying contingency table).

design-effects

specify potential sources of variation (such as main effects and interactions) in the model. Thus, these effects determine the number of model parameters, as well as the interpretation of such parameters. In addition, if there is no POPULATION statement, PROC CATMOD uses these variables to determine the populations (the rows of the underlying contingency table). When fitting the model, PROC CATMOD adjusts the independent effects in the model for all other independent effects in the model.

Design-effects can be any of those described in the section 'Specification of Effects' on page 864, or they can be defined by specifying the actual design matrix, enclosed in parentheses (see the 'Specifying the Design Matrix Directly' section on page 847). In addition, you can use the keyword _RESPONSE_ alone or as part of an effect. Effects cannot be nested within _RESPONSE_, so effects of the form A (_RESPONSE_) are invalid.

For more information, see the 'Log-Linear Model Analysis' sec-tion on page 870 and the 'Repeated Measures Analysis' section on page 873.

Some examples of MODEL statements are

model r=a b; main effects only model r=a b a*b; main effects with interaction model r=a b(a); nested effect model r=ab; complete factorial model r=a b(a=1) b(a=2); nested-by-value effects model r*s=_response_; log-linear model model r*s=a _response_(a); nested repeated measurement factor model _f_=_response_; direct input of the response functions

The relationship between these specifications and the structure of the design matrix X is described in the 'Generation of the Design Matrix' section on page 876.

The following table summarizes the options available in the MODEL statement.

Task

Options

Specify details of computation

 

Generates maximum likelihood estimates

ML=

Generates weighted least-squares estimates

GLS

 

WLS

Omits intercept term from the model

NOINT

Specifies parameterization of classification variables

PARAM=

Adds a number to each cell frequency

ADDCELL=

Averages main effects across response functions

AVERAGED

Specifies the convergence criterion for maximum likelihood

EPSILON=

Specifies the number of iterations for maximum likelihood

MAXITER=

Specifies how missing cells are treated

MISSING=

Specifies how zero cells are treated

ZERO=

Request additional computation and tables

 

Significance level of confidence intervals

ALPHA=

Wald confidence intervals of estimates

CLPARM

Estimated correlation matrix of estimates

CORRB

Covariance matrix of response functions

COV

Estimated covariance matrix of estimates

COVB

Design and _RESPONSE_ matrix

DESIGN

Two-way frequency tables

FREQ

Iterations for maximum likelihood

ITPRINT

One-way frequency tables

ONEWAY

Predicted values

PRED=

 

PREDICT

Probability estimates

PROB

Population profiles

PROFILE

Crossproducts matrix

XPX

Title

TITLE=

Suppress output

 

Design matrix

NODESIGN

Parameter estimates

NOPARM

Variable levels

NOPREDVAR

Population and response profiles

NOPROFILE

_RESPONSE_ matrix

NORESPONSE

The following list describes these options in alphabetical order.

ADDCELL= number

ALPHA= number

AVERAGED

CLPARM

CORRB

COV

COVB

DESIGN

EPSILON= number

FREQ

ITPRINT

MAXITER= number

ML < =NRIPF < ( ipf-options ) >>

CONV= keyword

CONVCRIT= keyword

CELL

termination requires the maximum absolute difference between consecutive cell estimates to be less than 0.001 (or the value of the EPSILON= option, if specified).

LOGL

termination requires the relative difference between consecutive estimates of the log-likelihood to be less than 1E-8 (or the value of the EPSILON= option, if specified). This is the default.

MARGIN

termination requires the maximum absolute difference between consecutive margin estimates to be less than 0.001 (or the value of the EPSILON= option, if specified).

DF= keyword

PARM

MISSING= keyword

MISS = keyword

NODESIGN

NOINT

NOITER

NOPARM

NOPREDVAR

NOPRINT

NOPROFILE

NORESPONSE

ONEWAY

PARAM = EFFECT REFERENCE

PREDICT

PRED=FREQ PROB

PROB

PROFILE

TITLE=' title '

WLS

GLS

XPX

ZERO= keyword

ZEROS= keyword

ZEROES= keyword

Specifying the Design Matrix Directly

If you specify the design matrix directly, adjacent rows of the matrix must be separated by a comma, and the matrix must have q — s rows, where s is the number of populations and q is the number of response functions per population. The first q rows correspond to the response functions for the first population, the second set of q rows corresponds to the functions for the second population, and so forth. The following is an example using direct specification of the design matrix.

proc catmod; model R=(1 0, 1 1, 1 2, 1 3); run;

These statements are appropriate for the case of one population and for R with five levels (generating four response functions), so that 4 — 1 = 4. These statements are also appropriate for a situation with two populations and two response functions per population; giving 2 — 2 = 4 rows of the design matrix. (To induce more than one population, the POPULATION statement is needed.)

When you input the design matrix directly, you also have the option of specifying that any subsets of the parameters be tested for equality to zero. Indicate each subset by specifying the appropriate column numbers of the design matrix, followed by an equal sign and a label (24 characters or less, in quotes) that describes the subset. Adjacent subsets are separated by a comma, and the entire specification is enclosed in parentheses and placed after the design matrix. For example,

proc catmod; population Group Time; model R=(1 1 0 0, 1 1 0 1, 1 1 0 2, 1 0 1 0, 1 0 1 1, 1 0 1 2, 1 -1 -1 0, 1 -1 -1 1, 1 -1 -1 2) (1 ='Intercept', 2 3='Group main effect', 4 ='Linear effect of Time'); run;

The preceding statements are appropriate when Group and Time each have three levels, and R is dichotomous. The POPULATION statement induces nine populations, and q =1(since R is dichotomous), so q — s = 1 — 9 = 9.

If you input the design matrix directly but do not specify any subsets of the parameters to be tested, then PROC CATMOD tests the effect of MODEL MEAN, which represents the significance of the model beyond what is explained by an overall mean. For the previous example, the MODEL MEAN effect is the same as that obtained by specifying

(2 3 4='modelmean');

at the end of the MODEL statement.

POPULATION Statement

The POPULATION statement specifies that populations are to be based only on cross-classifications of the specified variables . If you do not specify the POPULATION statement, then populations are based only on cross-classifications of the independent variables in the MODEL statement.

The POPULATION statement has two major uses:

To illustrate the first use, suppose that you specify the following statements:

data one; input A $ B $ wt @@; datalines; yes yes 23 yes no 31 no yes 47 no no 50 ; proc catmod; weight wt; population B; model A=(1 0, 1 1); run;

Since the dependent variable A has two levels, there is one response function per population. Since the variable B has two levels, there are two populations. Thus, the MODEL statement is valid since the number of rows in the design matrix (2) is the same as the total number of response functions. If the POPULATION statement is omitted, there would be only one population and one response function, and the MODEL statement would be invalid.

To illustrate the second use, suppose that you specify

data two; input A $ B $ Y wt @@; datalines; yes yes 1 23 yes yes 2 63 yes no 1 31 yes no 2 70 no yes 1 47 no yes 2 80 no no 1 50 no no 2 84 ; proc catmod; weight wt; model Y=A B A*B / wls; run;

These statements form four populations and produce the following design matrix and analysis of variance table.

 

Source

DF

Chi-Square

Pr > ChiSq

Intercept

1

48.10

<.0001

A

1

3.47

0.0625

B

1

0.25

0.6186

A*B

1

0.19

0.6638

Residual

 

Since the B and A * B effects are nonsignificant ( p> . 10), you may want to fitthe reduced model that contains only the A effect. If your new statements are

proc catmod; weight wt; model Y=A / wls; run;

then only two populations are formed , and the design matrix and the analysis of variance table are as follows.

 

Source

DF

Chi-Square

Pr > ChiSq

Intercept

1

47.94

<.0001

A

1

3.33

0.0678

Residual

 

However, if the new statements are

proc catmod; weight wt; population A B; model Y=A / wls; run;

then four populations are formed, and the design matrix and the analysis of variance table are as follows.

 

Source

DF

Chi-Square

Pr > ChiSq

Intercept

1

47.76

<.0001

A

1

3.30

0.0694

Residual

2

0.35

0.8374

The advantage of the latter analysis is that it retains four populations for the reduced model, thereby creating a built-in goodness-of-fit test: the residual chi-square. Such a test is important because the cumulative (or joint) effect of deleting two or more effects from the model may be significant, even if the individual effects are not.

The resulting differences between the two analyses are due to the fact that the latter analysis uses pure weighted least-squares estimates with respect to the four populations that are actually sampled. The former analysis pools populations and therefore uses parameter estimates that can be regarded as weighted least-squares estimates of maximum likelihood predicted cell frequencies. In any case, the estimation methods are asymptotically equivalent; therefore, the results are very similar. If you specify the ML option (instead of the WLS option) in the MODEL statements, then the parameter estimates are identical for the two analyses.

CAUTION: if your model has different covariate profiles within any population, then the first profile is used in the analysis.

REPEATED Statement

where a factor-description is

and factor-description s are separated from each other by a comma. The $ is required for character-valued factors. The value of levels provides the number of levels of the repeated measurement factor identified by a given factor-name . For only one repeated measurement factor, levels is optional; for two or more repeated measurement factors, it is required.

The REPEATED statement incorporates repeated measurement factors into the model. You can use this statement whenever there is more than one dependent variable and the keyword _RESPONSE_ is specified in the MODEL statement. If the dependent variables correspond to one or more repeated measurement factors, you can use the REPEATED statement to define _RESPONSE_ in terms of those factors. You can specify the name, type, and number of levels of each factor, as well as the identification of each level.

You cannot specify the REPEATED statement for an analysis that also contains the FACTORS or LOGLIN statement since all of them specify the same information: how to partition the variation among the response functions within a population.

In the REPEATED statement,

factor-name

names a repeated measurement factor that corresponds to two or more response functions. This name must be a valid SAS variable name, and it should not be the same as the name of a variable that already exists in the data set being analyzed.

$

indicates that the factor is character-valued. If the $ is omitted, then PROC CATMOD assumes that the factor is numeric. The type of the factor is relevant only when you use the PROFILE= option or when the _RESPONSE_= option specifies nested-by-value effects.

levels

specifies the number of levels of the corresponding repeated measurement factor. If there is only one such factor and the number is omitted, then PROC CATMOD assumes that the number of levels is equal to the number of response functions per population ( q ). Unless you specify the PROFILE= option, the number q must either be equal to or be a multiple of the product of the number of levels of all the factors.

You can specify the following options in the REPEATED statement after a slash.

PROFILE=( matrix )

_RESPONSE_= effects

TITLE= ' title '

RESPONSE Statement

The RESPONSE statement specifies functions of the response probabilities. The procedure models these response functions as linear combinations of the parameters.

By default, PROC CATMOD uses the standard response functions (generalized logits, which are explained in detail in the 'Understanding the Standard Response Functions' section on page 859). With these standard response functions, the default estimation method is maximum likelihood, but you can use the WLS option in the MODEL statement to request weighted least-squares estimation. With other response functions (specified in the RESPONSE statement), the default (and only) estimation method is weighted least squares.

You can specify more than one RESPONSE statement, in which case each RESPONSE statement produces a separate analysis. If the computed response functions for any population are linearly dependent (yielding a singular covariance matrix), then PROC CATMOD displays an error message and stops processing. See the 'Cautions' section on page 887 for methods of dealing with this.

The function specification can be any of the items in the following list. For an example of response functions generated and formulas for q (the number of response functions), see the 'More on Response Functions' section on page 854.

ALOGIT ALOGITS

specifies response functions as adjacent-category logits of the marginal probabilities for each of the dependent variables. For each dependent variable, the response functions are a set of linearly independent adjacent-category logits, obtained by taking the logarithms of the ratios of two probabilities. The denominator of the k th ratio is the marginal probability corresponding to the k th level of the variable, and the numerator is the marginal probability corresponding to the ( k + 1)th level. If a dependent variable has two levels, then the adjacent-category logit is the negative of the generalized logit.

CLOGIT CLOGITS

specifies that the response functions are cumulative logits of the marginal probabilities for each of the dependent variables. For each dependent variable, the response functions are a set of linearly independent cumulative logits, obtained by taking the logarithms of the ratios of two probabilities. The denominator of the k th ratio is the cumulative probability, c k , corresponding to the k th level of the variable, and the numerator is 1 - c k (Agresti 1984, 113-114). If a dependent variable has two levels, then PROC CATMOD computes its cumulative logit as the negative of its generalized logit. You should use cumulative logits only when the dependent variables are ordinally scaled.

JOINT

specifies that the response functions are the joint response probabilities. A linearly independent set is created by deleting the last response probability. For the case of one dependent variable, the JOINT and MARGINALS specifications are equivalent.

LOGIT LOGITS

specifies that the response functions are generalized logits of the marginal probabilities for each of the dependent variables. For each dependent variable, the response functions are a set of linearly independent generalized logits, obtained by taking the logarithms of the ratios of two probabilities. The denominator of each ratio is the marginal probability corresponding to the last observed level of the variable, and the numerators are the marginal probabilities corresponding to each of the other levels. If there is one dependent variable, then specifying LOGIT is equivalent to using the standard response functions.

MARGINAL MARGINALS

specifies that the response functions are marginal probabilities for each of the dependent variables in the MODEL statement. For each dependent variable, the response functions are a set of linearly independent marginals, obtained by deleting the marginal probability corresponding to the last level.

MEAN MEANS

specifies that the response functions are the means of the dependent variables in the MODEL statement. This specification requires that all of the dependent variables be numeric.

READ variables

specifies that the response functions and their covariance matrix are to be read directly from the input data set with one response function for each variable named. See the section 'Inputting Response Functions and Covariances Directly' on page 862 for more information.

transformation

specifies response functions that can be expressed by using successive applications of the four operations: LOG , EXP , * matrix literal, or + matrix literal. The operations are described in detail in the 'Using a Transformation to Specify Response Functions' section on page 856.

You can specify the following options in the RESPONSE statement after a slash.

OUT= SAS-data-set

OUTEST= SAS-data-set

TITLE= ' title'

More on Response Functions

Suppose the dependent variable A has 3 levels and is the only response-effect in the MODEL statement. The following table shows the proportions upon which the response functions are defined.

Value of A :

1

2

3

proportions:

p 1

p 2

p 3

Note that ˆ‘ j p j = 1. The following table shows the response functions generated for each population.

Function Specification

Value of q

Response Function

none [*]

2

ALOGITS

2

CLOGITS

2

JOINT

2

p 1 , p 2

LOGITS

2

MARGINAL

2

p 1 , p 2

MEAN

1

1 p 1 + 2 p 2 + 3 p 3

[*] Without a function specification, the default response functions are generalized logits.

Now, suppose the dependent variables A and B each have 3 levels (valued 1, 2, and 3 each) and the response-effect in the MODEL statement is A * B . The following table shows the proportions upon which the response functions are defined.

Value of A :

1

1

1

2

2

2

3

3

3

Value of B :

1

2

3

1

2

3

1

2

3

proportions:

p 1

p 2

p 3

p 4

p 5

p 6

p 7

p 8

p 9

The marginal totals for the preceding table are defined as follows,

where ˆ‘ j p j =1. The following table shows the response functions generated for each population.

Function Specification

Value of q

Response Function

none [*]

8

ALOGITS

4

CLOGITS

4

JOINT

8

p 1 , p 2 , p 3 , p 4 , p 5 , p 6 , p 7 , p 8

LOGITS

4

MARGINAL

4

p 1 · , p 2 · , p ·1 , p ·2

MEAN

2

1 p 1 + 2 p 2 + 3 p 3. , 1 p ·1 + 2 p ·2 + 3 p ·3

[*] Without a function specification, the default response functions are generalized logits.

The READ and transformation function specifications are not shown in the preceding table. For these two situations, there is not a general response function; the response functions generated depend on what you specify.

Another important aspect of the function specification is the number of response functions generated per population, q . Let m i represent the number of levels for the i th dependent variable in the MODEL statement, and let d represent the number of dependent variables in the MODEL statement. Then, if the function specification is ALOGITS, CLOGITS, LOGITS, or MARGINALS, the number of response functions is

If the function specification is JOINT or the default (generalized logits), the number of response functions per population is

where r is the number of response profiles. If every possible cross-classification of the dependent variables is observed in the samples, then

Otherwise, r is the number of cross-classifications actually observed.

If the function specification is MEANS, the number of response functions per population is q = d .

Response Statement Examples

Some example response statements are shown in the following table.

Example

Result

response marginals;

marginals for each dependent variable

response means;

the mean of each dependent variable

response logits;

generalized logits of the marginal probabilities

response clogits;

cumulative logits of the marginal probabilities

response alogits;

adjacent-category logits of the marginal probabilities

response joint;

the joint probabilities

response 1 -1 log;

the logit

response;

generalized logits

response123;

the mean score, with scores of 1, 2, and 3 corresponding to the three response levels

response read b1-b4;

four response functions and their covariance matrix, read directly from the input data set

Using a Transformation to Specify Response Functions

If you specify a transformation , it is applied to the vector that contains the sample proportions in each population. The transformation can be any combination of the following four operations.

Operation

Specification

linear combination

* matrix literal matrix literal

logarithm

LOG

exponential

EXP

adding constant

+ matrix literal

If more than one operation is specified, then PROC CATMOD applies the operations consecutively from right to left.

A matrix literal is a matrix of numbers with each row of the matrix separated from the next by a comma. If you specify a linear combination, in most cases the * is not needed. The following statement defines the response function p 1 + 1. The * is needed to separate the two matrix literals '1' and '1 0'.

response + 1 * 1 0;

The LOG of a vector transforms each element of the vector into its natural logarithm; the EXP of a vector transforms each element into its exponential function (antilogarithm).

In order to specify a linear response function for data that have r = 3 response categories, you could specify either of the following RESPONSE statements:

response * 1 0 0 , 0 1 0; response 1 0 0 , 0 1 0;

The matrix literal in the preceding statements specifies a 2 —3 matrix, which is applied to each population as follows:

where p 1 , p 2 , and p 3 are sample proportions for the three response categories in a population, and F 1 and F 2 are the two response functions computed for that population. This response function, therefore, sets F 1= p 1 and F 2= p 2 in each population.

As another example of the linear response function, suppose you have two dependent variables corresponding to two observers who evaluate the same subjects. If the observers grade on the same three-point scale and if all nine possible responses are observed, then the following RESPONSE statement would compute the probability that the observers agree on their assessments:

response 1 0 0 0 1 0 0 0 1;

This response function is then computed as

where p ij denotes the probability that a subject gets a grade of i from the first observer and j from the second observer.

If the function is a compound function, requiring more than one operation to specify it, then the operations should be listed so that the first operation to be applied is on the right and the last operation to be applied is on the left. For example, if there are two response levels, the response function

response 1 1 log;

is equivalent to the matrix expression:

which is the logit response function since p 2 = 1 ˆ’ p 1 when there are only two response levels.

Another example of a compound response function is

response exp 1 1 * 1 0 0 1, 0 1 1 0 log;

which is equivalent to the matrix expression

F = EXP ( A * B * LOG ( P ))

where P is the vector of sample proportions for some population,

If the four responses are based on two dependent variables, each with two levels, then the function can also be written as

which is the odds (crossproduct) ratio for a 2 — 2 table.

Understanding the Standard Response Functions

If no RESPONSE statement is specified, PROC CATMOD computes the standard response functions, which contrast the log of each response probability with the log of the probability for the last response category. If there are r response categories, then there are r ˆ’ 1 standard response functions. For example, if there are four response categories, using no RESPONSE statement is equivalent to specifying

response 1 0 0 1, 0 1 0 1, 0 0 1 1 log;

This results in three response functions:

If there are only two response levels, the resulting response function would be a logit. Thus, the standard response functions are called generalized logits. They are useful in dealing with the log-linear model:

If C denotes the matrix in the preceding RESPONSE statement, then because of the restriction that the probabilities sum to 1, it follows that an equivalent model is

But C * LOG ( P ) is simply the vector of standard response functions. Thus, fitting a log-linear model on the cell probabilities is equivalent to fitting a linear model on the generalized logits.

RESTRICT Statement

where parameter is the letter B followed by a number; for example, B3 specifies the third parameter in the model. The value is the value to which the parameter is restricted. The RESTRICT statement restricts values of parameters to the values you specify, so that the estimation of the remaining parameters is subject to these restrictions. Consider the following statement:

restrict b1=1 b4=0 b6=0;

This restricts the values of three parameters. The first parameter is set to 1, and the fourth and sixth parameters are set to zero.

The RESTRICT statement is interactive. A new RESTRICT statement replaces any previous ones. In addition, if you submit two or more MODEL, LOGLIN, FACTORS, or REPEATED statements, then the subsequent occurrences of these statements also delete the previous RESTRICT statement.

WEIGHT Statement

You can use a WEIGHT statement to refer to a variable containing the cell frequencies, which need not be integers. The WEIGHT statement lets you use summary data sets containing a count variable. See the 'Input Data Sets' section on page 860 for further information concerning the WEIGHT statement.

Категории