SAS/STAT 9.1 Users Guide Volume 2 only

The syntax of the GAM procedure is similar to that of other regression procedures in the SAS System. The PROC GAM and MODEL statements are required. The SCORE statement can appear multiple times; all other statements appear only once.

The syntax for PROC GAM is described in the following sections in alphabetical order after the description of the PROC GAM statement.

PROC GAM Statement

PROC GAM < option > ;

The PROC GAM statement invokes the procedure. You can specify the following option.

DATA= SAS-data-set

BY Statement

BY variables ;

You can specify a BY statement with PROC GAM to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in the order of the BY variables.

If your input data set is not sorted in ascending order, use one of the following alternatives:

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

CLASS Statement

CLASS variables ;

The CLASS statement names the classification variables to be used in the analysis. Typical class variables are TREATMENT, SEX, RACE, GROUP , and REPLICATION. If the CLASS statement is used, it must appear before the MODEL statement.

Classification variables can be either character or numeric. Class levels are determined from the formatted values of the CLASS variables. Thus, you can use formats to group values into levels. Refer to the discussion of the FORMAT procedure in the SAS Procedures Guide , and the discussions for the FORMAT statement and SAS formats in SAS Language Reference: Dictionary .

FREQ Statement

FREQ variable ;

The FREQ statement names a variable that provides frequencies for each observation in the DATA= data set. Specifically, if n is the value of the FREQ variable for a given observation, then that observation is used n times.

The analysis produced using a FREQ statement reflects the expanded number of observations. You can produce the same analysis (without the FREQ statement) by first creating a new data set that contains the expanded number of observations. For example, if the value of the FREQ variable is 5 for the first observation, the first five observations in the new data set are identical. Each observation in the old data set is replicated n i times in the new data set, where n i is the value of the FREQ variable for that observation.

If the value of the FREQ variable is missing or is less than 1, the observation is not used in the analysis. If the value is not an integer, only the integer portion is used.

The FREQ statement is not available when a loess smoother is included in the model.

ID Statement

ID variables ;

The variables in the ID statement are copied from the input data set to the OUT= data set. If you omit the ID statement, only the variables used in the MODEL statement and requested statistics are included in the output data set.

MODEL Statement

MODEL dependent= < PARAM(effects) >< smoothing effects >< /options > ;

MODEL event/trails= < PARAM(effects) >< smoothing effects >< /options > ;

The MODEL statement specifies the dependent variable and the independent effects you want to use to model its values. Specify the independent parametric variables inside the parentheses of PARAM( ). The parametric variables can be either CLASS variables or continuous variables. Class variables must be declared with a CLASS statement. Interactions between variables can also be included as parametric effects. The syntax for the specification of effects is the same as for the GLM procedure.

Any number of smoothing effects can be specified, as follows :

Smoothing Effect

Meaning

SPLINE(variable < , df=number > )

fit smoothing spline with the variable and with DF=number

LOESS(variable < , df=number > )

fit local regression with the variable and with DF=number

SPLINE2(variable, variable < ,df=number > )

fit bivariate thin-plate smoothing spline with DF=number

If you do not specify the DF=number option with a smoothing effect, DF=4 is used by default, unless you specify the METHOD=GCV model option. Note that for univariate spline components , a degree of freedom is removed by default to account for the linear portion of the model, so the value displayed in the Fit Summary and Analysis of Deviance tables will be one less than the value you specify.

Both parametric effects and smoothing effects are optional, but at least one of them must be present.

If only parametric variables are present, PROC GAM fits a parametric linear model using the terms inside the parentheses of PARAM( ). If only smoothing effects are present, PROC GAM fits a nonparametric additive model. If both types of effect are present, PROC GAM fits a semiparametric model using the parametric effects as the linear part of the model.

The following table shows how to specify various models for a dependent variable y and independent variables x , x1 , and x2 .

Table 30.1: Syntax for Common GAM Models

Type of Model

Syntax

Mathematical Form

Parametric

model y = param(x);

E ( y ) = ² + ² 1 x

Nonparametric

model y = spline(x);

E ( y ) = ² + s ( x )

Nonparametric

model y = loess(x);

E ( y ) = ² + s ( x )

Semiparametric

model y = param(x1) spline(x2);

E ( y ) = ² + ² 1 x 1 + s ( x 2 )

Additive

model y = spline(x1) spline(x2);

E ( y ) = ² + s 1 ( x 1 ) + s 2 ( x 2 )

Thin-plate spline

model y = spline2(x1,x2);

E ( y ) = ² + s ( x 1 , x 2 )

You can specify the following options in the MODEL statement.

ALPHA= number

DIST= distribution-id

EPSILON= numbe r

EPSSCORE= number

ITPRINT

MAXITER= number

MAXITSCORE= number

METHOD=GCV

NOTEST

OUTPUT Statement

The OUTPUT statement creates a new SAS data set containing diagnostic measures calculated after fitting the model.

You can request a variety of diagnostic measures that are calculated for each observation in the data set. The new data set contains the variables specified in the MODEL statement in addition to the requested variables. If no keyword is present, the data set contains only the predicted values.

Details on the specifications in the OUTPUT statement are as follows.

OUT= SAS-data-set

keyword

PREDICTED

predicted values for each smoothing component and overall predicted values at design points

UCLM

upper confidence limits for each predicted smoothing component

LCLM

lower confidence limits for each predicted smoothing component

ADIAG

diagonal element of the hat matrix associated with the observation for each smoothing spline component

RESIDUAL

residual standardized by its weights

STD

standard deviation of the prediction for each smoothing component

ALL

implies all preceding keywords

The names of the new variables that contain the statistics are formed by using a prefix of one or more characters that identify the statistic, followed by an underscore (_), followed by the variable name.

The prefixes of the new variables are as follows:

Keywords

Prefix

PRED

P_

UCLM

UCLM_

LCLM

LCLM_

ADIAG

ADIAG_

RESID

R_

STD

STD_ for spline

STDP_ for loess

For example, suppose that you have a dependent variable y and an independent smoothing variable x , and you specify the keywords PRED and ADIAG. In this case, the output SAS data set will contain the variables P_y , P_x , and ADIAG_x .

SCORE Statement

The SCORE statement calculates predicted values for a new data set. The variables generated by the SCORE statement use the same naming conventions with prefixes as the OUTPUT statement. If you have multiple data sets to predict, you can specify multiple SCORE statements. You must use a SCORE statement for each data set.

The following options must be specified in the SCORE statement.

DATA= SAS-data-set

OUT= SAS-data-set

Категории