SAS/STAT 9.1, Users Guide, Volume 3 (volume 3 ONLY)

The following statements are available in PROC LIFEREG.

The PROC LIFEREG statement invokes the procedure. The MODEL statement is required and specifies the variables used in the regression part of the model as well as the distribution used for the error, or random, component of the model. Only a single MODEL statement can be used with one invocation of the LIFEREG procedure. If multiple MODEL statements are present, only the last is used. Main effects and interaction terms can be specified in the MODEL statement, similar to the GLM procedure. Initial values can be specified in the MODEL statement or in an INEST= data set. If no initial values are specified, the starting estimates are obtained by ordinary least squares. The CLASS statement determines which explanatory variables are treated as categorical. The WEIGHT statement identifies a variable with values that are used to weight the observations. Observations with zero or negative weights are not used to fit the model, although predicted values can be computed for them. The OUTPUT statement creates an output data set containing predicted values and residuals.

PROC LIFEREG Statement

The PROC LIFEREG statement invokes the procedure. You can specify the following options in the PROC LIFEREG statement.

COVOUT

DATA= SAS-data-set

GOUT= graphics-catalog

INEST= SAS-data-set

NAMELEN= n

NOPRINT

ORDER=DATA FORMATTED FREQ INTERNAL

OUTEST= SAS-data-set

XDATA= SAS-data-set

BY Statement

You can specify a BY statement with PROC LIFEREG to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables.

If your input data set is not sorted in ascending order, use one of the following alternatives:

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

CLASS Statement

Variables that are classification variables rather than quantitative numeric variables must be listed in the CLASS statement. For each explanatory variable listed in the CLASS statement, indicator variables are generated for the levels assumed by the CLASS variable. If the CLASS statement is used, it must appear before the MODEL statement.

INSET Statement

The box or table of summary information produced on plots made with the PROBPLOT statement is called an inset . You can use the INSET statement to customize the information that is displayed in the inset box as well as to customize the appearance of the inset box. To supply the information that is displayed in the inset box, you specify keywords corresponding to the information that you want shown. For example, the following statements produce a probability plot with the number of observations, the number of right-censored observations, the name of the distribution, and the estimated Weibull shape parameter in the inset.

proc lifereg data=epidemic; model life = dose / dist = Weibull; probplot ; inset nobs right dist shape; run;

By default, inset entries are identified with appropriate labels. However, you can provide a customized label by specifying the keyword for that entry followed by the equal sign (=) and the label in quotes. For example, the following INSET statement produces an inset containing the number of observations and the name of the distribution, labeled Sample Size and Distribution in the inset.

inset nobs='Sample Size' dist='Distribution';

If you specify a keyword that does not apply to the plot you are creating, then the keyword is ignored.

If you specify more than one INSET statement, only the first one is used.

The following table lists keywords available in the INSET statement to display summary statistics, distribution parameters, and distribution fitting information.

Table 39.1: INSET Statement Keywords

CONFIDENCE

confidence coefficient for all confidence intervals

DIST

name of the distribution

INTERVAL

number of interval-censored observations

LEFT

number of left-censored observations

NOBS

number of observations

NMISS

number of observations with missing values

RIGHT

number of right-censored observations

SCALE

value of the scale parameter

SHAPE

value of the shape parameter

UNCENSORED

number of uncensored observations

The following options control the appearance of the box. All options are specified after the slash (/) in the INSET statement.

CFILL= color

CFILLH= color

CFRAME= color

CHEADER= color

CTEXT= color

FONT= font

HEIGHT= value

HEADER= quoted string

NOFRAME

POS= value < DATA PERCENT >

REFPOINT= name

MODEL Statement

Only a single MODEL statement can be used with one invocation of the LIFEREG procedure. If multiple MODEL statements are present, only the last is used. The optional label is used to label the model estimates in the output SAS data set and OUTEST= data set.

The first MODEL syntax is appropriate for right censoring. The variable response is possibly right-censored. If the response variable can be right-censored, then a second variable, denoted censor , must appear after the response variable with a list of parenthesized values, separated by commas or blanks, to indicate censoring. That is, if the censor variable takes on a value given in the list, the response is a right-censored value; otherwise , it is an observed value.

The second MODEL syntax specifies two variables, lower and upper , that contain values of the endpoints of the censoring interval. If the two values are the same (and not missing), it is assumed that there is no censoring and the actual response value is observed. If the lower value is missing, then the upper value is used as a left-censored value. If the upper value is missing, then the lower value is taken as a right-censored value. If both values are present and the lower value is less than the upper value, it is assumed that the values specify a censoring interval. If the lower value is greater than the upper value or both values are missing, then the observation is not used in the analysis although predicted values can still be obtained if none of the covariates are missing. The following table summarizes the ways of specifying censoring.

lower

upper

Comparison

Interpretation

not missing

not missing

equal

no censoring

not missing

not missing

lower < upper

censoring interval

missing

not missing

 

upper used as left-censoring value

not missing

missing

 

lower used as right-censoring value

not missing

not missing

lower > upper

observation not used

missing

missing

 

observation not used

The third MODEL syntax specifies two variables that contain count data for a binary response. The value of the first variable, events , is the number of successes. The value of the second variable, trials , is the number of tries . The values of both events and ( trials-events ) must be nonnegative, and trials must be positive for the response to be valid. The values of the two variables do not need to be integers and are not modified to be integers.

The effects following the equal sign are the covariates in the model. Higher-order effects, such as interactions and nested terms, are allowed in the list, similar to the GLM procedure. Variable names and combinations of variable names representing higher-order terms are allowed to appear in this list. Class variables can be used as effects, and indicator variables are generated for the class levels. If you do not specify any covariates following the equal sign, an intercept-only model is fit.

Examples of three valid MODEL statements are

a: model time*flag(1,3)=temp; b: model (start, finish)=; c: model r/n=dose;

Model statement a indicates that the response is contained in a variable named time and that, if the variable flag takes on the values 1 or 3, the observation is right-censored. The explanatory variable is temp , which could be a class variable. Model statement b indicates that the response is known to be in the interval between the values of the variables start and finish and that there are no covariates except for a default intercept term . Model statement c indicates a binary response, with the variable r containing the number of responses and the variable n containing the number of trials.

The following options can appear in the MODEL statement.

Task

Option

Model specification

 

set the significance level

ALPHA=

 

specify distribution type for failure time

DISTRIBUTION=

 

request no log transformation of response

NOLOG

 

initial estimate for intercept term

INTERCEPT=

 

hold intercept term fixed

NOINT

 

initial estimates for regression parameters

INITIAL=

 

initialize scale parameter

SCALE=

 

hold scale parameter fixed

NOSCALE

 

initialize first shape parameter

SHAPE1=

 

hold first shape parameter fixed

NOSHAPE1

Model fitting

 

set convergence criterion

CONVERGE=

 

set maximum iterations

MAXITER=

 

set tolerance for testing singularity

SINGULAR=

Output

 

display estimated correlation matrix

CORRB

 

display estimated covariance matrix

COVB

 

display iteration history, final gradient, and second derivative matrix

ITPRINT

ALPHA= value

CONVERGE= value

CONVG= value

CORRB

COVB

DISTRIBUTION= distribution-type

DIST= distribution-type

D= distribution-type

INITIAL= values

INTERCEPT= value

ITPRINT

MAXITER= n

NOINT

NOLOG

NOSCALE

NOSHAPE1

SCALE= value

SHAPE1= value

SINGULAR= value

OUTPUT Statement

The OUTPUT statement creates a new SAS data set containing statistics calculated after fitting the model. At least one specification of the form keyword = name is required.

All variables in the original data set are included in the new data set, along with the variables created as options to the OUTPUT statement. These new variables contain fitted values and estimated quantiles. If you want to create a permanent SAS data set, you must specify a two-level name (refer to SAS Language Reference: Concepts for more information on permanent SAS data sets). Each OUTPUT statement applies to the preceding MODEL statement. See Example 39.1 for illustrations of the OUTPUT statement.

The following specifications can appear in the OUTPUT statement:

OUT= SAS-data-set

specifies the new data set. By default, the procedure uses the DATA n convention to name the new data set.

keyword=name

specifies the statistics to include in the output data set and gives names to the new variables. Specify a keyword for each desired statistic (see the following list of keywords), an equal sign, and the variable to contain the statistic.

The keywords allowed and the statistics they represent are as follows:

CENSORED

specifies an indicator variable to signal censoring. The variable takes on the value 1 if the observation is censored; otherwise, it is 0.

CDF

specifies a variable to contain the estimates of the cumulative distribution function evaluated at the observed response. See the Predicted Values section on page 2114 for more information.

CONTROL

specifies a variable in the input data set to control the estimation of quantiles. See Example 39.1 for an illustration. If the specified variable has the value of 1, estimates for all the values listed in the QUANTILE= list are computed for that observation in the input data set; otherwise, no estimates are computed. If no CONTROL= variable is specified, all quantiles are estimated for all observations. If the response variable in the MODEL statement is binomial, then this option has no effect.

CRESIDUAL CRES

specifies a variable to contain the Cox-Snell residuals

where S is the standard survival function and

If the response variable in the corresponding model statement is binomial, then the residuals are not computed, and this variable contains missing values.

SRESIDUAL SRES

specifies a variable to contain the standardized residuals

If the response variable in the corresponding model statement is binomial, then the residuals are not computed, and this variable contains missing values.

PREDICTED P

specifies a variable to contain the quantile estimates. If the response variable in the corresponding model statement is binomial, then this variable contains the estimated probabilities, 1 ˆ’ F ( ˆ’ x ² b ).

QUANTILES QUANTILE Q

gives a list of values for which quantiles are calculated. The values must be between 0 and 1, noninclusive. For each value, a corresponding quantile is estimated. This option is not used if the response variable in the corresponding MODEL statement is binomial. The QUANTILES option can be specified as follows.

 

Type of List

Specification

 

list separated by blanks

.2 .4 .6 .8

 

list separated by commas

.2, .4, .6,.8

 

xtoy

.2 to .8

 

xtoybyz

.2 to .8 by .1

 

combination of methods

.1,.2 to .8 by .2

 

By default, QUANTILES=0.5. When the response is not binomial, a numeric variable, _ PROB_ , is added to the OUTPUT data set whenever the QUANTILES= option is specified. The variable _ PROB_ gives the probability value for the quantile estimates. These are the values taken from the QUANTILES= list and are given as values between 0 and 1, not as values between 0 and 100.

STD_ ERR STD

specifies a variable to contain the estimates of the standard errors of the estimated quantiles or x ² b . If the response used in the MODEL statement is a binomial response, then these are the standard errors of x ² b . Otherwise, they are the standard errors of the quantile estimates. These estimates can be used to compute confidence intervals for the quantiles. However, if the model is fitto the log of the event time, better confidence intervals can usually be computed by transforming the confidence intervals for the log response. See Example 39.1 for such a transformation.

XBETA

specifies a variable to contain the computed value of x ² b , where x is the covariate vector and b is the vector of parameter estimates.

PROBPLOT Statement

You can use the PROBPLOT statement to create a probability plot from lifetime data. The data can be uncensored, right-censored, or arbitrarily censored. You can specify any number of PROBPLOT statements after a MODEL statement. The syntax used for the response in the MODEL statement determines the type of censoring assumed in creating the probability plot. The model fit with the MODEL statement is plotted along with the data. If there are covariates in the model, they are set to constant values specified in the XDATA= data set when creating the probability plot. If no XDATA= data set is specified, continuous variables are set to their overall mean values and categorical variables specified in the CLASS statement are set to their highest levels.

You can specify the following options to control the content, layout, and appearance of a probability plot.

ANNOTATE= SAS-data-set

ANNO= SAS-data-set

CAXIS= color

CAXES= color

CCENSOR= color

CENBIN

CENCOLOR= color

CENSYMBOL= symbol ( symbol list )

CFIT= color

CFRAME= color

CFR= color

CGRID= color

CHREF= color

CH= color

CTEXT= color

CVREF= color

CV= color

DESCRIPTION= string

DES= string

FONT= font

HCL

HEIGHT = value

HLOWER= value

HOFFSET= value

HUPPER= value

HREF < (INTERSECT) > = value-list

HREFLABELS= label1 ... labeln

HREFLABEL= label1 ... labeln

HREFLAB= label1 ... labeln

HREFLABPOS= n

INBORDER

INTERTILE= value

ITPRINTEM

JITTER= value

LFIT= linetype

LGRID= linetype

LHREF= linetype

LH= linetype

LVREF= linetype

LV = linetype

MAXITEM= n1 <,n2> n1

NAME= string

NOCENPLOT

NOCONF

NODATA

NOFIT

NOFRAME

NOGRID

NOHLABEL

NOHTICK

NOPOLISH

NOVLABEL

NOVTICK

NPINTERVALS= interval type

PCTLIST= value-list

PLOWER= value

PRINTPROBS

PUPPER= value

PPOS= character-list

PPOUT

PROBLIST= value-list

ROTATE

SQUARE

TOLLIKE= value

TOLPROB= value

VAXISLABEL= ˜ string

VREF= value-list

VREFLABELS= label1 ... labeln

VREFLABEL= label1 ... labeln

VREFLAB= label1 ... labeln

VREFLABPOS= n

WAXIS= n

WFIT= n

WGRID= n

WREFL= n

WEIGHT Statement

If you want to use weights for each observation in the input data set, place the weights in a variable in the data set and specify the name in a WEIGHT statement. The values of the WEIGHT variable can be nonintegral and are not truncated. Observations with nonpositive or missing values for the weight variable do not contribute to the fitof the model. The WEIGHT variable multiplies the contribution to the log likelihood for each observation.

Категории