SAS.STAT 9.1 Users Guide (Vol. 4)

2017-07-07 02:10:07

The following statements generate five imputed data sets to be used in this section. The data set FitMiss was created in the section 'Getting Started' on page 2610. See 'The MI Procedure' chapter for details concerning the MI procedure.

proc mi data=FitMiss seed=3237851 noprint out=outmi; var Oxygen RunTime RunPulse; run;

The Fish data described in the STEPDISC procedure are measurements of 159 fish of seven species caught in Finland's lake Laengelmavesi. For each fish, the length, height, and width are measured. Three different length measurements are recorded: from the nose of the fish to the beginning of its tail ( Length1 ), from the nose to the notch of its tail ( Length2 ), and from the nose to the end of its tail ( Length3 ). See Chapter 67, 'The STEPDISC Procedure,' for more information.

The Fish2 data set is constructed from the Fish data set and contains two species of fish. Some values have been set to missing and the resulting data set has a monotone missing pattern in variables Length3 , Height , Width , and Species . Note that some values of the variable Species have also been altered in the data set.

The following statements create the Fish2 data set. It contains the first two species of fish in the Fish data set.

/*-------- Fishes of Species Bream and Parkki Pike --------*/ data Fish2 (drop=HtPct WidthPct); title 'Fish Measurement Data'; input Species $ Length3 HtPct WidthPct @@; Height= HtPct*Length3/100; Width= WidthPct*Length3/100; datalines; Gp1 30.0 38.4 13.4 Gp1 31.2 40.0 13.8 Gp1 31.1 39.8 15.1 . 33.5 38.0 . . 34.0 36.6 15.1 Gp1 34.7 39.2 14.2 Gp1 34.5 41.1 15.3 Gp1 35.0 36.2 13.4 Gp1 35.1 39.9 13.8 . 36.2 39.3 13.7 Gp1 36.2 39.4 14.1 . 36.2 39.7 13.3 Gp1 36.4 37.8 12.0 . 37.3 37.3 13.6 Gp1 37.2 40.2 13.9 Gp1 37.2 41.5 15.0 Gp1 38.3 38.8 13.8 Gp1 38.5 38.8 13.5 Gp1 38.6 40.5 13.3 Gp1 38.7 37.4 14.8 Gp1 39.5 38.3 14.1 Gp1 39.2 40.8 13.7 . 39.7 39.1 . Gp1 40.6 38.1 15.1 Gp1 40.5 40.1 13.8 Gp1 40.9 40.0 14.8 Gp1 40.6 40.3 15.0 Gp1 41.5 39.8 14.1 Gp2 41.6 40.6 14.9 Gp1 42.6 44.5 15.5 Gp1 44.1 40.9 14.3 Gp1 44.0 41.1 14.3 Gp1 45.3 41.4 14.9 Gp1 45.9 40.6 14.7 Gp1 46.5 37.9 13.7 Gp2 16.2 25.6 14.0 Gp2 20.3 26.1 13.9 Gp2 21.2 26.3 13.7 Gp2 22.2 25.3 14.3 Gp2 22.2 28.0 16.1 Gp2 22.8 28.4 14.7 Gp2 23.1 26.7 14.7 . 23.7 25.8 13.9 Gp2 24.7 23.5 15.2 Gp1 24.3 27.3 14.6 Gp2 25.3 27.8 15.1 Gp2 25.0 26.2 13.3 Gp2 25.0 25.6 15.2 Gp2 27.2 27.7 14.1 Gp2 26.7 25.9 13.6 . 26.8 27.6 15.4 Gp2 27.9 25.4 14.0 Gp2 29.2 30.4 15.4 Gp2 30.6 28.0 15.6 Gp2 35.0 27.1 15.3 ;

The following statements generate five imputed data sets to be used in this section. The regression method is used to impute missing values in the variable Width and the discriminant function method is used to impute the variable Species .

proc mi data=Fish2 seed=1305417 out=outfish; class Species; monotone reg (Width) discrim(Species= Length3 Height Width); var Length3 Height Width Species; run;

Examples 1-6 use different input option combinations to combine parameter estimates computed from different procedures, Examples 7-8 combine parameter estimates with CLASS variables, Example 9 shows the use of a TEST statement, and Example 10 combines statistics that are not directly derived from procedures.

Example 45.1. Reading Means and Standard Errors from Variables in a DATA= Data Set

This example creates an ordinary SAS data set that contains sample means and standard errors computed from imputed data sets. These estimates are then combined to generate valid univariate inferences about the population means.

The following statements use the UNIVARIATE procedure to generate sample means and standard errors for the variables in each imputed data set.

proc univariate data=outmi noprint; var Oxygen RunTime RunPulse; output out=outuni mean=Oxygen RunTime RunPulse stderr=SOxygen SRunTime SRunPulse; by _Imputation_; run;

The following statements display the output data set from PROC UNIVARIATE in Output 45.1.1:

proc print data=outuni; title 'UNIVARIATE Means and Standard Errors'; run;

Output 45.1.1: UNIVARIATE Output Data Set

UNIVARIATE Means and Standard Errors Run SRun SRun Obs _Imputation_ Oxygen RunTime Pulse SOxygen Time Pulse 1 1 47.0120 10.4441 171.216 0.95984 0.28520 1.59910 2 2 47.2407 10.5040 171.244 0.93540 0.26661 1.75638 3 3 47.4995 10.5922 171.909 1.00766 0.26302 1.85795 4 4 47.1485 10.5279 171.146 0.95439 0.26405 1.75011 5 5 47.0042 10.4913 172.072 0.96528 0.27275 1.84807

The following statements combine the means and standard errors from imputed data sets, The EDF= option requests that the adjusted degrees of freedom be used in the analysis. For sample means based on 31 observations, the complete-data error degrees of freedom is 30.

proc mianalyze data=outuni edf=30; modeleffects Oxygen RunTime RunPulse; stderr SOxygen SRunTime SRunPulse; run;

Output 45.1.2: Multiple Imputation Variance Information

The MIANALYZE Procedure Model Information Data Set WORK.OUTUNI Number of Imputations 5 Multiple Imputation Variance Information -----------------Variance----------------- Parameter Between Within Total DF Oxygen 0.041478 0.930853 0.980626 26.298 RunTime 0.002948 0.073142 0.076679 26.503 RunPulse 0.191086 3.114442 3.343744 25.463 Multiple Imputation Variance Information Relative Fraction Increase Missing Relative Parameter in Variance Information Efficiency Oxygen 0.053471 0.051977 0.989712 RunTime 0.048365 0.047147 0.990659 RunPulse 0.073626 0.070759 0.986046

The 'Model Information' table shown in Output 45.1.2 lists the input data set(s) and the number of imputations.

The 'Multiple Imputation Variance Information' table shown in Output 45.1.2 displays the between-imputation variance, within-imputation variance, and total variance for each univariate inference. It also displays the degrees of freedom for the total variance. The relative increase in variance due to missing values, the fraction of missing information, and the relative efficiency for each imputed variable are also displayed. A detailed description of these statistics is provided in the 'Combining Inferences from Imputed Data Sets' section on page 2624 and the 'Multiple Imputation Efficiency' section on page 2626.

The 'Multiple Imputation Parameter Estimates' table shown in Output 45.1.3 displays the estimated mean and corresponding standard error for each variable. The table also displays a 95% confidence interval for the mean and a t statistic with the associated p -value for testing the hypothesis that the mean is equal to the value specified. You can use the THETA0= option to specify the value for the null hypothesis, which is zero by default. The table also displays the minimum and maximum parameter estimates from the imputed data sets.

Note that the results in this example could also have been obtained with the MI procedure.

Example 45.2. Reading Means and Covariance Matrices from a DATA= COV Data Set

This example creates a COV type data set that contains sample means and covariance matrices computed from imputed data sets. These estimates are then combined to generate valid statistical inferences about the population means.

The following statements use the CORR procedure to generate sample means and a covariance matrix for the variables in each imputed data set.

proc corr data=outmi cov nocorr noprint out=outcov(type=cov); var Oxygen RunTime RunPulse; by _Imputation_; run;

The following statements display sample means and covariance matrices from the first two imputed data sets in Output 45.2.1.

proc print data=outcov(obs=12); title 'CORR Means and Covariance Matrices' ' (First Two Imputations)'; run;

Output 45.2.1: COV Data Set

CORR Means and Covariance Matrices (First Two Imputations) Obs _Imputation_ _TYPE_ _NAME_ Oxygen RunTime RunPulse 1 1 COV Oxygen 28.5603 -7.2652 -11.812 2 1 COV RunTime -7.2652 2.5214 2.536 3 1 COV RunPulse -11.8121 2.5357 79.271 4 1 MEAN 47.0120 10.4441 171.216 5 1 STD 5.3442 1.5879 8.903 6 1 N 31.0000 31.0000 31.000 7 2 COV Oxygen 27.1240 -6.6761 -10.217 8 2 COV RunTime -6.6761 2.2035 2.611 9 2 COV RunPulse -10.2170 2.6114 95.631 10 2 MEAN 47.2407 10.5040 171.244 11 2 STD 5.2081 1.4844 9.779 12 2 N 31.0000 31.0000 31.000

Note that the covariance matrices in the data set outcov are estimated covariance matrices of variables, V ( y ). The estimated covariance matrix of the sample means is V ( y ) = V ( y ) /n , where n is the sample size , and is not the same as an estimated covariance matrix for variables.

The following statements combine the results for the imputed data sets, and derive both univariate and multivariate inferences about the means. The EDF= option is specified to request that the adjusted degrees of freedom be used in the analysis. For sample means based on 31 observations, the complete-data error degrees of freedom is 30.

proc mianalyze data=outcov edf=30 wcov bcov tcov mult; modeleffects Oxygen RunTime RunPulse; run;

The 'Multiple Imputation Variance Information' and 'Multiple Imputation Parameter Estimates' tables display the same results as in Output 45.1.2 and Output 45.1.3 in Example 45.1.

Output 45.1.3: Multiple Imputation Parameter Estimates

The MIANALYZE Procedure Multiple Imputation Parameter Estimates Parameter Estimate Std Error 95% Confidence Limits DF Oxygen 47.180993 0.990266 45.1466 49.2154 26.298 RunTime 10.511906 0.276910 9.9432 11.0806 26.503 RunPulse 171.517500 1.828591 167.7549 175.2801 25.463 Multiple Imputation Parameter Estimates Parameter Minimum Maximum Oxygen 47.004201 47.499541 RunTime 10.444149 10.592244 RunPulse 171.146171 172.071730 Multiple Imputation Parameter Estimates t for H0: Parameter Theta0 Parameter=Theta0 Pr > t Oxygen 0 47.64 <.0001 RunTime 0 37.96 <.0001 RunPulse 0 93.80 <.0001

With the WCOV, BCOV, and TCOV options, the procedure displays the between-imputation covariance matrix, within-imputation covariance matrix, and total covariance matrix assuming that the between-imputation covariance matrix is proportional to the within-imputation covariance matrix in Output 45.2.2.

Output 45.2.2: Covariance Matrices

The MIANALYZE Procedure Within-Imputation Covariance Matrix Oxygen RunTime RunPulse Oxygen 0.930852655 0.226506411 0.461022083 RunTime 0.226506411 0.073141598 0.080316017 RunPulse 0.461022083 0.080316017 3.114441784 Between-Imputation Covariance Matrix Oxygen RunTime RunPulse Oxygen 0.0414778123 0.0099248946 0.0183701754 RunTime 0.0099248946 0.0029478891 0.0091684769 RunPulse 0.0183701754 0.0091684769 0.1910855259 Total Covariance Matrix Oxygen RunTime RunPulse Oxygen 1.202882661 0.292700068 0.595750001 RunTime 0.292700068 0.094516313 0.103787365 RunPulse 0.595750001 0.103787365 4.024598310

With the MULT option, the procedure assumes that the between-imputation covariance matrix is proportional to the within-imputation covariance matrix and displays a multivariate inference for all the parameters taken jointly.

The 'Multiple Imputation Multivariate Inference' table displayed in Output 45.2.3 shows a significant p -value for the null hypothesis that the population means are all equal to zero.

Output 45.2.3: Multiple Imputation Multivariate Inference

The MIANALYZE Procedure Multiple Imputation Multivariate Inference Assuming Proportionality of Between/Within Covariance Matrices Avg Relative Increase F for H0: in Variance Num DF Den DF Parameter=Theta0 Pr > F 0.292237 3 122.68 12519.7 <.0001

Example 45.3. Reading Regression Results from a DATA= EST Data Set

This example creates an EST type data set that contains regression coefficients and their corresponding covariance matrices computed from imputed data sets. These estimates are then combined to generate valid statistical inferences about the regression model.

The following statements use the REG procedure to generate regression coefficients:

proc reg data=outmi outest=outreg covout noprint; model Oxygen= RunTime RunPulse; by _Imputation_; run;

The following statements display regression coefficients and their covariance matrices from the first two imputed data sets in Output 45.3.1.

proc print data=outreg(obs=8); var _Imputation_ _Type_ _Name_ Intercept RunTime RunPulse; title 'REG Model Coefficients and Covariance matrices' ' (First Two Imputations)'; run;

Output 45.3.1: EST Type Data Set

REG Model Coefficients and Covariance matrices (First Two Imputations) Obs _Imputation_ _TYPE_ _NAME_ Intercept RunTime RunPulse 1 1 PARMS 86.544 2.82231 0.05873 2 1 COV Intercept 100.145 0.53519 0.55077 3 1 COV RunTime 0.535 0.10774 0.00345 4 1 COV RunPulse 0.551 0.00345 0.00343 5 2 PARMS 83.021 3.00023 0.02491 6 2 COV Intercept 79.032 0.66765 0.41918 7 2 COV RunTime 0.668 0.11456 0.00313 8 2 COV RunPulse 0.419 0.00313 0.00264

The following statements combine the results for the imputed data sets. The EDF= option is specified to request that the adjusted degrees of freedom be used in the analysis. For a regression model with three independent variables (including the Intercept) and 31 observations, the complete-data error degrees of freedom is 28.

proc mianalyze data=outreg edf=28; modeleffects Intercept RunTime RunPulse; run;

Output 45.3.2: Multiple Imputation Variance Information

The MIANALYZE Procedure Multiple Imputation Variance Information -----------------Variance---------------- Parameter Between Within Total DF Intercept 45.529229 76.543614 131.178689 9.1917 RunTime 0.019390 0.106220 0.129487 18.311 RunPulse 0.001007 0.002537 0.003746 12.137 Multiple Imputation Variance Information Relative Fraction Increase Missing Relative Parameter in Variance Information Efficiency Intercept 0.713777 0.461277 0.915537 RunTime 0.219051 0.192620 0.962905 RunPulse 0.476384 0.355376 0.933641

The 'Multiple Imputation Variance Information' table shown in Output 45.3.2 displays the between-imputation, within-imputation, and total variances for combining complete-data inferences.

Output 45.3.3: Multiple Imputation Parameter Estimates

The MIANALYZE Procedure Multiple Imputation Parameter Estimates Parameter Estimate Std Error 95% Confidence Limits DF Intercept 90.837440 11.453327 65.01034 116.6645 9.1917 RunTime 3.032870 0.359844 3.78795 2.2778 18.311 RunPulse 0.068578 0.061204 0.20176 0.0646 12.137 Multiple Imputation Parameter Estimates Parameter Minimum Maximum Intercept 83.020730 100.839807 RunTime 3.204426 2.822311 RunPulse 0.112840 0.024910 Multiple Imputation Parameter Estimates t for H0: Parameter Theta0 Parameter=Theta0 Pr > t Intercept 0 7.93 <.0001 RunTime 0 -8.43 <.0001 RunPulse 0 -1.12 0.2842

The 'Multiple Imputation Parameter Estimates' table shown in Output 45.3.3 displays the estimated mean and standard error of the regression coefficients. The inferences are based on the t distribution. The table also displays a 95% mean confidence interval and a t test with the associated p -value for the hypothesis that the regression coefficient is equal to zero. Since the p -value for RunPulse is 0.1597, this variable can be removed from the regression model.

Example 45.4. Reading Mixed Model Results from PARMS= and COVB= Data Sets

This example creates data sets containing parameter estimates and covariance matrices computed by a mixed model analysis for a set of imputed data sets. These estimates are then combined to generate valid statistical inferences about the parameters.

The following PROC MIXED statements generate the fixed-effect parameter estimates and covariance matrix for each imputed data set:

proc mixed data=outmi; model Oxygen= RunTime RunPulse RunTime*RunPulse/solution covb; by _Imputation_; ods output SolutionF=mixparms CovB=mixcovb; run;

The following statements display parameter estimates from the first two imputed data sets in Output 45.4.1.

proc print data=mixparms (obs=8); var _Imputation_ Effect Estimate StdErr; title 'MIXED Model Coefficients (First Two Imputations)'; run;

Output 45.4.1: PROC MIXED Model Coefficients

MIXED Model Coefficients (First Two Imputations) Obs _Imputation_ Effect Estimate StdErr 1 1 Intercept 148.09 81.5231 2 1 RunTime 8.8115 7.8794 3 1 RunPulse 0.4123 0.4684 4 1 RunTime*RunPulse 0.03437 0.04517 5 2 Intercept 64.3607 64.6034 6 2 RunTime 1.1270 6.4307 7 2 RunPulse 0.08160 0.3688 8 2 RunTime*RunPulse 0.01069 0.03664

The following statements display the covariance matrices associated with the parameter estimates from the first two imputed data sets in Output 45.4.2. Note that the variables Col1 , Col2 , Col3 , and Col4 are used to identify the effects Intercept , RunTime , RunPulse , and RunTime*RunPulse through the variable Row .

proc print data=mixcovb (obs=8); var _Imputation_ Row Effect Col1 Col2 Col3 Col4; title 'Covariance Matrices (First Two Imputations)'; run;

Output 45.4.2: PROC MIXED Covariance Matrices

Covariance Matrices (First Two Imputations) Obs _Imputation_ Row Effect Col1 Col2 Col3 Col4 1 1 1 Intercept 6646.01 637.40 38.1515 3.6542 2 1 2 RunTime 637.40 62.0842 3.6548 0.3556 3 1 3 RunPulse 38.1515 3.6548 0.2194 0.02099 4 1 4 RunTime*RunPulse 3.6542 0.3556 0.02099 0.002040 5 2 1 Intercept 4173.59 411.46 23.7889 2.3441 6 2 2 RunTime 411.46 41.3545 2.3414 0.2353 7 2 3 RunPulse 23.7889 2.3414 0.1360 0.01338 8 2 4 RunTime*RunPulse 2.3441 0.2353 0.01338 0.001343

For univariate inference, only parameter estimates and their associated standard errors are needed. The following statements use the MIANALYZE procedure with the input PARMS= data set to produce univariate results.

proc mianalyze parms=mixparms edf=28; modeleffects Intercept RunTime RunPulse RunTime*RunPulse; run;

Output 45.4.3: Multiple Imputation Variance Information

The MIANALYZE Procedure Multiple Imputation Variance Information -----------------Variance---------------- Parameter Between Within Total DF Intercept 1972.654530 4771.948777 7139.134213 11.82 RunTime 14.712602 45.549686 63.204808 13.797 RunPulse 0.062941 0.156717 0.232247 12.046 RunTime*RunPulse 0.000470 0.001490 0.002055 13.983 Multiple Imputation Variance Information Relative Fraction Increase Missing Relative Parameter in Variance Information Efficiency Intercept 0.496063 0.365524 0.931875 RunTime 0.387601 0.305893 0.942348 RunPulse 0.481948 0.358274 0.933136 RunTime*RunPulse 0.378863 0.300674 0.943276

The 'Multiple Imputation Variance Information' table shown in Output 45.4.3 displays the between-imputation, within-imputation, and total variances for combining complete-data inferences.

Output 45.4.4: Multiple Imputation Parameter Estimates

The MIANALYZE Procedure Multiple Imputation Parameter Estimates Parameter Estimate Std Error 95% Confidence Limits DF Intercept 136.071356 84.493397 48.3352 320.4779 11.82 RunTime 7.457186 7.950145 24.5322 9.6178 13.797 RunPulse 0.328104 0.481920 1.3777 0.7215 12.046 RunTime*RunPulse 0.025364 0.045328 0.0719 0.1226 13.983 Multiple Imputation Parameter Estimates Parameter Minimum Maximum Intercept 64.360719 186.549814 RunTime 11.514341 1.127010 RunPulse 0.602162 0.081597 RunTime*RunPulse 0.010690 0.047429 Multiple Imputation Parameter Estimates t for H0: Parameter Theta0 Parameter=Theta0 Pr > t Intercept 0 1.61 0.1337 RunTime 0 0.94 0.3644 RunPulse 0 0.68 0.5089 RunTime*RunPulse 0 0.56 0.5846

The 'Multiple Imputation Parameter Estimates' table shown in Output 45.4.4 displays the estimated mean and standard error of the regression coefficients.

Since each covariance matrix contains variables Row , Col1 , Col2 , Col3 , and Col4 for parameters, the EFFECTVAR=ROWCOL option is needed when specifying the COVB= option. The following statements illustrate the use of the MIANALYZE procedure with input PARMS= and COVB(EFFECTVAR=ROWCOL)= data sets:

proc mianalyze parms=mixparms edf=28 covb(effectvar=rowcol)=mixcovb; modeleffects Intercept RunTime RunPulse RunTime*RunPulse; run;

Example 45.5. Reading Generalized Linear Model Results from PARMS=, PARMINFO=, and COVB= Data Sets

This example creates data sets containing parameter estimates and corresponding covariance matrices computed by a generalized linear model analysis for a set of imputed data sets. These estimates are then combined to generate valid statistical inferences about the model parameters.

The following statements use PROC GENMOD to generate the parameter estimates and covariance matrix for each imputed data set:

proc genmod data=outmi; model Oxygen= RunTime RunPulse/covb; by _Imputation_; ods output ParameterEstimates=gmparms ParmInfo=gmpinfo CovB=gmcovb; run;

The following statements print parameter estimates and covariance matrix from the first two imputed data sets in Output 45.5.1.

proc print data=gmparms (obs=8); var _Imputation_ Parameter Estimate StdErr; title 'GENMOD Model Coefficients (First Two Imputations)'; run;

Output 45.5.1: PROC GENMOD Model Coefficients

GENMOD Model Coefficients (First Two Imputations) Obs _Imputation_ Parameter Estimate StdErr 1 1 Intercept 86.5440 9.5107 2 1 RunTime 2.8223 0.3120 3 1 RunPulse 0.0587 0.0556 4 1 Scale 2.6692 0.3390 5 2 Intercept 83.0207 8.4489 6 2 RunTime 3.0002 0.3217 7 2 RunPulse 0.0249 0.0488 8 2 Scale 2.5727 0.3267

The following statements display the parameter information table in Output 45.5.2. The table identifies parameter names used in the covariance matrices. The parameters Prm1 , Prm2 , and Prm3 are used for effects Intercept , RunTime , and RunPulse in each covariance matrix.

proc print data=gmpinfo (obs=6); title 'GENMOD Parameter Information (First Two Imputations)'; run;

Output 45.5.2: PROC GENMOD Model Information

GENMOD Parameter Information (First Two Imputations) Obs _Imputation_ Parameter Effect 1 1 Prm1 Intercept 2 1 Prm2 RunTime 3 1 Prm3 RunPulse 4 2 Prm1 Intercept 5 2 Prm2 RunTime 6 2 Prm3 RunPulse

The following statements display the covariance matrices from the first two imputed data sets in Output 45.5.3. Note that the GENMOD procedure computes maximum likelihood estimates for each covariance matrix.

proc print data=gmcovb (obs=8); var _Imputation_ RowName Prm1 Prm2 Prm3; title 'GENMOD Covariance Matrices (First Two Imputations)'; run;

Output 45.5.3: PROC GENMOD Covariance Matrices

GENMOD Covariance Matrices (First Two Imputations) Row Obs _Imputation_ Name Prm1 Prm2 Prm3 1 1 Prm1 90.453923 0.483394 0.497473 2 1 Prm2 0.483394 0.0973159 0.003113 3 1 Prm3 0.497473 0.003113 0.0030954 4 1 Scale 2.765E-17 3.05E-17 2.759E-18 5 2 Prm1 71.383332 0.603037 0.378616 6 2 Prm2 0.603037 0.1034766 0.002826 7 2 Prm3 0.378616 0.002826 0.0023843 8 2 Scale 1.132E-14 2.181E-16 7.62E-17

The following statements use the MIANALYZE procedure with input PARMS=, PARMINFO=, and COVB= data sets:

proc mianalyze parms=gmparms covb=gmcovb parminfo=gmpinfo; modeleffects Intercept RunTime RunPulse; run;

Since the GENMOD procedure computes maximum likelihood estimates for the covariance matrix, the EDF= option is not used. The resulting model coefficients are identical to the estimates in Example 45.3 in Output 45.3.3 but the standard errors are slightly different because in this example, maximum likelihood estimates for the standard errors are combined without the EDF= option, whereas in Example 45.3, unbiased estimates for the standard errors are combined with the EDF= option.

Example 45.6. Reading GLM Results from PARMS= and XPXI= Data Sets

This example creates data sets containing parameter estimates and corresponding ( X ² X ) ^{ˆ’ 1} matrices computed by a general linear model analysis for a set of imputed data sets. These estimates are then combined to generate valid statistical inferences about the model parameters.

The following statements use PROC GLM to generate the parameter estimates and ( X ² X ) ^{ˆ’ 1} matrix for each imputed data set:

proc glm data=outmi; model Oxygen= RunTime RunPulse/inverse; by _Imputation_; ods output ParameterEstimates=glmparms InvXPX=glmxpxi; quit;

The following statements display parameter estimates and standard errors from imputed data sets in Output 45.6.1.

proc print data=glmparms (obs=6); var _Imputation_ Parameter Estimate StdErr; title 'GLM Model Coefficients (First Two Imputations)'; run;

Output 45.6.1: PROC GLM Model Coefficients

GLM Model Coefficients (First Two Imputations) Obs _Imputation_ Parameter Estimate StdErr 1 1 Intercept 86.5440339 10.00726811 2 1 RunTime 2.8223108 0.32824165 3 1 RunPulse 0.0587292 0.05854109 4 2 Intercept 83.0207303 8.88996885 5 2 RunTime 3.0002288 0.33847204 6 2 RunPulse 0.0249103 0.05137859

The following statements display ( X ² X ) ^{ˆ’ 1} matrices from imputed data sets in Output 45.6.2.

proc print data=glmxpxi (obs=8); var _Imputation_ Parameter Intercept RunTime RunPulse; title 'GLM X''X Inverse Matrices (First Two Imputations)'; run;

Output 45.6.2: PROC GLM (X ² X) ^{ˆ’ 1} Matrices

GLM X'X Inverse Matrices (First Two Imputations) Obs _Imputation_ Parameter Intercept RunTime RunPulse 1 1 Intercept 12.696250656 0.067849956 0.069826009 2 1 RunTime 0.067849956 0.0136594055 0.000436938 3 1 RunPulse 0.069826009 0.000436938 0.0004344762 4 1 Oxygen 86.544033929 2.822310769 0.058729234 5 2 Intercept 10.784620785 0.091107072 0.057201387 6 2 RunTime 0.091107072 0.0156332765 0.000426902 7 2 RunPulse 0.057201387 0.000426902 0.0003602208 8 2 Oxygen 83.020730343 3.000228818 0.024910305

The standard errors for the estimates in the output glmparms data set are needed to create the covariance matrix from the ( X ² X ) ^{ˆ’ 1} matrix. The following statements use the MIANALYZE procedure with input PARMS= and XPXI= data sets to produce the same results as displayed in Example 45.3 in Output 45.3.2 and Output 45.3.3:

proc mianalyze parms=glmparms xpxi=glmxpxi edf=28; modeleffects Intercept RunTime RunPulse; run;

Example 45.7. Reading Logistic Model Results from PARMS= and COVB= Data Sets

This example creates data sets containing parameter estimates and corresponding covariance matrices computed by a logistic regression analysis for a set of imputed data sets. These estimates are then combined to generate valid statistical inferences about the model parameters.

The following statements use PROC LOGISTIC to generate the parameter estimates and covariance matrix for each imputed data set.

proc logistic data=outfish; class Species; model Species= Height Width Height*Width/ covb; by _Imputation_; ods output ParameterEstimates=lgsparms CovB=lgscovb; run;

The following statements displays the logistic regression coefficients from the first two imputations in Output 45.7.1.

proc print data=lgsparms (obs=8); title 'LOGISTIC Model Coefficients (First Two Imputations)'; run;

Output 45.7.1: PROC LOGISTIC Model Coefficients

LOGISTIC Model Coefficients (First Two Imputations) Prob Obs _Imputation_ Variable DF Estimate StdErr WaldChiSq ChiSq 1 1 Intercept 1 4.2188 7.8679 0.2875 0.5918 2 1 Height 1 2.4568 1.0579 5.3929 0.0202 3 1 Width 1 3.3480 2.8541 1.3761 0.2408 4 1 Height*Width 1 0.1331 0.1441 0.8527 0.3558 5 2 Intercept 1 10.9235 9.1880 1.4135 0.2345 6 2 Height 1 3.1578 1.5208 4.3116 0.0379 7 2 Width 1 1.7683 2.9749 0.3533 0.5522 8 2 Height*Width 1 0.2714 0.1892 2.0575 0.1515

The following statements displays the covariance matrices associated with parameter estimates from the first two imputations in Output 45.7.2.

proc print data=lgscovb (obs=8); title 'LOGISTIC Model Covariance Matrices (First Two Imputations)'; run;

Output 45.7.2: PROC LOGISTIC Covariance Matrices

LOGISTIC Model Covariance Matrices (First Two Imputations) Height Obs _Imputation_ Parameter Intercept Height Width Width 1 1 Intercept 61.90439 2.39611 18.8182 0.923732 2 1 Height 2.39611 1.119218 0.76837 0.11322 3 1 Width 18.8182 0.76837 8.145619 0.18386 4 1 HeightWidth 0.923732 0.11322 0.18386 0.020762 5 2 Intercept 84.41847 5.94636 20.9352 1.389396 6 2 Height 5.94636 2.312748 1.08263 0.24839 7 2 Width 20.9352 1.08263 8.849757 0.1547 8 2 HeightWidth 1.389396 0.24839 0.1547 0.035796

The following statements use the MIANALYZE procedure with input PARMS= and COVB= data sets.

proc mianalyze parms=lgsparms covb(effectvar=stacking)=lgscovb; modeleffects Intercept Height Width Height*Width; run;

Output 45.7.3: Multiple Imputation Variance Information

The MIANALYZE Procedure Multiple Imputation Variance Information -----------------Variance---------------- Parameter Between Within Total DF Intercept 15.218807 70.592292 88.854861 94.689 Height 0.181361 1.626663 1.844296 287.26 Width 0.804258 8.428402 9.393511 378.93 Height*Width 0.006765 0.026888 0.035006 74.37 Multiple Imputation Variance Information Relative Fraction Increase Missing Relative Parameter in Variance Information Efficiency Intercept 0.258705 0.221798 0.957525 Height 0.133791 0.124081 0.975785 Width 0.114507 0.107441 0.978964 Height*Width 0.301942 0.251772 0.952060

The 'Multiple Imputation Variance Information' table shown in Output 45.7.3 displays the between-imputation, within-imputation, and total variances for combining complete-data inferences.

Output 45.7.4: Multiple Imputation Parameter Estimates

The MIANALYZE Procedure Multiple Imputation Parameter Estimates Parameter Estimate Std Error 95% Confidence Limits DF Intercept 7.085702 9.426286 25.8000 11.62863 94.689 Height 2.757779 1.358049 0.0848 5.43077 287.26 Width 2.678006 3.064884 8.7043 3.34830 378.93 Height*Width 0.191947 0.187099 0.5647 0.18083 74.37 Multiple Imputation Parameter Estimates Parameter Minimum Maximum Intercept 11.769173 4.203658 Height 2.439954 3.285454 Width 3.349258 1.626538 Height*Width 0.291998 0.131535 Multiple Imputation Parameter Estimates t for H0: Parameter Theta0 Parameter=Theta0 Pr > t Intercept 0 0.75 0.4541 Height 0 2.03 0.0432 Width 0 0.87 0.3828 Height*Width 0 1.03 0.308 3

The 'Multiple Imputation Parameter Estimates' table shown in Output 45.7.4 displays the combined parameter estimates with associated standard errors.

Example 45.8. Reading Mixed Model Results with CLASS Variables

This example creates data sets containing parameter estimates and corresponding covariance matrices with CLASS variables computed by a mixed regression model analysis for a set of imputed data sets. These estimates are then combined to generate valid statistical inferences about the model parameters.

The following statements use PROC MIXED to generate the parameter estimates and covariance matrix for each imputed data set:

proc mixed data=outfish; class Species; model Length3= Species Height Width/ solution covb; by _Imputation_; ods output SolutionF=mxparms CovB=mxcovb; run;

The following statements displays the mixed model coefficients from the first two imputations in Output 45.8.1.

proc print data=mxparms (obs=10); var _Imputation_ Effect Species Estimate StdErr; title 'MIXED Model Coefficients (First Two Imputations)'; run;

Output 45.8.1: PROC MIXED Model Coefficients

MIXED Model Coefficients (First Two Imputations) Obs _Imputation_ Effect Species Estimate StdErr 1 1 Intercept 6.8381 1.0290 2 1 Species Gp1 0.05924 0.7253 3 1 Species Gp2 0 . 4 1 Height 0.9185 0.1732 5 1 Width 3.2526 0.5321 6 2 Intercept 6.9417 0.9868 7 2 Species Gp1 0.3178 0.7290 8 2 Species Gp2 0 . 9 2 Height 0.9544 0.1683 10 2 Width 3.1697 0.5079

The following statements use the MIANALYZE procedure with input PARMS= data set.

proc mianalyze parms(classvar=full)=mxparms; class Species; modeleffects Intercept Species Height Width; run;

Output 45.8.2: Multiple Imputation Variance Information

The MIANALYZE Procedure Multiple Imputation Variance Information -----------------Variance----------------- Parameter Species Between Within Total DF Intercept 0.013257 1.017462 1.033370 16879 Species Gp1 0.068045 0.519627 0.601281 216.9 Species Gp2 0 . . . Height 0.002691 0.028993 0.032222 398.26 Width 0.014947 0.270396 0.288332 1033.6 Multiple Imputation Variance Information Relative Fraction Increase Missing Relative Parameter Species in Variance Information Efficiency Intercept 0.015635 0.015511 0.996907 Species Gp1 0.157139 0.143659 0.972071 Species Gp2 . . . Height 0.111380 0.104703 0.979489 Width 0.066334 0.064017 0.987358

The 'Multiple Imputation Variance Information' table shown in Output 45.8.2 displays the between-imputation, within-imputation, and total variances for combining complete-data inferences.

Output 45.8.3: Multiple Imputation Parameter Estimates

The MIANALYZE Procedure Multiple Imputation Parameter Estimates Parameter Species Estimate Std Error 95% Confidence Limits DF Intercept 6.844098 1.016548 4.85156 8.836638 16879 Species Gp1 0.184298 0.775423 1.71263 1.344030 216.9 Species Gp2 0 . . . . Height 0.928624 0.179506 0.57573 1.281522 398.26 Width 3.237105 0.536966 2.18344 4.290772 1033.6 Multiple Imputation Parameter Estimates Parameter Species Minimum Maximum Intercept 6.713049 6.976758 Species Gp1 0.580012 0.033160 Species Gp2 0 0 Height 0.879314 1.004623 Width 3.064954 3.360809 Multiple Imputation Parameter Estimates t for H0: Parameter Species Theta0 Parameter=Theta0 Pr > t Intercept 0 6.73 <.0001 Species Gp1 0 0.24 0.8124 Species Gp2 0 . . Height 0 5.17 <.0001 Width 0 6.03 <.0001

The 'Multiple Imputation Parameter Estimates' table shown in Output 45.8.3 displays the combined parameter estimates with associated standard errors.

Example 45.9. Using a TEST statement

The following statements use the REG procedure to generate regression coefficients:

proc reg data=outmi outest=outreg covout noprint; model Oxygen= RunTime RunPulse; by _Imputation_; run;

The following statements combine the results for the imputed data sets. A TEST statement is used to test linear hypotheses of Intercept=0 and RunTime=RunPulse.

proc mianalyze data=outreg edf=28; modeleffects Intercept RunTime RunPulse; test Intercept, RunTime=RunPulse / mult; run;

Output 45.9.1: Test Specification

The MIANALYZE Procedure Test: Test 1 Test Specification ------------------L Matrix------------------ Parameter Intercept RunTime RunPulse C TestPrm1 1.000000 0 0 0 TestPrm2 0 1.000000 -1.000000 0

The 'Test Specification' table shown in Output 45.9.1 displays the L matrix and the c vector in a TEST statement. Since there is no label specified for the TEST statement, 'Test 1' is used as the label.

Output 45.9.2: Multiple Imputation Variance Information

The MIANALYZE Procedure Test: Test 1 Multiple Imputation Variance Information -----------------Variance----------------- Parameter Between Within Total DF TestPrm1 45.529229 76.543614 131.178689 9.1917 TestPrm2 0.014715 0.114324 0.131983 20.598 Multiple Imputation Variance Information Relative Fraction Increase Missing Relative Parameter in Variance Information Efficiency TestPrm1 0.713777 0.461277 0.915537 TestPrm2 0.154459 0.141444 0.972490

The 'Multiple Imputation Variance Information' table shown in Output 45.9.2 displays the between-imputation variance, within-imputation variance, and total variance for each univariate inference. A detailed description of these statistics is provided in the 'Combining Inferences from Imputed Data Sets' section on page 2624 and the 'Multiple Imputation Efficiency' section on page 2626.

Output 45.9.3: Multiple Imputation Parameter Estimates

The MIANALYZE Procedure Test: Test 1 Multiple Imputation Parameter Estimates Parameter Estimate Std Error 95% Confidence Limits DF TestPrm1 90.837440 11.453327 65.01034 116.6645 9.1917 TestPrm2 2.964292 0.363294 3.72070 2.2079 20.598 Multiple Imputation Parameter Estimates t for H0: Parameter Minimum Maximum C Parameter=C Pr > t TestPrm1 83.020730 100.839807 0 7.93 <.0001 TestPrm2 3.091586 2.763582 0 8.16 <.0001

The 'Multiple Imputation Parameter Estimates' table shown in Output 45.9.3 displays the estimated mean and standard error of the linear components . The inferences are based on the t distribution. The table also displays a 95% mean confidence interval and a t test with the associated p -value for the hypothesis that each linear component of L ² is equal to zero.

Output 45.9.4: Multiple Imputation Multivariate Inference

The MIANALYZE Procedure Test: Test 1 Multiple Imputation Multivariate Inference Assuming Proportionality of Between/Within Covariance Matrices Avg Relative Increase F for H0: in Variance Num DF Den DF Parameter=Theta0 Pr > F 0.419868 2 35.053 60.34 <.0001

Example 45.10. Combining Correlation Coefficients

This example combines sample correlation coefficients computed from a set of imputed data sets using Fisher's z transformation.

Fisher's z transformation of the sample correlation r is

The statistic z is approximately normally distributed with mean

and variance 1 / ( n ˆ’ 3), where is the population correlation coefficient and n is the number of observations.

The following statements use the CORR procedure to compute the correlation r and its associated Fisher's z statistic between variables Oxygen and RunTime for each imputed data set. The ODS statement is used to save Fisher's z statistic in an output data set.

proc corr data=outmi fisher(biasadj=no); var Oxygen RunTime; by _Imputation_; ods output FisherPearsonCorr= outz; run;

The following statements display the number of observations and Fisher's z statistic for each imputed data set in Output 45.10.1.

proc print data=outz; title 'Fisher''s Correlation Statistics'; var _Imputation_ NObs ZVal; run;

Output 45.10.1: Output z Statistics

Fisher's Correlation Statistics Obs _Imputation_ NObs ZVal 1 1 31 1.27869 2 2 31 1.30715 3 3 31 1.27922 4 4 31 1.39243 5 5 31 1.40146

The following statements generate the standard error associated with the z statistic,

data outz; set outz; StdZ= 1. / sqrt(NObs-3); run;

The following statements use the MIANALYZE procedure to generate a combined parameter estimate and its variance, as shown in Output 45.10.2. The ODS statement is used to save the parameter estimates in an output data set.

proc mianalyze data=outz; ods output ParameterEstimates=parms; modeleffects ZVal; stderr StdZ; run;

Output 45.10.2: Combining Fisher's z statistics

The MIANALYZE Procedure Multiple Imputation Parameter Estimates Parameter Estimate Std Error 95% Confidence Limits DF ZVal 1.331787 0.200327 1.72587 0.93771 330.23 Multiple Imputation Parameter Estimates Parameter Minimum Maximum ZVal 1.401459 1.278686 Multiple Imputation Parameter Estimates t for H0: Parameter Theta0 Parameter=Theta0 Pr > t ZVal 0 6.65 <.0001

In addition to the estimate for z , PROC MIANALYZE also generates 95% confidence limits for z , _. ₀₂₅ and _. ₉₇₅ . The following statements print the estimate and 95% confidence limits for z in Output 45.10.3.

proc print data=parms; title 'Parameter Estimates with 95% Confidence Limits'; var Estimate LCLMean UCLMean; run;

Output 45.10.3: Parameter Estimates with 95% Confidence Limits

Parameter Estimates with 95% Confidence Limits Obs Estimate LCLMean UCLMean 1 1.331787 1.72587 0.93771

An estimate of the correlation coefficient and 95% confidence limits are then generated from the following inverse transformation as described in the 'Correlation Coefficients' section on page 2630

for z = _0.25 , ,and ₉₇₅ .

The following statements generate and display an estimate of the correlation coefficient and its 95% confidence limits.

data corr_ci; set parms; r= tanh(Estimate); r_lower= tanh(LCLMean); r_upper= tanh(UCLMean); run; proc print data=corr_ci; title 'Estimated Correlation Coefficient' ' with 95% Confidence Limits'; var r r_lower r_upper; run;

Output 45.10.4 : Estimated Correlation Coefficient

Estimated Correlation Coefficient with 95% Confidence Limits Obs r r_lower r_upper 1 0.86969 0.93857 0.73417