Modelling Intervention Effects in Clustered Randomized Pretest/Posttest Studies Introduction Ed Stanek We consider a study design similar to the design for the Well Women Project, and discuss analyses of these data. Although the analyses reported are based on categorical variables, we consider continuous variables, and discuss the formulation of the mixed model statements. We do this by simulating data similar to the Well Women Project, and then by fitting the simulated data with mixed model statements in SAS. Initial Simulation The initial simulation is contained in SNE99p140.sas. The simulation was conducted on 30 clusters, with each cluster having 30 subjects. There was cluster, subject, and response error variance (giving rise to 3 variance components). The design has: 30 clusters, randomly assigned to two treatments 30 subjects per cluster 2 measures per subject (Baseline and year1) We added a treatment effect to the second time point for the treatment group. The treatment effect is assumed to be constant. The results are given below: Source:sne99p140.sas SEASON 9/15/99 EJS The MIXED Procedure Class Level Information Class Levels s C 30 1 2 3 4 5 6 7 8 9 10 11 12 13 ID 30 1 2 3 4 5 6 7 8 9 10 11 12 13 TRT 2 Active Control T 2 1 2 sea99d5.doc 9/15/99 12:27 PM 1
REML Estimation Iteration History Iteration Evaluations Objective Criterion 0 1 10295.409895 1 1 8425.0573792 0.00000000 Convergence criteria met. Covariance Parameter Estimates (REML) Cov Parm Estimate Std Error Z Pr > Z C 76.81104595 20.77843060 3.70 0.0002 ID(C) 15.88823672 1.46138340 10.87 0.0001 Residual 24.29602420 1.14660027 21.19 0.0001 Observations 1800.000 Res Log Likelihood -5862.94 Akaike's Information Criterion -5865.94 Schwarz's Bayesian Criterion -5874.18 Source:sne99p140.sas SEASON 9/15/99 EJS -2 Res Log Likelihood 11725.88 Solution for Fixed Effects Effect TRT T Estimate Std Error DF t Pr > t INTERCEPT 104.96886454 2.28255006 28 45.99 0.0001 TRT Active -1.74794993 3.22801324 898-0.54 0.5883 TRT Control 0.00000000.... T 1-0.23578790 0.32860665 898-0.72 0.4732 T 2 0.00000000.... TRT*T Active 1-1.55049065 0.46471998 898-3.34 0.0009 TRT*T Active 2 0.00000000.... TRT*T Control 1 0.00000000.... TRT*T Control 2 0.00000000.... sea99d5.doc 9/15/99 12:27 PM 2
Tests of Fixed Effects Source NDF DDF Type III F Pr > F TRT 1 898 0.61 0.4334 T 1 898 18.93 0.0001 TRT*T 1 898 11.13 0.0009 Least Squares Means Effect TRT T LSMEAN Std Error DF t Pr > t TRT*T Active 1 101.43463605 2.28255006 898 44.44 0.0001 TRT*T Active 2 103.22091461 2.28255006 898 45.22 0.0001 TRT*T Control 1 104.73307664 2.28255006 898 45.88 0.0001 TRT*T Control 2 104.96886454 2.28255006 898 45.99 0.0001 These results agree with the simulated data, in that the estimated variances are close to the simulated values, and the estimated treatment effect is close to the simulated value. Alternative Mixed Model Specification An alternative model specification (using the same simulated data), is given as the following: *** Fit mixed model (from SNE99p141.sas) **; PROC MIXED DATA=d1 covtest; by trial; CLASS c id trt t ; MODEL y=trt t t*trt/solution; RANDOM int t /SUBJECT=c(trt); REPEATED t/subject=id(c*trt) TYPE=cs; PARMS /NOBOUND; LSMEANS t*trt; run; The results are given below: Source:sne99p141.sas SEASON 9/15/99 EJS sea99d5.doc 9/15/99 12:27 PM 3
The MIXED Procedure Class Level Information Class Levels s C 30 1 2 3 4 5 6 7 8 9 10 11 12 13 ID 30 1 2 3 4 5 6 7 8 9 10 11 12 13 TRT 2 Active Control T 2 1 2 REML Estimation Iteration History Iteration Evaluations Objective Criterion 0 1 10295.409895 1 1 8421.0735068 0.00000000 Convergence criteria met. Covariance Parameter Estimates (REML) Cov Parm Subject Estimate Std Error Z Pr > Z INTERCEPT C(TRT) 76.99462456 20.77852856 3.71 0.0002 T C(TRT) -0.36715722 0.12760703-2.88 0.0040 CS ID(C*TRT) 15.71651508 1.46829097 10.70 0.0001 Residual 24.63946747 1.18137204 20.86 0.0001 Observations 1800.000 Res Log Likelihood -5860.95 Akaike's Information Criterion -5864.95 Source:sne99p141.sas SEASON 9/15/99 EJS sea99d5.doc 9/15/99 12:27 PM 4
Schwarz's Bayesian Criterion -5875.94-2 Res Log Likelihood 11721.90 PARMS Model LRT Chi-Square 1874.336 PARMS Model LRT DF 3.0000 PARMS Model LRT P- 0.0000 Solution for Fixed Effects Effect TRT T Estimate Std Error DF t Pr > t INTERCEPT 104.96886454 2.27995127 28 46.04 0.0001 TRT Active -1.74794993 3.22433800 28-0.54 0.5920 TRT Control 0.00000000.... T 1-0.23578790 0.24607813 28-0.96 0.3462 T 2 0.00000000.... TRT*T Active 1-1.55049065 0.34800704 28-4.46 0.0001 TRT*T Active 2 0.00000000.... TRT*T Control 1 0.00000000.... TRT*T Control 2 0.00000000.... Tests of Fixed Effects Source NDF DDF Type III F Pr > F TRT 1 28 0.61 0.4398 T 1 28 33.76 0.0001 TRT*T 1 28 19.85 0.0001 Least Squares Means Effect TRT T LSMEAN Std Error DF t Pr > t TRT*T Active 1 101.43463605 2.27995127 28 44.49 0.0001 TRT*T Active 2 103.22091461 2.27995127 28 45.27 0.0001 TRT*T Control 1 104.73307664 2.27995127 28 45.94 0.0001 TRT*T Control 2 104.96886454 2.27995127 28 46.04 0.0001 Note that the actual group means are identical in the two analyses, but the variance structures fit are not the same. A likelihood ratio test based on the difference in 2 log(l) is given by 3.98, which is greater than 3.84. This doesn t mean much since the data were simulated. However, it raises the question as to which variance structure is most appropriate, and which one should be used in the modelling. sea99d5.doc 9/15/99 12:27 PM 5
Appendix 1. Programs OPTIONS LINESIZE=120 PAGESIZE=53 NOCENTER NODATE NONUMBER NOFMTERR; ***************** * SEASONS STUDY PROGRAM ; * PROGRAM NAME LOCATION DATE PROGRAMMER ; Title1 "Source:sne99p140.sas SEASON 9/15/99 EJS " ; * ; * : Simulate repeated measures design similar to Anne ; * stoddards study on comrehensive disease screenings in health centers; ***************** LIBNAME current "j:\projects\seasons\data\current"; PROC FORMAT; VALUE trtf 0="Control" 1="Active"; VALUE tf 1="Baseline" 2="Year1"; DATA d1; FORMAT trt trtf.; ****************************************; *SIMULATE MIXED MODEL Data ; * Design: 10 clusters are randomized ; * to two a control and trt ; * protocol. Each cluster ; * consists of 100 subjects ; * with a baseline and year1 ; * measure ; ****************************************; %LET err_v=25; *Var for pure error on response for id=i at time t; %LET sub_v=16; *Var between subject mean true parameters within a cluster; %LET clus_v=64; *Var between cluster true parameters within a treatment; %LET tmean=100; *overall average of cluster means at baseline with no treatment; %LET treat=2; *treatment effect for second time point for treatment group; %LET nclus=30; *Number of clusters ; %LET nsub=30; *Number of subjects per cluster; %LET nrep=1; *Number of Reps; DO trial=1 to &nrep; DO c=1 to &nclus; *Clusters; cm=&tmean+rannor(23201)*sqrt(&clus_v); IF c LE &nclus/2 THEN trt=0; *Control treatment; IF c GT &nclus/2 THEN trt=1; *Active treatment; DO id=1 to *Subjects in clusters; sm=cm + sqrt(&sub_v)*rannor(332321); DO t=1 to 2; y=sm + trt*(t-1)*&treat + sqrt(&err_v)*rannor(2321); * Subject value; OUTPUT; LABEL trial="trial*(trial)" c="cluster*(c)" id="subject*(id)" t="time*(t)" cm="cluster*true*mean*(cm)" sea99d5.doc 9/15/99 12:27 PM 6
sm="subject*true*mean*(sm)" y="subject*response*(y)"; PROC PRINT; TITLE2 "Simulation with Pure Err Var=&err_v"; TITLE3 " Subject Var=&sub_v " ; TITLE4 " Cluster Var=&clus_v"; TITLE5 " &nclus clusters, &nsub subjects/cluster, and &nrep trials"; TITLE6 " with overall mean=&tmean, two times, two trts, trt=&treat"; run; *** Fit mixed model **; PROC MIXED DATA=d1 covtest; by trial; CLASS c id trt t ; MODEL y=trt t t*trt/solution; RANDOM c id(c); LSMEANS t*trt; run; sea99d5.doc 9/15/99 12:27 PM 7