Hello, I sincerely apologise in advance for the basic-ness of this question but I am finding it difficult to source answers myself despite reading a number of texts on generalized estimating equations (GEE).
I need help with the syntax I am using for xtgee with respect to my dataset and hypotheses, and then the correct syntax for postestimation commands particularly checking residuals.
Stata version 14.2
Background:
I am running a randomized clinical trial whereby outcomes are measured at 3 time points (baseline:0, 8 weeks:1, 16 weeks:2). There are two groups (intervention:1, waitlist control:2). The intervention takes place between 0 and 8 weeks, then the 16 week timepoint is follow-up/retention. Randomization was stratified by sex (male:1 vs. female:2) and gross motor function level (gmfcs 1 or 2 vs. gmfcs 3) resulting in 4 strata. The main outcome copmp is a continuous variable with a value between 1 and 10, and generally has a negatively skewed distribution at baseline with a more normal distribution at time 1 and 2 for the intervention group.
Here is an example of my data in wide format:
Correlations look like this:
Hypothesis:
My hypothesis is that copmp will be significantly greater in the intervention group compared to the waitlist following the intervention (8 weeks:1) and at follow up (16 weeks:2). I specified in the protocol that the stratification factors (sex, gmfcs) will also be covariables although I don't really know if this is strictly necessary (I realise I have to collapse gmfcs into 2 categories if I was to do this).
Problem:
The syntax I have been using is a mash of what my colleagues have recommended plus what I have guessed based on my readings. In terms of the specifications, I have been using the identity link and gaussian distribution. Originally my supervisor recommended an exchangeable correlation matrix but from my readings this doesn't seem to suit my dataset.
Another issue is after I have done this, I have used predict residuals postestimation command, and the scatterplot of this seems nonsensical to me. I am sure I must be missing a step.
Questions:
1. Could you please comment on the choice of correlation matrix, or give me a suggestion about how I might choose the most appropriate one?
2. Could you comment on my syntax and let me know if there is a more appropriate/simplified code to use to answer my hypotheses?
3. Could you suggest whether it would be appropriate to include the stratification factors as covariables and how I might decide this?
4. I need a stepwise process to look at postestimation, particularly examining residuals including the syntax I would use to both create the variables and then graph/plot
I would be sincerely and genuinely grateful for any help. Happy to provide any extra information as necessary.
Kind regards, Sarah
I need help with the syntax I am using for xtgee with respect to my dataset and hypotheses, and then the correct syntax for postestimation commands particularly checking residuals.
Stata version 14.2
Background:
I am running a randomized clinical trial whereby outcomes are measured at 3 time points (baseline:0, 8 weeks:1, 16 weeks:2). There are two groups (intervention:1, waitlist control:2). The intervention takes place between 0 and 8 weeks, then the 16 week timepoint is follow-up/retention. Randomization was stratified by sex (male:1 vs. female:2) and gross motor function level (gmfcs 1 or 2 vs. gmfcs 3) resulting in 4 strata. The main outcome copmp is a continuous variable with a value between 1 and 10, and generally has a negatively skewed distribution at baseline with a more normal distribution at time 1 and 2 for the intervention group.
Here is an example of my data in wide format:
Code:
input int idno float(copmp0 copms0 copmc0 copmp1 copms1 copmc1 copmp2 copms2 copmc2) byte(cohort group sex gmfcs) 1030 3.66667 2 8.66667 . . . . . . 2 2 1 3 1727 3.33333 3.66667 8.66667 5 2.33333 8.66667 2.66667 2.66667 7 3 2 1 1 1969 3.33333 4.66667 4 2.33333 8.33333 2 4.66667 5.33333 5 1 2 1 1 2010 3 4.33333 7 1.66667 3 8.33333 1.66667 1.33333 9.66667 5 2 2 3 2199 1 3.66667 8.33333 8 8 7.66667 4.66667 5.33333 7.33333 6 1 2 2
Code:
. correlate copmp0 copmp1 copmp2 (obs=30) | copmp0 copmp1 copmp2 -------------+--------------------------- copmp0 | 1.0000 copmp1 | 0.0876 1.0000 copmp2 | 0.1089 0.8455 1.0000 . by group, sort : correlate copmp0 copmp1 copmp2 ---------------------------------------------------------------------------------------------------------------------------------------------------- -> group = 1 (obs=15) | copmp0 copmp1 copmp2 -------------+--------------------------- copmp0 | 1.0000 copmp1 | -0.1714 1.0000 copmp2 | 0.0080 0.4715 1.0000 ---------------------------------------------------------------------------------------------------------------------------------------------------- -> group = 2 (obs=15) | copmp0 copmp1 copmp2 -------------+--------------------------- copmp0 | 1.0000 copmp1 | 0.2052 1.0000 copmp2 | 0.1160 0.8459 1.0000
My hypothesis is that copmp will be significantly greater in the intervention group compared to the waitlist following the intervention (8 weeks:1) and at follow up (16 weeks:2). I specified in the protocol that the stratification factors (sex, gmfcs) will also be covariables although I don't really know if this is strictly necessary (I realise I have to collapse gmfcs into 2 categories if I was to do this).
Problem:
The syntax I have been using is a mash of what my colleagues have recommended plus what I have guessed based on my readings. In terms of the specifications, I have been using the identity link and gaussian distribution. Originally my supervisor recommended an exchangeable correlation matrix but from my readings this doesn't seem to suit my dataset.
Code:
. xi: xtgee copmp i.group*i.time, i(idno) t(time) family(gaussian) link(identity) corr(exchangeable) i.group _Igroup_1-2 (naturally coded; _Igroup_1 omitted) i.time _Itime_0-2 (naturally coded; _Itime_0 omitted) i.group*i.time _IgroXtim_#_# (coded as above) Iteration 1: tolerance = .02625145 Iteration 2: tolerance = .00015618 Iteration 3: tolerance = 8.173e-07 GEE population-averaged model Number of obs = 100 Group variable: idno Number of groups = 37 Link: identity Obs per group: Family: Gaussian min = 1 Correlation: exchangeable avg = 2.7 max = 3 Wald chi2(5) = 162.78 Scale parameter: 2.904201 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------- copmp | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- _Igroup_2 | -.0855261 .5605329 -0.15 0.879 -1.18415 1.013098 _Itime_1 | 4.708333 .469458 10.03 0.000 3.788213 5.628454 _Itime_2 | 4.485148 .4977298 9.01 0.000 3.509616 5.46068 _IgroXtim_2_1 | -3.663343 .6825294 -5.37 0.000 -5.001076 -2.32561 _IgroXtim_2_2 | -3.379046 .7022752 -4.81 0.000 -4.75548 -2.002612 _cons | 2.791667 .4016771 6.95 0.000 2.004394 3.578939 -------------------------------------------------------------------------------
Questions:
1. Could you please comment on the choice of correlation matrix, or give me a suggestion about how I might choose the most appropriate one?
2. Could you comment on my syntax and let me know if there is a more appropriate/simplified code to use to answer my hypotheses?
3. Could you suggest whether it would be appropriate to include the stratification factors as covariables and how I might decide this?
4. I need a stepwise process to look at postestimation, particularly examining residuals including the syntax I would use to both create the variables and then graph/plot
I would be sincerely and genuinely grateful for any help. Happy to provide any extra information as necessary.
Kind regards, Sarah
Comment