Whether to use xtmixed or xtgee ?

Andrea Burgess

Join Date: Dec 2016

Posts: 36
#1

Whether to use xtmixed or xtgee ?

10 Jan 2017, 06:05

Hi, I am trying to work out whether to use xtmixed or xtgee. I have longitudinal data which involves children seen at fixed time points between 18months and 60 months. Children are classified as having a certain level manual ability (macs) or gross motor ability (gmfcs). The outcome variable is selfcare. What I am really interested in is how do children selfcare skills develop with time, according to macs level. From reading, it seems that I could use either xtmixed or xtgee. If I used xt mixed, would it be correct to use the following ? mixed scaleselfcare appt_type || macs: || patientid: appt_type (is this the correct order of grouping - that macs is the top group ? - ) If I used xtgee, is this correct ? xi: xtgee scaleselfcare i.appt_type, i( patientid ) t( appt_type ) corr(unstructured) link(identity) family(gauss) Example generated by -dataex-. To install: ssc install dataex clear input int patientid byte appt_type int macs byte gmfcs_t 1389 30 1 1 1389 60 1 1 1389 48 1 1 1389 24 1 1 1389 36 2 1 1391 30 2 1 1391 48 2 1 1391 36 2 1 1391 24 2 1 1391 60 2 1 1396 48 1 1 1396 60 1 1 1396 24 2 1 1396 30 2 1 1396 36 2 1 1414 60 1 1 1414 36 1 1 1414 48 1 2 1414 30 2 1 1427 36 1 1 end label values appt_type appt_type_labels label def appt_type_labels 24 "24 months", modify label def appt_type_labels 30 "30 months", modify label def appt_type_labels 36 "36 months", modify label def appt_type_labels 48 "48 months", modify label def appt_type_labels 60 "60 months", modify label values macs macs_labels label def macs_labels 1 "MACS 1", modify label def macs_labels 2 "MACS 2", modify
------------------ copy up to and including the previous line --

Kind regards,
Andrea Burgess
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

10 Jan 2017, 06:45

Hello Andrea,

Your output, unfortunately, is practically unreadable, at least to me, under a small notebook. Please present it under CODE delimiters, as recommended in the FAQ.

That said, instead of- xtmixed - , the up-to-date command is - mixed - and it may provide similar results as xtgee, depending on the correlation structure and errors you select.

In short, the may difference between mixed models and gee models is the fact that the last one is fundamentally a population-averaged model.

Hopefully that helps.

Best,

Marcos

Best regards,

Marcos
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30147

10 Jan 2017, 10:13

Apart from the question of whether to prefer -mixed- or -xtgee-, if I understand your situation both of your models appear to be mis-specified.

I took your post to mean that you have an outcome variable, selfcarescale (which you don't include in your data example) and you also have two predictor variables, macs, and gmfcs_t that you want to use as predictors of that outcome. You have repeated observations on each study subject: the subjects are identified by variable patientid, and the observations on each subject are then designated by appt_type (which is actually a chronology). If that is what you want to do, the commands would be:

Code:

mixed selfcarescale i.macs i.gmfcs_t || patientid:

xtset patientid appt_type
xtgee selfcarescale i.macs i.gmfcs_t

Note: This assumes that you do not want to model any interaction between the macs and gmfcs_t effects.

As for the difference between using -mixed- and -xtgee- here, Marcos is correct that -xtgee- produces population averaged effect estimates whereas -mixed- and its related -me- commands produce individual level effect estimates. However, for linear models, the population averaged and individual level effects being estimated are actually the same, so it should make little difference. The results will come out nearly the same. Here is an example based on your data, where I made-up a selfcarescale outcome to illustrate:

Code:

. clear 

. input int patientid byte appt_type int macs byte gmfcs_t 

     patien~d  appt_t~e      macs   gmfcs_t
  1. 1389 30 1 1
  2. 1389 60 1 1 
  3. 1389 48 1 1 
  4. 1389 24 1 1 
  5. 1389 36 2 1 
  6. 1391 30 2 1 
  7. 1391 48 2 1 
  8. 1391 36 2 1 
  9. 1391 24 2 1 
 10. 1391 60 2 1 
 11. 1396 48 1 1 
 12. 1396 60 1 1 
 13. 1396 24 2 1 
 14. 1396 30 2 1 
 15. 1396 36 2 1 
 16. 1414 60 1 1 
 17. 1414 36 1 1 
 18. 1414 48 1 2 
 19. 1414 30 2 1 
 20. 1427 36 1 1 
 21. end 

. label values appt_type appt_type_labels 

. label def appt_type_labels 24 "24 months", modify 

. label def appt_type_labels 30 "30 months", modify 

. label def appt_type_labels 36 "36 months", modify 

. label def appt_type_labels 48 "48 months", modify 

. label def appt_type_labels 60 "60 months", modify 

. label values macs macs_labels 

. label def macs_labels 1 "MACS 1", modify 

. label def macs_labels 2 "MACS 2", modify

. 
. gen u = rnormal(0.5)

. by patientid (appt_type), sort: replace u = u[1] 
(15 real changes made)

. gen selfcarescale = 2.5*macs + 1.5*gmfcs_t + u + rnormal(0.7)

. 
. mixed selfcarescale i.macs i.gmfcs_t || patientid:

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -25.613979  
Iteration 1:   log likelihood = -25.613979  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =         20
Group variable: patientid                       Number of groups  =          5

                                                Obs per group:
                                                              min =          1
                                                              avg =        4.0
                                                              max =          5

                                                Wald chi2(2)      =      23.28
Log likelihood = -25.613979                     Prob > chi2       =     0.0000

-------------------------------------------------------------------------------
selfcarescale |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         macs |
      MACS 2  |   1.921648   .3982532     4.83   0.000     1.141086     2.70221
    2.gmfcs_t |   .6805709   .7910428     0.86   0.390    -.8698446    2.230986
        _cons |   5.400706   .4649618    11.62   0.000     4.489398    6.312015
-------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
patientid: Identity          |
                  var(_cons) |   .7526018   .5866723      .1633172    3.468155
-----------------------------+------------------------------------------------
               var(Residual) |   .4705603   .1727231      .2291788    .9661755
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 6.45          Prob >= chibar2 = 0.0056

. 
. xtset patientid appt_type
       panel variable:  patientid (unbalanced)
        time variable:  appt_type, 24 to 60, but with gaps
                delta:  1 unit

. xtgee selfcarescale i.macs i.gmfcs_t

Iteration 1: tolerance = .18672309
Iteration 2: tolerance = .08777219
Iteration 3: tolerance = .01718743
Iteration 4: tolerance = .00245169
Iteration 5: tolerance = .00032879
Iteration 6: tolerance = .00004371
Iteration 7: tolerance = 5.803e-06
Iteration 8: tolerance = 7.704e-07

GEE population-averaged model                   Number of obs     =         20
Group variable:                  patientid      Number of groups  =          5
Link:                             identity      Obs per group:
Family:                           Gaussian                    min =          1
Correlation:                  exchangeable                    avg =        4.0
                                                              max =          5
                                                Wald chi2(2)      =      24.01
Scale parameter:                  1.280931      Prob > chi2       =     0.0000

------------------------------------------------------------------------------
selfcaresc~e |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        macs |
     MACS 2  |     1.8971   .3871944     4.90   0.000     1.138213    2.655987
   2.gmfcs_t |   .6971371    .765401     0.91   0.362    -.8030213    2.197295
       _cons |   5.415123   .4796393    11.29   0.000     4.475047    6.355198
------------------------------------------------------------------------------

Comment

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

10 Jan 2017, 10:43

In addition to what Craig said,

1) Your first mixed command syntax would have specified multiple levels of nesting, i.e. observations are nested within macs, then within patients. I doubt that's what you meant to do. It would not make sense. In my line of work, we might write a mixed model that nests patients within hospitals. It sounds like you have only one level of nesting, i.e. observations are nested within patients.

I'm not sure what you mean by "according to macs level," but it seems like you'd want to include it as a covariate, and you'd want to consider if you want to model the interaction of time (i.e. appt_type, if I understand you correctly) with macs level.

Any substantive questions should be discussed with the PI, I think, or if you're the PI or your PI doesn't have an idea how to model them, I'd ask a statistician face to face.

2) Your syntax, as written, asks for a random slope for appt_type. Forgive me if you already know this. But GEE isn't able to account for individual variation in trajectories, and hence you may actually be better off with a mixed model, provided you can interpret it. If you can't explain what a random slope means, you may wish to consult a statistician.

3) In your GEE syntax, it seems like appt_type is coded more like a time variable. In your command as written, by asking for i.appt_type, you asked for it to be treated categorically. Is that what you meant to do?

4) In Craig's sample code, he is coding GMFCS and MACS as categorical. I suspect your variables are measured as continuous. Just be sure to adjust your own coding accordingly.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#5

10 Jan 2017, 10:56

I agree with nearly all of Weiwen Ng's comments.

I'm not sure about his 4), however. In the example data that Andrea Burgess provided, MACS is coded 1/2 and has a value label assigned to it. gmfcs is also coded 1/2, though it does not receive a value label. I thought in this general context that it was reasonable to consider these as dichotomous variables, though the possibility that they are actually continuous and the example data is not representative does remain.

As for the treatment of appt_type, I just ignored it in my code. But including it as either a discrete or quasi-continuous variable might be reasonable depending on what theory suggests here. If -mixed- is ultimately selected, a random slope model is possible for this variable, if desired.
Comment
Andrea Burgess

Join Date: Dec 2016

Posts: 36
#6

10 Jan 2017, 18:57

Hi,
Thank you for your time and help. Hopefully, the centre where I am will soon have a statistician, I do think face to face discussion with a statistician would be best. My PI would like me to check with statistician.
Sorry the information I gave was not clear and insufficient.
The outcome variable selfcare is continous, The variables macs and gmfcs are categorical with 5 categories each. Appt_type is the time children were seen (in specified time points eg. 18, 24, 30, 36, 48, 60months. I do want to model the interaction with time. I did realize gee was population based, but wasn't sure when to choose between the gee and mixed.
I am looking at rate of development.
Regards,
Andrea.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#7

10 Jan 2017, 20:34

If you are looking at rate of development, that sounds like your model should include time as a (quasi-)continuous variable. Again, with a linear model, there is no difference between population-averaged and individual-level parameters. So the choice between -xtgee- and -mixed- is basically one of convenience (unless you want to include random slopes, which -xtgee- does not support).

Looking at rate of development over 6 time points raises the question of whether the relationship is linear in time. I would strongly recommend you undertake some graphical exploration of the selfcarescsale vs time relationship to see whether a simple linear term in appt_type is sufficient. If the relationship appears to be materially non-linear then you may want to consider some non-linear transformations of time, or the use of a spline to represent time in your model.

I agree that a face-to-face consultation with a statistician is indicated. We do our best here on the forum, but I think we are most effective when dealing with relatively limited questions.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#8

10 Jan 2017, 20:53

Originally posted by Andrea Burgess View Post

Hi,
Thank you for your time and help. Hopefully, the centre where I am will soon have a statistician, I do think face to face discussion with a statistician would be best. My PI would like me to check with statistician.
Sorry the information I gave was not clear and insufficient.
The outcome variable selfcare is continous, The variables macs and gmfcs are categorical with 5 categories each. Appt_type is the time children were seen (in specified time points eg. 18, 24, 30, 36, 48, 60months. I do want to model the interaction with time. I did realize gee was population based, but wasn't sure when to choose between the gee and mixed.
I am looking at rate of development.
Regards,
Andrea.

So, Craig was correct that MACS and GMFCs were correctly coded as categorical. On that issue, it may be a minor point, but I was taught to not turn continuous variables into discrete where possible, to preserve all information, and it seemed like those two things would be measured as continuous (i was assuming they were multi item scales). So, If they are actually measured as ordinal or nominal categories, ignore what I said.

One last thing about GEE. Your syntax asked the program to estimate an unstructured correlation structure, and GEE will estimate a unique parameter for each correlation between outcomes at each time point. For example, it will estimate one unique parameter for the correlation between 18- and 24-month observations, then between 18- and 36-months, and for all pairwise combos of times. You can frequently get away with simplifying the correlation structure, e.g. To exchangeable (I.e. Estimate only one parameter for the average correlation among outcomes at any two time points). If this doesn't make sense to you, then definitely clarify with the statistician.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Andrea Burgess

Join Date: Dec 2016

Posts: 36
#9

10 Jan 2017, 23:24

I have previously done some graphing and it does seem that the relationship is linear. I have done histograms and boxplots, but the most interesting graph was:

twoway (lfit scaleselfcare appt_type if macs==1) (lfit scaleselfcare appt_type if macs==2) (lfit scaleselfcare appt_type if macs==3) (lfit scaleselfcare appt_type if macs==4) (lfit scaleselfcare appt_type if macs==5)

This graph gave the course of selfcare development for each macs level. I think it is correct - just thought you may have feedback on it.

I will need to learn how to make time quasi-continous.

Just to clarify, using the command below is it correct to say, the random slope provided in a mixed model will allow for self care (outcome) to change over time, across different values of patientid?
mixed selfcarescale i.macs*appt_type || patientid:
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#10

11 Jan 2017, 06:20

Originally posted by Andrea Burgess View Post

I have previously done some graphing and it does seem that the relationship is linear. I have done histograms and boxplots, but the most interesting graph was:

twoway (lfit scaleselfcare appt_type if macs==1) (lfit scaleselfcare appt_type if macs==2) (lfit scaleselfcare appt_type if macs==3) (lfit scaleselfcare appt_type if macs==4) (lfit scaleselfcare appt_type if macs==5)

This graph gave the course of selfcare development for each macs level. I think it is correct - just thought you may have feedback on it.

I will need to learn how to make time quasi-continous.

Just to clarify, using the command below is it correct to say, the random slope provided in a mixed model will allow for self care (outcome) to change over time, across different values of patientid?
mixed selfcarescale i.macs*appt_type || patientid:

You're correct that a random slope for time in a mixed model will allow each patient to have their own trajectory of self care. The fixed effect coefficient you would get for time represents the grand mean change in self care per unit time, but the program will allow some patients to improve faster, some to improve slower, some to have self care decline.

the command you typed doesn't give a random slope for time. This does (assuming appt_time was coded as quasi continuous, say in months):

mixed selfcaerscale i.macs##c.appt_type || patientid: appt_type

If you coded appt_time in the actual value of months, e.g. 18, 24, 36, then your betas will be the unit change per month.

as to your graph, you're not plotting each patient's trajectory, whereas the spaghetti plot program written by UCLA should do it.

http://www.ats.ucla.edu/stat/stata/faq/spagplot.htm

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Andrea Burgess

Join Date: Dec 2016

Posts: 36
#11

12 Jan 2017, 05:15

Thank you for your help Weiwen. I appreciate it.
I had a look at the spaghetti plots. I have done two way connected graphs in the past - are these the same as spaghetti plot?
I did want a graph which portrays the selfcare according to macs levels (1-5)

Thank you Clyde for your time earlier to re-do the coding and give a worked example.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#12

12 Jan 2017, 08:44

Originally posted by Andrea Burgess View Post

Thank you for your help Weiwen. I appreciate it.
I had a look at the spaghetti plots. I have done two way connected graphs in the past - are these the same as spaghetti plot?
I did want a graph which portrays the selfcare according to macs levels (1-5)

Thank you Clyde for your time earlier to re-do the coding and give a worked example.

Andrea,

You used twoway lfit. That command actually is not really a connected graph. It conducts a linear regression and then draws the regression line. The way you ran that command, you should have got 5 regression lines for each level of macs, and you regressed the self care scale against time. This is potentially a very useful command! But as far as I can understand, it ignored the fact that the observations are nested within people.

With the spaghetti plot command, you are plotting each person's trajectory. I think that xtline does the same thing, but xtline runs really slowly in my opinion.

For others' reference:

http://www.stata.com/manuals13/g-2graphtwowaylfit.pdf

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Andrea Burgess

Join Date: Dec 2016

Posts: 36
#13

12 Jan 2017, 16:07

Hi Weiwn,
The graph I thought was like a spaghettiplot was:
twoway connected scaleselfcare appt_type if macs==1, connect(L)

The twoway lfit graph is just what I wanted as a visual summary of the study - is there a way to improve it with regard to observations being nested within people ?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#14

13 Jan 2017, 07:58

Originally posted by Andrea Burgess View Post

Hi Weiwn,
The graph I thought was like a spaghettiplot was:
twoway connected scaleselfcare appt_type if macs==1, connect(L)

The twoway lfit graph is just what I wanted as a visual summary of the study - is there a way to improve it with regard to observations being nested within people ?

Andrea, I am not as familiar with the regular graphing commands. It appears that you requested for a twoway scatterplot of self care vs time, and that you requested the lines be connected. I haven't used the command, but you didn't specify any grouping variable (I.e. Person ID).

When you use xtline or the spaghetti plot command, Stata will recognize that observations are grouped by person. It will plot a line for each person, thus giving you an idea of how trajectories can vary.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement