Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question on xtmixed syntax and backed up message

    Dear Statalist,

    I am using xtmixed for a cohort analysis, and I would like to know if the syntax I used is correct for the study I am doing.

    To start with the beginning, I have been working on a paper (uploaded at OSF) that uses tobit regression with the "wordsum" 10-item vocabulary test in dependent var; and are age, gender, race variable, cohort dummies and race interaction with the cohort dummies in the independent vars. The goal is to examine the trend in scores across cohorts among black and white people. The data used is the GSS. But then, one reviewer told me that my cohort analysis is probably confounded with age effect (the wordsum score increases with age) and the positive interaction between black-white gap and age (the gap becomes much larger in older ages).

    So, I have decided to use mixed models, following the recommendation of these papers, and in the expectation to remove the age effect (and the age*race interaction) from the cohort effect.

    With xtmixed, I use race (2 categories) and cohort (6 categories) as random intercepts, and within my cohort variable, I include agegroup (5 categories) and the race*agegroup interaction as random slope. The purpose here is to get the trend in black and white scores by cohort but not confounded with age and race*age interaction effects.

    The syntax I use on Stata-12 looks like this (it's not exactly the syntax applied in my paper but I want to keep this post as brief as possible) :

    Code:
    keep if age<70
    replace race = 0 if race==2
    replace race = . if race==3
    recode age (18/26=1) (27/35=2) (36/44=3) (45/55=4) (56/69=5), generate(agegroup)
    recode cohort (1905/1928=1) (1929/1943=2) (1944/1953=3) (1954/1962=4) (1963/1973=5) (1974/1994=6), generate(cohort6)
    replace cohort6 = . if cohort6>6
    recode year (1974/1982=1) (1983/1990=2) (1991/2000=3) (2001/2012=4), generate(year4)
    gen raceagegroup = race*agegroup
    replace wordsum = . if wordsum<0
    replace wordsum = . if wordsum>10
    
    xtmixed wordsum || race: || cohort6: agegroup raceagegroup
    predict xtfitted1, fitted
    
    twoway (scatter xtfitted1 cohort6, msymbol(Oh) jitter(0)) (lfit xtfitted1 cohort6 if race==0, lcolor(red)) (lfit xtfitted1 cohort6 if race==1, lcolor(green)), by(agegroup) legend(cols(4) size(small)) title("Wordsum trend", size(medsmall)) legend(order(2 "black" 3 "white"))
    
    xtmixed wordsum || race: || year4: || cohort6: agegroup raceagegroup
    predict xtfitted2, fitted
    
    twoway (scatter xtfitted2 cohort6, msymbol(Oh) jitter(0)) (lfit xtfitted2 cohort6 if race==0, lcolor(red)) (lfit xtfitted2 cohort6 if race==1, lcolor(green)), by(agegroup) legend(cols(4) size(small)) title("Wordsum trend", size(medsmall)) legend(order(2 "black" 3 "white"))
    For those who want to replicate, the data I used is the General Social Survey, available here (but needs to be converted into DTA file). The subset I use has been uploaded here.

    The scatter plots look fine. But I would like to have confirmation that the syntax I use is correct. To recall, I want to get the trend in wordsum score by ethnicity and by cohort, and allowing random slopes of age (which are allowed to differ by race) across cohorts.

    I'm not even sure that my syntax was correct. I have hesitated between the following syntax :

    xtmixed wordsum || race: || cohort6: agegroup
    xtmixed wordsum || race: || cohort6: agegroup raceagegroup
    xtmixed wordsum || cohort6: race agegroup raceagegroup

    I know that the first line means that race (dichotomy) is treated as random intercept, and the second line produces similar results. The third line considers race (dichotomy) as random slope. The results look very different when you use the third code versus the first (or second). So the choice is important for my paper, and I have to be careful about it. As for the third line (race, age and their interaction as random slope) I have never seen such examples in the textbooks I have read. Does that make any sense ? My impression is that the second line is the correct one, but I need to be certain.

    Finally, I would like to ask a little question concerning the "error" message "backed up" that sometimes appears in the iteration procedure. Usually, when I get this, the software keeps repeating the iteration process (there is no end to this) or it stops and fails to converge. But sometimes, I get this message and the model converges, with no error of calculated standard errors. What is the meaning of "backed up" in this situation ? I even saw that in the book "Statistics with Stata, Updated for Stata 12" (Lawrence Hamilton, 2012, pages 394 and 398), but the author did not mentioned it was a problem, so I'm confused.

    Thanks for your time. (and sorry for the long post)

    Meng.
Working...
X