I am analyzing a Panel data of children (aged 0 to 15 at 2010 baseline survey) surveyed every two years. I have three waves of data in 2010, 2012 and 2014. At the 2010 and 2014 wave, children over 10 years old are each given a math test and a words test. There is also information about family structure at each wave, e.g. whether the parents are living at home due to migration. I am trying to understand how parental absence due to migration affects children's test scores in 2010 and 2014. Test scores are continuous. The Parental absence variable has four categories (both absent, father absent, mother absent, both present). In the final data, children with the test scores include: (1) children aged 10 to 15 in year 2010, (2) children 10-15 in 2010 with follow up scores in year 2014 (they are now about age 14 to 19), (3) children 6 to 9 in 2010 who are first tested in 2014 (when they are about 10 to 14). Only group 2 (10 to 15 at each wave) have test scores at both years. (Some children 10-15 in 2010 have missing test scores in 2014 too.) My focal research question is: how various type of parental absence due to emigration (with 5 categories: both at home, only mother at home (father emigrant), only father at home (mother emigrant), no parent at home (both emigrant), divorce/death of parents) affect children's test performance.
I first use the STATA xt regression random effect procedures to conduct the analysis. Wordtest2yr is the test score. Absence_5cat is the IV variable for parental absence. Wave is the survey panel indicator. All other covariates (parents education, age) are time variant, and parental absence before age 3 (livenoparbf3) is time invariant. The following two syntax with robust standard errors generate the same results, and the effects of "absence_5cat" are significant and sensible :
xi: regress wordtest2yr i.sex c.age_w1w3##c.age_w1w3 i.absence_5cat i.age_baba i.age_mama_x i.edu_baba i.edu_mama i.livenoparbf3 i.wave if urban_com==1, vce(cluster pid)
xtreg wordtest2yr i.sex c.age_w1w3##c.age_w1w3 i.absence_5cat i.age_baba i.age_mama_x i.edu_baba i.edu_mama i.livenoparbf3 i.wave if urban_com==1, re vce(robust) theta
As I think fixed effect models should produce more robust and unbiased results, I run the fixed effect model, as follows:
xtreg wordtest2yr i.sex c.age_w1w3##c.age_w1w3 i.absence_5cat i.age_baba i.age_mama_x i.edu_baba i.edu_mama i.livenoparbf3 i.wave if urban_com==1, fe vce(robust)
However, the fixed effect models generate very different coefficients for absence in both magnitude and sign. Although the magnitudes are large enough, but none are significant,, and make no sense. I have also tried to remove the "wave" indicator, and get similar results. Results of the random effect and fixed effect models are shown below:
** Random Effect:
wordtest2yr Coef. Std. Err. z P>z [95% Conf. Interval]
absence_5cat
1 1.52727 .7799622 1.96 0.050 -.0014281 3.055968
2 .2929886 1.316119 0.22 0.824 -2.286557 2.872535
3 1.31037 .5621309 2.33 0.020 .208614 2.412127
4 1.689032 .7019148 2.41 0.016 .3133047 3.06476
** Fixed effect:
absence_5cat
1 -1.216155 3.046883 -0.40 0.690 -7.193684 4.761373
2 2.254528 2.819032 0.80 0.424 -3.275991 7.785047
3 -1.101228 2.848645 -0.39 0.699 -6.689842 4.487385
4 1.554859 3.088102 0.50 0.615 -4.503534 7.613251
In a post elsewhere, I was advised to adopt a Difference-in-Differences approach and create treatment and control groups for parental absence between 2010 and 2014. As there are five categories of parental absence, there will be many kinds of change combinations between 2010 and 2014. Also I understand the difference in difference approach will only keep the children who have both 2010 and 2014 test scores. That will mean many children with only one test won't be included in the analysis.
I would like to hear about your valuable suggestions on what is the best way to proceed.
I first use the STATA xt regression random effect procedures to conduct the analysis. Wordtest2yr is the test score. Absence_5cat is the IV variable for parental absence. Wave is the survey panel indicator. All other covariates (parents education, age) are time variant, and parental absence before age 3 (livenoparbf3) is time invariant. The following two syntax with robust standard errors generate the same results, and the effects of "absence_5cat" are significant and sensible :
xi: regress wordtest2yr i.sex c.age_w1w3##c.age_w1w3 i.absence_5cat i.age_baba i.age_mama_x i.edu_baba i.edu_mama i.livenoparbf3 i.wave if urban_com==1, vce(cluster pid)
xtreg wordtest2yr i.sex c.age_w1w3##c.age_w1w3 i.absence_5cat i.age_baba i.age_mama_x i.edu_baba i.edu_mama i.livenoparbf3 i.wave if urban_com==1, re vce(robust) theta
As I think fixed effect models should produce more robust and unbiased results, I run the fixed effect model, as follows:
xtreg wordtest2yr i.sex c.age_w1w3##c.age_w1w3 i.absence_5cat i.age_baba i.age_mama_x i.edu_baba i.edu_mama i.livenoparbf3 i.wave if urban_com==1, fe vce(robust)
However, the fixed effect models generate very different coefficients for absence in both magnitude and sign. Although the magnitudes are large enough, but none are significant,, and make no sense. I have also tried to remove the "wave" indicator, and get similar results. Results of the random effect and fixed effect models are shown below:
** Random Effect:
wordtest2yr Coef. Std. Err. z P>z [95% Conf. Interval]
absence_5cat
1 1.52727 .7799622 1.96 0.050 -.0014281 3.055968
2 .2929886 1.316119 0.22 0.824 -2.286557 2.872535
3 1.31037 .5621309 2.33 0.020 .208614 2.412127
4 1.689032 .7019148 2.41 0.016 .3133047 3.06476
** Fixed effect:
absence_5cat
1 -1.216155 3.046883 -0.40 0.690 -7.193684 4.761373
2 2.254528 2.819032 0.80 0.424 -3.275991 7.785047
3 -1.101228 2.848645 -0.39 0.699 -6.689842 4.487385
4 1.554859 3.088102 0.50 0.615 -4.503534 7.613251
In a post elsewhere, I was advised to adopt a Difference-in-Differences approach and create treatment and control groups for parental absence between 2010 and 2014. As there are five categories of parental absence, there will be many kinds of change combinations between 2010 and 2014. Also I understand the difference in difference approach will only keep the children who have both 2010 and 2014 test scores. That will mean many children with only one test won't be included in the analysis.
I would like to hear about your valuable suggestions on what is the best way to proceed.
Comment