Hello,
I basically have a large unbalanced panel dataset of students (stud_id) across all grades from multiple schools (school_id) for 6 years (year). My dependent variable (DV) is a flag that shows if the student was suspended or not (susp_tag) in every year. For now, lets ignore the fact that the DV is binary. I am still running OLS and I am fine with that. I have many independent variables (IV's) some of which are at the student level and time invariant, some are student-level time varying (past suspension history), some are school level that are all time-varying. I want to try out multiple specifications of fixed effects and I am getting stuck with the various options. I have read through statlist and the stata manual on fixed effects, but I am still getting confused with certain paramterizations. So, I would really appreciate your help with this question.
Since there are some students who move across schools within a year, I created a new panelid (panelid = group(stud_id school_id) so a student-year within a school is my unit of analysis.
1. First, I want to use only school fixed effects to control for all time-invariant school-level characteristics and cluster SE's at school level. So I use xtreg DV IV's, fe i(school_id) vce(cluster school_id) nonest or areg DV IV, absorb(school_id) vce (cluster school_id). They both give me same coefficients with minor differences in SE which I believe is just the slight difference in degrees of freedom calculations. Right?
2. Second, I want to use student fixed effects to predict change within student over time with SE's clustered again at school-level. So I use xtreg DV IV (only time-varying ones included), fe vce (cluster school_id) or areg DV IV, absorb (panelid) vce (cluster school_id). The main variable of interest here is a student-level time-varying IV (hence the identification relies on the sample of students for whom that status changes over time). gain, minor differences in SE's. Is this right?
3. Lastly, I want to include student fixed effects and school by grade by year fixed effects (to control for variation in school quality across grades and years). So I create a variable school_grade_year = group(school_id grade year).
If I use xtreg DV IV i.school_grade_year, fe vce(cluster school_id), I run into too many variables issue. Similarly, if I use areg DV IV i.school_grade_year, absorb(panelid), I run into the too many variables issue. If I use xtreg DV IV, fe i(school_grade_year) vce(), that's including only student_grade_year fixed effects right? How do I include both student and the school by grade by year FE?
Thanks in advance!
Best,
Maithreyi
I basically have a large unbalanced panel dataset of students (stud_id) across all grades from multiple schools (school_id) for 6 years (year). My dependent variable (DV) is a flag that shows if the student was suspended or not (susp_tag) in every year. For now, lets ignore the fact that the DV is binary. I am still running OLS and I am fine with that. I have many independent variables (IV's) some of which are at the student level and time invariant, some are student-level time varying (past suspension history), some are school level that are all time-varying. I want to try out multiple specifications of fixed effects and I am getting stuck with the various options. I have read through statlist and the stata manual on fixed effects, but I am still getting confused with certain paramterizations. So, I would really appreciate your help with this question.
Since there are some students who move across schools within a year, I created a new panelid (panelid = group(stud_id school_id) so a student-year within a school is my unit of analysis.
1. First, I want to use only school fixed effects to control for all time-invariant school-level characteristics and cluster SE's at school level. So I use xtreg DV IV's, fe i(school_id) vce(cluster school_id) nonest or areg DV IV, absorb(school_id) vce (cluster school_id). They both give me same coefficients with minor differences in SE which I believe is just the slight difference in degrees of freedom calculations. Right?
2. Second, I want to use student fixed effects to predict change within student over time with SE's clustered again at school-level. So I use xtreg DV IV (only time-varying ones included), fe vce (cluster school_id) or areg DV IV, absorb (panelid) vce (cluster school_id). The main variable of interest here is a student-level time-varying IV (hence the identification relies on the sample of students for whom that status changes over time). gain, minor differences in SE's. Is this right?
3. Lastly, I want to include student fixed effects and school by grade by year fixed effects (to control for variation in school quality across grades and years). So I create a variable school_grade_year = group(school_id grade year).
If I use xtreg DV IV i.school_grade_year, fe vce(cluster school_id), I run into too many variables issue. Similarly, if I use areg DV IV i.school_grade_year, absorb(panelid), I run into the too many variables issue. If I use xtreg DV IV, fe i(school_grade_year) vce(), that's including only student_grade_year fixed effects right? How do I include both student and the school by grade by year FE?
Thanks in advance!
Best,
Maithreyi
Comment