Apologies if this is more of a stats question, but hoping to find some support for differences in standard errors when specifying a fixed effects model with complex survey data.
I am running an individual-level fixed effects regression model using data from one of the NCES longitudinal cohort surveys (HSLS:09). The standard errors need to account for the random sampling of students clustered within 944 schools. The data documentation provides instructions on survey setting the data using Taylor series linearization with the code below. I'll note the PSU variable is three levels, the STRAT_ID variable is 450 levels, and the data is mi set.
As has been discussed on the forum, the svy command does not support xtreg or reghdfe. I have been using the advice posted here as a workaround. I tried to specify the model using the three strategies listed below to see differences in the output.
All three strategies produce practically the same coefficients (give or take 0.0001). The issue is that I am getting drastically different standard errors. The output using xtreg and reghdfe provide practically the same standard errors (give or take about 0.001), but reg+absorb gives standard errors which are way off from strategies 1 and 2 (about 0.05 higher). The F-test also goes from 0 in the first two models to 0.6 with reg+absorb. I accept that the outputs are not going to be exactly the same, but this feels like a pretty drastic difference. It's troubling to me that the strategy that most correctly specifies the survey design is giving me such distinct output.
Any thoughts on what is accounting for the differences in the standard errors using these three approaches? Is it something inherent about how xtreg/reghdfe function compared to reg+absorb()? Does it come down to specifying the Taylor series linearization vs. the cluster robust specifications?
Each output is also giving me a different number of observations. I figured this was because the three commands treat missing values in the dependent variables differently, but mentioning it in case I am missing a blatantly obvious clue as to the differences between these functions.
This is my first time posting to the forum! Sorry if I messed up any norms (still learning!).
I am running an individual-level fixed effects regression model using data from one of the NCES longitudinal cohort surveys (HSLS:09). The standard errors need to account for the random sampling of students clustered within 944 schools. The data documentation provides instructions on survey setting the data using Taylor series linearization with the code below. I'll note the PSU variable is three levels, the STRAT_ID variable is 450 levels, and the data is mi set.
HTML Code:
mi svyset, clear() mi svyset PSU [pweight = Weight_Variable], strata(STRAT_ID) vce(linear) singleunit(centered)
HTML Code:
*Strategy 1: reghdfe mi estimate, post cmdok: reghdfe DV IV1 IV2 IV3 [pweight = Weight_Variable], absorb(STU_ID) vce(cluster STRAT_ID) *Strategy 2: xtreg mi xtset STU_ID Year mi estimate, post: xtreg DV IV1 IV2 IV3 [pweight = Weight_Variable], fe vce(cluster STRAT_ID) *Strategy 3: reg+absorb mi estimate, post: svy: reg: DV IV1 IV2 IV3, absorb(STU_ID)
Any thoughts on what is accounting for the differences in the standard errors using these three approaches? Is it something inherent about how xtreg/reghdfe function compared to reg+absorb()? Does it come down to specifying the Taylor series linearization vs. the cluster robust specifications?
Each output is also giving me a different number of observations. I figured this was because the three commands treat missing values in the dependent variables differently, but mentioning it in case I am missing a blatantly obvious clue as to the differences between these functions.
This is my first time posting to the forum! Sorry if I messed up any norms (still learning!).
Comment