Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Differences between xtreg, reghdfe, and reg using complex survey data?

    Apologies if this is more of a stats question, but hoping to find some support for differences in standard errors when specifying a fixed effects model with complex survey data.

    I am running an individual-level fixed effects regression model using data from one of the NCES longitudinal cohort surveys (HSLS:09). The standard errors need to account for the random sampling of students clustered within 944 schools. The data documentation provides instructions on survey setting the data using Taylor series linearization with the code below. I'll note the PSU variable is three levels, the STRAT_ID variable is 450 levels, and the data is mi set.

    HTML Code:
    mi svyset, clear()
    mi svyset PSU [pweight = Weight_Variable], strata(STRAT_ID) vce(linear) singleunit(centered) 
    As has been discussed on the forum, the svy command does not support xtreg or reghdfe. I have been using the advice posted here as a workaround. I tried to specify the model using the three strategies listed below to see differences in the output.

    HTML Code:
    *Strategy 1: reghdfe
    mi estimate, post cmdok: reghdfe DV IV1 IV2 IV3 [pweight = Weight_Variable], absorb(STU_ID) vce(cluster STRAT_ID)
    
    *Strategy 2: xtreg
    mi xtset STU_ID Year
    mi estimate, post: xtreg DV IV1 IV2 IV3 [pweight = Weight_Variable], fe vce(cluster STRAT_ID)
    
    *Strategy 3: reg+absorb
    mi estimate, post: svy: reg: DV IV1 IV2 IV3, absorb(STU_ID)
    All three strategies produce practically the same coefficients (give or take 0.0001). The issue is that I am getting drastically different standard errors. The output using xtreg and reghdfe provide practically the same standard errors (give or take about 0.001), but reg+absorb gives standard errors which are way off from strategies 1 and 2 (about 0.05 higher). The F-test also goes from 0 in the first two models to 0.6 with reg+absorb. I accept that the outputs are not going to be exactly the same, but this feels like a pretty drastic difference. It's troubling to me that the strategy that most correctly specifies the survey design is giving me such distinct output.

    Any thoughts on what is accounting for the differences in the standard errors using these three approaches? Is it something inherent about how xtreg/reghdfe function compared to reg+absorb()? Does it come down to specifying the Taylor series linearization vs. the cluster robust specifications?

    Each output is also giving me a different number of observations. I figured this was because the three commands treat missing values in the dependent variables differently, but mentioning it in case I am missing a blatantly obvious clue as to the differences between these functions.

    This is my first time posting to the forum! Sorry if I messed up any norms (still learning!).

  • #2
    As was stated in the linked thread, clustering on the PSU variable and using -pweights()- is equivalent to -svy- only in the absence of stratification. This is not the case for you as you have stratification. In this case, the only viable estimator is svy: regress.

    Comment


    • #3
      Thanks, Andrew. Aside from the issues with specifying standard error / stratification, is it safe to assume that reg+absorb() is doing practically the same thing as xtreg/reghdfe?

      Comment


      • #4
        For regress with -absorb()-, refer to

        Code:
        help areg

        There are some differences in terms of focus and what kinds of models each of these commands can handle, do read the documentation. But if you have panel data, -xtreg,fe- and absorbing the panel identifier in each of the other two estimators yields equivalent results. So in that sense, they do the same thing.

        Comment

        Working...
        X