Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Differences in estimates between areg, xtreg, and reghdfe

    Dear all,

    I am having some issues with getting different results for my coefficients of interest using different fixed effects estimation commands in Stata. I haven't been able to find anything on this searching the forum so I post here.

    I know that the fixed effects coefficients will change between different commands because the parametrize them differently, but this is not supposed to affect other coefficients as far as I know. Currently I am working on data where students take several exams (between 2 and 5) in different subjects (82) across several years (5). Students are not necessarily tested in the same subjects or years. My variable of interest is measured at the subject-year level. When I estimate a model where I include subject, student, and year fixed effects I get the same results regardless of which of the three commands above. However, when I interact the fixed effects I get wild different results. For example if I include subject by year fixed effects I would expect the all models to return results where my variable of interest was omitted. This is not, however, the case. reghdfe reports quite precise zeros.

    I cannot share my data, but I have been able to recreate the problem in constructed data with the same structure as my own. The code is included below. I have refrained from posting the results from running this code, but I believe it is easily found by just running it.

    Thank you kindly for all suggestions, questions, or explanations.



    Code:
    clear
    set obs 10000
    gen studentid = int(_n/5)+1
    gen double u = (100-1)*runiform() + 1
    gen double subjectid = round(u)
    drop u
    sort studentid subjectid
    by studentid: drop if subjectid == subjectid[_n-1]
    gen double uu = (6-1)*runiform() + 1
    gen double examscore = round(uu)
    drop uu
    gen uuu = (2012-2008)*runiform() + 2008
    gen examyear = round(uuu)
    drop uuu
    sort subjectid examyear
    by subjectid examyear: gen double uuuu = (25-5)*runiform() + 5 if _n == 1
    gen double x = round(uuuu)
    drop uuuu
    by subjectid examyear: replace x = x[_n-1] if x == .
    
    sort student
    by student: gen double uuuuu = (200-1)*runiform() + 1 if _n == 1
    gen double school = round(uuuuu)
    by student: replace school = school[_n-1] if school == .
    
    areg examscore x i.examyear i.subjectid, absorb(studentid)
    xtset studentid
    xtreg examscore x i.examyear i.subjectid, fe
    reghdfe examscore x, absorb(studentid subjectid examyear)
    
    
    set matsize 4000
    egen d_subject_year = group(subject examyear)
    areg examscore x i.d_subject_year i.examyear i.subjectid, absorb(studentid) cluster(school)
    xtset studentid
    xtreg examscore x  i.d_subject_year i.examyear i.subjectid, fe cluster(school)
    reghdfe examscore x, absorb(studentid d_subject_year examyear subjectid) cluster(school)

  • #2
    In your last -reghdfe- regression, the problem is that -x- is perfectly collinear with the absorbed variables. There are a few ways to verify that:
    • Note that the standard errors are extremely large (4e+7!)
    • Run this regression and note that the R2 is 1.00000: reghdfe examscore x, absorb(studentid d_subject_year examyear subjectid) cluster(school)
    Now, why is the variable not omitted? Because the command that drops omitted variables (_rmcoll) was not really designed for reghdfe's case so it does not recognize that x is missing. There is an alternative command used within ivreg2, but I still haven't had the time to add it to reghdfe.

    Also, do note that even if Stata tends to do a good job in dropping missing variables, it's always better if you can drop them beforehand, so you always get the same normalization (there is no guarantee that Stata will always drop the same variable in a set of collinear ones).


    Best,
    S

    Comment


    • #3
      Thank you very much Sergio,

      I was confused that the variable was not omitted, but that makes sense now. Do you have any clues regarding the lines using areg and xtreg, why they don't drop the x? Is it the same issue?

      Best,
      Simon

      Comment


      • #4
        areg and xtreg drop a random variable until the regressors are not collinear, so they can drop either x or one of the many dummy variables. To see this, remove x from the regressor list, and the commands will stop dropping one of the dummy variables.

        Cheers,
        S

        Comment


        • #5
          So you are saying that the coefficients in xtreg and areg differs because they drop different dummy variables? Thank you for you patience.

          Comment


          • #6
            Indeed! Even two runs of -areg- could drop different dummies if something changes with the dataset (e.g. you sort it differently), so my best advice would be to avoid doing regrs with so many collinear vars

            Comment


            • #7
              Thanks again Sergio,

              The reason I included so many collinear dummies was to demonstrate the source of the variation.

              Comment

              Working...
              X