Differences in estimates between areg, xtreg, and reghdfe

Simon Bensnes

Join Date: Aug 2015

Posts: 22
#1

Differences in estimates between areg, xtreg, and reghdfe

05 Oct 2016, 09:08

Dear all,

I am having some issues with getting different results for my coefficients of interest using different fixed effects estimation commands in Stata. I haven't been able to find anything on this searching the forum so I post here.

I know that the fixed effects coefficients will change between different commands because the parametrize them differently, but this is not supposed to affect other coefficients as far as I know. Currently I am working on data where students take several exams (between 2 and 5) in different subjects (82) across several years (5). Students are not necessarily tested in the same subjects or years. My variable of interest is measured at the subject-year level. When I estimate a model where I include subject, student, and year fixed effects I get the same results regardless of which of the three commands above. However, when I interact the fixed effects I get wild different results. For example if I include subject by year fixed effects I would expect the all models to return results where my variable of interest was omitted. This is not, however, the case. reghdfe reports quite precise zeros.

I cannot share my data, but I have been able to recreate the problem in constructed data with the same structure as my own. The code is included below. I have refrained from posting the results from running this code, but I believe it is easily found by just running it.

Thank you kindly for all suggestions, questions, or explanations.

Code:

clear set obs 10000 gen studentid = int(_n/5)+1 gen double u = (100-1)*runiform() + 1 gen double subjectid = round(u) drop u sort studentid subjectid by studentid: drop if subjectid == subjectid[_n-1] gen double uu = (6-1)*runiform() + 1 gen double examscore = round(uu) drop uu gen uuu = (2012-2008)*runiform() + 2008 gen examyear = round(uuu) drop uuu sort subjectid examyear by subjectid examyear: gen double uuuu = (25-5)*runiform() + 5 if _n == 1 gen double x = round(uuuu) drop uuuu by subjectid examyear: replace x = x[_n-1] if x == . sort student by student: gen double uuuuu = (200-1)*runiform() + 1 if _n == 1 gen double school = round(uuuuu) by student: replace school = school[_n-1] if school == . areg examscore x i.examyear i.subjectid, absorb(studentid) xtset studentid xtreg examscore x i.examyear i.subjectid, fe reghdfe examscore x, absorb(studentid subjectid examyear) set matsize 4000 egen d_subject_year = group(subject examyear) areg examscore x i.d_subject_year i.examyear i.subjectid, absorb(studentid) cluster(school) xtset studentid xtreg examscore x i.d_subject_year i.examyear i.subjectid, fe cluster(school) reghdfe examscore x, absorb(studentid d_subject_year examyear subjectid) cluster(school)
Tags: None
Sergio Correia

Join Date: Apr 2014

Posts: 420
#2

05 Oct 2016, 09:23

In your last -reghdfe- regression, the problem is that -x- is perfectly collinear with the absorbed variables. There are a few ways to verify that:
Note that the standard errors are extremely large (4e+7!)

Run this regression and note that the R2 is 1.00000: reghdfe examscore x, absorb(studentid d_subject_year examyear subjectid) cluster(school)

Now, why is the variable not omitted? Because the command that drops omitted variables (_rmcoll) was not really designed for reghdfe's case so it does not recognize that x is missing. There is an alternative command used within ivreg2, but I still haven't had the time to add it to reghdfe.

Also, do note that even if Stata tends to do a good job in dropping missing variables, it's always better if you can drop them beforehand, so you always get the same normalization (there is no guarantee that Stata will always drop the same variable in a set of collinear ones).

Best,
S
1 like
Comment
Simon Bensnes

Join Date: Aug 2015

Posts: 22
#3

05 Oct 2016, 09:30

Thank you very much Sergio,

I was confused that the variable was not omitted, but that makes sense now. Do you have any clues regarding the lines using areg and xtreg, why they don't drop the x? Is it the same issue?

Best,
Simon
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#4

05 Oct 2016, 11:14

areg and xtreg drop a random variable until the regressors are not collinear, so they can drop either x or one of the many dummy variables. To see this, remove x from the regressor list, and the commands will stop dropping one of the dummy variables.

Cheers,
S
Comment
Simon Bensnes

Join Date: Aug 2015

Posts: 22
#5

06 Oct 2016, 03:04

So you are saying that the coefficients in xtreg and areg differs because they drop different dummy variables? Thank you for you patience.
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#6

06 Oct 2016, 07:49

Indeed! Even two runs of -areg- could drop different dummies if something changes with the dataset (e.g. you sort it differently), so my best advice would be to avoid doing regrs with so many collinear vars
Comment
Simon Bensnes

Join Date: Aug 2015

Posts: 22
#7

06 Oct 2016, 07:52

Thanks again Sergio,

The reason I included so many collinear dummies was to demonstrate the source of the variation.
Comment

Announcement

Differences in estimates between areg, xtreg, and reghdfe

Comment

Comment

Comment

Comment

Comment

Comment