REGHDFE and Predict

Nicole Huan

Join Date: Nov 2015

Posts: 7
#1

REGHDFE and Predict

18 Feb 2016, 15:34

Dear Statlisters,

I have a question about the use of REGHDFE, created by Sergio Correira. I am using Stata12.

Suppose I have an employer-employee linked panel dataset that looks something like this:

Year Worker_ID Firm_ID X1 X2 X3 Wage
1992 1 3 2 2 2 15
1993 1 3 3 3 3 20
1994 1 4 2 2 2 50
1995 2 51 10 7 7 28

where X1, X2, X3 are worker characteristics (age, education etc).

I want to estimate a two-way fixed effects model such as:

wage(i,t) = x(i,t)b + workers fe + firm fe + residual(i,t)

I use the command to estimate the model:

reghdfe wage X1 X2 X3, absvar(p=Worker_ID j=Firm_ID)

I then check:

predict xb, xb
predict res, r

gen yhat = xb + p + j + res

and find that yhat ≠ wage.

MY QUESTION: Why is it that yhat ≠ wage?

However, the following produces yhat = wage:

capture drop yhat
predict xbd, xbd
gen yhat = xbd + res

Now, yhat=wage

What is the difference between xbd and xb + p + f? What is it in the estimation procedure that causes the two to differ?

Thanks in advance!

Nicky
Tags: None
Sergio Correia

Join Date: Apr 2014

Posts: 420
#2

18 Feb 2016, 19:38

Hi Nicky,

This is an important point that I will clarify a bit more in the future. Long story short, the difference lies in the constant.
If you run "summarize p j" you will see they have mean zero. This is useful for several technical reasons, as well as a design choice.

However, if you run "predict d, d" you will see that it is not the same as "p+j". This difference is in the constant. You can check that easily when running e.g. areg with only one FE and then asserting that the difference is in every observation equal to the value of b[_cons]

Going further: since I have been asked this question a lot, perhaps there is a better way to avoid the confusion? The problem is that I only get the constant indirectly (see e.g. here: https://github.com/sergiocorreia/reg.../reghdfe_p.ado line 68). I compute the residuals (y-xbd), take the mean of that, and that is the constant. There is an alternative way of obtaining the fixed effects (faster but more code involved) which might allow me to always calculate and show the constant, which would help to solve this questions. In any case, any input is welcome!

Sergio
Comment
Nicole Huan

Join Date: Nov 2015

Posts: 7
#3

19 Feb 2016, 02:03

Dear Sergio,

Thanks for the clarification - much appreciated. On another note: many thanks for writing this program! (Been using FELSDVREG for a while and I'm really loving REGHDFE)

Regarding the question above, I hope you can shed some light on the following:

1.) How does the implicit inclusion of a constant change the interpretation of the model? For example, in the wage model I have above (or e.g. in Card et al. 2013 or Abowd, Kramarz, Margolis 1999):
wage(i,t) = x(i,t)b + workers fe + firm fe + residual(i,t)
what role does the constant play? What if the constant is negative?

2.) Does the inclusion of the constant change any resulting analyses? For example, if I wanted to compute the share of the variance of wages due to heterogeneity in firms (a la Card et al 2013), would the following be accurate?

corr wage j, cov
local varj = r(Var_2)
local vary = r(Var_1)
di `vars'/`vary'

3.) Does normalizing the fixed effects to have zero mean make comparisons across mobility groups meaningful? Or ought we just focus on, say, the largest mobility group created by the option groupvar(newvar)?

Many thanks again.

Nicky
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#4

19 Feb 2016, 11:28

Hi Nicky,

- Short explanation: don't worry about the constants

- Longer explanation:

If what you care is about the -xb- side, then the constant plays no role. Other packages used to group it in the first fixed effect (so the first fixed effect has a non-zero mean). Since this is arbitrary I decided against it, but in any case it plays no role in the estimates or other e() results.

*However* if you are thinking about doing inference about the fixed effects, you do need to be careful about what exactly you want to do (not just the constants, but about identification). I haven't read Card's paper, but if you check for instance the Abowd et al 2002, you will notice a lengthy discussion about how individual FEs are not identified. I am not an expert on fixed effect identification, but just let me give you one example. If you change the order of the variables in absorb(..), the resulting variance of the fixed effects might change if you haven't dropped singletons:

Code:

sysuse auto reghdfe price weight, a(TURN=turn TRUNK=trunk, save) keepsingletons su TURN TRUNK drop TURN TRUNK reghdfe price weight, a(TRUNK=trunk TURN=turn, save) keepsingletons su TURN TRUNK

That said, in any case the correlations you are computing shouldn't be affected by adding or dropping a constant term, so your example stays unchanged.

Best,
S
Comment
Nicole Huan

Join Date: Nov 2015

Posts: 7
#5

19 Feb 2016, 14:12

Hi Sergio,

Thanks very much for your insight - much appreciated! I will have to think over this.

Best wishes,

Nicky
Comment
Olga Mian

Join Date: May 2017

Posts: 3
#6

18 Jul 2017, 07:05

Dear Sergio,

If I understood correctly from the conversation above, computing residuals using the built-in reghdfe option or with the predict post-estimation command will yield residuals not taking into account the implicit constant in the model?

The reason I am asking is that I am trying to plot a histogram of my regression residuals but they seem awfully big relative to the standard-errors and confidence intervals resulting from the estimation.

If it is the case, is there a way to compute residuals abstracting from that constant?

Thanks in advance,

Olga
Comment
S. M. Woahid Murad

Join Date: Apr 2015

Posts: 35
#7

07 May 2024, 05:54

Sergio Correia

Hi Sergio,

Can I find information criteria after estimating my fixed effect model using reghdfe?

Code:

reghdfe

For example, after regression, we run:

Code:

estat aic

Thanks in advance.

Kind Regards,
Woahid
Comment
S. M. Woahid Murad

Join Date: Apr 2015

Posts: 35
#8

20 May 2024, 11:12

Originally posted by S. M. Woahid Murad View Post

Sergio Correia

Hi Sergio,

Can I find information criteria after estimating my fixed effect model using reghdfe?

Code:

reghdfe

For example, after regression, we run:

Code:

estat aic

Thanks in advance.

Kind Regards,
Woahid

There was a typo in my previous question #7. The problem has been solved. The command should be

Code:

estat ic
Comment

Announcement

REGHDFE and Predict

Comment

Comment

Comment

Comment

Comment

Comment

Comment