Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • REGHDFE and Predict

    Dear Statlisters,

    I have a question about the use of REGHDFE, created by Sergio Correira. I am using Stata12.

    Suppose I have an employer-employee linked panel dataset that looks something like this:

    Year Worker_ID Firm_ID X1 X2 X3 Wage
    1992 1 3 2 2 2 15
    1993 1 3 3 3 3 20
    1994 1 4 2 2 2 50
    1995 2 51 10 7 7 28

    where X1, X2, X3 are worker characteristics (age, education etc).

    I want to estimate a two-way fixed effects model such as:

    wage(i,t) = x(i,t)b + workers fe + firm fe + residual(i,t)

    I use the command to estimate the model:

    reghdfe wage X1 X2 X3, absvar(p=Worker_ID j=Firm_ID)

    I then check:

    predict xb, xb
    predict res, r

    gen yhat = xb + p + j + res

    and find that yhat ≠ wage.

    MY QUESTION: Why is it that yhat ≠ wage?

    However, the following produces yhat = wage:

    capture drop yhat

    predict xbd, xbd
    gen yhat = xbd + res

    Now, yhat=wage

    What is the difference between xbd and xb + p + f? What is it in the estimation procedure that causes the two to differ?

    Thanks in advance!

    Nicky


  • #2
    Hi Nicky,

    This is an important point that I will clarify a bit more in the future. Long story short, the difference lies in the constant.
    • If you run "summarize p j" you will see they have mean zero. This is useful for several technical reasons, as well as a design choice.
    • However, if you run "predict d, d" you will see that it is not the same as "p+j". This difference is in the constant. You can check that easily when running e.g. areg with only one FE and then asserting that the difference is in every observation equal to the value of b[_cons]
    • Going further: since I have been asked this question a lot, perhaps there is a better way to avoid the confusion? The problem is that I only get the constant indirectly (see e.g. here: https://github.com/sergiocorreia/reg.../reghdfe_p.ado line 68). I compute the residuals (y-xbd), take the mean of that, and that is the constant. There is an alternative way of obtaining the fixed effects (faster but more code involved) which might allow me to always calculate and show the constant, which would help to solve this questions. In any case, any input is welcome!
    Sergio

    Comment


    • #3
      Dear Sergio,

      Thanks for the clarification - much appreciated. On another note: many thanks for writing this program! (Been using FELSDVREG for a while and I'm really loving REGHDFE)

      Regarding the question above, I hope you can shed some light on the following:

      1.) How does the implicit inclusion of a constant change the interpretation of the model? For example, in the wage model I have above (or e.g. in Card et al. 2013 or Abowd, Kramarz, Margolis 1999):
      wage(i,t) = x(i,t)b + workers fe + firm fe + residual(i,t)
      what role does the constant play? What if the constant is negative?


      2.) Does the inclusion of the constant change any resulting analyses? For example, if I wanted to compute the share of the variance of wages due to heterogeneity in firms (a la Card et al 2013), would the following be accurate?

      corr wage j, cov
      local varj = r(Var_2)
      local vary = r(Var_1)
      di `vars'/`vary'


      3.) Does normalizing the fixed effects to have zero mean make comparisons across mobility groups meaningful? Or ought we just focus on, say, the largest mobility group created by the option groupvar(newvar)?

      Many thanks again.

      Nicky

      Comment


      • #4
        Hi Nicky,

        - Short explanation: don't worry about the constants

        - Longer explanation:

        If what you care is about the -xb- side, then the constant plays no role. Other packages used to group it in the first fixed effect (so the first fixed effect has a non-zero mean). Since this is arbitrary I decided against it, but in any case it plays no role in the estimates or other e() results.

        *However* if you are thinking about doing inference about the fixed effects, you do need to be careful about what exactly you want to do (not just the constants, but about identification). I haven't read Card's paper, but if you check for instance the Abowd et al 2002, you will notice a lengthy discussion about how individual FEs are not identified. I am not an expert on fixed effect identification, but just let me give you one example. If you change the order of the variables in absorb(..), the resulting variance of the fixed effects might change if you haven't dropped singletons:

        Code:
        sysuse auto
        reghdfe price weight, a(TURN=turn TRUNK=trunk, save) keepsingletons
        su TURN TRUNK
        drop TURN TRUNK
        reghdfe price weight, a(TRUNK=trunk TURN=turn, save) keepsingletons
        su TURN TRUNK

        That said, in any case the correlations you are computing shouldn't be affected by adding or dropping a constant term, so your example stays unchanged.

        Best,
        S

        Comment


        • #5
          Hi Sergio,

          Thanks very much for your insight - much appreciated! I will have to think over this.

          Best wishes,

          Nicky

          Comment


          • #6
            Dear Sergio,

            If I understood correctly from the conversation above, computing residuals using the built-in reghdfe option or with the predict post-estimation command will yield residuals not taking into account the implicit constant in the model?

            The reason I am asking is that I am trying to plot a histogram of my regression residuals but they seem awfully big relative to the standard-errors and confidence intervals resulting from the estimation.

            If it is the case, is there a way to compute residuals abstracting from that constant?

            Thanks in advance,

            Olga

            Comment

            Working...
            X