Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reghdfe 4.x standard errors, and categorical variables

    Dear all,

    I have two questions that I need to understand, and I would really appreciate any help that I can get.

    Firstly, I would like to understand why is it that -reghdfe- the 4.x version gives smaller standard erors, hence smaller p-values, than the -reghdfe- 3.x version? And which one would be the better to use?

    Secondly, I have a control variable in my estimation, which is the lagged log of Income. I also created a categorical variable out of Income, and include it as an interaction with another (continous) variable. My question is, when Income is included as a categorical variable, should I still keep the lagged log of Income too, or exclude it from the regression?

    My basic specification looks something like this:
    Code:
     xtreg y x1 x2 x3 logIncome_(t-1), fe cluster(id)
    now:
    Code:
    xtreg y x1 c.x2##i.Incomecat x3 logIncome_(t-1), fe cluster(id)
    Would this make sense? I would instinctly exclude the logIncome_(t-1) from the regression, though I have some doubts, as I do not have much experience with empirical analysis.
    I apologize if the question is a rather basic one, but would really appreciate any guidance!

    Thank you in advance for your time.
    Last edited by Luca Baum; 11 Sep 2017, 08:38.

  • #2
    Anyone please? Any ideas?

    Comment


    • #3
      Hi Luca,

      Just read your post. Have you seen if the degrees of freedom have changed?

      Note that you can run the 3.x version if you run reghdfe with the old option, so that would be easy to compare without uninstalling/installing.

      My best guess is that you might have different variables, or at least a diff number of degrees of freedom and of absorbed variables (i.e. , check e(df_a) e(df_r) e(df_m) ,etc.)

      Best,
      Sergio

      Comment


      • #4
        Hi Sergio,

        Thank you for your respond.
        The model that I am estimating is the same with both the 3.x version, or the 4.x, so I have the same variables.

        Actually, the e(df_r) and e(df_m) are the same in both of them, however, the e(df_a) is different. From what I can tell, they use different absorbed variables. The 4.x. version uses the year, whereas the 3.x. version uses the firm id and absorbs one year too. (I include both year and firm-fixed effects.)
        I am not sure If I am making any sense right now. Hence, as I do not understand it that well, I am not sure what to continue the analysis with.
        I appreciate your guidance!

        Thank you!

        Comment


        • #5
          Originally posted by Luca Baumm View Post
          Hi Sergio,
          The 4.x. version uses the year, whereas the 3.x. version uses the firm id and absorbs one year too. (I include both year and firm-fixed effects.)
          Without knowing the exact results, this is my guess:

          On 3.x, if you had the same variable in absorb() and cluster() then e(df_a) wouldn't be affected by the variable, because we are already applying a penalty by adjusting the number of obs. to the number of clusters in the DoF calculation. However, on 4.x we do apply a penalty of one degree of freedom, because the mean of the partialled-out variable is zero.

          This shouldn't really matter because if you are using reghdfe you are likely to have many obs, in which case one obs shouldn't matter.

          Best,
          S

          Comment


          • #6
            Sergio,
            Thank you very much for the explanation!

            This is what I get at the end of the output.
            With 3.x:

            Code:
            Absorbed degrees of freedom:
            ---------------------------------------------------------------+
             Absorbed FE |  Num. Coefs.  =   Categories  -   Redundant     |
            -------------+-------------------------------------------------|
                      id |        32260           32260              0     |
                    year |            9              10              1     |
            ---------------------------------------------------------------+
            Whereas, with 4.x :

            Code:
            Absorbed degrees of freedom:
            -----------------------------------------------------+
             Absorbed FE | Categories  - Redundant  = Num. Coefs |
            -------------+---------------------------------------|
                      id |     32260      32260            0    *|
                    year |        10           0          10     |
            -----------------------------------------------------+
            * = FE nested within cluster; treated as redundant for DoF computation
            So, I have the same variables in absorb (id year), and I cluster at the same level for both (country).

            If I understand you correctly, it shouldn't matter which version I use?

            Thank you for your patience and help!

            Comment


            • #7
              Yeah, I only see a difference of one due to the 9 in year for 3.x and 10 in 4.x

              Given your number of obs., this shouldn't really matter for your degrees of freedom

              Comment


              • #8
                Thank you very much Sergio!!

                Comment


                • #9
                  Originally posted by Sergio Correia View Post
                  Yeah, I only see a difference of one due to the 9 in year for 3.x and 10 in 4.x
                  Sorry again Sergio, but what causes the confusion for me is that the number of redundant categories of Id with the 3.x version is 0, whereas with the 4.x all are redundant (* = FE nested within cluster; treated as redundant for DoF computation)..
                  Am I missing something very straightforward here?
                  Thank you!

                  Best,
                  Luca

                  Comment


                  • #10
                    I see. Are you using the same clustering on both cases?

                    Comment


                    • #11
                      Yes, I use exactly the same command on both cases.

                      I just unistall /install the versions and run the regression.

                      Code:
                       reghdfe y x1 x2 x3 x4 ,  absorb(id year)  vce(cluster country#year)
                      This is the exact command I am using on both cases.

                      However, I just noticed I get the same results with 3.x and 4.x if I use the grouping: countryyear instead of country#year.

                      I understand that the two are equivalent (Countryyear and country#year), and with the 3.x version they always give me the same results, nonetheless apparently with the 4.x they do not.
                      Last edited by Luca Baumm; 25 Sep 2017, 07:54.

                      Comment


                      • #12
                        Ok, I can reproduce this issue with Stata's example dataset:

                        Code:
                        clear all
                        cls
                        set more off
                        sysuse auto
                        
                        reghdfe price weight, a(turn#trunk foreign) vce(cluster turn#foreign)
                        reghdfe price weight, a(turn#trunk foreign) vce(cluster turn#foreign) old
                        egen turn_foreign = group(turn foreign)
                        reghdfe price weight, a(turn#trunk foreign) vce(cluster turn_foreign)
                        (note that the -old- version on reghdfe 4 calls reghdfe 3, which helps when comparing between both).

                        It seems that there is a bug in reghdfe 4, where it seems that -country- is nested within -id- (but it should have checked that country#year is). Will try to push an update later today.

                        Comment


                        • #13
                          Ohh okay, I see, I understand now. That would be great!

                          Your help is greatly appreciated! Thank you!

                          Comment


                          • #14
                            The bug was (as it often is) due to a line of code that caused reghdfe to only use the first variable in vce(cluster var1#var2).

                            This commit should be fixing the problem. Let me know if reinstalling reghdfe fixes the problem.

                            Comment


                            • #15
                              Dear Sergio,

                              Yes! It did fix the problem.

                              Thank you!

                              Comment

                              Working...
                              X