Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why demeaning procedures and reghdfe have different results when estimating fixed effects?

    Hi Statalist community. I really need your help with this.

    Im trying to use two alternative procedures to estimated fixed effects in a regression. Demeaning manually and using reghdfe. However, both methods shows different coefficients. I can't figure out why.

    For example, I have a dataset with 1938 paired-municipalities, each municipality belongs to a department. I have a balanced panel with i_pareja variable:
    Code:
    .xtset i_pareja
    
    Panel variable: i_pareja (balanced)
    This is the result if I run a regression with my manually demeaned variables (i'm demeaning by i_pareja or pair and by department in a multi-step demeaning process, because I need pair and department fixed effects):
    Code:
    .  reg fe_dept_fe_nbi fe_dept_ldistlaura, cluster(i_pareja)
    
    Linear regression                               Number of obs     =      1,936
                                                    F(1, 968)         =      78.99
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.0836
                                                    Root MSE          =     8.2161
    
                                       (Std. err. adjusted for 969 clusters in i_pareja)
    ------------------------------------------------------------------------------------
                       |               Robust
        fe_dept_fe_nbi | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------------+----------------------------------------------------------------
    fe_dept_ldistlaura |   2.134917   .2402189     8.89   0.000     1.663507    2.606327
                 _cons |  -.0044798   .0381333    -0.12   0.907    -.0793133    .0703537
    ------------------------------------------------------------------------------------
    This is the result when I use reghdfe:

    Code:
    .   reghdfe nbi l_dist_laura, absorb(departamento* i_pareja) vce(cluster i_pareja)
    (warning: absorbing 34 dimensions of fixed effects; check that you really want that)
    (dropped 2 singleton observations)
    (MWFE estimator converged in 29 iterations)
    
    HDFE Linear regression                            Number of obs   =      1,934
    Absorbing 34 HDFE groups                          F(   1,    966) =      70.37
    Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                      R-squared       =     0.8258
                                                      Adj R-squared   =     0.6410
                                                      Within R-sq.    =     0.0895
    Number of clusters (i_pareja) =        967        Root MSE        =    11.1799
    
                                 (Std. err. adjusted for 967 clusters in i_pareja)
    ------------------------------------------------------------------------------
                 |               Robust
             nbi | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    l_dist_laura |   2.204064   .2627431     8.39   0.000     1.688451    2.719677
           _cons |   19.69598   2.512484     7.84   0.000     14.76542    24.62653
    ------------------------------------------------------------------------------
    I know the first model is running with 1934 observations, but eliminating two singletons (which is what reghdfe does automatically) does not change the coefficients.

    I'm wondering why this happens.

    I need to run regressions with reg because im doing a mediation analysis with three mediators at the same time, and the only way I know to do it is to use SUREG command, which implies running several REG regressions.

    Thank you so much in advance.

  • #2
    Here you have more than 1 FE dimension:

    absorb(departamento* i_pareja)
    Show us how you are demeaning the data and also make sure that your panel is indeed a panel and is truly balanced to begin with.

    Code:
    qui regress nbi l_dist_laura departamento* i_pareja año
    keep if e(sample)
    xtset i_pareja año 
    where you replace "año" with your time variable.

    Comment


    • #3
      Hi Andrew, thanks for your answer.

      Here is how i am demeaning the data.

      Code:
      bys i_pareja: egen f_nbi=mean(nbi)
      gen fe_nbi = nbi - f_nbi
      
      bys i_pareja: egen f_l_dist_laura=mean(l_dist_laura)
      gen fe_l_dist_laura = l_dist_laura - f_l_dist_laura
      
      bys departamento*: egen fdept_nbi=mean(fe_nbi)
      gen fe_dept_fe_nbi = fe_nbi - fdept_nbi
      
      bys departamento*: egen fdept_distlaura=mean(fe_l_dist_laura)
      gen fe_dept_ldistlaura = fe_l_dist_laura - fdept_distlaura
      Now, i don't really have a panel with a time variable. I just have 1938 paired municipalities, everyone observed in 2005.

      Code:
      . xtset i_pareja ano 
      repeated time values within panel
      Thank you.

      Comment


      • #4
        Do you get the same coefficients if you restrict your sample to one observation per ID and year?

        Code:
        qui regress nbi l_dist_laura departamento* i_pareja año
        keep if e(sample)
        collapse nbi l_dist_laura departamento*, by(i_pareja ano)
        xtset i_pareja ano 
        
        bys i_pareja: egen f_nbi=mean(nbi)
        gen fe_nbi = nbi - f_nbi
        
        bys i_pareja: egen f_l_dist_laura=mean(l_dist_laura)
        gen fe_l_dist_laura = l_dist_laura - f_l_dist_laura
        
        bys departamento*: egen fdept_nbi=mean(fe_nbi)
        gen fe_dept_fe_nbi = fe_nbi - fdept_nbi
        
        bys departamento*: egen fdept_distlaura=mean(fe_l_dist_laura)
        gen fe_dept_ldistlaura = fe_l_dist_laura - fdept_distlaura

        Comment


        • #5
          Thanks for your reply.

          Im afraid i can't restrict my sample to one observation per ID (i_pareja) and year, because im trying to estimate neighboured paired-municipalities OLS, so I need to be able to compare municipalities treated with nontreated, so thats why they are paired, one in each pair is treated and the other one is not. When I collapse i eliminate controls so variables are omitted in estimation.

          Comment


          • #6
            You are checking why the results are not the same, so this is an exercise. I am not asking you to restrict the data in this way for analysis.

            Comment


            • #7
              Well, in that case, doing the collapse i dont get any coefficient, variables are ommited.

              Comment


              • #8
                Can you share a sample of the dataset that replicates the problem?

                Code:
                sort i_pareja ano
                dataex

                Comment


                • #9
                  following Andrew's thoughts, maybe try

                  egen group = group(departamento* i_pareja)

                  then demean on group.

                  Comment

                  Working...
                  X