Why demeaning procedures and reghdfe have different results when estimating fixed effects?

Diego Duarte

Join Date: Oct 2023
Posts: 4

Why demeaning procedures and reghdfe have different results when estimating fixed effects?

30 Oct 2023, 06:19

Hi Statalist community. I really need your help with this.

Im trying to use two alternative procedures to estimated fixed effects in a regression. Demeaning manually and using reghdfe. However, both methods shows different coefficients. I can't figure out why.

For example, I have a dataset with 1938 paired-municipalities, each municipality belongs to a department. I have a balanced panel with i_pareja variable:

Code:

.xtset i_pareja

Panel variable: i_pareja (balanced)

This is the result if I run a regression with my manually demeaned variables (i'm demeaning by i_pareja or pair and by department in a multi-step demeaning process, because I need pair and department fixed effects):

Code:

.  reg fe_dept_fe_nbi fe_dept_ldistlaura, cluster(i_pareja)

Linear regression                               Number of obs     =      1,936
                                                F(1, 968)         =      78.99
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0836
                                                Root MSE          =     8.2161

                                   (Std. err. adjusted for 969 clusters in i_pareja)
------------------------------------------------------------------------------------
                   |               Robust
    fe_dept_fe_nbi | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------------+----------------------------------------------------------------
fe_dept_ldistlaura |   2.134917   .2402189     8.89   0.000     1.663507    2.606327
             _cons |  -.0044798   .0381333    -0.12   0.907    -.0793133    .0703537
------------------------------------------------------------------------------------

This is the result when I use reghdfe:

Code:

.   reghdfe nbi l_dist_laura, absorb(departamento* i_pareja) vce(cluster i_pareja)
(warning: absorbing 34 dimensions of fixed effects; check that you really want that)
(dropped 2 singleton observations)
(MWFE estimator converged in 29 iterations)

HDFE Linear regression                            Number of obs   =      1,934
Absorbing 34 HDFE groups                          F(   1,    966) =      70.37
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.8258
                                                  Adj R-squared   =     0.6410
                                                  Within R-sq.    =     0.0895
Number of clusters (i_pareja) =        967        Root MSE        =    11.1799

                             (Std. err. adjusted for 967 clusters in i_pareja)
------------------------------------------------------------------------------
             |               Robust
         nbi | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
l_dist_laura |   2.204064   .2627431     8.39   0.000     1.688451    2.719677
       _cons |   19.69598   2.512484     7.84   0.000     14.76542    24.62653
------------------------------------------------------------------------------

I know the first model is running with 1934 observations, but eliminating two singletons (which is what reghdfe does automatically) does not change the coefficients.

I'm wondering why this happens.

I need to run regressions with reg because im doing a mediation analysis with three mediators at the same time, and the only way I know to do it is to use SUREG command, which implies running several REG regressions.

Thank you so much in advance.

Tags: None

Andrew Musau

Join Date: Oct 2014

Posts: 10260
#2

30 Oct 2023, 06:31

Here you have more than 1 FE dimension:

absorb(departamento* i_pareja)

Show us how you are demeaning the data and also make sure that your panel is indeed a panel and is truly balanced to begin with.

Code:

qui regress nbi l_dist_laura departamento* i_pareja año keep if e(sample) xtset i_pareja año

where you replace "año" with your time variable.
Comment

Diego Duarte

Join Date: Oct 2023
Posts: 4

30 Oct 2023, 07:11

Hi Andrew, thanks for your answer.

Here is how i am demeaning the data.

Code:

bys i_pareja: egen f_nbi=mean(nbi)
gen fe_nbi = nbi - f_nbi

bys i_pareja: egen f_l_dist_laura=mean(l_dist_laura)
gen fe_l_dist_laura = l_dist_laura - f_l_dist_laura

bys departamento*: egen fdept_nbi=mean(fe_nbi)
gen fe_dept_fe_nbi = fe_nbi - fdept_nbi

bys departamento*: egen fdept_distlaura=mean(fe_l_dist_laura)
gen fe_dept_ldistlaura = fe_l_dist_laura - fdept_distlaura

Now, i don't really have a panel with a time variable. I just have 1938 paired municipalities, everyone observed in 2005.

Code:

. xtset i_pareja ano 
repeated time values within panel

Thank you.

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10260

30 Oct 2023, 08:14

Do you get the same coefficients if you restrict your sample to one observation per ID and year?

Code:

qui regress nbi l_dist_laura departamento* i_pareja año
keep if e(sample)
collapse nbi l_dist_laura departamento*, by(i_pareja ano)
xtset i_pareja ano 

bys i_pareja: egen f_nbi=mean(nbi)
gen fe_nbi = nbi - f_nbi

bys i_pareja: egen f_l_dist_laura=mean(l_dist_laura)
gen fe_l_dist_laura = l_dist_laura - f_l_dist_laura

bys departamento*: egen fdept_nbi=mean(fe_nbi)
gen fe_dept_fe_nbi = fe_nbi - fdept_nbi

bys departamento*: egen fdept_distlaura=mean(fe_l_dist_laura)
gen fe_dept_ldistlaura = fe_l_dist_laura - fdept_distlaura

Comment

Diego Duarte

Join Date: Oct 2023

Posts: 4
#5

30 Oct 2023, 08:24

Thanks for your reply.

Im afraid i can't restrict my sample to one observation per ID (i_pareja) and year, because im trying to estimate neighboured paired-municipalities OLS, so I need to be able to compare municipalities treated with nontreated, so thats why they are paired, one in each pair is treated and the other one is not. When I collapse i eliminate controls so variables are omitted in estimation.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10260
#6

30 Oct 2023, 09:12

You are checking why the results are not the same, so this is an exercise. I am not asking you to restrict the data in this way for analysis.
Comment
Diego Duarte

Join Date: Oct 2023

Posts: 4
#7

30 Oct 2023, 11:34

Well, in that case, doing the collapse i dont get any coefficient, variables are ommited.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10260
#8

30 Oct 2023, 12:14

Can you share a sample of the dataset that replicates the problem?

Code:

sort i_pareja ano dataex
Comment
George Ford

Join Date: Aug 2014

Posts: 3179
#9

30 Oct 2023, 15:15

following Andrew's thoughts, maybe try

egen group = group(departamento* i_pareja)

then demean on group.
Comment

Announcement

Why demeaning procedures and reghdfe have different results when estimating fixed effects?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment