Computing Fixed Effects Manually

Jean Jacques

Join Date: Sep 2020
Posts: 97

Computing Fixed Effects Manually

16 Jul 2021, 08:36

Hi guys, I have a problem regarding computing fixed effects manually: even if results when I do it manually and when I do it through the "fe command" are similar, they're not the same. I attach a simulated data (ls = satisfaction with life, income, hs_work = hours of work). Sorry for the long piece of code, I don't know how to do it shorter.

Code:

clear
input int(wave iid) float(ls income hs_work)
1 112 8  1000    20
1 111 7  1100    25
2 111 .  800        30
2 112 4  2000    15
3 112 7     1246    20
3 111 3     4589    18
4 112 4  2500    24
4 111 4  3000    40
5 112 8  1798    48
5 111 7  3251    40
6 112 8     3425    36
6 111 5  2000    38
end

xtset iid wave


bysort iid :egen double m_ls=mean(ls)
bysort iid :egen double m_income=mean(income)
bysort iid :egen double m_hs_work=mean(hs_work)

gen double dm_ls=ls-m_ls
gen double dm_income=income-m_income
gen double dm_hs_work=hs_work-m_hs_work

drop m_ls m_income m_hs_work

bysort wave:egen double m_ls=mean(dm_ls)
bysort wave:egen double m_income=mean(dm_income)
bysort wave:egen double m_hs_work=mean(dm_hs_work)

replace dm_ls=dm_ls-m_ls
replace dm_income=dm_income-m_income
replace dm_hs_work=dm_hs_work-m_hs_work

reg dm_ls dm_income dm_hs_work

forvalues i=1/6 {
qui {
drop m_ls m_income m_hs_work

bysort iid:egen double m_ls=mean(dm_ls)
bysort iid:egen double m_income=mean(dm_income)
bysort iid:egen double m_hs_work=mean(dm_hs_work)

replace dm_ls = dm_ls - m_ls
replace dm_income = dm_income - m_income
replace dm_hs_work = dm_hs_work - m_hs_work

sum dm_ls
drop m_ls m_income m_hs_work

bysort wave:egen double m_ls = mean(dm_ls)
bysort wave:egen double m_income = mean(dm_income)
bysort wave:egen double m_hs_work=mean(dm_hs_work)

replace dm_ls=dm_ls-m_ls
replace dm_income=dm_income-m_income
replace dm_hs_work=dm_hs_work-m_hs_work
}
reg dm_ls dm_income dm_hs_work
}

Output (summary)

Code:

reg dm_ls dm_income dm_hs_work

------------------------------------------------------------------------------
       dm_ls |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   dm_income |  -.0000895   .0003102    -0.29   0.780    -.0008048    .0006258
  dm_hs_work |   .0651194   .0593232     1.10   0.304    -.0716802     .201919
       _cons |    .037346   .2231589     0.17   0.871    -.4772595    .5519514
------------------------------------------------------------------------------


xtreg ls income hs_work, fe

------------------------------------------------------------------------------
          ls |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      income |  -.0007449   .0005059    -1.47   0.184    -.0019411    .0004513
     hs_work |   .0806055   .0485036     1.66   0.140    -.0340873    .1952984
       _cons |   5.289379   1.874318     2.82   0.026     .8573201    9.721438
-------------+----------------------------------------------------------------

Thanks a lot for the help!

Tags: fixed effects, manually, panel data

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17704

16 Jul 2021, 08:54

Jean:
if you mean the coefficients of -xtreg,fe- and -regress-, you can easily obtain the same point estimates of the shared coefficients but different standard errors and related stuff (please consider that in the following toy-example clustered standard errors are not at heir best, due to the limited sample size):

Code:

. xtset iid wave
       panel variable:  iid (strongly balanced)
        time variable:  wave, 1 to 6
                delta:  1 unit

. xtreg ls income hs_work, fe vce(cluster iid)

Fixed-effects (within) regression               Number of obs     =         11
Group variable: iid                             Number of groups  =          2

R-sq:                                           Obs per group:
     within  = 0.3996                                         min =          5
     between = 1.0000                                         avg =        5.5
     overall = 0.3834                                         max =          6

                                                F(1,1)            =          .
corr(u_i, Xb)  = 0.0848                         Prob > F          =          .

                                    (Std. Err. adjusted for 2 clusters in iid)
------------------------------------------------------------------------------
             |               Robust
          ls |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      income |  -.0007449   .0000149   -50.09   0.013    -.0009338   -.0005559
     hs_work |   .0806055   .0391247     2.06   0.288    -.4165215    .5777326
       _cons |   5.289379   1.117377     4.73   0.133    -8.908239      19.487
-------------+----------------------------------------------------------------
     sigma_u |  .78834795
     sigma_e |  1.6644414
         rho |  .18323072   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. reg ls income hs_work i.iid, vce(cluster iid)

Linear regression                               Number of obs     =         11
                                                F(0, 1)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.4746
                                                Root MSE          =     1.6644

                                    (Std. Err. adjusted for 2 clusters in iid)
------------------------------------------------------------------------------
             |               Robust
          ls |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      income |  -.0007449   .0000159   -46.86   0.014    -.0009469   -.0005429
     hs_work |   .0806055   .0418261     1.93   0.305    -.4508456    .6120567
     112.iid |   1.114892   .1979158     5.63   0.112    -1.399866    3.629651
       _cons |   4.681256    1.30248     3.59   0.173    -11.86832    21.23083
------------------------------------------------------------------------------

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Jean Jacques

Join Date: Sep 2020

Posts: 97
#3

16 Jul 2021, 09:23

Ciao Carlo, thanks a lot for your answer! Sorry may be I'm a bit lost but doing a fixed effects regression equals to adding i.iid as an extra regressor? So the demeaning process that I described above is useless? I'm lost, sorry
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#4

16 Jul 2021, 12:15

Jean:
to obtain the fixed effect (after -xtreg- only), you should type:

Code:

predict fe,u

This is not feasible after -regress-.
In my previus example I obtained the same coefficients for the shared coefficients of -xtreg,fe- and -regress-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jean Jacques

Join Date: Sep 2020

Posts: 97
#5

16 Jul 2021, 13:24

Thanks Carlo, probably I wasn't clear!

I mean, the betas that I obtain doing

Code:

reg depvar indvar

, which I typically use for an OLS are the same that I obtain doing

Code:

xtreg depvar indvar, fe

if I add i.iid to the regression?:

Code:

reg depvar indvar i.iid

The point is that I need to do a fixed effect regression, but given the circumstances of the specific setting in which I'm working, I cannot use the command

Code:

xtreg depvar indvar, fe

so I'm trying to compute them manually. As I do have a panel with 180.000 observations (aprox) when I do

Code:

reg depvar indvar i.iid

I takes several hours, that's why I'm looking for an alternative. Any suggestion?
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

16 Jul 2021, 13:45

What OP is asking is a bit of a mystery. I will take the interpretation that he wonders why the manual demeaning is not giving him the fixed effect estimator.

The reason for this is that he has a missing value in one of the variables, and therefore it needs to be done like this:

Code:

. xtreg ls income hs_work, fe vce(cluster iid)

Fixed-effects (within) regression               Number of obs     =         11
Group variable: iid                             Number of groups  =          2

R-sq:                                           Obs per group:
     within  = 0.3996                                         min =          5
     between = 1.0000                                         avg =        5.5
     overall = 0.3834                                         max =          6

                                                F(1,1)            =          .
corr(u_i, Xb)  = 0.0848                         Prob > F          =          .

                                    (Std. Err. adjusted for 2 clusters in iid)
------------------------------------------------------------------------------
             |               Robust
          ls |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      income |  -.0007449   .0000149   -50.09   0.013    -.0009338   -.0005559
     hs_work |   .0806055   .0391247     2.06   0.288    -.4165215    .5777326
       _cons |   5.289379   1.117377     4.73   0.133    -8.908239      19.487
-------------+----------------------------------------------------------------
     sigma_u |  .78834795
     sigma_e |  1.6644414
         rho |  .18323072   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. 
. foreach var of varlist ls income hs_work {
  2. egen `var'mean = mean(`var') if e(sample), by(iid) 
  3. gen `var'de = `var' - `var'mean if e(sample)
  4. }
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)
(1 missing value generated)

. 
. reg lsde incomede hs_workde, nocons

      Source |       SS           df       MS      Number of obs   =        11
-------------+----------------------------------   F(2, 9)         =      3.00
       Model |  12.9074446         2  6.45372228   Prob > F        =    0.1007
    Residual |  19.3925554         9  2.15472838   R-squared       =    0.3996
-------------+----------------------------------   Adj R-squared   =    0.2662
       Total |        32.3        11  2.93636364   Root MSE        =    1.4679

------------------------------------------------------------------------------
        lsde |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    incomede |  -.0007449   .0004461    -1.67   0.129    -.0017541    .0002643
   hs_workde |   .0806055   .0427762     1.88   0.092    -.0161609     .177372
------------------------------------------------------------------------------

. 
. reg ls income hs_work i.iid

      Source |       SS           df       MS      Number of obs   =        11
-------------+----------------------------------   F(3, 7)         =      2.11
       Model |  17.5165355         3  5.83884516   Prob > F        =    0.1877
    Residual |  19.3925554         7  2.77036506   R-squared       =    0.4746
-------------+----------------------------------   Adj R-squared   =    0.2494
       Total |  36.9090909        10  3.69090909   Root MSE        =    1.6644

------------------------------------------------------------------------------
          ls |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      income |  -.0007449   .0005059    -1.47   0.184    -.0019411    .0004513
     hs_work |   .0806055   .0485036     1.66   0.140    -.0340873    .1952984
     112.iid |   1.114892   1.106759     1.01   0.347    -1.502176    3.731961
       _cons |   4.681256   2.173545     2.15   0.068    -.4583603    9.820872
------------------------------------------------------------------------------

.

and now we observe that all three estimators are numerically the same.

Comment

Jean Jacques

Join Date: Sep 2020

Posts: 97
#7

16 Jul 2021, 14:12

Ah great, yes, I think that's why I want. When I do the loop above I have that the variables created by it (incomemean incomede hs_workmean hs_workde) are 0. I don't find the way of solving it, I guess it's related with the "if e(sample)"
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#8

17 Jul 2021, 01:58

Jean:
thanks for clarifying.
While I interpreted your question #3 as witnessing your interest in retrieving the -u- panel-wise residual, Joro's helpful reply was on target.
That said, while with 11 clusters it is not helpful to use clustered standard errors, if you decide to go pooled OLS (by the way, something that I find hard to prefer to -xtreg-), you should impose non-default standard errors given the non-independence of the observations belonging to the same panel.
In addition, there might be good reasons for going -cluster- (or -robust-, as both options do the very same job, here) with -xtreg-, too (serial autocorrelation and/heteroskedastcity).

Kind regards,
Carlo
(Stata 19.0)
Comment
Jean Jacques

Join Date: Sep 2020

Posts: 97
#9

17 Jul 2021, 04:53

Hi Carlo, indeed probably in order to be short in the questions (which was already long) I wasn't being clear enough. Actually what I'm trying to do is run a Finite Mixture Model using fixed effects. As the fmm command in stata doesn't support fixed effect estimations I'm trying to do the transformation by my own and with that being able to apply the fmm Stata command that appears I think in Stata 15 onwards. Clearly I will cluster standard errors, I just omitted that in my code for simplicity.

Regarding the answer of Joro indeed was quite helpful. I was able of implementing it imposing "if e(sample) !=." instead of "if e(sample)" (if I just do that I don't have observations).
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2150
#10

17 Jul 2021, 06:27

There are several equivalent ways to obtain the FE estimates, but, as Joro pointed out, you must be careful with missing data. Using either the Mundlak approach or the within approach (as Jean did), one must restrict attention to the complete cases.

For what Jean wants to do -- that is, use a finite mixture model -- I think the Mundlak approach is theoretically more justified. You include the time averages as additional control variables, but those time averages are obtained using only the complete cases. You can use e(sample) or define a complete cases indicator head of time.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#11

17 Jul 2021, 06:53

You are not doing anything with ""if e(sample) !=." instead of "if e(sample)"."

The e(sample) function is either 0, if the observation is not included in the estimation sample, or 1 if the observation is included in the estimation sample.

Therefore the statement that I used
if e(sample)
is equivalent to the statement
if e(sample)==1
or to
if e(sample) !=0

The statement that you are using
if e(sample) !=.
is not doing anything, or is equivalent to not including any "if" statement because e(sample) is never missing.

Originally posted by Jean Jacques View Post

Hi Carlo, indeed probably in order to be short in the questions (which was already long) I wasn't being clear enough. Actually what I'm trying to do is run a Finite Mixture Model using fixed effects. As the fmm command in stata doesn't support fixed effect estimations I'm trying to do the transformation by my own and with that being able to apply the fmm Stata command that appears I think in Stata 15 onwards. Clearly I will cluster standard errors, I just omitted that in my code for simplicity.

Regarding the answer of Joro indeed was quite helpful. I was able of implementing it imposing "if e(sample) !=." instead of "if e(sample)" (if I just do that I don't have observations).
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

#12

17 Jul 2021, 06:57

What is happening is that the function is not defined at the moment you are calling it, as the following example suggests:

Code:

. sysuse auto, clear
(1978 Automobile Data)

. summ mpg if e(sample)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         mpg |          0

. summ mpg if e(sample)!=.

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         mpg |         74     21.2973    5.785503         12         41

This example above is surprising to me as well, I would have thought that when the function e(sample) is not defined it should evaluate to missing. But it does not, it evaluates to 0. Here

Code:

. gen e = e(sample)

. summ e

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           e |         74           0           0          0          0

Last edited by Joro Kolev; 17 Jul 2021, 07:03.

Announcement

Computing Fixed Effects Manually

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment