Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Computing Fixed Effects Manually

    Hi guys, I have a problem regarding computing fixed effects manually: even if results when I do it manually and when I do it through the "fe command" are similar, they're not the same. I attach a simulated data (ls = satisfaction with life, income, hs_work = hours of work). Sorry for the long piece of code, I don't know how to do it shorter.

    Code:
    clear
    input int(wave iid) float(ls income hs_work)
    1 112 8  1000    20
    1 111 7  1100    25
    2 111 .  800        30
    2 112 4  2000    15
    3 112 7     1246    20
    3 111 3     4589    18
    4 112 4  2500    24
    4 111 4  3000    40
    5 112 8  1798    48
    5 111 7  3251    40
    6 112 8     3425    36
    6 111 5  2000    38
    end
    
    xtset iid wave
    
    
    bysort iid :egen double m_ls=mean(ls)
    bysort iid :egen double m_income=mean(income)
    bysort iid :egen double m_hs_work=mean(hs_work)
    
    gen double dm_ls=ls-m_ls
    gen double dm_income=income-m_income
    gen double dm_hs_work=hs_work-m_hs_work
    
    drop m_ls m_income m_hs_work
    
    bysort wave:egen double m_ls=mean(dm_ls)
    bysort wave:egen double m_income=mean(dm_income)
    bysort wave:egen double m_hs_work=mean(dm_hs_work)
    
    replace dm_ls=dm_ls-m_ls
    replace dm_income=dm_income-m_income
    replace dm_hs_work=dm_hs_work-m_hs_work
    
    reg dm_ls dm_income dm_hs_work
    
    forvalues i=1/6 {
    qui {
    drop m_ls m_income m_hs_work
    
    bysort iid:egen double m_ls=mean(dm_ls)
    bysort iid:egen double m_income=mean(dm_income)
    bysort iid:egen double m_hs_work=mean(dm_hs_work)
    
    replace dm_ls = dm_ls - m_ls
    replace dm_income = dm_income - m_income
    replace dm_hs_work = dm_hs_work - m_hs_work
    
    sum dm_ls
    drop m_ls m_income m_hs_work
    
    bysort wave:egen double m_ls = mean(dm_ls)
    bysort wave:egen double m_income = mean(dm_income)
    bysort wave:egen double m_hs_work=mean(dm_hs_work)
    
    replace dm_ls=dm_ls-m_ls
    replace dm_income=dm_income-m_income
    replace dm_hs_work=dm_hs_work-m_hs_work
    }
    reg dm_ls dm_income dm_hs_work
    }
    Output (summary)

    Code:
    reg dm_ls dm_income dm_hs_work
    
    ------------------------------------------------------------------------------
           dm_ls |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       dm_income |  -.0000895   .0003102    -0.29   0.780    -.0008048    .0006258
      dm_hs_work |   .0651194   .0593232     1.10   0.304    -.0716802     .201919
           _cons |    .037346   .2231589     0.17   0.871    -.4772595    .5519514
    ------------------------------------------------------------------------------
    
    
    xtreg ls income hs_work, fe
    
    ------------------------------------------------------------------------------
              ls |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          income |  -.0007449   .0005059    -1.47   0.184    -.0019411    .0004513
         hs_work |   .0806055   .0485036     1.66   0.140    -.0340873    .1952984
           _cons |   5.289379   1.874318     2.82   0.026     .8573201    9.721438
    -------------+----------------------------------------------------------------
    Thanks a lot for the help!

  • #2
    Jean:
    if you mean the coefficients of -xtreg,fe- and -regress-, you can easily obtain the same point estimates of the shared coefficients but different standard errors and related stuff (please consider that in the following toy-example clustered standard errors are not at heir best, due to the limited sample size):
    Code:
    . xtset iid wave
           panel variable:  iid (strongly balanced)
            time variable:  wave, 1 to 6
                    delta:  1 unit
    
    . xtreg ls income hs_work, fe vce(cluster iid)
    
    Fixed-effects (within) regression               Number of obs     =         11
    Group variable: iid                             Number of groups  =          2
    
    R-sq:                                           Obs per group:
         within  = 0.3996                                         min =          5
         between = 1.0000                                         avg =        5.5
         overall = 0.3834                                         max =          6
    
                                                    F(1,1)            =          .
    corr(u_i, Xb)  = 0.0848                         Prob > F          =          .
    
                                        (Std. Err. adjusted for 2 clusters in iid)
    ------------------------------------------------------------------------------
                 |               Robust
              ls |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          income |  -.0007449   .0000149   -50.09   0.013    -.0009338   -.0005559
         hs_work |   .0806055   .0391247     2.06   0.288    -.4165215    .5777326
           _cons |   5.289379   1.117377     4.73   0.133    -8.908239      19.487
    -------------+----------------------------------------------------------------
         sigma_u |  .78834795
         sigma_e |  1.6644414
             rho |  .18323072   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . reg ls income hs_work i.iid, vce(cluster iid)
    
    Linear regression                               Number of obs     =         11
                                                    F(0, 1)           =          .
                                                    Prob > F          =          .
                                                    R-squared         =     0.4746
                                                    Root MSE          =     1.6644
    
                                        (Std. Err. adjusted for 2 clusters in iid)
    ------------------------------------------------------------------------------
                 |               Robust
              ls |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          income |  -.0007449   .0000159   -46.86   0.014    -.0009469   -.0005429
         hs_work |   .0806055   .0418261     1.93   0.305    -.4508456    .6120567
         112.iid |   1.114892   .1979158     5.63   0.112    -1.399866    3.629651
           _cons |   4.681256    1.30248     3.59   0.173    -11.86832    21.23083
    ------------------------------------------------------------------------------
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Ciao Carlo, thanks a lot for your answer! Sorry may be I'm a bit lost but doing a fixed effects regression equals to adding i.iid as an extra regressor? So the demeaning process that I described above is useless? I'm lost, sorry

      Comment


      • #4
        Jean:
        to obtain the fixed effect (after -xtreg- only), you should type:
        Code:
        predict fe,u
        This is not feasible after -regress-.
        In my previus example I obtained the same coefficients for the shared coefficients of -xtreg,fe- and -regress-.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks Carlo, probably I wasn't clear!

          I mean, the betas that I obtain doing
          Code:
          reg depvar indvar
          , which I typically use for an OLS are the same that I obtain doing
          Code:
          xtreg depvar indvar, fe
          if I add i.iid to the regression?:
          Code:
          reg depvar indvar i.iid
          The point is that I need to do a fixed effect regression, but given the circumstances of the specific setting in which I'm working, I cannot use the command
          Code:
          xtreg depvar indvar, fe
          so I'm trying to compute them manually. As I do have a panel with 180.000 observations (aprox) when I do
          Code:
          reg depvar indvar i.iid
          I takes several hours, that's why I'm looking for an alternative. Any suggestion?

          Comment


          • #6
            What OP is asking is a bit of a mystery. I will take the interpretation that he wonders why the manual demeaning is not giving him the fixed effect estimator.

            The reason for this is that he has a missing value in one of the variables, and therefore it needs to be done like this:

            Code:
            . xtreg ls income hs_work, fe vce(cluster iid)
            
            Fixed-effects (within) regression               Number of obs     =         11
            Group variable: iid                             Number of groups  =          2
            
            R-sq:                                           Obs per group:
                 within  = 0.3996                                         min =          5
                 between = 1.0000                                         avg =        5.5
                 overall = 0.3834                                         max =          6
            
                                                            F(1,1)            =          .
            corr(u_i, Xb)  = 0.0848                         Prob > F          =          .
            
                                                (Std. Err. adjusted for 2 clusters in iid)
            ------------------------------------------------------------------------------
                         |               Robust
                      ls |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                  income |  -.0007449   .0000149   -50.09   0.013    -.0009338   -.0005559
                 hs_work |   .0806055   .0391247     2.06   0.288    -.4165215    .5777326
                   _cons |   5.289379   1.117377     4.73   0.133    -8.908239      19.487
            -------------+----------------------------------------------------------------
                 sigma_u |  .78834795
                 sigma_e |  1.6644414
                     rho |  .18323072   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            
            . 
            . foreach var of varlist ls income hs_work {
              2. egen `var'mean = mean(`var') if e(sample), by(iid) 
              3. gen `var'de = `var' - `var'mean if e(sample)
              4. }
            (1 missing value generated)
            (1 missing value generated)
            (1 missing value generated)
            (1 missing value generated)
            (1 missing value generated)
            (1 missing value generated)
            
            . 
            . reg lsde incomede hs_workde, nocons
            
                  Source |       SS           df       MS      Number of obs   =        11
            -------------+----------------------------------   F(2, 9)         =      3.00
                   Model |  12.9074446         2  6.45372228   Prob > F        =    0.1007
                Residual |  19.3925554         9  2.15472838   R-squared       =    0.3996
            -------------+----------------------------------   Adj R-squared   =    0.2662
                   Total |        32.3        11  2.93636364   Root MSE        =    1.4679
            
            ------------------------------------------------------------------------------
                    lsde |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                incomede |  -.0007449   .0004461    -1.67   0.129    -.0017541    .0002643
               hs_workde |   .0806055   .0427762     1.88   0.092    -.0161609     .177372
            ------------------------------------------------------------------------------
            
            . 
            . reg ls income hs_work i.iid
            
                  Source |       SS           df       MS      Number of obs   =        11
            -------------+----------------------------------   F(3, 7)         =      2.11
                   Model |  17.5165355         3  5.83884516   Prob > F        =    0.1877
                Residual |  19.3925554         7  2.77036506   R-squared       =    0.4746
            -------------+----------------------------------   Adj R-squared   =    0.2494
                   Total |  36.9090909        10  3.69090909   Root MSE        =    1.6644
            
            ------------------------------------------------------------------------------
                      ls |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                  income |  -.0007449   .0005059    -1.47   0.184    -.0019411    .0004513
                 hs_work |   .0806055   .0485036     1.66   0.140    -.0340873    .1952984
                 112.iid |   1.114892   1.106759     1.01   0.347    -1.502176    3.731961
                   _cons |   4.681256   2.173545     2.15   0.068    -.4583603    9.820872
            ------------------------------------------------------------------------------
            
            .
            and now we observe that all three estimators are numerically the same.

            Comment


            • #7
              Ah great, yes, I think that's why I want. When I do the loop above I have that the variables created by it (incomemean incomede hs_workmean hs_workde) are 0. I don't find the way of solving it, I guess it's related with the "if e(sample)"

              Comment


              • #8
                Jean:
                thanks for clarifying.
                While I interpreted your question #3 as witnessing your interest in retrieving the -u- panel-wise residual, Joro's helpful reply was on target.
                That said, while with 11 clusters it is not helpful to use clustered standard errors, if you decide to go pooled OLS (by the way, something that I find hard to prefer to -xtreg-), you should impose non-default standard errors given the non-independence of the observations belonging to the same panel.
                In addition, there might be good reasons for going -cluster- (or -robust-, as both options do the very same job, here) with -xtreg-, too (serial autocorrelation and/heteroskedastcity).
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Hi Carlo, indeed probably in order to be short in the questions (which was already long) I wasn't being clear enough. Actually what I'm trying to do is run a Finite Mixture Model using fixed effects. As the fmm command in stata doesn't support fixed effect estimations I'm trying to do the transformation by my own and with that being able to apply the fmm Stata command that appears I think in Stata 15 onwards. Clearly I will cluster standard errors, I just omitted that in my code for simplicity.

                  Regarding the answer of Joro indeed was quite helpful. I was able of implementing it imposing "if e(sample) !=." instead of "if e(sample)" (if I just do that I don't have observations).

                  Comment


                  • #10
                    There are several equivalent ways to obtain the FE estimates, but, as Joro pointed out, you must be careful with missing data. Using either the Mundlak approach or the within approach (as Jean did), one must restrict attention to the complete cases.

                    For what Jean wants to do -- that is, use a finite mixture model -- I think the Mundlak approach is theoretically more justified. You include the time averages as additional control variables, but those time averages are obtained using only the complete cases. You can use e(sample) or define a complete cases indicator head of time.

                    Comment


                    • #11
                      You are not doing anything with ""if e(sample) !=." instead of "if e(sample)"."

                      The e(sample) function is either 0, if the observation is not included in the estimation sample, or 1 if the observation is included in the estimation sample.

                      Therefore the statement that I used
                      if e(sample)
                      is equivalent to the statement
                      if e(sample)==1
                      or to
                      if e(sample) !=0

                      The statement that you are using
                      if e(sample) !=.
                      is not doing anything, or is equivalent to not including any "if" statement because e(sample) is never missing.



                      Originally posted by Jean Jacques View Post
                      Hi Carlo, indeed probably in order to be short in the questions (which was already long) I wasn't being clear enough. Actually what I'm trying to do is run a Finite Mixture Model using fixed effects. As the fmm command in stata doesn't support fixed effect estimations I'm trying to do the transformation by my own and with that being able to apply the fmm Stata command that appears I think in Stata 15 onwards. Clearly I will cluster standard errors, I just omitted that in my code for simplicity.

                      Regarding the answer of Joro indeed was quite helpful. I was able of implementing it imposing "if e(sample) !=." instead of "if e(sample)" (if I just do that I don't have observations).

                      Comment


                      • #12
                        What is happening is that the function is not defined at the moment you are calling it, as the following example suggests:

                        Code:
                        . sysuse auto, clear
                        (1978 Automobile Data)
                        
                        . summ mpg if e(sample)
                        
                            Variable |        Obs        Mean    Std. Dev.       Min        Max
                        -------------+---------------------------------------------------------
                                 mpg |          0
                        
                        . summ mpg if e(sample)!=.
                        
                            Variable |        Obs        Mean    Std. Dev.       Min        Max
                        -------------+---------------------------------------------------------
                                 mpg |         74     21.2973    5.785503         12         41
                        This example above is surprising to me as well, I would have thought that when the function e(sample) is not defined it should evaluate to missing. But it does not, it evaluates to 0. Here

                        Code:
                        . gen e = e(sample)
                        
                        . summ e
                        
                            Variable |        Obs        Mean    Std. Dev.       Min        Max
                        -------------+---------------------------------------------------------
                                   e |         74           0           0          0          0
                        Last edited by Joro Kolev; 17 Jul 2021, 07:03.

                        Comment

                        Working...
                        X