Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Canonical two-way fixed effects with cross sectional id and time t: two stage procedure gives almost, but not exactly what I expect.

    Good evening,

    I have been reading the literature for estimating two way fixed effects models, and it is a bit of a mess, I could not find a good survey/synthesis of what works and what does not work -- everybody seems to be doing his own thing.

    I want to check what I am getting wrong about the following reasoning, and any contributions are welcome, if Professor Jeff Wooldridge and Sergio Correia can look into the matter I would appreciate their input a lot.

    My reasoning is that in the standard/canonical two-way fixed effects model with cross sectional id and time series t, the id fixed effects are orthogonal to the t fixed effects.

    This is because the cross sectional id dummies vary only across id units (but not across time), and the time dummies vary only across time (but not across ids).

    Therefore I reason that as the id dummies are orthogonal to the time dummies, I should be able to obtain the two-way fixed effects by a simple two stage procedure:

    1. Residualise every variable in the regression, using say -areg-, with respect of the first fixed effect.

    2. Use the residualised variables in a second -areg- regression absorbing the second fixed effects, where the regressors are the residualised variables from step 1.

    What I think should work, almost work, but not exactly. So my question is why is this not exactly working? Do I have some error in the reasoning? Or is it a numerical issue?

    Here is an illustration. Lets say I want to fit a two-way fixed effects regression using the nlswork data, where the dependent variable is ln_wage and the regressors are age and hours, and id is idcode and time is year:

    1) Step one, I residualise all the variables with respect of one of the fixed effects, idcode in this example

    Code:
    . webuse nlswork, clear
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    . qui foreach var of varlist ln_wage age hours {
    . areg `var', absorb(idcode)
    . predict double `var'res, resid
    . }
    2) Step two, I use the residualised variables as regressors now absorbing the second fixed effect:

    Code:
    . areg  ln_wageres ageres hoursres, absorb(year)
    
    Linear regression, absorbing indicators         Number of obs     =     28,443
    Absorbed variable: year                         No. of categories =         15
                                                    F(   2,  28426)   =     579.90
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.1063
                                                    Adj R-squared     =     0.1058
                                                    Root MSE          =     0.2769
    
    ------------------------------------------------------------------------------
      ln_wageres |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          ageres |   .0182627    .000538    33.94   0.000     .0172082    .0193172
        hoursres |   .0007479   .0002189     3.42   0.001     .0003188    .0011769
           _cons |   .0000474   .0016419     0.03   0.977    -.0031708    .0032656
    ------------------------------------------------------------------------------
    F test of absorbed indicators: F(14, 28426) = 6.860           Prob > F = 0.000
    and they are almost, but not exactly what they should be:

    Code:
    . reghdfe ln_wage age hours, absorb(idcode year)
    (dropped 554 singleton observations)
    (MWFE estimator converged in 8 iterations)
    
    HDFE Linear regression                            Number of obs   =     27,889
    Absorbing 2 HDFE groups                           F(   2,  23718) =       5.75
                                                      Prob > F        =     0.0032
                                                      R-squared       =     0.6554
                                                      Adj R-squared   =     0.5948
                                                      Within R-sq.    =     0.0005
                                                      Root MSE        =     0.3030
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .0128924   .0102256     1.26   0.207    -.0071505    .0329353
           hours |    .000756   .0002397     3.15   0.002     .0002862    .0012258
           _cons |   1.275635   .2973869     4.29   0.000      .692738    1.858533
    ------------------------------------------------------------------------------
    
    
    . areg ln_wage age hours i.year, absorb(idcode)
    
    Linear regression, absorbing indicators         Number of obs     =     28,443
    Absorbed variable: idcode                       No. of categories =      4,709
                                                    F(  16,  23718)   =     177.07
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.6648
                                                    Adj R-squared     =     0.5980
                                                    Root MSE          =     0.3030
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .0128924   .0102256     1.26   0.207    -.0071505    .0329353
           hours |    .000756   .0002397     3.15   0.002     .0002862    .0012258
                 |
            year |
             69  |   .0741072   .0159121     4.66   0.000     .0429185    .1052959
             70  |   .0475207   .0235895     2.01   0.044     .0012839    .0937576
             71  |   .0860412   .0328227     2.62   0.009     .0217066    .1503757
             72  |   .0856385   .0425291     2.01   0.044     .0022789    .1689982
             73  |   .0875726   .0523902     1.67   0.095    -.0151154    .1902607
             75  |   .0765988    .072101     1.06   0.288    -.0647237    .2179213
             77  |   .1071687   .0923094     1.16   0.246    -.0737636     .288101
             78  |   .1293613   .1029087     1.26   0.209    -.0723463    .3310689
             80  |   .1119272   .1229027     0.91   0.362    -.1289699    .3528242
             82  |   .1075358   .1432406     0.75   0.453     -.173225    .3882966
             83  |   .1190697   .1533399     0.78   0.437    -.1814863    .4196257
             85  |   .1429657   .1737723     0.82   0.411     -.197639    .4835705
             87  |   .1339107   .1942917     0.69   0.491    -.2469135    .5147349
             88  |   .1745405   .2081783     0.84   0.402    -.2335022    .5825832
                 |
           _cons |   1.169893   .1956564     5.98   0.000     .7863941    1.553392
    ------------------------------------------------------------------------------
    F test of absorbed indicators: F(4708, 23718) = 8.643         Prob > F = 0.000
    And the results are not the same as in my two step procedure...

    So does anybody see where I go wrong?

  • #2
    And to make things worse, when I exchange the order of the elimination of the fixed effect, I get similar but again slightly different results, here:

    Code:
    . webuse nlswork, clear
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    . qui foreach var of varlist ln_wage age hours {
    . areg `var', absorb(year)
    . predict double `var'res, resid
    . }
    
    . areg  ln_wage age hours, absorb(idcode)
    
    Linear regression, absorbing indicators         Number of obs     =     28,443
    Absorbed variable: idcode                       No. of categories =      4,709
                                                    F(   2,  23732)   =    1367.47
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.6635
                                                    Adj R-squared     =     0.5967
                                                    Root MSE          =     0.3035
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .0181932   .0003481    52.26   0.000     .0175109    .0188755
           hours |   .0007344   .0002393     3.07   0.002     .0002653    .0012035
           _cons |   1.120157   .0136361    82.15   0.000     1.093429    1.146884
    ------------------------------------------------------------------------------
    F test of absorbed indicators: F(4708, 23732) = 8.654         Prob > F = 0.000

    Comment


    • #3
      Could it be related, in this example, to the missing values of age and hours, so that your various estimations are across different subsamples?
      Code:
      . webuse nlswork, clear
      (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
      
      . misstable summarize idcode year ln_wage age hours
                                                                     Obs<.
                                                      +------------------------------
                     |                                | Unique
            Variable |     Obs=.     Obs>.     Obs<.  | values        Min         Max
        -------------+--------------------------------+------------------------------
                 age |        24              28,510  |     33         14          46
               hours |        67              28,467  |     85          1         168
        -----------------------------------------------------------------------------

      Comment


      • #4
        Hi Joro
        the procedure that you describe only works if the panel is fully balanced . When it is not, you have to apply that process iteratively.
        i have a paper from in Stata journal where I explain the logic, extending the estimation of multiple fixed effects
        of course , Sergio correrĂ­a command superseded mine, but the idea that i describe there does what you are considering
        Hth
        fernando

        Comment


        • #5
          Here is the paper
          https://www.stata-journal.com/articl...article=st0409

          Comment


          • #6
            Hello Statalist
            Last edited by IRAMAHORO GERARD; 16 Sep 2023, 22:31.

            Comment


            • #7
              Hello Stata community. I am stata new user and I conduct a panel fixed effected analysis to assess the effects of CO2 and institutional quality on innovation in some countries. The results : xtreg logpat logCO2 INSTQ INST_CO2 logIIPR GDPgrowth Trade FDI ,fe robust with:
              logpat : dependent variable
              logCO2 and INSTQ are core independent variables
              INSTQ_CO2: interaction term of logCO2 * INSTQ
              GDPgrowth, trade and FDI are control variables.
              the results with above command are:

              Fixed-effects (within) regression Number of obs = 126
              Group variable: Year Number of groups = 21

              R-squared: Obs per group:
              Within = 0.8981 min = 6
              Between = 0.5789 avg = 6.0
              Overall = 0.8672 max = 6

              F(7,20) = 635.33
              corr(u_i, Xb) = -0.0289 Prob > F = 0.0000

              (Std. err. adjusted for 21 clusters in Year)
              ------------------------------------------------------------------------------------
              | Robust
              logpat | Coefficient std. err. t P>|t| [95% conf. interval]
              -------------------+----------------------------------------------------------------
              logCO2 | .08485 .0448944 1.89 0.073 -.008798 .178498
              INSTQ | .447999 .0777798 5.76 0.000 .2857533 .6102448
              INST_CO2 | -.1139003 .0159248 -7.15 0.000 -.1471189 -.0806817
              logIIPR | .6589335 .0348303 18.92 0.000 .5862787 .7315884
              GDPgrowth | .2070662 .0255474 8.11 0.000 .1537752 .2603572
              Trade | .0071305 .0026699 2.67 0.015 .0015611 .0126999
              FDI | -.033264 .0458649 -0.73 0.477 -.1289365 .0624085
              _cons | -7.31902 .5278719 -13.87 0.000 -8.420142 -6.217899
              -------------------+----------------------------------------------------------------
              sigma_u | .37134933
              sigma_e | .61870152
              rho | .26484054 (fraction of variance due to u_i)
              ------------------------------------------------------------------------------------
              Also , I want that stata displays the coeff for each country but controlling country effect with "i.Country1" with the following code:
              . xtreg logpat logCO2 INSTQ INST_CO2 logIIPR GDPgrowth Trade FDI i.Country1,fe robust

              Fixed-effects (within) regression Number of obs = 126
              Group variable: Year Number of groups = 21

              R-squared: Obs per group:
              Within = 0.9631 min = 6
              Between = 0.4482 avg = 6.0
              Overall = 0.9096 max = 6

              F(12,20) = 443.87
              corr(u_i, Xb) = 0.0415 Prob > F = 0.0000

              (Std. err. adjusted for 21 clusters in Year)
              ------------------------------------------------------------------------------------
              | Robust
              logpat | Coefficient std. err. t P>|t| [95% conf. interval]
              -------------------+----------------------------------------------------------------
              logCO2 | .3626265 .1015377 3.57 0.002 .1508225 .5744306
              INSTQ | .5388251 .2245202 2.40 0.026 .0704843 1.007166
              INST_CO2 | -.0755751 .0376895 -2.01 0.059 -.154194 .0030439
              logIIPR | .4419327 .0731009 6.05 0.000 .2894469 .5944184
              GDPgrowth | .0444302 .0238153 1.87 0.077 -.0052477 .0941082
              Trade| -.0331399 .0114673 -2.89 0.009 -.0570604 -.0092194
              FDI| .0272996 .0262657 1.04 0.311 -.0274896 .0820888
              |
              Country1 |
              Brazil | -1.167918 .8900959 -1.31 0.204 -3.024625 .6887894
              China | 1.984053 1.104671 1.80 0.088 -.3202511 4.288357
              Germany | 1.255095 .4110715 3.05 0.006 .3976148 2.112575
              South Africa | 1.014754 .5439211 1.87 0.077 -.1198456 2.149354
              U. States | -.7334651 .7596303 -0.97 0.346 -2.318026 .851096
              |
              _cons | -2.274916 1.304987 -1.74 0.097 -4.997071 .4472396
              -------------------+----------------------------------------------------------------
              sigma_u | .44037201

              sigma_e | .38186315

              rho | .57079981 (fraction of variance due to u_i)
              ------------------------------------------------------------------------------------
              Questions:
              1. Are these results have a meaning of being inserted in my paper squeletto?
              2. How could I interpret these outputs specially for INSTQ_CO2 coefficient and groups (I mean country1)?
              3. why if I control the group (I have 6 countries) the first one which is Australia has been deleted by stata?
              4. How could I interpret the coeff of each country if they have sense?
              Thank you dear community and I wait for your help.
              Last edited by IRAMAHORO GERARD; 16 Sep 2023, 22:41.

              Comment


              • #8
                Iramahoro:
                your second code shows a quasi-extreme multicollinearity problem.
                Whilee R_sq within is sky-rocketing, most of your predictors are not statistically significant.
                Try a more parsimonious model, use cluster-robust standard errors and share what you typed and what Stata gave you back via CODE delimiters. Thanks.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Hi dear Carlo Lazzaro. Thank you for your response.
                  It is true my R^2 is very high but also all predictors have significant coefficients except FDI in both commands.
                  So considering your suggestion, stata output is the following:
                  "". xtreg logpat logCO2 INSTQ INST_CO2 logIIPR FDI GDP_growth Trade, fe vce(cluster Country1)
                  panels are not nested within clusters". the command can't be run so what will be the origin of this problem?""

                  Comment


                  • #10
                    Iramahoro:
                    1) please read and act on 12.3 How to use CODE delimiters for code, results, and data. Thanks;
                    That said:
                    2) cluster-robust standard errors. I ws wron, as I misread the number of observations (126) for the number of groups (21). With less than 30 clusters, non-default standard errors might be seriously misleading. Therefore, stick with the default ones. Sorry for the confusion;
                    3) over and above withn R_sq and statistical significance of the coefficients, you should check whether the functional form of your regressand is correctly specified (please see the following toy-example):
                    Code:
                    . use "https://www.stata-press.com/data/r17/nlswork.dta"
                    (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
                    
                    . xtreg ln_wage c.age##c.age, fe
                    
                    Fixed-effects (within) regression               Number of obs     =     28,510
                    Group variable: idcode                          Number of groups  =      4,710
                    
                    R-squared:                                      Obs per group:
                         Within  = 0.1087                                         min =          1
                         Between = 0.1006                                         avg =        6.1
                         Overall = 0.0865                                         max =         15
                    
                                                                    F(2,23798)        =    1451.88
                    corr(u_i, Xb) = 0.0440                          Prob > F          =     0.0000
                    
                    ------------------------------------------------------------------------------
                         ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                             age |   .0539076   .0028078    19.20   0.000     .0484041    .0594112
                                 |
                     c.age#c.age |  -.0005973   .0000465   -12.84   0.000    -.0006885   -.0005061
                                 |
                           _cons |    .639913   .0408906    15.65   0.000     .5597649    .7200611
                    -------------+----------------------------------------------------------------
                         sigma_u |   .4039153
                         sigma_e |  .30245467
                             rho |  .64073314   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------
                    F test that all u_i=0: F(4709, 23798) = 8.74                 Prob > F = 0.0000
                    
                    . predict fitted, xb
                    (24 missing values generated)
                    
                    . g sq_fitted=fitted^2
                    (24 missing values generated)
                    
                    . xtreg ln_wage fitted sq_fitted , fe
                    
                    Fixed-effects (within) regression               Number of obs     =     28,510
                    Group variable: idcode                          Number of groups  =      4,710
                    
                    R-squared:                                      Obs per group:
                         Within  = 0.1092                                         min =          1
                         Between = 0.1033                                         avg =        6.1
                         Overall = 0.0881                                         max =         15
                    
                                                                    F(2,23798)        =    1457.96
                    corr(u_i, Xb) = 0.0467                          Prob > F          =     0.0000
                    
                    ------------------------------------------------------------------------------
                         ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                          fitted |   2.569185    .476861     5.39   0.000     1.634507    3.503863
                       sq_fitted |    -.47432   .1440324    -3.29   0.001    -.7566326   -.1920074
                           _cons |  -1.290258   .3930351    -3.28   0.001    -2.060631   -.5198837
                    -------------+----------------------------------------------------------------
                         sigma_u |    .403403
                         sigma_e |  .30238578
                             rho |  .64025357   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------
                    F test that all u_i=0: F(4709, 23798) = 8.72                 Prob > F = 0.0000
                    
                    . test sq_fitted=0
                    
                     ( 1)  sq_fitted = 0
                    
                           F(  1, 23798) =   10.84
                                Prob > F =    0.0010
                    
                    .
                    As -test- outcome reaches stsistical significance, the model is (as expected) misspecified, in that it needs more predictors and/or interactions.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Hello dear Mr Carlo Lazarro. Many thanks for your kind responses and then, when I repeat my regression accompanied with model specification test the results are like this:
                      Code:
                       
                       xtreg logpat logCO2 INSTQ INST_CO2 logIIPR FDI GDP_growth Trade i.Country1 , fe robust
                      
                      Fixed-effects (within) regression               Number of obs     =        126
                      Group variable: Year                            Number of groups  =         21
                      
                      R-squared:                                      Obs per group:
                           Within  = 0.9631                                         min =          6
                           Between = 0.4482                                         avg =        6.0
                           Overall = 0.9096                                         max =          6
                      
                                                                      F(12,20)          =     443.87
                      corr(u_i, Xb) = 0.0415                          Prob > F          =     0.0000
                      
                                                         (Std. err. adjusted for 21 clusters in Year)
                      -------------------------------------------------------------------------------
                                    |               Robust
                             logpat | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                      --------------+----------------------------------------------------------------
                             logCO2 |   .3626265   .1015377     3.57   0.002     .1508225    .5744306
                              INSTQ |   .5388251   .2245202     2.40   0.026     .0704843    1.007166
                           INST_CO2 |  -.0755751   .0376895    -2.01   0.059     -.154194    .0030439
                            logIIPR |   .4419327   .0731009     6.05   0.000     .2894469    .5944184
                                FDI |   .0272996   .0262657     1.04   0.311    -.0274896    .0820888
                         GDP_growth |   .0444302   .0238153     1.87   0.077    -.0052477    .0941082
                              Trade |  -.0331399   .0114673    -2.89   0.009    -.0570604   -.0092194
                                    |
                           Country1 |
                            Brazil  |  -1.167918   .8900959    -1.31   0.204    -3.024625    .6887894
                             China  |   1.984053   1.104671     1.80   0.088    -.3202511    4.288357
                           Germany  |   1.255095   .4110715     3.05   0.006     .3976148    2.112575
                      South Africa  |   1.014754   .5439211     1.87   0.077    -.1198456    2.149354
                         U. States  |  -.7334651   .7596303    -0.97   0.346    -2.318026     .851096
                                    |
                              _cons |  -2.274916   1.304987    -1.74   0.097    -4.997071    .4472396
                      --------------+----------------------------------------------------------------
                            sigma_u |  .44037201
                            sigma_e |  .38186315
                                rho |  .57079981   (fraction of variance due to u_i)
                      -------------------------------------------------------------------------------
                      
                      . predict fitted, xb
                      
                      . gen sq_fitted=fitted^2
                      
                      .  xtreg logpat fitted sq_fitted, fe
                      
                      Fixed-effects (within) regression               Number of obs     =        126
                      Group variable: Year                            Number of groups  =         21
                      
                      R-squared:                                      Obs per group:
                           Within  = 0.9636                                         min =          6
                           Between = 0.4487                                         avg =        6.0
                           Overall = 0.9106                                         max =          6
                      
                                                                      F(2,103)          =    1362.94
                      corr(u_i, Xb) = 0.0403                          Prob > F          =     0.0000
                      
                      ------------------------------------------------------------------------------
                            logpat | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                      -------------+----------------------------------------------------------------
                            fitted |   .8006837   .1791975     4.47   0.000     .4452878     1.15608
                         sq_fitted |   .0123535   .0110429     1.12   0.266    -.0095475    .0342545
                             _cons |   .7684255   .7049044     1.09   0.278    -.6295862    2.166437
                      -------------+----------------------------------------------------------------
                           sigma_u |   .4380316
                           sigma_e |  .36066847
                               rho |  .59596098   (fraction of variance due to u_i)
                      ------------------------------------------------------------------------------
                      F test that all u_i=0: F(20, 103) = 8.81                     Prob > F = 0.0000
                      
                      . test sq_fitted=0
                      
                       ( 1)  sq_fitted = 0
                      
                             F(  1,   103) =    1.25
                                  Prob > F =    0.2659
                      As Prob>F not significant, can I say that my model is well specified?

                      Comment


                      • #12
                        Iramahoro:
                        my only comment here is to use -robust- stndard errors in -
                        xtreg logpat fitted sq_fitted, fe- , too. Please note that, under -xtreg-, the options -robust- and -vce(cluster clusterid)- do the very same job, as they both invoke cluster-robust standard errors.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Okay dear Carlo. So I am asking if I could maintain this model or not. I need your point of view dear Sir. Thank you again.

                          Comment


                          • #14
                            Iramahoro:
                            if, after
                            Code:
                             
                             xtreg logpat fitted sq_fitted, fe robust
                            -test sq_fitted=0- outcome will not reach statistical significance, you can go with your model.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              Hello dear Carlo lazzaro. Thank you for your knowledge sharing. and I think I can go ahead with my model as the output of -test sq-fitted = 0- is insignificant if I am not not wrong.
                              The output is the follow: . xtreg logpat fitted sq_fitted, fe robust

                              Code:
                               Fixed-effects (within) regression               Number of obs     =        126
                              Group variable: Year                            Number of groups  =         21
                              
                              R-squared:                                      Obs per group:
                                   Within  = 0.9636                                         min =          6
                                   Between = 0.4487                                         avg =        6.0
                                   Overall = 0.9106                                         max =          6
                              
                                                                              F(2,20)           =     984.50
                              corr(u_i, Xb) = 0.0403                          Prob > F          =     0.0000
                              
                                                                (Std. err. adjusted for 21 clusters in Year)
                              ------------------------------------------------------------------------------
                                           |               Robust
                                    logpat | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                              -------------+----------------------------------------------------------------
                                    fitted |   .8006837   .1701499     4.71   0.000     .4457572     1.15561
                                 sq_fitted |   .0123535   .0108075     1.14   0.267    -.0101906    .0348976
                                     _cons |   .7684255   .6518107     1.18   0.252    -.5912277    2.128079
                              -------------+----------------------------------------------------------------
                                   sigma_u |   .4380316
                                   sigma_e |  .36066847
                                       rho |  .59596098   (fraction of variance due to u_i)
                              ------------------------------------------------------------------------------
                              
                              . test sq_fitted=0
                              
                               ( 1)  sq_fitted = 0
                              
                                     F(  1,    20) =    1.31
                                          Prob > F =    0.2665

                              Comment

                              Working...
                              X