Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • First differencing versus Fixed effects

    I have a large panel dataset and am trying to replicate the results produced by a first differencing model by instead using a fixed effects model. I have read online the results should be somewhat similar.

    The first differencing results:

    Code:
    . areg dln_cons_beer dln_excise_price dln_salestax_price dln_population,a(year)
    
    Linear regression, absorbing indicators         Number of obs     =      1,607
    Absorbed variable: year                         No. of categories =         33
                                                    F(   3,   1571)   =       9.31
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.2686
                                                    Adj R-squared     =     0.2523
                                                    Root MSE          =     0.0310
    
    ------------------------------------------------------------------------------------
         dln_cons_beer |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------------+----------------------------------------------------------------
      dln_excise_price |  -.8767847   .1690759    -5.19   0.000    -1.208423   -.5451466
    dln_salestax_price |  -.1991993   .3006204    -0.66   0.508    -.7888587    .3904602
        dln_population |   .0335824   .0649213     0.52   0.605    -.0937592    .1609239
                 _cons |   .0042591   .0011534     3.69   0.000     .0019968    .0065215
    ------------------------------------------------------------------------------------
    F test of absorbed indicators: F(32, 1571) = 12.442           Prob > F = 0.000
    The fixed effects results. I'm wondering why qualitatively they're very different. Perhaps I'm making incorrect assumptions that they should be similar or something else is going on.
    Code:
    . xtreg ln_cons_beer ln_excise_price ln_salestax_price ln_population i.year, fe
    
    Fixed-effects (within) regression               Number of obs     =      1,656
    Group variable: state                           Number of groups  =         49
    
    R-sq:                                           Obs per group:
         within  = 0.6640                                         min =         32
         between = 0.1076                                         avg =       33.8
         overall = 0.2398                                         max =         34
    
                                                    F(36,1571)        =      86.24
    corr(u_i, Xb)  = -0.2019                        Prob > F          =     0.0000
    
    -----------------------------------------------------------------------------------
         ln_cons_beer |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------+----------------------------------------------------------------
      ln_excise_price |  -1.574363   .0521172   -30.21   0.000    -1.676589   -1.472136
    ln_salestax_price |   .8174433   .2683902     3.05   0.002     .2910027    1.343884
        ln_population |  -.0254941   .0157272    -1.62   0.105    -.0563425    .0053544
                _cons |   3.254145   .1233847    26.37   0.000     3.012129    3.496161
    -----------------------------------------------------------------------------------
    F test that all u_i=0: F(48, 1571) = 213.47                  Prob > F = 0.0000
    Thank you.

  • #2
    Fixed effects and First Differencing (FD) are equivalent only when T=2. You need to difference the year dummies to correctly implement FD and also drop the constant. Additionally, cluster your standard errors as first-differencing induces serial correlation in the errors. So your FD regression should be

    Code:
    xi: regress D.ln_cons_beer D.(ln_excise_price ln_salestax_price ln_population i.year), nocons cluster(state)

    Comment


    • #3
      Thank you for your response. Running that code yields exactly the same as the initial areg command (when clustering standard errors). Why is it the case that FD yields such a different result to xtreg with fixed effects? When I run it on only two time periods the results are still not equivalent:

      Code:
      xi: regress D.ln_cons_beer D.(ln_excise_price ln_salestax_price ln_population) i.year if year > 2001, nocons vce(cluster state)
      Linear regression                               Number of obs     =         98
                                                      F(5, 48)          =      52.75
                                                      Prob > F          =     0.0000
                                                      R-squared         =     0.1189
                                                      Root MSE          =     .03383
      
                                            (Std. Err. adjusted for 49 clusters in state)
      -----------------------------------------------------------------------------------
                        |               Robust
         D.ln_cons_beer |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      ------------------+----------------------------------------------------------------
        ln_excise_price |
                    D1. |  -.6514689   .0768177    -8.48   0.000    -.8059211   -.4970167
                        |
      ln_salestax_price |
                    D1. |   1.982322     2.7548     0.72   0.475    -3.556574    7.521218
                        |
          ln_population |
                    D1. |  -.9079943   .3534391    -2.57   0.013    -1.618631   -.1973575
      and...
      Code:
      xtreg ln_cons_beer ln_excise_price ln_salestax_price ln_population i.year if year > 2001, fe vce(cluster state)
      
      Fixed-effects (within) regression               Number of obs     =         98
      Group variable: state                           Number of groups  =         49
      
      R-sq:                                           Obs per group:
           within  = 0.1179                                         min =          2
           between = 0.1269                                         avg =        2.0
           overall = 0.1254                                         max =          2
      
                                                      F(4,48)           =      89.68
      corr(u_i, Xb)  = -0.9868                        Prob > F          =     0.0000
      
                                            (Std. Err. adjusted for 49 clusters in state)
      -----------------------------------------------------------------------------------
                        |               Robust
           ln_cons_beer |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      ------------------+----------------------------------------------------------------
        ln_excise_price |  -.6514677   .0697474    -9.34   0.000    -.7917041   -.5112312
      ln_salestax_price |   2.943518    3.71878     0.79   0.433    -4.533591    10.42063
          ln_population |  -1.008372   .6968511    -1.45   0.154    -2.409485    .3927408

      Comment


      • #4
        For a balanced panel with two periods they have to be algebraically the same. So you have some error in the implementation if you are getting them different.

        For more than 2 periods they are different, and the "fixed effects" (i.e., the within transformation) is optimal if your idiosyncratic error is iid, and the first differences is optimal if your idiosyncratic error is a random walk.


        Originally posted by Bob Geldof View Post
        Thank you for your response. Running that code yields exactly the same as the initial areg command (when clustering standard errors). Why is it the case that FD yields such a different result to xtreg with fixed effects? When I run it on only two time periods the results are still not equivalent:

        Code:
        xi: regress D.ln_cons_beer D.(ln_excise_price ln_salestax_price ln_population) i.year if year > 2001, nocons vce(cluster state)
        Linear regression Number of obs = 98
        F(5, 48) = 52.75
        Prob > F = 0.0000
        R-squared = 0.1189
        Root MSE = .03383
        
        (Std. Err. adjusted for 49 clusters in state)
        -----------------------------------------------------------------------------------
        | Robust
        D.ln_cons_beer | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        ------------------+----------------------------------------------------------------
        ln_excise_price |
        D1. | -.6514689 .0768177 -8.48 0.000 -.8059211 -.4970167
        |
        ln_salestax_price |
        D1. | 1.982322 2.7548 0.72 0.475 -3.556574 7.521218
        |
        ln_population |
        D1. | -.9079943 .3534391 -2.57 0.013 -1.618631 -.1973575
        and...
        Code:
        xtreg ln_cons_beer ln_excise_price ln_salestax_price ln_population i.year if year > 2001, fe vce(cluster state)
        
        Fixed-effects (within) regression Number of obs = 98
        Group variable: state Number of groups = 49
        
        R-sq: Obs per group:
        within = 0.1179 min = 2
        between = 0.1269 avg = 2.0
        overall = 0.1254 max = 2
        
        F(4,48) = 89.68
        corr(u_i, Xb) = -0.9868 Prob > F = 0.0000
        
        (Std. Err. adjusted for 49 clusters in state)
        -----------------------------------------------------------------------------------
        | Robust
        ln_cons_beer | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        ------------------+----------------------------------------------------------------
        ln_excise_price | -.6514677 .0697474 -9.34 0.000 -.7917041 -.5112312
        ln_salestax_price | 2.943518 3.71878 0.79 0.433 -4.533591 10.42063
        ln_population | -1.008372 .6968511 -1.45 0.154 -2.409485 .3927408

        Comment


        • #5
          Originally posted by Joro Kolev View Post
          For a balanced panel with two periods they have to be algebraically the same. So you have some error in the implementation if you are getting them different.

          For more than 2 periods they are different, and the "fixed effects" (i.e., the within transformation) is optimal if your idiosyncratic error is iid, and the first differences is optimal if your idiosyncratic error is a random walk.



          Thank you. Forgive my complete lack of econometrics knowledge but please can you explain what you mean by 'algebraically the same'?
          Also I keep seeing iid vs random walk and I am not sure how to know what my errors are? I'm looking at how changes in beer tax effect changes in alcohol consumption over the US states between 1970 and 2003.

          Comment


          • #6
            If means that when we have more than two periods, they do not have to be the same, here:

            Code:
            . webuse grunfeld, clear
            
            . xtreg invest mvalue i.year, fe
            
            Fixed-effects (within) regression               Number of obs     =        200
            Group variable: company                         Number of groups  =         10
            
            R-sq:                                           Obs per group:
                 within  = 0.5027                                         min =         20
                 between = 0.8572                                         avg =       20.0
                 overall = 0.7662                                         max =         20
            
                                                            F(20,170)         =       8.59
            corr(u_i, Xb)  = -0.5686                        Prob > F          =     0.0000
            
            ------------------------------------------------------------------------------
                  invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                  mvalue |   .1799679   .0206334     8.72   0.000     .1392372    .2206986
                         |
                    year |
                   1936  |   -38.1056   37.04158    -1.03   0.305    -111.2263    35.01509
                   1937  |  -66.31173   38.60245    -1.72   0.088    -142.5136    9.890143
                   1938  |   -20.3273   36.35156    -0.56   0.577    -92.08588    51.43128
                   1939  |  -59.30898   37.04449    -1.60   0.111    -132.4354    13.81745
                   1940  |  -36.24558   37.29061    -0.97   0.332    -109.8579    37.36671
                   1941  |  -1.500652   37.07777    -0.04   0.968     -74.6928     71.6915
                   1942  |   18.71706   36.41329     0.51   0.608    -53.16338     90.5975
                   1943  |  -6.731505   36.72012    -0.18   0.855    -79.21764    65.75463
                   1944  |  -9.464895   36.83487    -0.26   0.798    -82.17755    63.24776
                   1945  |  -26.44723   37.32046    -0.71   0.480    -100.1184    47.22398
                   1946  |   -.739807   37.65736    -0.02   0.984    -75.07607    73.59646
                   1947  |   34.82738   36.51991     0.95   0.342    -37.26352    106.9183
                   1948  |   47.09646    36.4475     1.29   0.198    -24.85151    119.0444
                   1949  |    28.6898   36.49548     0.79   0.433    -43.35289    100.7325
                   1950  |   29.79853   36.66158     0.81   0.417    -42.57203    102.1691
                   1951  |   36.74273   37.68058     0.98   0.331    -37.63936    111.1248
                   1952  |   52.86058   37.95357     1.39   0.166    -22.06041    127.7816
                   1953  |   64.20592   39.56961     1.62   0.107    -13.90515     142.317
                   1954  |   69.57366   39.24665     1.77   0.078    -7.899875    147.0472
                         |
                   _cons |  -54.57607   29.48989    -1.85   0.066    -112.7896    3.637465
            -------------+----------------------------------------------------------------
                 sigma_u |  93.804227
                 sigma_e |  81.028709
                     rho |  .57268461   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            F test that all u_i=0: F(9, 170) = 17.64                     Prob > F = 0.0000
            
            . xi: reg d.invest d.(mvalue i.year), nocons
            i.year            _Iyear_1935-1954    (naturally coded; _Iyear_1935 omitted)
            
                  Source |       SS           df       MS      Number of obs   =       190
            -------------+----------------------------------   F(20, 170)      =      6.25
                   Model |  256715.317        20  12835.7659   Prob > F        =    0.0000
                Residual |  348962.182       170  2052.71872   R-squared       =    0.4238
            -------------+----------------------------------   Adj R-squared   =    0.3561
                   Total |    605677.5       190  3187.77631   Root MSE        =    45.307
            
            ------------------------------------------------------------------------------
                D.invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                  mvalue |
                     D1. |   .0806285   .0102636     7.86   0.000     .0603679    .1008892
                         |
             _Iyear_1936 |
                     D1. |  -1.141129    14.8276    -0.08   0.939    -30.41106     28.1288
                         |
             _Iyear_1937 |
                     D1. |  -2.255822   21.31535    -0.11   0.916    -44.33268    39.82104
                         |
             _Iyear_1938 |
                     D1. |   -6.45247     24.857    -0.26   0.795    -55.52061    42.61567
                         |
             _Iyear_1939 |
                     D1. |  -22.27696   28.90894    -0.77   0.442    -79.34368    34.78977
                         |
             _Iyear_1940 |
                     D1. |    6.12722   32.33459     0.19   0.850     -57.7018    69.95624
                         |
             _Iyear_1941 |
                     D1. |   36.29569    35.3112     1.03   0.305    -33.40921    106.0006
                         |
             _Iyear_1942 |
                     D1. |   35.94002   37.94825     0.95   0.345    -38.97045    110.8505
                         |
             _Iyear_1943 |
                     D1. |   21.84773   40.63119     0.54   0.591    -58.35892    102.0544
                         |
             _Iyear_1944 |
                     D1. |    22.3536   43.10747     0.52   0.605    -62.74127    107.4485
                         |
             _Iyear_1945 |
                     D1. |   16.53035   45.52401     0.36   0.717    -73.33481    106.3955
                         |
             _Iyear_1946 |
                     D1. |   48.58149   47.79077     1.02   0.311     -45.7583    142.9213
                         |
             _Iyear_1947 |
                     D1. |   56.66475   49.68252     1.14   0.256    -41.40938    154.7389
                         |
             _Iyear_1948 |
                     D1. |   65.92217   51.69446     1.28   0.204    -36.12357    167.9679
                         |
             _Iyear_1949 |
                     D1. |   49.55931   53.65124     0.92   0.357    -56.34913    155.4678
                         |
             _Iyear_1950 |
                     D1. |   56.57883   55.55838     1.02   0.310    -53.09434     166.252
                         |
             _Iyear_1951 |
                     D1. |    86.4733   57.53912     1.50   0.135    -27.10988    200.0565
                         |
             _Iyear_1952 |
                     D1. |   107.1904   59.33912     1.81   0.073    -9.946047    224.3268
                         |
             _Iyear_1953 |
                     D1. |    140.728   61.29765     2.30   0.023     19.72544    261.7306
                         |
             _Iyear_1954 |
                     D1. |   142.1382   62.89972     2.26   0.025     17.97308    266.3033
            ------------------------------------------------------------------------------
            But when I keep only two periods, they become algebraically equivalent, here:

            Code:
            . keep if year<1937
            (180 observations deleted)
            
            . xtreg invest mvalue i.year, fe
            
            Fixed-effects (within) regression               Number of obs     =         20
            Group variable: company                         Number of groups  =         10
            
            R-sq:                                           Obs per group:
                 within  = 0.4571                                         min =          2
                 between = 0.7746                                         avg =        2.0
                 overall = 0.7367                                         max =          2
            
                                                            F(2,8)            =       3.37
            corr(u_i, Xb)  = 0.6514                         Prob > F          =     0.0869
            
            ------------------------------------------------------------------------------
                  invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                  mvalue |   .0442829   .0287062     1.54   0.161    -.0219137    .1104795
                         |
                    year |
                   1936  |    12.3832   17.47315     0.71   0.499    -27.90995    52.67635
                         |
                   _cons |   41.41713   22.54005     1.84   0.103    -10.56032    93.39458
            -------------+----------------------------------------------------------------
                 sigma_u |  81.619228
                 sigma_e |  30.920299
                     rho |  .87449532   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            F test that all u_i=0: F(9, 8) = 7.55                        Prob > F = 0.0046
            
            . xi: reg d.invest d.(mvalue i.year), nocons
            i.year            _Iyear_1935-1936    (naturally coded; _Iyear_1935 omitted)
            
                  Source |       SS           df       MS      Number of obs   =        10
            -------------+----------------------------------   F(2, 8)         =      3.37
                   Model |  12879.8533         2  6439.92665   Prob > F        =    0.0869
                Residual |  15297.0386         8  1912.12982   R-squared       =    0.4571
            -------------+----------------------------------   Adj R-squared   =    0.3214
                   Total |  28176.8919        10  2817.68919   Root MSE        =    43.728
            
            ------------------------------------------------------------------------------
                D.invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                  mvalue |
                     D1. |   .0442829   .0287062     1.54   0.161    -.0219137    .1104795
                         |
             _Iyear_1936 |
                     D1. |    12.3832   17.47315     0.71   0.499    -27.90995    52.67635
            ------------------------------------------------------------------------------
            
            .
            Originally posted by Bob Geldof View Post

            Thank you. Forgive my complete lack of econometrics knowledge but please can you explain what you mean by 'algebraically the same'?
            Also I keep seeing iid vs random walk and I am not sure how to know what my errors are? I'm looking at how changes in beer tax effect changes in alcohol consumption over the US states between 1970 and 2003.

            Comment


            • #7
              Thank you - I have managed to get identical results for T=2 now. Why do the different methods yield different results when T≠2 and how to know which to go for i.e. whether errors are idd or random walk?
              Last edited by Bob Geldof; 13 May 2021, 16:15.

              Comment


              • #8
                If your name really is Bob Geldof, well and good. Otherwise please note our request for use of a real name. https://www.statalist.org/forums/help#realnames and #3 at https://www.statalist.org/forums/help#adviceextras

                Comment

                Working...
                X