First differencing versus Fixed effects

Bob Geldof

Join Date: May 2021
Posts: 12

First differencing versus Fixed effects

13 May 2021, 12:47

I have a large panel dataset and am trying to replicate the results produced by a first differencing model by instead using a fixed effects model. I have read online the results should be somewhat similar.

The first differencing results:

Code:

. areg dln_cons_beer dln_excise_price dln_salestax_price dln_population,a(year)

Linear regression, absorbing indicators         Number of obs     =      1,607
Absorbed variable: year                         No. of categories =         33
                                                F(   3,   1571)   =       9.31
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2686
                                                Adj R-squared     =     0.2523
                                                Root MSE          =     0.0310

------------------------------------------------------------------------------------
     dln_cons_beer |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
  dln_excise_price |  -.8767847   .1690759    -5.19   0.000    -1.208423   -.5451466
dln_salestax_price |  -.1991993   .3006204    -0.66   0.508    -.7888587    .3904602
    dln_population |   .0335824   .0649213     0.52   0.605    -.0937592    .1609239
             _cons |   .0042591   .0011534     3.69   0.000     .0019968    .0065215
------------------------------------------------------------------------------------
F test of absorbed indicators: F(32, 1571) = 12.442           Prob > F = 0.000

The fixed effects results. I'm wondering why qualitatively they're very different. Perhaps I'm making incorrect assumptions that they should be similar or something else is going on.

Code:

. xtreg ln_cons_beer ln_excise_price ln_salestax_price ln_population i.year, fe

Fixed-effects (within) regression               Number of obs     =      1,656
Group variable: state                           Number of groups  =         49

R-sq:                                           Obs per group:
     within  = 0.6640                                         min =         32
     between = 0.1076                                         avg =       33.8
     overall = 0.2398                                         max =         34

                                                F(36,1571)        =      86.24
corr(u_i, Xb)  = -0.2019                        Prob > F          =     0.0000

-----------------------------------------------------------------------------------
     ln_cons_beer |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
  ln_excise_price |  -1.574363   .0521172   -30.21   0.000    -1.676589   -1.472136
ln_salestax_price |   .8174433   .2683902     3.05   0.002     .2910027    1.343884
    ln_population |  -.0254941   .0157272    -1.62   0.105    -.0563425    .0053544
            _cons |   3.254145   .1233847    26.37   0.000     3.012129    3.496161
-----------------------------------------------------------------------------------
F test that all u_i=0: F(48, 1571) = 213.47                  Prob > F = 0.0000

Thank you.

Tags: None

Andrew Musau

Join Date: Oct 2014

Posts: 10225
#2

13 May 2021, 13:27

Fixed effects and First Differencing (FD) are equivalent only when T=2. You need to difference the year dummies to correctly implement FD and also drop the constant. Additionally, cluster your standard errors as first-differencing induces serial correlation in the errors. So your FD regression should be

Code:

xi: regress D.ln_cons_beer D.(ln_excise_price ln_salestax_price ln_population i.year), nocons cluster(state)
Comment

Bob Geldof

Join Date: May 2021
Posts: 12

13 May 2021, 14:08

Thank you for your response. Running that code yields exactly the same as the initial areg command (when clustering standard errors). Why is it the case that FD yields such a different result to xtreg with fixed effects? When I run it on only two time periods the results are still not equivalent:

Code:

xi: regress D.ln_cons_beer D.(ln_excise_price ln_salestax_price ln_population) i.year if year > 2001, nocons vce(cluster state)
Linear regression                               Number of obs     =         98
                                                F(5, 48)          =      52.75
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1189
                                                Root MSE          =     .03383

                                      (Std. Err. adjusted for 49 clusters in state)
-----------------------------------------------------------------------------------
                  |               Robust
   D.ln_cons_beer |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
  ln_excise_price |
              D1. |  -.6514689   .0768177    -8.48   0.000    -.8059211   -.4970167
                  |
ln_salestax_price |
              D1. |   1.982322     2.7548     0.72   0.475    -3.556574    7.521218
                  |
    ln_population |
              D1. |  -.9079943   .3534391    -2.57   0.013    -1.618631   -.1973575

and...

Code:

xtreg ln_cons_beer ln_excise_price ln_salestax_price ln_population i.year if year > 2001, fe vce(cluster state)

Fixed-effects (within) regression               Number of obs     =         98
Group variable: state                           Number of groups  =         49

R-sq:                                           Obs per group:
     within  = 0.1179                                         min =          2
     between = 0.1269                                         avg =        2.0
     overall = 0.1254                                         max =          2

                                                F(4,48)           =      89.68
corr(u_i, Xb)  = -0.9868                        Prob > F          =     0.0000

                                      (Std. Err. adjusted for 49 clusters in state)
-----------------------------------------------------------------------------------
                  |               Robust
     ln_cons_beer |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
  ln_excise_price |  -.6514677   .0697474    -9.34   0.000    -.7917041   -.5112312
ln_salestax_price |   2.943518    3.71878     0.79   0.433    -4.533591    10.42063
    ln_population |  -1.008372   .6968511    -1.45   0.154    -2.409485    .3927408

Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

13 May 2021, 14:22

For a balanced panel with two periods they have to be algebraically the same. So you have some error in the implementation if you are getting them different.

For more than 2 periods they are different, and the "fixed effects" (i.e., the within transformation) is optimal if your idiosyncratic error is iid, and the first differences is optimal if your idiosyncratic error is a random walk.

Originally posted by Bob Geldof View Post

Code:

xi: regress D.ln_cons_beer D.(ln_excise_price ln_salestax_price ln_population) i.year if year > 2001, nocons vce(cluster state)
Linear regression Number of obs = 98
F(5, 48) = 52.75
Prob > F = 0.0000
R-squared = 0.1189
Root MSE = .03383

(Std. Err. adjusted for 49 clusters in state)
-----------------------------------------------------------------------------------
| Robust
D.ln_cons_beer | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------+----------------------------------------------------------------
ln_excise_price |
D1. | -.6514689 .0768177 -8.48 0.000 -.8059211 -.4970167
|
ln_salestax_price |
D1. | 1.982322 2.7548 0.72 0.475 -3.556574 7.521218
|
ln_population |
D1. | -.9079943 .3534391 -2.57 0.013 -1.618631 -.1973575

and...

Code:

xtreg ln_cons_beer ln_excise_price ln_salestax_price ln_population i.year if year > 2001, fe vce(cluster state)

Fixed-effects (within) regression Number of obs = 98
Group variable: state Number of groups = 49

R-sq: Obs per group:
within = 0.1179 min = 2
between = 0.1269 avg = 2.0
overall = 0.1254 max = 2

F(4,48) = 89.68
corr(u_i, Xb) = -0.9868 Prob > F = 0.0000

(Std. Err. adjusted for 49 clusters in state)
-----------------------------------------------------------------------------------
| Robust
ln_cons_beer | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------+----------------------------------------------------------------
ln_excise_price | -.6514677 .0697474 -9.34 0.000 -.7917041 -.5112312
ln_salestax_price | 2.943518 3.71878 0.79 0.433 -4.533591 10.42063
ln_population | -1.008372 .6968511 -1.45 0.154 -2.409485 .3927408

Comment

Bob Geldof

Join Date: May 2021

Posts: 12
#5

13 May 2021, 14:48

Originally posted by Joro Kolev View Post

For a balanced panel with two periods they have to be algebraically the same. So you have some error in the implementation if you are getting them different.

For more than 2 periods they are different, and the "fixed effects" (i.e., the within transformation) is optimal if your idiosyncratic error is iid, and the first differences is optimal if your idiosyncratic error is a random walk.

Thank you. Forgive my complete lack of econometrics knowledge but please can you explain what you mean by 'algebraically the same'?
Also I keep seeing iid vs random walk and I am not sure how to know what my errors are? I'm looking at how changes in beer tax effect changes in alcohol consumption over the US states between 1970 and 2003.
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

13 May 2021, 15:48

If means that when we have more than two periods, they do not have to be the same, here:

Code:

. webuse grunfeld, clear

. xtreg invest mvalue i.year, fe

Fixed-effects (within) regression               Number of obs     =        200
Group variable: company                         Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.5027                                         min =         20
     between = 0.8572                                         avg =       20.0
     overall = 0.7662                                         max =         20

                                                F(20,170)         =       8.59
corr(u_i, Xb)  = -0.5686                        Prob > F          =     0.0000

------------------------------------------------------------------------------
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1799679   .0206334     8.72   0.000     .1392372    .2206986
             |
        year |
       1936  |   -38.1056   37.04158    -1.03   0.305    -111.2263    35.01509
       1937  |  -66.31173   38.60245    -1.72   0.088    -142.5136    9.890143
       1938  |   -20.3273   36.35156    -0.56   0.577    -92.08588    51.43128
       1939  |  -59.30898   37.04449    -1.60   0.111    -132.4354    13.81745
       1940  |  -36.24558   37.29061    -0.97   0.332    -109.8579    37.36671
       1941  |  -1.500652   37.07777    -0.04   0.968     -74.6928     71.6915
       1942  |   18.71706   36.41329     0.51   0.608    -53.16338     90.5975
       1943  |  -6.731505   36.72012    -0.18   0.855    -79.21764    65.75463
       1944  |  -9.464895   36.83487    -0.26   0.798    -82.17755    63.24776
       1945  |  -26.44723   37.32046    -0.71   0.480    -100.1184    47.22398
       1946  |   -.739807   37.65736    -0.02   0.984    -75.07607    73.59646
       1947  |   34.82738   36.51991     0.95   0.342    -37.26352    106.9183
       1948  |   47.09646    36.4475     1.29   0.198    -24.85151    119.0444
       1949  |    28.6898   36.49548     0.79   0.433    -43.35289    100.7325
       1950  |   29.79853   36.66158     0.81   0.417    -42.57203    102.1691
       1951  |   36.74273   37.68058     0.98   0.331    -37.63936    111.1248
       1952  |   52.86058   37.95357     1.39   0.166    -22.06041    127.7816
       1953  |   64.20592   39.56961     1.62   0.107    -13.90515     142.317
       1954  |   69.57366   39.24665     1.77   0.078    -7.899875    147.0472
             |
       _cons |  -54.57607   29.48989    -1.85   0.066    -112.7896    3.637465
-------------+----------------------------------------------------------------
     sigma_u |  93.804227
     sigma_e |  81.028709
         rho |  .57268461   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(9, 170) = 17.64                     Prob > F = 0.0000

. xi: reg d.invest d.(mvalue i.year), nocons
i.year            _Iyear_1935-1954    (naturally coded; _Iyear_1935 omitted)

      Source |       SS           df       MS      Number of obs   =       190
-------------+----------------------------------   F(20, 170)      =      6.25
       Model |  256715.317        20  12835.7659   Prob > F        =    0.0000
    Residual |  348962.182       170  2052.71872   R-squared       =    0.4238
-------------+----------------------------------   Adj R-squared   =    0.3561
       Total |    605677.5       190  3187.77631   Root MSE        =    45.307

------------------------------------------------------------------------------
    D.invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |
         D1. |   .0806285   .0102636     7.86   0.000     .0603679    .1008892
             |
 _Iyear_1936 |
         D1. |  -1.141129    14.8276    -0.08   0.939    -30.41106     28.1288
             |
 _Iyear_1937 |
         D1. |  -2.255822   21.31535    -0.11   0.916    -44.33268    39.82104
             |
 _Iyear_1938 |
         D1. |   -6.45247     24.857    -0.26   0.795    -55.52061    42.61567
             |
 _Iyear_1939 |
         D1. |  -22.27696   28.90894    -0.77   0.442    -79.34368    34.78977
             |
 _Iyear_1940 |
         D1. |    6.12722   32.33459     0.19   0.850     -57.7018    69.95624
             |
 _Iyear_1941 |
         D1. |   36.29569    35.3112     1.03   0.305    -33.40921    106.0006
             |
 _Iyear_1942 |
         D1. |   35.94002   37.94825     0.95   0.345    -38.97045    110.8505
             |
 _Iyear_1943 |
         D1. |   21.84773   40.63119     0.54   0.591    -58.35892    102.0544
             |
 _Iyear_1944 |
         D1. |    22.3536   43.10747     0.52   0.605    -62.74127    107.4485
             |
 _Iyear_1945 |
         D1. |   16.53035   45.52401     0.36   0.717    -73.33481    106.3955
             |
 _Iyear_1946 |
         D1. |   48.58149   47.79077     1.02   0.311     -45.7583    142.9213
             |
 _Iyear_1947 |
         D1. |   56.66475   49.68252     1.14   0.256    -41.40938    154.7389
             |
 _Iyear_1948 |
         D1. |   65.92217   51.69446     1.28   0.204    -36.12357    167.9679
             |
 _Iyear_1949 |
         D1. |   49.55931   53.65124     0.92   0.357    -56.34913    155.4678
             |
 _Iyear_1950 |
         D1. |   56.57883   55.55838     1.02   0.310    -53.09434     166.252
             |
 _Iyear_1951 |
         D1. |    86.4733   57.53912     1.50   0.135    -27.10988    200.0565
             |
 _Iyear_1952 |
         D1. |   107.1904   59.33912     1.81   0.073    -9.946047    224.3268
             |
 _Iyear_1953 |
         D1. |    140.728   61.29765     2.30   0.023     19.72544    261.7306
             |
 _Iyear_1954 |
         D1. |   142.1382   62.89972     2.26   0.025     17.97308    266.3033
------------------------------------------------------------------------------

But when I keep only two periods, they become algebraically equivalent, here:

Code:

. keep if year<1937
(180 observations deleted)

. xtreg invest mvalue i.year, fe

Fixed-effects (within) regression               Number of obs     =         20
Group variable: company                         Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.4571                                         min =          2
     between = 0.7746                                         avg =        2.0
     overall = 0.7367                                         max =          2

                                                F(2,8)            =       3.37
corr(u_i, Xb)  = 0.6514                         Prob > F          =     0.0869

------------------------------------------------------------------------------
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .0442829   .0287062     1.54   0.161    -.0219137    .1104795
             |
        year |
       1936  |    12.3832   17.47315     0.71   0.499    -27.90995    52.67635
             |
       _cons |   41.41713   22.54005     1.84   0.103    -10.56032    93.39458
-------------+----------------------------------------------------------------
     sigma_u |  81.619228
     sigma_e |  30.920299
         rho |  .87449532   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(9, 8) = 7.55                        Prob > F = 0.0046

. xi: reg d.invest d.(mvalue i.year), nocons
i.year            _Iyear_1935-1936    (naturally coded; _Iyear_1935 omitted)

      Source |       SS           df       MS      Number of obs   =        10
-------------+----------------------------------   F(2, 8)         =      3.37
       Model |  12879.8533         2  6439.92665   Prob > F        =    0.0869
    Residual |  15297.0386         8  1912.12982   R-squared       =    0.4571
-------------+----------------------------------   Adj R-squared   =    0.3214
       Total |  28176.8919        10  2817.68919   Root MSE        =    43.728

------------------------------------------------------------------------------
    D.invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |
         D1. |   .0442829   .0287062     1.54   0.161    -.0219137    .1104795
             |
 _Iyear_1936 |
         D1. |    12.3832   17.47315     0.71   0.499    -27.90995    52.67635
------------------------------------------------------------------------------

.

Originally posted by Bob Geldof View Post

Thank you. Forgive my complete lack of econometrics knowledge but please can you explain what you mean by 'algebraically the same'?
Also I keep seeing iid vs random walk and I am not sure how to know what my errors are? I'm looking at how changes in beer tax effect changes in alcohol consumption over the US states between 1970 and 2003.

Comment

Bob Geldof

Join Date: May 2021

Posts: 12
#7

13 May 2021, 15:53

Thank you - I have managed to get identical results for T=2 now. Why do the different methods yield different results when T≠2 and how to know which to go for i.e. whether errors are idd or random walk?

Last edited by Bob Geldof; 13 May 2021, 16:15.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35734
#8

14 May 2021, 02:55

If your name really is Bob Geldof, well and good. Otherwise please note our request for use of a real name. https://www.statalist.org/forums/help#realnames and #3 at https://www.statalist.org/forums/help#adviceextras
Comment

Announcement

First differencing versus Fixed effects

Comment

Comment

Comment

Comment

Comment

Comment

Comment