Interpreting margins in a two-way fixed effects model

Tom Kemeny

Join Date: Apr 2016
Posts: 52

Interpreting margins in a two-way fixed effects model

06 Nov 2018, 08:43

Time-varying effects in a fixed effects panel model

Hello Statlist,

I have a relatively balanced panel of about 700 locations, observed on a decadal basis between 1940 and 2010 (T=8). My baseline specification is a two-way fixed effects model - something like:

Code:

xtreg y x z i.year, fe

where y is the log of a continuous variable; x, which is my key independent variable, is a fraction ranging in practice between 0 and about 0.65; and z is a vector of controls.

However, my theory leads me to expect the effect of x on y to vary by year - or at least by some periods within my overall panel. Motivated by this idea, I have tried versions where I run the model above separately for years within each distinct period. Confirming the intuition given by theory, this tells me that x is not significant for period 1, and is positive and significant in period 2. So, below are results for period 1, followed by those for period 2:

Code:

.                                         eststo l1: xtreg y x z i.year if year<1990, fe  // convergence

Fixed-effects (within) regression               Number of obs     =      2,665
Group variable: czone                           Number of groups  =        687

R-sq:                                           Obs per group:
     within  = 0.9482                                         min =          1
     between = 0.1759                                         avg =        3.9
     overall = 0.7102                                         max =          5

                                                F(6,1972)         =    6016.99
corr(u_i, Xb)  = -0.1600                        Prob > F          =     0.0000

------------------------------------------------------------------------------
         y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x  |   .0578142   .1297792     0.45   0.656    -.1967045     .312333
         z   |    -.02596   .0120427    -2.16   0.031    -.0495778   -.0023422
             |
        year |
       1950  |   .4027728   .0064565    62.38   0.000     .3901105    .4154351
       1960  |   .6947271   .0073666    94.31   0.000       .68028    .7091743
       1970  |   .9610166    .008406   114.33   0.000     .9445311    .9775021
       1980  |   .9730421   .0096244   101.10   0.000      .954167    .9919173
             |
       _cons |   2.155035   .0719909    29.93   0.000     2.013849    2.296221
-------------+----------------------------------------------------------------
     sigma_u |  .19346937
     sigma_e |  .09002546
         rho |  .82201413   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(686, 1972) = 7.69                   Prob > F = 0.0000

Code:

.                                         eststo l2: xtreg y x ltp i.year if year>1970, fe  // divergence

Fixed-effects (within) regression               Number of obs     =      2,478
Group variable: czone                           Number of groups  =        696

R-sq:                                           Obs per group:
     within  = 0.7363                                         min =          1
     between = 0.5035                                         avg =        3.6
     overall = 0.5922                                         max =          4

                                                F(5,1777)         =     992.10
corr(u_i, Xb)  = -0.1616                        Prob > F          =     0.0000

------------------------------------------------------------------------------
         y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        x    |   .1704771    .060941     2.80   0.005     .0509535    .2900006
         z   |   .0579945   .0093522     6.20   0.000     .0396521    .0763369
             |
        year |
       1990  |  -.0151207   .0030884    -4.90   0.000     -.021178   -.0090634
       2000  |   .0730655    .003749    19.49   0.000     .0657125    .0804185
       2010  |   .1147842   .0047111    24.36   0.000     .1055443     .124024
             |
       _cons |    2.54693   .0568694    44.79   0.000     2.435393    2.658468
-------------+----------------------------------------------------------------
     sigma_u |  .08922207
     sigma_e |  .04943455
         rho |  .76511993   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(695, 1777) = 9.39                   Prob > F = 0.0000

Here is my question: Based on the idea that another way of thinking of this is that the slope of x on y depends on the year, and that within 'periods' slopes may not be consistent (my mental picture is something like a sine-wave across time) I was wondering if it makes sense to run something like

Code:

xtreg y i.year##c.x z, fe

After running a regression like this, I have tried to interpret the resulting output using margins, as follows:

Code:

margins year, at(x=(0(0.1)0.7))
marginsplot, xdimensions(at(x)) recast(line) recastci(rarea)

this produced the following:

Click image for larger version

Name: Screen Shot 2018-11-06 at 3.32.06 PM.png
Views: 1
Size: 163.9 KB
ID: 1469153

Based on my naive interpretation, the results are the opposite of what I got when I did the 'period' based regressions - ie coefficients for years in period 1 are positive and significant, and the coefficients for years that fall into period 2 are negative, and largely insignificant.

However - it may be that I am misinterpreting these results...more generally I am not quite clear on whether to interpret margins in a FE model differently from what one would do with a non-panel OLS model.

Another possibility is that this interaction-based approach is not sensible given the questions I am chasing down.

I would welcome the collective wisdom of the list!

Thank you
Tom

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#2

06 Nov 2018, 10:08

Well, there are several differences between your original separate period models and your interaction approach.

First, the two period models cover overlapping periods of time. The first period is pre-1990, but the second period is post 1970. So the 1980 data is counted in both regressions, clouding their separate intepretations.

Next, your separate period models allow for different coefficients of z (and of the common year indicator for 1980, which is, absorbed in the constant term in the second period model), whereas in the interaction model these are constrained to be the same over time. These differences may account for the problem.

So to do this more cleanly, I recommend you first re-run your two period models with non-overlapping periods: say year <1980 for one and year >= 1980 for the other. Then re-run the interaction model without constraining the coefficients of z:

Code:

xtreg y i.year##(c.x ZZ), fe

where ZZ is the list of covariates that you have called z, but with each variable prefixed by either i. or c. as appropriate. If all of the covariates in ZZ are continuous, you can simplify the code to -xtreg y i.year##c.(x ZZ), fe-.

The results you get will not be the same as either of the results you got before, but I'm confident that they will be consistent with each other.
Comment
Tom Kemeny

Join Date: Apr 2016

Posts: 52
#3

06 Nov 2018, 13:34

Thank you Clyde - that is very helpful. I have followed your advice. However, I remain confused by the interacted results and the margins plot.

When I split into two non-overlapping periods - pre-1980 vs 1980 and beyond, and run separate FE models for each, I get results that are strongly comparable to what I had when my periods overlapped. In period 1, I find no significant relationship between x and y. In period 2, across a few different measures of y, I find a consistently positive and significant relationship. As I mentioned in my last post, this is precisely what theory tells me to expect.

When I implement your suggestion, ie:

Code:

xtreg y i.year##c.(x z), fe

I get something that looks a lot like what I had before (note that, at present I have just one control variable z, which is continuous). Specifically, I get this:

As I interpret this figure, it suggests that the relationship between x and y is much more strongly positive in period 1 years (1940, 50 and 60 in particular), and thereafter the relationship becomes much less positive, and judging by the CIs, perhaps not materially from one year to the next - even years that cross the divide between period 1 and 2 in the earlier , non-interacted models. In short, this seems inconsistent with the 2-period results.

Am I misinterpreting the figure? Or is there some other angle to this mystery?

Many thanks
Tom
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#4

06 Nov 2018, 13:52

Well, I am puzzled as well. If the data in the early decades is sparse, I can easily see how the first-period slope for x from the pooled period data could be close to zero, with the 1970 and 1980 findings dominating. But I cannot see how the second period slope comes out to be so high.

I think you need to post the full commands and outputs you got (including the -margins- command and output) and probably a sample of your data too. Please use the -dataex- command to show the example data. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment

Tom Kemeny

Join Date: Apr 2016
Posts: 52

07 Nov 2018, 04:32

Thanks again Clyde. I have followed your suggestions - so below is some data generated using Dataex, as well as results and syntax.

Example data first:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float year double c float(x z y)
1940  100  .06741573  6.685861 1.9170296
1950  100   .0861244  6.951772 2.3703396
1960  100   .0881201  7.334329  2.677568
1970  100  .09556314  7.759614  2.860222
1980  100  .11955816  8.032035 2.9198496
1990  100  .13637497  8.296796  2.908269
2000  100   .1610404  8.598036  2.989298
2010  100    .175543  8.813141  3.062129
1950  200  .08461539  4.867535  2.392718
1970  200  .10596026   5.01728  2.755263
1980  200  .13265306  5.278115  2.799294
1990  200  .17021276  5.641907  2.907918
2000  200  .21355236  6.188264  2.941689
2010  200  .29403973  6.626718  2.965321
1970  301  .05405406 4.7095304  2.772009
1980  301  .05645161  4.820282  2.895877
1990  301  .08029197  4.919981 2.8908455
2000  301  .09933775   5.01728 2.8938975
2010  301  .11392405  5.062595   2.88582
1940  302  .08575287  7.300473 2.1107717
1950  302  .11022727  7.473069 2.5045936
1960  302  .15817223  7.730175  2.753011
1970  302  .17898424  7.956827  2.918792
1980  302  .19704816  8.258941  3.011214
1990  302   .2188518  8.499436  2.991829
2000  302  .24691544   8.80177 3.1126254
2010  302   .2882904  9.052516 3.1957016
1940  401  .08333334  5.780744 1.9672023
1950  401  .10263158  5.940171 2.4176354
1960  401  .13468635  6.295266  2.674937
1970  401   .1511216  6.741701  2.912432
1980  401  .13893016  7.204893  2.918633
1990  401   .1861999  7.586296  3.036151
2000  401   .2397347  8.060225 3.1441514
2010  401  .27971548  8.379768 3.1610115
1940  402  .10638298 3.8501475 1.6669267
1950  402  .13114753 4.1108737  2.223724
1960  402      .1625 4.3820267  2.556472
1970  402   .2264151 4.6634393    2.9205
1980  402   .2440476  5.123964 2.8196075
1990  402   .2670157  5.252274  2.864244
2000  402  .27391306  5.438079  2.968674
2010  402  .29473683  5.652489  2.984576
1940  500  .03829029  7.023759 1.9198444
1950  500  .04777871  7.084227 2.3226702
1960  500  .07063712  7.275172 2.6056755
1970  500  .10523614  7.574558  2.898303
1980  500  .11748252  7.958577  2.880606
1990  500  .14220917  8.228977  2.962361
2000  500   .1911237  8.574518 3.1080136
2010  500   .2372533 8.8950815 3.1418295
1950  601  .05882353 3.9318256 2.2938592
1970  601      .0875 4.3820267  2.723534
1980  601         .1 4.6051702  2.668441
1990  601  .11678832  4.919981  2.808734
2000  601  .11764706  5.231109  2.941665
2010  601   .1322314  5.488938  2.980087
1940  602   .0967742 4.5325994  1.809495
1960  602       .144  4.828314  2.492147
1990  602  .14361702  5.236442 2.7829804
2000  602  .15929204  5.420535 2.9473696
2010  602  .14606741  5.587249  2.948902
1950  700  .05537459  5.726848 2.2551136
1960  700  .08093995  5.948035  2.464019
1970  700  .13383459  6.499787 2.7903116
1980  700   .1575837  7.328437 2.8692195
1990  700    .188929  7.821242   2.92427
2000  700  .22406967  8.239858  3.076594
2010  700  .24078445  8.613775  3.059695
1940  800  .04918033  5.497168  1.889386
1950  800  .06015038  5.583496  2.260122
1960  800  .07180851  5.929589  2.488825
1970  800  .11804385  6.385194  2.787262
1980  800  .13719185  6.838405  2.810167
1990  800  .15742972  7.126891  2.908849
2000  800  .17667045  7.476472 3.0514855
2010  800  .20683686  7.745436  3.112535
1940  900  .05017606  7.035269 1.9262397
1950  900  .07028754  7.132498 2.3493614
1960  900  .09575836  7.349874 2.6217885
1970  900  .13293742    7.6912   2.92802
1980  900  .14598091  8.085486 2.9289076
1990  900   .1774436  8.445697 3.0587904
2000  900    .213799  8.859932 3.2504575
2010  900   .2612748 9.2081375   3.31174
1970 1001  .04918033 4.1108737  2.816636
1980 1001  .06666667  4.317488    2.7184
1990 1001  .12871288 4.6151204  2.817173
2000 1001         .2  4.905275 2.9277446
2010 1001  .24285714  5.347107 2.9823964
1960 1002  .05084746 4.0775375 2.4529595
1970 1002  .04395605 4.5108595   2.81544
1980 1002   .0620155  4.859812  2.741524
1990 1002  .12209302  5.147494  2.853711
2000 1002  .18503937  5.537334  2.965409
2010 1002  .23607427  5.932245  2.981001
1940 1100  .02857143  4.941642  1.842884
1950 1100  .04635762   5.01728   2.27514
1960 1100 .064327486  5.141664  2.507516
1970 1100   .0986547  5.407172  2.772718
end

Now my syntax. First, the period-specific regressions

Code:

xtreg y x z i.year if year<1980, fe

Which generates the following results (on the full data):

Code:

Fixed-effects (within) regression               Number of obs     =      2,065
Group variable: c                           Number of groups  =        669

R-sq:                                           Obs per group:
     within  = 0.9495                                         min =          1
     between = 0.1839                                         avg =        3.1
     overall = 0.6728                                         max =          4

                                                F(5,1391)         =    5235.60
corr(u_i, Xb)  = -0.1546                        Prob > F          =     0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -.0088623   .1719888    -0.05   0.959    -.3462477    .3285231
           z |  -.0256077   .0204123    -1.25   0.210    -.0656499    .0144344
             |
        year |
       1950  |   .4070574   .0067419    60.38   0.000     .3938321    .4202827
       1960  |   .6997802   .0080708    86.71   0.000      .683948    .7156124
       1970  |    .965986   .0097604    98.97   0.000     .9468393    .9851327
             |
       _cons |   2.158198   .1228108    17.57   0.000     1.917283    2.399112
-------------+----------------------------------------------------------------
     sigma_u |   .2111395
     sigma_e |  .09087989
         rho |   .8436922   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(668, 1391) = 7.38                   Prob > F = 0.0000

And then, for period 2:

Code:

xtreg y x z i.year if year>=1980, fe 

 Fixed-effects (within) regression               Number of obs     =      2,478
Group variable: czone                           Number of groups  =        696

R-sq:                                           Obs per group:
     within  = 0.7363                                         min =          1
     between = 0.5035                                         avg =        3.6
     overall = 0.5922                                         max =          4

                                                F(5,1777)         =     992.10
corr(u_i, Xb)  = -0.1616                        Prob > F          =     0.0000

------------------------------------------------------------------------------
          y  |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        x    |   .1704771    .060941     2.80   0.005     .0509535    .2900006
        z    |   .0579945   .0093522     6.20   0.000     .0396521    .0763369
             |
        year |
       1990  |  -.0151207   .0030884    -4.90   0.000     -.021178   -.0090634
       2000  |   .0730655    .003749    19.49   0.000     .0657125    .0804185
       2010  |   .1147842   .0047111    24.36   0.000     .1055443     .124024
             |
       _cons |    2.54693   .0568694    44.79   0.000     2.435393    2.658468
-------------+----------------------------------------------------------------
     sigma_u |  .08922207
     sigma_e |  .04943455
         rho |  .76511993   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(695, 1777) = 9.39                   Prob > F = 0.0000

And now the interacted version:

Code:

xtreg y i.year##c.(x z), fe 

 Fixed-effects (within) regression               Number of obs     =      4,543
Group variable: czone                           Number of groups  =        710

R-sq:                                           Obs per group:
     within  = 0.9467                                         min =          1
     between = 0.5654                                         avg =        6.4
     overall = 0.8827                                         max =          8

                                                F(23,3810)        =    2940.16
corr(u_i, Xb)  = -0.0017                        Prob > F          =     0.0000

--------------------------------------------------------------------------------
           y    |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
          year |
         1950  |   .6190264   .0252093    24.56   0.000     .5696013    .6684515
         1960  |   .8527394   .0253965    33.58   0.000     .8029474    .9025314
         1970  |   1.277541   .0249469    51.21   0.000     1.228631    1.326452
         1980  |   1.311035   .0247835    52.90   0.000     1.262444    1.359625
         1990  |   1.207576   .0249038    48.49   0.000      1.15875    1.256402
         2000  |   1.302313    .024751    52.62   0.000     1.253786     1.35084
         2010  |   1.394699   .0251122    55.54   0.000     1.345464    1.443933
               |
       x       |   1.622354   .1434178    11.31   0.000     1.341171    1.903537
           z   |   .0906117   .0074765    12.12   0.000     .0759533      .10527
               |
year#c.x       |
         1950  |  -.4207889   .1534314    -2.74   0.006    -.7216046   -.1199732
         1960  |   -.779238   .1523557    -5.11   0.000    -1.077945   -.4805314
         1970  |  -1.389221   .1507515    -9.22   0.000    -1.684782   -1.093659
         1980  |  -1.429912   .1579517    -9.05   0.000     -1.73959   -1.120234
         1990  |  -1.336558   .1590563    -8.40   0.000    -1.648402   -1.024714
         2000  |  -1.356769   .1596698    -8.50   0.000    -1.669816   -1.043723
         2010  |  -1.527198   .1564801    -9.76   0.000    -1.833991   -1.220405
               |
    year#c.z   |
         1950  |  -.0328047   .0034376    -9.54   0.000    -.0395444   -.0260651
         1960  |  -.0238643   .0034802    -6.86   0.000    -.0306875   -.0170411
         1970  |  -.0418693   .0034514   -12.13   0.000     -.048636   -.0351027
         1980  |  -.0472446    .003413   -13.84   0.000     -.053936   -.0405532
         1990  |  -.0355467   .0034277   -10.37   0.000     -.042267   -.0288264
         2000  |  -.0362767   .0034374   -10.55   0.000     -.043016   -.0295374
         2010  |  -.0393795   .0034952   -11.27   0.000    -.0462321   -.0325268
               |
         _cons |   1.328532   .0474822    27.98   0.000     1.235439    1.421625
---------------+----------------------------------------------------------------
       sigma_u |  .10365221
       sigma_e |  .08473408
           rho |  .59941938   (fraction of variance due to u_i)
--------------------------------------------------------------------------------
F test that all u_i=0: F(709, 3810) = 8.29                   Prob > F = 0.0000

Interpreting with margins:

Code:

margins year, at(x=(0(0.1)0.7)) 

 Predictive margins                              Number of obs     =      4,543
Model VCE    : Conventional

Expression   : Linear prediction, predict()

1._at        : x        =           0

2._at        : x         =          .1

3._at        : x         =          .2

4._at        : x         =          .3

5._at        : x         =          .4

6._at        : x         =          .5

7._at        : x         =          .6

8._at        : x         =          .7

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    _at#year |
     1 1940  |   1.934264   .0098973   195.43   0.000     1.914865    1.953662
     1 1950  |   2.333993   .0088005   265.21   0.000     2.316744    2.351242
     1 1960  |   2.627472   .0099051   265.26   0.000     2.608058    2.646886
     1 1970  |   2.931912    .010189   287.75   0.000     2.911942    2.951882
     1 1980  |   2.929472    .011318   258.83   0.000     2.907289    2.951655
     1 1990  |   2.904213   .0119668   242.69   0.000     2.880759    2.927668
     1 2000  |    2.99407   .0123508   242.42   0.000     2.969863    3.018277
     1 2010  |   3.065714   .0112632   272.19   0.000     3.043639    3.087789
     2 1940  |   2.096499   .0087692   239.08   0.000     2.079312    2.113686
     2 1950  |    2.45415   .0057132   429.56   0.000     2.442952    2.465347
     2 1960  |   2.711784   .0045029   602.23   0.000     2.702958    2.720609
     2 1970  |   2.955225    .003842   769.18   0.000     2.947695    2.962756
     2 1980  |   2.948716   .0035193   837.88   0.000     2.941819    2.955614
     2 1990  |   2.932793   .0038687   758.09   0.000     2.925211    2.940375
     2 2000  |   3.020628   .0046063   655.76   0.000       3.0116    3.029656
     2 2010  |    3.07523   .0054106   568.37   0.000     3.064625    3.085834
     3 1940  |   2.258734   .0216151   104.50   0.000      2.21637    2.301099
     3 1950  |   2.574306   .0137527   187.19   0.000     2.547351    2.601261
     3 1960  |   2.796095   .0116689   239.62   0.000     2.773225    2.818966
     3 1970  |   2.978539   .0098353   302.84   0.000     2.959262    2.997816
     3 1980  |    2.96796   .0107978   274.87   0.000     2.946797    2.989124
     3 1990  |   2.961373   .0097648   303.27   0.000     2.942234    2.980511
     3 2000  |   3.047187   .0086352   352.88   0.000     3.030262    3.064111
     3 2010  |   3.084745   .0069398   444.50   0.000     3.071143    3.098347
     4 1940  |    2.42097   .0356216    67.96   0.000     2.351153    2.490787
     4 1950  |   2.694463   .0233866   115.21   0.000     2.648626    2.740299
     4 1960  |   2.880407   .0211134   136.43   0.000     2.839025    2.921788
     4 1970  |   3.001852   .0187011   160.52   0.000     2.965199    3.038506
     4 1980  |   2.987205   .0209933   142.29   0.000     2.946058    3.028351
     4 1990  |   2.989952    .019605   152.51   0.000     2.951527    3.028377
     4 2000  |   3.073745   .0176802   173.85   0.000     3.039093    3.108398
     4 2010  |   3.094261   .0135469   228.41   0.000     3.067709    3.120812
     5 1940  |   2.583205   .0498193    51.85   0.000     2.485561    2.680849
     5 1950  |   2.814619   .0332571    84.63   0.000     2.749436    2.879802
     5 1960  |   2.964718   .0308076    96.23   0.000     2.904337      3.0251
     5 1970  |   3.025165   .0278165   108.75   0.000     2.970646    3.079685
     5 1980  |   3.006449   .0313809    95.80   0.000     2.944943    3.067954
     5 1990  |   3.018532   .0296982   101.64   0.000     2.960324    3.076739
     5 2000  |   3.100304   .0271161   114.33   0.000     3.047157     3.15345
     5 2010  |   3.103776   .0208649   148.76   0.000     3.062882    3.144671
     6 1940  |   2.745441   .0640811    42.84   0.000     2.619844    2.871037
     6 1950  |   2.934776   .0432023    67.93   0.000     2.850101     3.01945
     6 1960  |    3.04903    .040573    75.15   0.000     2.969508    3.128552
     6 1970  |   3.048479   .0369974    82.40   0.000     2.975965    3.120992
     6 1980  |   3.025693   .0418178    72.35   0.000     2.943732    3.107654
     6 1990  |   3.047111   .0398527    76.46   0.000     2.969002    3.125221
     6 2000  |   3.126862   .0366423    85.33   0.000     3.055044    3.198679
     6 2010  |   3.113292   .0283486   109.82   0.000      3.05773    3.168854
     7 1940  |   2.907676   .0783722    37.10   0.000     2.754069    3.061283
     7 1950  |   3.054932   .0531803    57.44   0.000     2.950701    3.159164
     7 1960  |   3.133342   .0503681    62.21   0.000     3.034622    3.232061
     7 1970  |   3.071792   .0462048    66.48   0.000     2.981232    3.162352
     7 1980  |   3.044937   .0522744    58.25   0.000     2.942481    3.147393
     7 1990  |   3.075691   .0500311    61.48   0.000     2.977632     3.17375
     7 2000  |    3.15342   .0462028    68.25   0.000     3.062865    3.243976
     7 2010  |   3.122808   .0358944    87.00   0.000     3.052456    3.193159
     8 1940  |   3.069911   .0926789    33.12   0.000     2.888264    3.251559
     8 1950  |   3.175089   .0631755    50.26   0.000     3.051267     3.29891
     8 1960  |   3.217653   .0601785    53.47   0.000     3.099706    3.335601
     8 1970  |   3.095105   .0554255    55.84   0.000     2.986473    3.203737
     8 1980  |   3.064181   .0627408    48.84   0.000     2.941211    3.187151
     8 1990  |   3.104271   .0602213    51.55   0.000     2.986239    3.222302
     8 2000  |   3.179979   .0557801    57.01   0.000     3.070652    3.289306
     8 2010  |   3.132323   .0434701    72.06   0.000     3.047123    3.217523
------------------------------------------------------------------------------

and this generated the marginsplot I included last time around.

Looking forward to hearing from you.
Tom

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#6

07 Nov 2018, 08:47

Aha! You have done nothing wrong and these results are all correct. I failed to recognize Simpson's paradox (also known as Lord's paradox in the context of continuous variables.)

If you run -graph matrix x year y- you can readily see (and if you want to be quantitative, run -corr x year y-) that year is very strongly correlated with both x and y. That is, both x and y exhibit very strong increasing time trends. This is exactly the setup for a Simpson's paradox to occur. The result is that when stratifying by year (the interaction model) you get results that are very different from, even in the opposite direction from, the results that occur when the years are lumped together. (In your case, you don't completely lump all the years together, but you still have two lumps, and within each lump you are seeing Simpson's paradox in action.)

The Wikipedia page on Simpson's paradox is very well written and explains it better than I can.

To see even better how this plays out in your specific data, run this:

Code:

graph twoway scatter y x, name(overall, replace) title(overall) local graphs overall levelsof year, local(years) foreach y of local years { graph twoway scatter y x if year == `y', name(y`y', replace) title(`y') local graphs `graphs' y`y' } graph combine `graphs', altshrink xcommon ycommon

Last edited by Clyde Schechter; 07 Nov 2018, 08:56.
1 like
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30121

07 Nov 2018, 13:53

In fact, to make it much clearer, do this:

Code:

label define later 0 "E" 1 "L"
gen byte later:later = (year >= 1980)
graph twoway (scatter y x, name(overall, replace) title(overall) ///
    msym(i) mlab(later) mlabpos(0)) (lfit y x if later == 0) ///
    (lfit y x if later == 1)
local graphs overall

levelsof year, local(years)
foreach y of local years {
    graph twoway (scatter y x if year == `y') (lfit y x if year == `y'), ///
    name(y`y', replace) title(`y')
    local graphs `graphs' y`y'
}

graph combine `graphs', altshrink xcommon ycommon

regress y x z if later
regress y x z if !later
regress y c.(x z)##i.year
margins year, dydx(x)

This is the same as I recommended in #6, except that in the "overall" graph the Early (before 1980) and Late (1980 and after) eras are distinguished by having markers E and L, respectively, and in the other graphs, the linear fit between x and y is overlaid on the graph.

Comment

Tom Kemeny

Join Date: Apr 2016
Posts: 52

08 Nov 2018, 06:48

Dear Clyde,

Attached is the graph your code generated on the full data, as well as results.

Click image for larger version

Name: Screen Shot 2018-11-08 at 1.32.09 PM.png
Views: 1
Size: 333.7 KB
ID: 1469461

Code:

. regress y x z if later

      Source |       SS           df       MS      Number of obs   =     2,478
-------------+----------------------------------   F(2, 2475)      =   1362.13
       Model |  27.7021638         2  13.8510819   Prob > F        =    0.0000
    Residual |   25.167541     2,475  .010168703   R-squared       =    0.5240
-------------+----------------------------------   Adj R-squared   =    0.5236
       Total |  52.8697048     2,477  .021344249   Root MSE        =    .10084

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .7934159   .0325258    24.39   0.000     .7296352    .8571966
           z |   .0415972   .0012296    33.83   0.000      .039186    .0440084
       _cons |   2.619379    .008193   319.71   0.000     2.603314    2.635445
------------------------------------------------------------------------------

. regress y x z if !later

      Source |       SS           df       MS      Number of obs   =     2,065
-------------+----------------------------------   F(2, 2062)      =    269.66
       Model |  63.3936511         2  31.6968256   Prob > F        =    0.0000
    Residual |  242.373491     2,062  .117542915   R-squared       =    0.2073
-------------+----------------------------------   Adj R-squared   =    0.2066
       Total |  305.767142     2,064  .148142995   Root MSE        =    .34285

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   2.933526   .1563879    18.76   0.000     2.626831     3.24022
           z |   .0513899   .0044578    11.53   0.000     .0426477    .0601321
       _cons |   1.980069   .0310801    63.71   0.000     1.919118    2.041021
------------------------------------------------------------------------------

. regress y c.(x z)##i.year

      Source |       SS           df       MS      Number of obs   =     4,543
-------------+----------------------------------   F(23, 4519)     =   1504.28
       Model |  532.715639        23  23.1615495   Prob > F        =    0.0000
    Residual |  69.5794344     4,519  .015397087   R-squared       =    0.8845
-------------+----------------------------------   Adj R-squared   =    0.8839
       Total |  602.295073     4,542  .132605696   Root MSE        =    .12408

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   1.635393   .1773391     9.22   0.000     1.287722    1.983064
           z |    .089429   .0036682    24.38   0.000     .0822375    .0966205
             |
        year |
       1950  |   .6332566   .0349436    18.12   0.000     .5647501    .7017632
       1960  |   .8479927   .0349712    24.25   0.000     .7794319    .9165534
       1970  |   1.293101   .0344226    37.57   0.000     1.225616    1.360586
       1980  |   1.327345   .0340822    38.95   0.000     1.260527    1.394163
       1990  |   1.206989   .0343907    35.10   0.000     1.139567    1.274412
       2000  |   1.312604   .0337922    38.84   0.000     1.246355    1.378853
       2010  |   1.400795   .0341022    41.08   0.000     1.333938    1.467652
             |
    year#c.x |
       1950  |  -.3350924   .2119039    -1.58   0.114    -.7505277     .080343
       1960  |  -.6419666   .2092869    -3.07   0.002    -1.052271   -.2316619
       1970  |  -1.293619   .2060644    -6.28   0.000    -1.697606   -.8896316
       1980  |  -1.169162   .2093957    -5.58   0.000     -1.57968   -.7586441
       1990  |  -1.058502   .2064459    -5.13   0.000    -1.463237   -.6537669
       2000  |  -1.064991   .1990275    -5.35   0.000    -1.455182   -.6747997
       2010  |  -1.195604   .1899883    -6.29   0.000    -1.568074   -.8231339
             |
    year#c.z |
       1950  |  -.0361109    .004817    -7.50   0.000    -.0455546   -.0266672
       1960  |  -.0252495   .0048469    -5.21   0.000    -.0347518   -.0157473
       1970  |  -.0456533   .0048288    -9.45   0.000    -.0551202   -.0361865
       1980  |  -.0533753   .0047644   -11.20   0.000    -.0627158   -.0440347
       1990  |  -.0401906   .0047938    -8.38   0.000    -.0495888   -.0307925
       2000  |  -.0433738   .0047243    -9.18   0.000    -.0526356   -.0341119
       2010  |   -.048157   .0047634   -10.11   0.000    -.0574957   -.0388183
             |
       _cons |   1.334145   .0271769    49.09   0.000     1.280865    1.387425
------------------------------------------------------------------------------

. margins year, dydx(x)

Average marginal effects                        Number of obs     =      4,543
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x            |
        year |
       1940  |   1.635393   .1773391     9.22   0.000     1.287722    1.983064
       1950  |   1.300301   .1159919    11.21   0.000       1.0729    1.527701
       1960  |   .9934264   .1111389     8.94   0.000     .7755398    1.211313
       1970  |   .3417745   .1049446     3.26   0.001     .1360318    .5475172
       1980  |    .466231   .1113436     4.19   0.000     .2479432    .6845188
       1990  |   .5768912   .1056918     5.46   0.000     .3696836    .7840988
       2000  |    .570402   .0903482     6.31   0.000     .3932753    .7475287
       2010  |   .4397891   .0681645     6.45   0.000     .3061532    .5734249
------------------------------------------------------------------------------

I understand the point about over-aggregation that can conceal 'contradictory' variation, at in the Stanford case reported on in the Wikipedia page.

But I'm not sure how to translate this to my context. The year-specific scatterplots your code generates are all upward sloping, and they look pretty similar to my eye. The early vs late scatterplot shows that late matters more than early, which I have said makes sense theory-wise. And yet the marginal effects on the interaction shows the opposite!

I can't quite see which version this invalidates (neither?). The Stanford analog seems to be that aggregation (ie over the whole study period, or into two periods that cover multiple years) masks what is really happening. So is the interacted version the one that I should 'trust' more? And does this code's lack of inclusion of unit-specific fixed effects make a difference?

In short: could you clarify a bit?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#9

08 Nov 2018, 09:58

I think the point of Simpson's paradox is not that the aggregated analysis invalidates the disaggregated analyses, nor the other way around. Both analyses are valid ways of looking at the relationships in the data, but which one is appropriate to the research question is something that has to be pondered and decided. Sometimes both perspectives are useful. I can't really guide you here as I don't know what the domain and variables in this research are, and even if I did it is likely outside my area of expertise anyway.

As for the matter of fixed effects, yes, in your real analyses that cover multiple years you should use fixed effects in this panel data. And the results will likely be different from what -regress- shows you, perhaps markedly so. In the single-year analyses, you will have just one observation per czone, so -xtreg, fe- is not appropriate. I'll also note that the regression analyses shown include covariate z, which is not accounted for in the graphical analyses. So there are a number of ways in which the graphs depart from the analyses you need to do for your actual project. But I was looking for a relatively simple way to clarify what is going on in the data. So I used -regress- here because it makes it easier to understand what's going on and is closer to the data than the fixed effects regression is. And graphically there is no command analogous to -lfit- for -xtreg, fe-.
Comment
Tom Kemeny

Join Date: Apr 2016

Posts: 52
#10

09 Nov 2018, 02:23

Thanks so much for your thoughts Clyde - this was very helpful.
Comment

Announcement

Interpreting margins in a two-way fixed effects model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment