Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting margins in a two-way fixed effects model

    Time-varying effects in a fixed effects panel model

    Hello Statlist,

    I have a relatively balanced panel of about 700 locations, observed on a decadal basis between 1940 and 2010 (T=8). My baseline specification is a two-way fixed effects model - something like:

    Code:
    xtreg y x z i.year, fe
    where y is the log of a continuous variable; x, which is my key independent variable, is a fraction ranging in practice between 0 and about 0.65; and z is a vector of controls.

    However, my theory leads me to expect the effect of x on y to vary by year - or at least by some periods within my overall panel. Motivated by this idea, I have tried versions where I run the model above separately for years within each distinct period. Confirming the intuition given by theory, this tells me that x is not significant for period 1, and is positive and significant in period 2. So, below are results for period 1, followed by those for period 2:

    Code:
    .                                         eststo l1: xtreg y x z i.year if year<1990, fe  // convergence
    
    Fixed-effects (within) regression               Number of obs     =      2,665
    Group variable: czone                           Number of groups  =        687
    
    R-sq:                                           Obs per group:
         within  = 0.9482                                         min =          1
         between = 0.1759                                         avg =        3.9
         overall = 0.7102                                         max =          5
    
                                                    F(6,1972)         =    6016.99
    corr(u_i, Xb)  = -0.1600                        Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
             y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              x  |   .0578142   .1297792     0.45   0.656    -.1967045     .312333
             z   |    -.02596   .0120427    -2.16   0.031    -.0495778   -.0023422
                 |
            year |
           1950  |   .4027728   .0064565    62.38   0.000     .3901105    .4154351
           1960  |   .6947271   .0073666    94.31   0.000       .68028    .7091743
           1970  |   .9610166    .008406   114.33   0.000     .9445311    .9775021
           1980  |   .9730421   .0096244   101.10   0.000      .954167    .9919173
                 |
           _cons |   2.155035   .0719909    29.93   0.000     2.013849    2.296221
    -------------+----------------------------------------------------------------
         sigma_u |  .19346937
         sigma_e |  .09002546
             rho |  .82201413   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(686, 1972) = 7.69                   Prob > F = 0.0000
    Code:
    .                                         eststo l2: xtreg y x ltp i.year if year>1970, fe  // divergence
    
    Fixed-effects (within) regression               Number of obs     =      2,478
    Group variable: czone                           Number of groups  =        696
    
    R-sq:                                           Obs per group:
         within  = 0.7363                                         min =          1
         between = 0.5035                                         avg =        3.6
         overall = 0.5922                                         max =          4
    
                                                    F(5,1777)         =     992.10
    corr(u_i, Xb)  = -0.1616                        Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
             y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            x    |   .1704771    .060941     2.80   0.005     .0509535    .2900006
             z   |   .0579945   .0093522     6.20   0.000     .0396521    .0763369
                 |
            year |
           1990  |  -.0151207   .0030884    -4.90   0.000     -.021178   -.0090634
           2000  |   .0730655    .003749    19.49   0.000     .0657125    .0804185
           2010  |   .1147842   .0047111    24.36   0.000     .1055443     .124024
                 |
           _cons |    2.54693   .0568694    44.79   0.000     2.435393    2.658468
    -------------+----------------------------------------------------------------
         sigma_u |  .08922207
         sigma_e |  .04943455
             rho |  .76511993   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(695, 1777) = 9.39                   Prob > F = 0.0000
    Here is my question: Based on the idea that another way of thinking of this is that the slope of x on y depends on the year, and that within 'periods' slopes may not be consistent (my mental picture is something like a sine-wave across time) I was wondering if it makes sense to run something like

    Code:
    xtreg y i.year##c.x z, fe
    After running a regression like this, I have tried to interpret the resulting output using margins, as follows:

    Code:
    margins year, at(x=(0(0.1)0.7))
    marginsplot, xdimensions(at(x)) recast(line) recastci(rarea)
    this produced the following:

    Click image for larger version

Name:	Screen Shot 2018-11-06 at 3.32.06 PM.png
Views:	1
Size:	163.9 KB
ID:	1469153

    Based on my naive interpretation, the results are the opposite of what I got when I did the 'period' based regressions - ie coefficients for years in period 1 are positive and significant, and the coefficients for years that fall into period 2 are negative, and largely insignificant.

    However - it may be that I am misinterpreting these results...more generally I am not quite clear on whether to interpret margins in a FE model differently from what one would do with a non-panel OLS model.

    Another possibility is that this interaction-based approach is not sensible given the questions I am chasing down.

    I would welcome the collective wisdom of the list!

    Thank you
    Tom



  • #2
    Well, there are several differences between your original separate period models and your interaction approach.

    First, the two period models cover overlapping periods of time. The first period is pre-1990, but the second period is post 1970. So the 1980 data is counted in both regressions, clouding their separate intepretations.

    Next, your separate period models allow for different coefficients of z (and of the common year indicator for 1980, which is, absorbed in the constant term in the second period model), whereas in the interaction model these are constrained to be the same over time. These differences may account for the problem.

    So to do this more cleanly, I recommend you first re-run your two period models with non-overlapping periods: say year <1980 for one and year >= 1980 for the other. Then re-run the interaction model without constraining the coefficients of z:

    Code:
    xtreg y i.year##(c.x ZZ), fe
    where ZZ is the list of covariates that you have called z, but with each variable prefixed by either i. or c. as appropriate. If all of the covariates in ZZ are continuous, you can simplify the code to -xtreg y i.year##c.(x ZZ), fe-.

    The results you get will not be the same as either of the results you got before, but I'm confident that they will be consistent with each other.

    Comment


    • #3
      Thank you Clyde - that is very helpful. I have followed your advice. However, I remain confused by the interacted results and the margins plot.

      When I split into two non-overlapping periods - pre-1980 vs 1980 and beyond, and run separate FE models for each, I get results that are strongly comparable to what I had when my periods overlapped. In period 1, I find no significant relationship between x and y. In period 2, across a few different measures of y, I find a consistently positive and significant relationship. As I mentioned in my last post, this is precisely what theory tells me to expect.

      When I implement your suggestion, ie:
      Code:
       xtreg y i.year##c.(x z), fe
      I get something that looks a lot like what I had before (note that, at present I have just one control variable z, which is continuous). Specifically, I get this:

      Click image for larger version

Name:	Screen Shot 2018-11-06 at 8.28.15 PM.png
Views:	1
Size:	236.2 KB
ID:	1469216


      As I interpret this figure, it suggests that the relationship between x and y is much more strongly positive in period 1 years (1940, 50 and 60 in particular), and thereafter the relationship becomes much less positive, and judging by the CIs, perhaps not materially from one year to the next - even years that cross the divide between period 1 and 2 in the earlier , non-interacted models. In short, this seems inconsistent with the 2-period results.

      Am I misinterpreting the figure? Or is there some other angle to this mystery?

      Many thanks
      Tom

      Comment


      • #4
        Well, I am puzzled as well. If the data in the early decades is sparse, I can easily see how the first-period slope for x from the pooled period data could be close to zero, with the 1970 and 1980 findings dominating. But I cannot see how the second period slope comes out to be so high.

        I think you need to post the full commands and outputs you got (including the -margins- command and output) and probably a sample of your data too. Please use the -dataex- command to show the example data. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        Comment


        • #5
          Thanks again Clyde. I have followed your suggestions - so below is some data generated using Dataex, as well as results and syntax.

          Example data first:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float year double c float(x z y)
          1940  100  .06741573  6.685861 1.9170296
          1950  100   .0861244  6.951772 2.3703396
          1960  100   .0881201  7.334329  2.677568
          1970  100  .09556314  7.759614  2.860222
          1980  100  .11955816  8.032035 2.9198496
          1990  100  .13637497  8.296796  2.908269
          2000  100   .1610404  8.598036  2.989298
          2010  100    .175543  8.813141  3.062129
          1950  200  .08461539  4.867535  2.392718
          1970  200  .10596026   5.01728  2.755263
          1980  200  .13265306  5.278115  2.799294
          1990  200  .17021276  5.641907  2.907918
          2000  200  .21355236  6.188264  2.941689
          2010  200  .29403973  6.626718  2.965321
          1970  301  .05405406 4.7095304  2.772009
          1980  301  .05645161  4.820282  2.895877
          1990  301  .08029197  4.919981 2.8908455
          2000  301  .09933775   5.01728 2.8938975
          2010  301  .11392405  5.062595   2.88582
          1940  302  .08575287  7.300473 2.1107717
          1950  302  .11022727  7.473069 2.5045936
          1960  302  .15817223  7.730175  2.753011
          1970  302  .17898424  7.956827  2.918792
          1980  302  .19704816  8.258941  3.011214
          1990  302   .2188518  8.499436  2.991829
          2000  302  .24691544   8.80177 3.1126254
          2010  302   .2882904  9.052516 3.1957016
          1940  401  .08333334  5.780744 1.9672023
          1950  401  .10263158  5.940171 2.4176354
          1960  401  .13468635  6.295266  2.674937
          1970  401   .1511216  6.741701  2.912432
          1980  401  .13893016  7.204893  2.918633
          1990  401   .1861999  7.586296  3.036151
          2000  401   .2397347  8.060225 3.1441514
          2010  401  .27971548  8.379768 3.1610115
          1940  402  .10638298 3.8501475 1.6669267
          1950  402  .13114753 4.1108737  2.223724
          1960  402      .1625 4.3820267  2.556472
          1970  402   .2264151 4.6634393    2.9205
          1980  402   .2440476  5.123964 2.8196075
          1990  402   .2670157  5.252274  2.864244
          2000  402  .27391306  5.438079  2.968674
          2010  402  .29473683  5.652489  2.984576
          1940  500  .03829029  7.023759 1.9198444
          1950  500  .04777871  7.084227 2.3226702
          1960  500  .07063712  7.275172 2.6056755
          1970  500  .10523614  7.574558  2.898303
          1980  500  .11748252  7.958577  2.880606
          1990  500  .14220917  8.228977  2.962361
          2000  500   .1911237  8.574518 3.1080136
          2010  500   .2372533 8.8950815 3.1418295
          1950  601  .05882353 3.9318256 2.2938592
          1970  601      .0875 4.3820267  2.723534
          1980  601         .1 4.6051702  2.668441
          1990  601  .11678832  4.919981  2.808734
          2000  601  .11764706  5.231109  2.941665
          2010  601   .1322314  5.488938  2.980087
          1940  602   .0967742 4.5325994  1.809495
          1960  602       .144  4.828314  2.492147
          1990  602  .14361702  5.236442 2.7829804
          2000  602  .15929204  5.420535 2.9473696
          2010  602  .14606741  5.587249  2.948902
          1950  700  .05537459  5.726848 2.2551136
          1960  700  .08093995  5.948035  2.464019
          1970  700  .13383459  6.499787 2.7903116
          1980  700   .1575837  7.328437 2.8692195
          1990  700    .188929  7.821242   2.92427
          2000  700  .22406967  8.239858  3.076594
          2010  700  .24078445  8.613775  3.059695
          1940  800  .04918033  5.497168  1.889386
          1950  800  .06015038  5.583496  2.260122
          1960  800  .07180851  5.929589  2.488825
          1970  800  .11804385  6.385194  2.787262
          1980  800  .13719185  6.838405  2.810167
          1990  800  .15742972  7.126891  2.908849
          2000  800  .17667045  7.476472 3.0514855
          2010  800  .20683686  7.745436  3.112535
          1940  900  .05017606  7.035269 1.9262397
          1950  900  .07028754  7.132498 2.3493614
          1960  900  .09575836  7.349874 2.6217885
          1970  900  .13293742    7.6912   2.92802
          1980  900  .14598091  8.085486 2.9289076
          1990  900   .1774436  8.445697 3.0587904
          2000  900    .213799  8.859932 3.2504575
          2010  900   .2612748 9.2081375   3.31174
          1970 1001  .04918033 4.1108737  2.816636
          1980 1001  .06666667  4.317488    2.7184
          1990 1001  .12871288 4.6151204  2.817173
          2000 1001         .2  4.905275 2.9277446
          2010 1001  .24285714  5.347107 2.9823964
          1960 1002  .05084746 4.0775375 2.4529595
          1970 1002  .04395605 4.5108595   2.81544
          1980 1002   .0620155  4.859812  2.741524
          1990 1002  .12209302  5.147494  2.853711
          2000 1002  .18503937  5.537334  2.965409
          2010 1002  .23607427  5.932245  2.981001
          1940 1100  .02857143  4.941642  1.842884
          1950 1100  .04635762   5.01728   2.27514
          1960 1100 .064327486  5.141664  2.507516
          1970 1100   .0986547  5.407172  2.772718
          end
          Now my syntax. First, the period-specific regressions
          Code:
          xtreg y x z i.year if year<1980, fe
          Which generates the following results (on the full data):
          Code:
          Fixed-effects (within) regression               Number of obs     =      2,065
          Group variable: c                           Number of groups  =        669
          
          R-sq:                                           Obs per group:
               within  = 0.9495                                         min =          1
               between = 0.1839                                         avg =        3.1
               overall = 0.6728                                         max =          4
          
                                                          F(5,1391)         =    5235.60
          corr(u_i, Xb)  = -0.1546                        Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
                     y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                     x |  -.0088623   .1719888    -0.05   0.959    -.3462477    .3285231
                     z |  -.0256077   .0204123    -1.25   0.210    -.0656499    .0144344
                       |
                  year |
                 1950  |   .4070574   .0067419    60.38   0.000     .3938321    .4202827
                 1960  |   .6997802   .0080708    86.71   0.000      .683948    .7156124
                 1970  |    .965986   .0097604    98.97   0.000     .9468393    .9851327
                       |
                 _cons |   2.158198   .1228108    17.57   0.000     1.917283    2.399112
          -------------+----------------------------------------------------------------
               sigma_u |   .2111395
               sigma_e |  .09087989
                   rho |   .8436922   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(668, 1391) = 7.38                   Prob > F = 0.0000
          And then, for period 2:
          Code:
          xtreg y x z i.year if year>=1980, fe 
          
           Fixed-effects (within) regression               Number of obs     =      2,478
          Group variable: czone                           Number of groups  =        696
          
          R-sq:                                           Obs per group:
               within  = 0.7363                                         min =          1
               between = 0.5035                                         avg =        3.6
               overall = 0.5922                                         max =          4
          
                                                          F(5,1777)         =     992.10
          corr(u_i, Xb)  = -0.1616                        Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
                    y  |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                  x    |   .1704771    .060941     2.80   0.005     .0509535    .2900006
                  z    |   .0579945   .0093522     6.20   0.000     .0396521    .0763369
                       |
                  year |
                 1990  |  -.0151207   .0030884    -4.90   0.000     -.021178   -.0090634
                 2000  |   .0730655    .003749    19.49   0.000     .0657125    .0804185
                 2010  |   .1147842   .0047111    24.36   0.000     .1055443     .124024
                       |
                 _cons |    2.54693   .0568694    44.79   0.000     2.435393    2.658468
          -------------+----------------------------------------------------------------
               sigma_u |  .08922207
               sigma_e |  .04943455
                   rho |  .76511993   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(695, 1777) = 9.39                   Prob > F = 0.0000
          And now the interacted version:
          Code:
          xtreg y i.year##c.(x z), fe 
          
           Fixed-effects (within) regression               Number of obs     =      4,543
          Group variable: czone                           Number of groups  =        710
          
          R-sq:                                           Obs per group:
               within  = 0.9467                                         min =          1
               between = 0.5654                                         avg =        6.4
               overall = 0.8827                                         max =          8
          
                                                          F(23,3810)        =    2940.16
          corr(u_i, Xb)  = -0.0017                        Prob > F          =     0.0000
          
          --------------------------------------------------------------------------------
                     y    |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
                    year |
                   1950  |   .6190264   .0252093    24.56   0.000     .5696013    .6684515
                   1960  |   .8527394   .0253965    33.58   0.000     .8029474    .9025314
                   1970  |   1.277541   .0249469    51.21   0.000     1.228631    1.326452
                   1980  |   1.311035   .0247835    52.90   0.000     1.262444    1.359625
                   1990  |   1.207576   .0249038    48.49   0.000      1.15875    1.256402
                   2000  |   1.302313    .024751    52.62   0.000     1.253786     1.35084
                   2010  |   1.394699   .0251122    55.54   0.000     1.345464    1.443933
                         |
                 x       |   1.622354   .1434178    11.31   0.000     1.341171    1.903537
                     z   |   .0906117   .0074765    12.12   0.000     .0759533      .10527
                         |
          year#c.x       |
                   1950  |  -.4207889   .1534314    -2.74   0.006    -.7216046   -.1199732
                   1960  |   -.779238   .1523557    -5.11   0.000    -1.077945   -.4805314
                   1970  |  -1.389221   .1507515    -9.22   0.000    -1.684782   -1.093659
                   1980  |  -1.429912   .1579517    -9.05   0.000     -1.73959   -1.120234
                   1990  |  -1.336558   .1590563    -8.40   0.000    -1.648402   -1.024714
                   2000  |  -1.356769   .1596698    -8.50   0.000    -1.669816   -1.043723
                   2010  |  -1.527198   .1564801    -9.76   0.000    -1.833991   -1.220405
                         |
              year#c.z   |
                   1950  |  -.0328047   .0034376    -9.54   0.000    -.0395444   -.0260651
                   1960  |  -.0238643   .0034802    -6.86   0.000    -.0306875   -.0170411
                   1970  |  -.0418693   .0034514   -12.13   0.000     -.048636   -.0351027
                   1980  |  -.0472446    .003413   -13.84   0.000     -.053936   -.0405532
                   1990  |  -.0355467   .0034277   -10.37   0.000     -.042267   -.0288264
                   2000  |  -.0362767   .0034374   -10.55   0.000     -.043016   -.0295374
                   2010  |  -.0393795   .0034952   -11.27   0.000    -.0462321   -.0325268
                         |
                   _cons |   1.328532   .0474822    27.98   0.000     1.235439    1.421625
          ---------------+----------------------------------------------------------------
                 sigma_u |  .10365221
                 sigma_e |  .08473408
                     rho |  .59941938   (fraction of variance due to u_i)
          --------------------------------------------------------------------------------
          F test that all u_i=0: F(709, 3810) = 8.29                   Prob > F = 0.0000
          Interpreting with margins:
          Code:
          margins year, at(x=(0(0.1)0.7)) 
          
           Predictive margins                              Number of obs     =      4,543
          Model VCE    : Conventional
          
          Expression   : Linear prediction, predict()
          
          1._at        : x        =           0
          
          2._at        : x         =          .1
          
          3._at        : x         =          .2
          
          4._at        : x         =          .3
          
          5._at        : x         =          .4
          
          6._at        : x         =          .5
          
          7._at        : x         =          .6
          
          8._at        : x         =          .7
          
          ------------------------------------------------------------------------------
                       |            Delta-method
                       |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
              _at#year |
               1 1940  |   1.934264   .0098973   195.43   0.000     1.914865    1.953662
               1 1950  |   2.333993   .0088005   265.21   0.000     2.316744    2.351242
               1 1960  |   2.627472   .0099051   265.26   0.000     2.608058    2.646886
               1 1970  |   2.931912    .010189   287.75   0.000     2.911942    2.951882
               1 1980  |   2.929472    .011318   258.83   0.000     2.907289    2.951655
               1 1990  |   2.904213   .0119668   242.69   0.000     2.880759    2.927668
               1 2000  |    2.99407   .0123508   242.42   0.000     2.969863    3.018277
               1 2010  |   3.065714   .0112632   272.19   0.000     3.043639    3.087789
               2 1940  |   2.096499   .0087692   239.08   0.000     2.079312    2.113686
               2 1950  |    2.45415   .0057132   429.56   0.000     2.442952    2.465347
               2 1960  |   2.711784   .0045029   602.23   0.000     2.702958    2.720609
               2 1970  |   2.955225    .003842   769.18   0.000     2.947695    2.962756
               2 1980  |   2.948716   .0035193   837.88   0.000     2.941819    2.955614
               2 1990  |   2.932793   .0038687   758.09   0.000     2.925211    2.940375
               2 2000  |   3.020628   .0046063   655.76   0.000       3.0116    3.029656
               2 2010  |    3.07523   .0054106   568.37   0.000     3.064625    3.085834
               3 1940  |   2.258734   .0216151   104.50   0.000      2.21637    2.301099
               3 1950  |   2.574306   .0137527   187.19   0.000     2.547351    2.601261
               3 1960  |   2.796095   .0116689   239.62   0.000     2.773225    2.818966
               3 1970  |   2.978539   .0098353   302.84   0.000     2.959262    2.997816
               3 1980  |    2.96796   .0107978   274.87   0.000     2.946797    2.989124
               3 1990  |   2.961373   .0097648   303.27   0.000     2.942234    2.980511
               3 2000  |   3.047187   .0086352   352.88   0.000     3.030262    3.064111
               3 2010  |   3.084745   .0069398   444.50   0.000     3.071143    3.098347
               4 1940  |    2.42097   .0356216    67.96   0.000     2.351153    2.490787
               4 1950  |   2.694463   .0233866   115.21   0.000     2.648626    2.740299
               4 1960  |   2.880407   .0211134   136.43   0.000     2.839025    2.921788
               4 1970  |   3.001852   .0187011   160.52   0.000     2.965199    3.038506
               4 1980  |   2.987205   .0209933   142.29   0.000     2.946058    3.028351
               4 1990  |   2.989952    .019605   152.51   0.000     2.951527    3.028377
               4 2000  |   3.073745   .0176802   173.85   0.000     3.039093    3.108398
               4 2010  |   3.094261   .0135469   228.41   0.000     3.067709    3.120812
               5 1940  |   2.583205   .0498193    51.85   0.000     2.485561    2.680849
               5 1950  |   2.814619   .0332571    84.63   0.000     2.749436    2.879802
               5 1960  |   2.964718   .0308076    96.23   0.000     2.904337      3.0251
               5 1970  |   3.025165   .0278165   108.75   0.000     2.970646    3.079685
               5 1980  |   3.006449   .0313809    95.80   0.000     2.944943    3.067954
               5 1990  |   3.018532   .0296982   101.64   0.000     2.960324    3.076739
               5 2000  |   3.100304   .0271161   114.33   0.000     3.047157     3.15345
               5 2010  |   3.103776   .0208649   148.76   0.000     3.062882    3.144671
               6 1940  |   2.745441   .0640811    42.84   0.000     2.619844    2.871037
               6 1950  |   2.934776   .0432023    67.93   0.000     2.850101     3.01945
               6 1960  |    3.04903    .040573    75.15   0.000     2.969508    3.128552
               6 1970  |   3.048479   .0369974    82.40   0.000     2.975965    3.120992
               6 1980  |   3.025693   .0418178    72.35   0.000     2.943732    3.107654
               6 1990  |   3.047111   .0398527    76.46   0.000     2.969002    3.125221
               6 2000  |   3.126862   .0366423    85.33   0.000     3.055044    3.198679
               6 2010  |   3.113292   .0283486   109.82   0.000      3.05773    3.168854
               7 1940  |   2.907676   .0783722    37.10   0.000     2.754069    3.061283
               7 1950  |   3.054932   .0531803    57.44   0.000     2.950701    3.159164
               7 1960  |   3.133342   .0503681    62.21   0.000     3.034622    3.232061
               7 1970  |   3.071792   .0462048    66.48   0.000     2.981232    3.162352
               7 1980  |   3.044937   .0522744    58.25   0.000     2.942481    3.147393
               7 1990  |   3.075691   .0500311    61.48   0.000     2.977632     3.17375
               7 2000  |    3.15342   .0462028    68.25   0.000     3.062865    3.243976
               7 2010  |   3.122808   .0358944    87.00   0.000     3.052456    3.193159
               8 1940  |   3.069911   .0926789    33.12   0.000     2.888264    3.251559
               8 1950  |   3.175089   .0631755    50.26   0.000     3.051267     3.29891
               8 1960  |   3.217653   .0601785    53.47   0.000     3.099706    3.335601
               8 1970  |   3.095105   .0554255    55.84   0.000     2.986473    3.203737
               8 1980  |   3.064181   .0627408    48.84   0.000     2.941211    3.187151
               8 1990  |   3.104271   .0602213    51.55   0.000     2.986239    3.222302
               8 2000  |   3.179979   .0557801    57.01   0.000     3.070652    3.289306
               8 2010  |   3.132323   .0434701    72.06   0.000     3.047123    3.217523
          ------------------------------------------------------------------------------
          and this generated the marginsplot I included last time around.

          Looking forward to hearing from you.
          Tom

          Comment


          • #6
            Aha! You have done nothing wrong and these results are all correct. I failed to recognize Simpson's paradox (also known as Lord's paradox in the context of continuous variables.)

            If you run -graph matrix x year y- you can readily see (and if you want to be quantitative, run -corr x year y-) that year is very strongly correlated with both x and y. That is, both x and y exhibit very strong increasing time trends. This is exactly the setup for a Simpson's paradox to occur. The result is that when stratifying by year (the interaction model) you get results that are very different from, even in the opposite direction from, the results that occur when the years are lumped together. (In your case, you don't completely lump all the years together, but you still have two lumps, and within each lump you are seeing Simpson's paradox in action.)

            The Wikipedia page on Simpson's paradox is very well written and explains it better than I can.

            To see even better how this plays out in your specific data, run this:

            Code:
            graph twoway scatter y x, name(overall, replace) title(overall)
            local graphs overall
            
            levelsof year, local(years)
            foreach y of local years {
                graph twoway scatter y x if year == `y', name(y`y', replace) title(`y')
                local graphs `graphs' y`y'
            }
            
            graph combine `graphs', altshrink xcommon ycommon


            Last edited by Clyde Schechter; 07 Nov 2018, 09:56.

            Comment


            • #7
              In fact, to make it much clearer, do this:

              Code:
              label define later 0 "E" 1 "L"
              gen byte later:later = (year >= 1980)
              graph twoway (scatter y x, name(overall, replace) title(overall) ///
                  msym(i) mlab(later) mlabpos(0)) (lfit y x if later == 0) ///
                  (lfit y x if later == 1)
              local graphs overall
              
              levelsof year, local(years)
              foreach y of local years {
                  graph twoway (scatter y x if year == `y') (lfit y x if year == `y'), ///
                  name(y`y', replace) title(`y')
                  local graphs `graphs' y`y'
              }
              
              graph combine `graphs', altshrink xcommon ycommon
              
              regress y x z if later
              regress y x z if !later
              regress y c.(x z)##i.year
              margins year, dydx(x)
              This is the same as I recommended in #6, except that in the "overall" graph the Early (before 1980) and Late (1980 and after) eras are distinguished by having markers E and L, respectively, and in the other graphs, the linear fit between x and y is overlaid on the graph.

              Comment


              • #8
                Dear Clyde,

                Attached is the graph your code generated on the full data, as well as results.

                Click image for larger version

Name:	Screen Shot 2018-11-08 at 1.32.09 PM.png
Views:	1
Size:	333.7 KB
ID:	1469461



                Code:
                . regress y x z if later
                
                      Source |       SS           df       MS      Number of obs   =     2,478
                -------------+----------------------------------   F(2, 2475)      =   1362.13
                       Model |  27.7021638         2  13.8510819   Prob > F        =    0.0000
                    Residual |   25.167541     2,475  .010168703   R-squared       =    0.5240
                -------------+----------------------------------   Adj R-squared   =    0.5236
                       Total |  52.8697048     2,477  .021344249   Root MSE        =    .10084
                
                ------------------------------------------------------------------------------
                           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                           x |   .7934159   .0325258    24.39   0.000     .7296352    .8571966
                           z |   .0415972   .0012296    33.83   0.000      .039186    .0440084
                       _cons |   2.619379    .008193   319.71   0.000     2.603314    2.635445
                ------------------------------------------------------------------------------
                
                . regress y x z if !later
                
                      Source |       SS           df       MS      Number of obs   =     2,065
                -------------+----------------------------------   F(2, 2062)      =    269.66
                       Model |  63.3936511         2  31.6968256   Prob > F        =    0.0000
                    Residual |  242.373491     2,062  .117542915   R-squared       =    0.2073
                -------------+----------------------------------   Adj R-squared   =    0.2066
                       Total |  305.767142     2,064  .148142995   Root MSE        =    .34285
                
                ------------------------------------------------------------------------------
                           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                           x |   2.933526   .1563879    18.76   0.000     2.626831     3.24022
                           z |   .0513899   .0044578    11.53   0.000     .0426477    .0601321
                       _cons |   1.980069   .0310801    63.71   0.000     1.919118    2.041021
                ------------------------------------------------------------------------------
                
                . regress y c.(x z)##i.year
                
                      Source |       SS           df       MS      Number of obs   =     4,543
                -------------+----------------------------------   F(23, 4519)     =   1504.28
                       Model |  532.715639        23  23.1615495   Prob > F        =    0.0000
                    Residual |  69.5794344     4,519  .015397087   R-squared       =    0.8845
                -------------+----------------------------------   Adj R-squared   =    0.8839
                       Total |  602.295073     4,542  .132605696   Root MSE        =    .12408
                
                ------------------------------------------------------------------------------
                           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                           x |   1.635393   .1773391     9.22   0.000     1.287722    1.983064
                           z |    .089429   .0036682    24.38   0.000     .0822375    .0966205
                             |
                        year |
                       1950  |   .6332566   .0349436    18.12   0.000     .5647501    .7017632
                       1960  |   .8479927   .0349712    24.25   0.000     .7794319    .9165534
                       1970  |   1.293101   .0344226    37.57   0.000     1.225616    1.360586
                       1980  |   1.327345   .0340822    38.95   0.000     1.260527    1.394163
                       1990  |   1.206989   .0343907    35.10   0.000     1.139567    1.274412
                       2000  |   1.312604   .0337922    38.84   0.000     1.246355    1.378853
                       2010  |   1.400795   .0341022    41.08   0.000     1.333938    1.467652
                             |
                    year#c.x |
                       1950  |  -.3350924   .2119039    -1.58   0.114    -.7505277     .080343
                       1960  |  -.6419666   .2092869    -3.07   0.002    -1.052271   -.2316619
                       1970  |  -1.293619   .2060644    -6.28   0.000    -1.697606   -.8896316
                       1980  |  -1.169162   .2093957    -5.58   0.000     -1.57968   -.7586441
                       1990  |  -1.058502   .2064459    -5.13   0.000    -1.463237   -.6537669
                       2000  |  -1.064991   .1990275    -5.35   0.000    -1.455182   -.6747997
                       2010  |  -1.195604   .1899883    -6.29   0.000    -1.568074   -.8231339
                             |
                    year#c.z |
                       1950  |  -.0361109    .004817    -7.50   0.000    -.0455546   -.0266672
                       1960  |  -.0252495   .0048469    -5.21   0.000    -.0347518   -.0157473
                       1970  |  -.0456533   .0048288    -9.45   0.000    -.0551202   -.0361865
                       1980  |  -.0533753   .0047644   -11.20   0.000    -.0627158   -.0440347
                       1990  |  -.0401906   .0047938    -8.38   0.000    -.0495888   -.0307925
                       2000  |  -.0433738   .0047243    -9.18   0.000    -.0526356   -.0341119
                       2010  |   -.048157   .0047634   -10.11   0.000    -.0574957   -.0388183
                             |
                       _cons |   1.334145   .0271769    49.09   0.000     1.280865    1.387425
                ------------------------------------------------------------------------------
                
                . margins year, dydx(x)
                
                Average marginal effects                        Number of obs     =      4,543
                Model VCE    : OLS
                
                Expression   : Linear prediction, predict()
                dy/dx w.r.t. : x
                
                ------------------------------------------------------------------------------
                             |            Delta-method
                             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                x            |
                        year |
                       1940  |   1.635393   .1773391     9.22   0.000     1.287722    1.983064
                       1950  |   1.300301   .1159919    11.21   0.000       1.0729    1.527701
                       1960  |   .9934264   .1111389     8.94   0.000     .7755398    1.211313
                       1970  |   .3417745   .1049446     3.26   0.001     .1360318    .5475172
                       1980  |    .466231   .1113436     4.19   0.000     .2479432    .6845188
                       1990  |   .5768912   .1056918     5.46   0.000     .3696836    .7840988
                       2000  |    .570402   .0903482     6.31   0.000     .3932753    .7475287
                       2010  |   .4397891   .0681645     6.45   0.000     .3061532    .5734249
                ------------------------------------------------------------------------------

                I understand the point about over-aggregation that can conceal 'contradictory' variation, at in the Stanford case reported on in the Wikipedia page.

                But I'm not sure how to translate this to my context. The year-specific scatterplots your code generates are all upward sloping, and they look pretty similar to my eye. The early vs late scatterplot shows that late matters more than early, which I have said makes sense theory-wise. And yet the marginal effects on the interaction shows the opposite!

                I can't quite see which version this invalidates (neither?). The Stanford analog seems to be that aggregation (ie over the whole study period, or into two periods that cover multiple years) masks what is really happening. So is the interacted version the one that I should 'trust' more? And does this code's lack of inclusion of unit-specific fixed effects make a difference?

                In short: could you clarify a bit?

                Comment


                • #9
                  I think the point of Simpson's paradox is not that the aggregated analysis invalidates the disaggregated analyses, nor the other way around. Both analyses are valid ways of looking at the relationships in the data, but which one is appropriate to the research question is something that has to be pondered and decided. Sometimes both perspectives are useful. I can't really guide you here as I don't know what the domain and variables in this research are, and even if I did it is likely outside my area of expertise anyway.

                  As for the matter of fixed effects, yes, in your real analyses that cover multiple years you should use fixed effects in this panel data. And the results will likely be different from what -regress- shows you, perhaps markedly so. In the single-year analyses, you will have just one observation per czone, so -xtreg, fe- is not appropriate. I'll also note that the regression analyses shown include covariate z, which is not accounted for in the graphical analyses. So there are a number of ways in which the graphs depart from the analyses you need to do for your actual project. But I was looking for a relatively simple way to clarify what is going on in the data. So I used -regress- here because it makes it easier to understand what's going on and is closer to the data than the fixed effects regression is. And graphically there is no command analogous to -lfit- for -xtreg, fe-.

                  Comment


                  • #10
                    Thanks so much for your thoughts Clyde - this was very helpful.

                    Comment

                    Working...
                    X