Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -margins- and -marginsplot- for heterogeneity analysis

    Hi all,

    Please consider the following example data

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte stateid int birthyear float(imr post treat highwealth cluster)
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . . . 0 1    .
    3    . . . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . . . 0 1    .
    3 2008 0 1 0 1 3446
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . . . 0 1    .
    3    . . . 0 0    .
    3 2004 0 0 0 1 3396
    4    . 0 . 0 1    .
    4    . . . 0 1    .
    4    . 0 . 0 1    .
    4    . 0 . 0 1    .
    4    . 0 . 0 1    .
    4    . . . 0 1    .
    4    . . . 0 1    .
    4    . 0 . 0 1    .
    4    . 0 . 0 1    .
    4    . 0 . 0 1    .
    4    . . . 0 1    .
    4    . . . 0 1    .
    4    . 0 . 0 1    .
    4 2007 0 0 0 1 3425
    5    . 0 . 0 1    .
    5    . 0 . 0 1    .
    5 2005 0 0 0 1 3406
    5    . . . 0 1    .
    5    . 0 . 0 1    .
    5    . 0 . 0 1    .
    5    . 0 . 0 1    .
    5    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3 2007 0 0 0 1 3430
    3 2006 0 0 0 1 3422
    3 2007 0 0 0 1 3435
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . . . 0 1    .
    3 2007 0 0 0 1 3435
    3 2011 0 1 0 1 3477
    3 2005 0 0 0 1 3401
    3    . . . 0 1    .
    3    . . . 0 1    .
    3    . . . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . . . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3 2002 0 0 0 1 3364
    3    . . . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . . . 0 1    .
    3 2009 0 1 0 1 3449
    3    . . . 0 1    .
    3    . . . 0 0    .
    3    . . . 0 0    .
    3 2010 0 1 0 1 3466
    3 2011 0 1 0 1 3482
    3    . . . 0 1    .
    3 2004 0 0 0 0 3399
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3 2011 0 1 0 1 3483
    3 2006 0 0 0 1 3417
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3 2010 0 1 0 1 3469
    3 2007 0 0 0 1 3427
    3    . 0 . 0 1    .
    3 2008 0 0 0 1 3438
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3 2006 0 0 0 1 3417
    3    . . . 0 0    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . . . 0 1    .
    3    . 0 . 0 1    .
    3    . . . 0 1    .
    3    . 0 . 0 1    .
    3 2008 1 0 0 1 3441
    3 2001 0 0 0 0 3352
    3    . . . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    3    . 0 . 0 1    .
    end
    label values imr binarylabel
    label def binarylabel 0 "no", modify
    label def binarylabel 1 "yes", modify
    I'm using a difference in difference model of the following specification using this repeated cross section data
    Code:
    areg imr i.treat##i.post i.stateid highwealth, absorb(birthyear) cluster(cluster)
    I now want to plot the marginal effects from this DiD in a graph using -margins- and -marginsplot-, separately for highwealth==1 and for highwealth==0, as a heterogeneity analysis

    I went through the very helpful guide by Richard Williams on https://www3.nd.edu/~rwilliam/stats/Margins01.pdf and did the following:
    Code:
    margins treat#post, at( highwealth=(0 1)) vsquish
    But this produced result where none of the combinations are estimable:
    Code:
    Predictive margins                                     Number of obs = 172,020
    Model VCE: Robust
    
    Expression: Linear prediction, predict()
    1._at: highwealth= 0
    2._at: highwealth= 1
    
    ------------------------------------------------------------------------------------------------
                                   |            Delta-method
                                   |     Margin   std. err.      t    P>|t|     [95% conf. interval]
    -------------------------------+----------------------------------------------------------------
                _at#treat#post|
                            1 0 0  |          .  (not estimable)
                            1 0 1  |          .  (not estimable)
                            1 1 0  |          .  (not estimable)
                            1 1 1  |          .  (not estimable)
                            2 0 0  |          .  (not estimable)
                            2 0 1  |          .  (not estimable)
                            2 1 0  |          .  (not estimable)
                            2 1 1  |          .  (not estimable)
    ------------------------------------------------------------------------------------------------
    note: the result is from estimation using my original data and may not match the sample data
    Can someone please help me understand what is going on here and offer a solution?

    Thanks

  • #2
    It is difficult to give a confident response because your example data includes only treat = 0 observations. To be sure what is going on, one would need to see example data that includes all combinations of treat and post.

    Nevertheless, my best guess would be that the birthyear variable is colinear with post (in your example data, it definitely is). And I suspect that stateid is colinear with treat as well. These things make the effects of treatment and pre-post unidentifiable in your data. -margins- is therefore refusing to calculate statistics that are meaningless artifacts of the way in which those colinearities get broken in order to run the regression.

    Added: These questions might also be answerable without recourse to a better data example if the complete output of your -areg- command. And, in general, for future guidance, it is usually not a good idea to ask for help with -margins- without showing the full code (which you did) and results (which you didn't) of the underlying regression.
    Last edited by Clyde Schechter; 11 Dec 2022, 10:15.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      It is difficult to give a confident response because your example data includes only treat = 0 observations. To be sure what is going on, one would need to see example data that includes all combinations of treat and post.

      Nevertheless, my best guess would be that the birthyear variable is colinear with post (in your example data, it definitely is). And I suspect that stateid is colinear with treat as well. These things make the effects of treatment and pre-post unidentifiable in your data. -margins- is therefore refusing to calculate statistics that are meaningless artifacts of the way in which those colinearities get broken in order to run the regression.

      Added: These questions might also be answerable without recourse to a better data example if the complete output of your -areg- command. And, in general, for future guidance, it is usually not a good idea to ask for help with -margins- without showing the full code (which you did) and results (which you didn't) of the underlying regression.
      Thanks Clyde for your detailed explanation. I have included below the code and output of both the areg and the margins.

      Code:
      areg imr i.treat##i.post highwealth i.stateid, absorb(birthyear) cluster(cluster)
      note: 37.stateid omitted because of collinearity.
      
      Linear regression, absorbing indicators            Number of obs     = 526,108
      Absorbed variable: birthyear                           No. of categories =      13
                                                         F(39, 4468)       =   56.55
                                                         Prob > F          =  0.0000
                                                         R-squared         =  0.0137
                                                         Adj R-squared     =  0.0136
                                                         Root MSE          =  0.1757
      
                                             (Std. err. adjusted for 4,469 clusters in cluster)
      --------------------------------------------------------------------------------------------
                                 |               Robust
                              imr | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      ---------------------------+----------------------------------------------------------------
                  1.treat |   .0211684   .0034424     6.15   0.000     .0144197    .0279171
                 1.post |   -.012983   .0014303    -9.08   0.000     -.015787    -.010179
                                 |
                  treat#post|
                            1 1  |   -.008295   .0023276    -3.56   0.000    -.0128583   -.0037317
                                 |
                           highwealth |  -.0077955   .0005783   -13.48   0.000    -.0089293   -.0066617
                                 |
                           state |
                 A  |   .0048587   .0016725     2.91   0.004     .0015797    .0081377
              B  |    .007887   .0025672     3.07   0.002     .0028541    .0129199
                          C  |   .0223681   .0028567     7.83   0.000     .0167676    .0279686
                          D|   .0297623   .0026459    11.25   0.000     .0245751    .0349495
                     E|   .0168991   .0058225     2.90   0.004     .0054841     .028314
                   F|   .0360819    .003223    11.20   0.000     .0297632    .0424006
           G  |  -.0010736   .0054308    -0.20   0.843    -.0117207    .0095734
                    H  |   .0021262   .0039996     0.53   0.595     -.005715    .0099674
                          I|   .0295596   .0035921     8.23   0.000     .0225172    .0366019
                            J|   .0077316   .0033893     2.28   0.023     .0010869    .0143762
                        K|   .0148172   .0029093     5.09   0.000     .0091135    .0205209
                        L|   .0237443   .0026187     9.07   0.000     .0186103    .0288783
               M  |   .0085309   .0030205     2.82   0.005     .0026093    .0144526
                N  |   .0050295    .003159     1.59   0.111    -.0011636    .0112227
                      O  |    .022035   .0028489     7.73   0.000     .0164498    .0276201
                      P|   .0173745   .0024636     7.05   0.000     .0125446    .0222045
                         Q|   .0037494    .002623     1.43   0.153    -.0013929    .0088917
                    S|     .00715   .0051539     1.39   0.165    -.0029542    .0172542
                 R|   .0320781   .0027634    11.61   0.000     .0266606    .0374957
                    T|   .0143853   .0024227     5.94   0.000     .0096357     .019135
                        U|   .0064814   .0027218     2.38   0.017     .0011454    .0118175
                      V|   .0149639   .0027448     5.45   0.000     .0095828    .0203451
                        W|   .0106478   .0028945     3.68   0.000     .0049731    .0163226
                       X|   .0029462   .0033012     0.89   0.372    -.0035258    .0094182
                         Y|   .0283065   .0030492     9.28   0.000     .0223286    .0342845
                     Z|   .0076615   .0029044     2.64   0.008     .0019674    .0133555
                         AB|   .0213461    .002581     8.27   0.000     .0162861    .0264062
                      AC|    .033116   .0028029    11.81   0.000      .027621    .0386111
                         AD|   .0107973   .0036701     2.94   0.003     .0036021    .0179925
                     AE|   .0109469   .0024409     4.48   0.000     .0061615    .0157324
                      AF|          0  (omitted)
                        AG|   .0201784   .0035798     5.64   0.000     .0131602    .0271967
                  AH|   .0462361   .0025262    18.30   0.000     .0412834    .0511887
                    AI|   .0046262   .0027339     1.69   0.091    -.0007335     .009986
                    AJ|   .0140196   .0060528     2.32   0.021     .0021532    .0258861
                    AK  |   .0199296   .0027081     7.36   0.000     .0146204    .0252387
                                 |
                           _cons |   .0135481   .0023379     5.79   0.000     .0089646    .0181316
      --------------------------------------------------------------------------------------------
      
      . margins treat#post , at( highwealth=(0 1)) vsquish
      
      Predictive margins                                     Number of obs = 526,108
      Model VCE: Robust
      
      Expression: Linear prediction, predict()
      1._at: highwealth= 0
      2._at: highwealth= 1
      
      ------------------------------------------------------------------------------------------------
                                     |            Delta-method
                                     |     Margin   std. err.      t    P>|t|     [95% conf. interval]
      -------------------------------+----------------------------------------------------------------
                 _at#treat#post |
                              1 0 0  |          .  (not estimable)
                              1 0 1  |          .  (not estimable)
                              1 1 0  |          .  (not estimable)
                              1 1 1  |          .  (not estimable)
                              2 0 0  |          .  (not estimable)
                              2 0 1  |          .  (not estimable)
                              2 1 0  |          .  (not estimable)
                              2 1 1  |          .  (not estimable)
      ------------------------------------------------------------------------------------------------
      post is not collinear with birthyear in the original data. But treat is collinear with stateid as,
      Code:
      gen treat=0
      replace treat=1 if stateid==3|stateid==37
      But in the underlying regression, post is not getting dropped, only stateid=37 is being treated as reference category, and being omitted.

      Next I tried the same regression by dropping i.stateid and got results that are drastically different, and while the margins produce estimates, the results do not make sense based on the context. So I'm not sure what's going on

      Code:
      areg imr i.treat##i.post highwealth, absorb(birthyear) cluster(cluster)
      
      Linear regression, absorbing indicators            Number of obs     = 526,108
      Absorbed variable: birthyear              No. of categories =      13
                                                         F(4, 4468)        =   88.10
                                                         Prob > F          =  0.0000
                                                         R-squared         =  0.0094
                                                         Adj R-squared     =  0.0094
                                                         Root MSE          =  0.1760
      
                                             (Std. err. adjusted for 4,469 clusters in ym_cluster)
      --------------------------------------------------------------------------------------------
                                 |               Robust
                imr | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      ---------------------------+----------------------------------------------------------------
                  1.treat|   .0005997   .0021089     0.28   0.776    -.0035349    .0047343
                 1.post|  -.0180113    .001762   -10.22   0.000    -.0214656   -.0145569
                                 |
      treat#post|
                            1 1  |    .000038    .002227     0.02   0.986    -.0043281    .0044041
                                 |
                           highwealth|  -.0090304   .0005727   -15.77   0.000    -.0101533   -.0079076
                           _cons |   .0382798   .0006761    56.62   0.000     .0369543    .0396053
      --------------------------------------------------------------------------------------------
      
      . margins treat#post, at( highwealth=(0 1)) vsquish
      
      Adjusted predictions                                   Number of obs = 526,108
      Model VCE: Robust
      
      Expression: Linear prediction, predict()
      1._at: highwealth= 0
      2._at: highwealth= 1
      
      ------------------------------------------------------------------------------------------------
                                     |            Delta-method
                                     |     Margin   std. err.      t    P>|t|     [95% conf. interval]
      -------------------------------+----------------------------------------------------------------
      _at#treat#post|
                              1 0 0  |   .0382798   .0006761    56.62   0.000     .0369543    .0396053
                              1 0 1  |   .0202685   .0014093    14.38   0.000     .0175056    .0230315
                              1 1 0  |   .0388795   .0020604    18.87   0.000     .0348402    .0429188
                              1 1 1  |   .0209062   .0015484    13.50   0.000     .0178707    .0239418
                              2 0 0  |   .0292494   .0007126    41.04   0.000     .0278523    .0306465
                              2 0 1  |   .0112381   .0014313     7.85   0.000     .0084321    .0140441
                              2 1 0  |   .0298491   .0020558    14.52   0.000     .0258187    .0338795
                              2 1 1  |   .0118758   .0015509     7.66   0.000     .0088352    .0149164
      ------------------------------------------------------------------------------------------------

      Comment


      • #4
        Well, the removal of i.stateid did solve the original problem. If the results do not make sense in context, I can't give you much advice, as I don't know what the context is, and from the names of the variables, I would guess that even if I did, it is beyond the scope of my knowledge. From a general statistical point of view, when you have a model like -areg imr i.treat##i.post highwealth, absorb(birthyear)-, the only place you can have a serious modeling error is with the highwealth variable. Your treat##post interaction is bullet proof: there is no way that the linearity assumptions can be violated for these dichotomous variables and their interaction. Yes, there can be issues with the distribution of residuals, but you have used cluster-robust standard errors and the sample size is quite large (thereby nullifying any potential problems about normality). So if there is a modeling error it would be that the relationship between imr is seriously non-linear. And you can explore this.

        But, before you do that, I can't help noticing from your original data example that you have an enormous amount of missing values. This means that you are actually regressing on a small fraction of the total data sample. And it is highly plausible that such a subsample can be highly biased. My intuition is that this is the most likely reason for getting results that make no sense: you are getting them from a sample that is not representative of the phenomena you are studying. In particular, I don't understand why birthyear, cluster, and post are missing so frequently. I don't know where this data comes from, but most sources of demographic data or surveys would have very low rates of missing information on birth year. Simlarly post is presumably something that you yourself defined in your study, and, again, what kind of data source does not have the date information needed to classify all, or nearly all, observations as post or not? Cluster, I imagine, is similar to post in being defined in your study, and so it is hard for me to see why they are so often missing. I suspect that you can go back to square one and build a new data set that fills in most of these data gaps. If you do that, you will, I imagine, serve your analysis much better.

        Comment

        Working...
        X