Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Still on the margins after xtreg, fe problem

    I am estimating a regression where I control for district and year fixed effects (hence using xtreg or reghdfe), while simultaneously examining the marginal effect of my binary variable X, which does NOT vary over time, just over districts. However, I wish to allow that marginal effect to vary over years.

    Like many others, I have found that margins after xtreg (or it turns out reghdfe) is tricky.

    I began with this code:
    Code:
    xtset district year
    xtreg y i.x##i.year , fe vce(bootstrap)
    margins, dydx(x) over(year)
    While I can see the interactions between x and year in my regression output, margins will not return the marginal effect of x over year ("not estimable").

    Precisely this problem was documented here and Jeff Pitblado explains that we must not include the i.x fixed effect alone, as it does not vary within districts. He recommends that we instead run a regression like this, manually excluding i.x, but including the x-year interactions.
    Code:
    xtset district year
    xtreg y i.x#i.year i.year , fe vce(bootstrap)
    margins, dydx(x) over(year)
    For the simple nlswork dataset example being discussed in that thread, margins works after this formulation. However, in my case, this does NOT solve the problem -- I get the same "not estimable" error as before. And I'm apparently not alone -- that thread ends with a plaintive cry from Peter Meier asking the same question that I'm asking now. Given that this is a classic DiD / event study type regression, loads of people must try it, to meet with this same problem. How do I solve this? (Happy to post the code + precise data if necessary.)

    Two post-scripts: (1) I find precisely the same problem if I use "reghdfe y i.x#i.year i.year , absorb(district year)". (2) While Jeff Pitblado mentions in his reply that manually excluding i.x should not change results, I oddly find that the coefficient on one year level in my output changes across the two specifications listed above. The interaction coefficients change too, but that's to be expected since different interaction years are being dropped as omitted in those two formulations. But I find it odd that all year-specific (i.year) coefficients are identical across the two formulations, except for 1 of them, which is tiny (0.0000133) under the second formulation, but sizable (0.0437494) under the first.
    Last edited by Leah Bevis; 09 May 2023, 13:42. Reason: xtreg, margins

  • #2
    Please show the complete and exact output of your regressions, including the headers and messages that precede the table of coefficients itself. Do not edit or modify in any way; there are no unimportant details here. I strongly suspect the answer will be found in those outputs.

    Comment


    • #3
      Thanks so much Clyde.

      Below is my code, exactly as I have it in my little simplified script:
      Code:
      use "${DATA}/vdsa.dta", clear
      keep if year<=2000
      xtset distcode year
      rename prop_op_marg outcome
      gen highaq= aquifer>1
      rename highaq x
      
      tab year if outcome~=.
      
      xtreg outcome i.x##i.year, fe vce(bootstrap)
          margins, dydx(x) over(year)
      
      xtreg outcome i.x#i.year i.year, fe vce(bootstrap)
          margins, dydx(x) over(year)
      Attached is the log that shows the script and output. Very excited to hear your thoughts.
      Attached Files

      Comment


      • #4
        The immediate problem is spelled out in the warnings that Stata gave you at the top of the regression output:
        Code:
        note: 0.x#1974.year identifies no observations in the sample.
        note: 1.x#1974.year omitted because of collinearity.
        Because there is not a full complement of x#year interaction terms, the margins you are trying to estimate are, in fact, not estimable. Ordinarily 0.x#1974.year would be the base category for your interaction. But because there are no observations with x = 0 and year = 1974 in the estimation sample, it cannot serve that purpose, 1.x#1974.year gets omitted in its stead. But that leaves you short one interaction variable, so -margins- cannot proceed.

        The deeper underlying clause remains for you to investigate. Why are there no observations with x = 0 and year = 1974. Are there any in the data set at all? If not, why not? It suggests something went wrong in the creation of the data set and you will need to find and fix the error(s) that led to this condition. If there are observations in the data set with x = 0 and year = 1974, why aren't there any in the estimation sample? That is usually because of missing values on some regression variable in all such observations. Of course, since x and year are the only variables, missing values on those cannot be the source of the problem since, if x = 0 and year = 1974 neither x nor year is missing. Perhaps there is a "structural" reason why x can never = 0 in 1974. Then you simply cannot fit this kind of model to this data and use -margins-. Probably your best bet in that circumstance would be to eliminate all 1974 observations from the data set and just run your model on data from 1975 onward.

        Comment


        • #5
          Hm. Turns out this is not actually the problem. I did know that x doesn't vary within 1974 (a varying sample of districts are viewed year year; there are only 10 districts in 1974, and just happens to be none with thick aquifers), but it didn't occur to me that this was a problem, since as you say, Stata omits the 1974 interactions from the regression for this reason. This seemed to me to take care of the problem -- and within the regression, it does, since i can *see* the marginal effects I want in the regression output. But if I'm understanding you correctly, margins can't handle having a 1974 level but not a 1974 interaction. Interesting, I'll bear this in mind in future, thanks a lot.

          However, omitting 1974 from the dataset altogether does not solve the problem in this case -- see attached log. So there must be another culprit. Any ideas? (Note the problem also remains if I drop 1974, 1979, and 1986, the years with the smallest N and the least variation in x.)
          Attached Files

          Comment


          • #6
            This is surprising to me. The i.x#i.year i.year model shows a sign of trouble, because 1.x#2000.year is "(omitted)". Any time you see something marked that way, -margins- will conclude that the effects you want are not estimable. But it isn't clear to me why 1.x#2000.year was omitted there. Please re-run that regression and then show also the results of -tab year x if e(sample)-.

            Comment


            • #7
              Oh, you're right -- since x is omitted (manually or automatically if I use i.x##i.year), it makes no sense that the x-2000 interaction is also omitted. My first reaction was to say that the coef on x holds the 2000 effect, but I don't have that coef here. Thanks. And ok, great rule of thumb -- any interaction omissions (or corresponding levels omissions?) incapacitate margins.

              I've included the -tab year x if e(sample)- at the end, you can check it out. Nothing is obviously "wrong" with the data in 2000. I'm going to keep playing with this, but any further thoughts you have would be super helpful.

              This isn't stemming from Stata's choice to use 1970 as the base year level, while including x*1970 as an interaction? Mathematically I don't see the problem, but just in case...
              Attached Files

              Comment


              • #8
                Yes that -tab- output looks OK. I'm stumped at this point. If anything hits me later on, I'll get back to you.

                Comment


                • #9
                  Ok -- should I submit this to the technical folks, perhaps? I've come up against similar problems with margins before, and I was talking to younger colleagues in my dept yesterday, and many of them were familiar with the same -- though of course some of that may be user error (eg. specifying in the ## way). But I do think this question represents a more generalizable problem / issue with margins, when trying to look at interactions with a variable (often time) that is also included in the model in levels.
                  Last edited by Leah Bevis; 10 May 2023, 10:10.

                  Comment


                  • #10
                    You might well do that at this point. When they resolve the problem, it would be great if you would post the solution back here as well.

                    Comment


                    • #11
                      Ok thanks, I did. One further note: Stata also omits the x-2000 interaction if I manually create and include the interactions variables, rather than using i.x#i.year. And while it's tempting to think there's something off about the year 2000, this is not the case: if I drop the year 2000 from my datase, then Stata drops the x-1995 interaction for both formulations of the regression. It's always dropping the final interaction. Unlike xtreg, reghdfe does NOT drop the x-2000 interaction... but margins is still inestimable, I think because it's looking for the 1.x interactions, whereas reghdfe included the 0.x interactions and omitted the 1.x interactions. Log attached showing all of that.
                      Attached Files
                      Last edited by Leah Bevis; 10 May 2023, 12:21.

                      Comment


                      • #12
                        I have added a reproducible example of the problem here, to illustrate that this is a generalizable problem.

                        Code:
                        webuse nlswork.dta, clear
                        isid idcode year
                        xtset idcode year
                        
                        * all interactions included, though 1.collgrad obvi ommitted; margins fails
                        xtreg ln_wage collgrad##i.year, fe
                        margins, dydx(collgrad) over(year)
                        
                        * final year interaction ommitted; margins also fails
                        xtreg ln_wage collgrad#i.year i.year , fe
                        margins, dydx(collgrad) over(year)
                        
                        * final year interaction still ommitted; margins doesn't recognize collgrad
                        xtreg ln_wage c.collgrad#i.year i.year , fe
                        margins, dydx(collgrad) over(year)
                        
                        * reghdfe works perfectly in this case, doesn't drop final year interaction. 
                        * however, margins still isn't working
                        reghdfe ln_wage collgrad#i.year, absorb(idcode year)
                        margins, dydx(collgrad) over(year)
                        
                        * though when I add c., reghdfe DOES drop the last interaction + margins still fails
                        reghdfe ln_wage c.collgrad#i.year, absorb(idcode year)
                        margins, dydx(collgrad) over(year)
                        
                        * this is identical to using collgrad#i.year
                        reghdfe ln_wage i.collgrad#i.year, absorb(idcode year)
                        margins, dydx(collgrad) over(year)

                        Comment


                        • #13
                          Thanks for posting that. But I think it is materially different from your particular situation. The variable collgrad in nlswork is time invariant within idcode. Your variable x, a I understand it, is not time invariant within distcode. With a time invariant group (treatment) variable, it is not appropriate to use a two way fixed-effects model to do DID estimation, and when you try, as you have seen, the effects you want to estimate are not estimable. The appropriate analysis in the case of a time-invariant treatment variable is with -regress-, not -xtreg, fe-. Here's an example:
                          Code:
                          . webuse grunfeld, clear
                          
                          . xtset // SHOW -xtset- PARAMETERS
                          
                          Panel variable: company (strongly balanced)
                           Time variable: year, 1935 to 1954
                                   Delta: 1 year
                          
                          . gen byte group = company > 5 // CREATE A "TREATMENT" VARIABLE, TIME-INVARIANT
                          
                          . xtreg mvalue i.group##i.year, fe
                          note: 1.group omitted because of collinearity.
                          
                          Fixed-effects (within) regression               Number of obs     =        200
                          Group variable: company                         Number of groups  =         10
                          
                          R-squared:                                      Obs per group:
                               Within  = 0.4778                                         min =         20
                               Between = 0.3532                                         avg =       20.0
                               Overall = 0.2756                                         max =         20
                          
                                                                          F(38, 152)        =       3.66
                          corr(u_i, Xb) = 0.3177                          Prob > F          =     0.0000
                          
                          ------------------------------------------------------------------------------
                                mvalue | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                               1.group |          0  (omitted)
                                       |
                                  year |
                                 1936  |     660.72   178.0823     3.71   0.000     308.8839    1012.556
                                 1937  |    1151.36   178.0823     6.47   0.000     799.5239    1503.196
                                 1938  |     208.34   178.0823     1.17   0.244    -143.4962    560.1761
                                 1939  |     642.22   178.0823     3.61   0.000     290.3839    994.0562
                                 1940  |   741.1199   178.0823     4.16   0.000     389.2838    1092.956
                                 1941  |     684.46   178.0823     3.84   0.000     332.6239    1036.296
                                 1942  |     282.88   178.0823     1.59   0.114     -68.9561    634.7162
                                 1943  |     468.22   178.0823     2.63   0.009     116.3839    820.0561
                                 1944  |     515.96   178.0823     2.90   0.004     164.1238    867.7961
                                 1945  |     714.64   178.0823     4.01   0.000     362.8038    1066.476
                                 1946  |     823.38   178.0823     4.62   0.000     471.5438    1175.216
                                 1947  |     324.24   178.0823     1.82   0.071    -27.59615    676.0761
                                 1948  |     243.46   178.0823     1.37   0.174    -108.3761    595.2961
                                 1949  |     287.98   178.0823     1.62   0.108    -63.85614    639.8161
                                 1950  |     358.08   178.0823     2.01   0.046     6.243892    709.9162
                                 1951  |      778.3   178.0823     4.37   0.000     426.4639    1130.136
                                 1952  |     812.74   178.0823     4.56   0.000     460.9038    1164.576
                                 1953  |    1171.56   178.0823     6.58   0.000     819.7239    1523.396
                                 1954  |    1070.24   178.0823     6.01   0.000     718.4039    1422.076
                                       |
                            group#year |
                               1 1936  |   -577.234   251.8464    -2.29   0.023    -1074.805   -79.66261
                               1 1937  |  -1013.082   251.8464    -4.02   0.000    -1510.653   -515.5106
                               1 1938  |   -137.338   251.8464    -0.55   0.586    -634.9094    360.2335
                               1 1939  |   -538.874   251.8464    -2.14   0.034    -1036.445    -41.3026
                               1 1940  |   -629.148   251.8464    -2.50   0.014    -1126.719   -131.5765
                               1 1941  |   -607.966   251.8464    -2.41   0.017    -1105.537   -110.3946
                               1 1942  |    -219.01   251.8464    -0.87   0.386    -716.5815    278.5614
                               1 1943  |   -361.054   251.8464    -1.43   0.154    -858.6254    136.5174
                               1 1944  |   -391.318   251.8464    -1.55   0.122    -888.8894    106.2535
                               1 1945  |   -564.012   251.8464    -2.24   0.027    -1061.583   -66.44053
                               1 1946  |   -653.774   251.8464    -2.60   0.010    -1151.345   -156.2025
                               1 1947  |   -208.828   251.8464    -0.83   0.408    -706.3994    288.7435
                               1 1948  |   -107.902   251.8464    -0.43   0.669    -605.4734    389.6694
                               1 1949  |   -155.794   251.8464    -0.62   0.537    -653.3654    341.7774
                               1 1950  |   -176.992   251.8464    -0.70   0.483    -674.5635    320.5794
                               1 1951  |   -555.374   251.8464    -2.21   0.029    -1052.945   -57.80257
                               1 1952  |   -531.658   251.8464    -2.11   0.036    -1029.229   -34.08653
                               1 1953  |  -802.5001   251.8464    -3.19   0.002    -1300.072   -304.9286
                               1 1954  |   -679.538   251.8464    -2.70   0.008    -1177.109   -181.9666
                                       |
                                 _cons |    707.471   89.04115     7.95   0.000     531.5529    883.3891
                          -------------+----------------------------------------------------------------
                               sigma_u |   1210.177
                               sigma_e |  281.57285
                                   rho |  .94864447   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          F test that all u_i=0: F(9, 152) = 294.68                    Prob > F = 0.0000
                          
                          . margins, dydx(group) over(year) // FAIL!
                          
                          Average marginal effects                                   Number of obs = 200
                          Model VCE: Conventional
                          
                          Expression: Linear prediction, predict()
                          dy/dx wrt:  1.group
                          Over:       year
                          
                          ------------------------------------------------------------------------------
                                       |            Delta-method
                                       |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                          0.group      |  (base outcome)
                          -------------+----------------------------------------------------------------
                          1.group      |
                                  year |
                                 1935  |          .  (not estimable)
                                 1936  |          .  (not estimable)
                                 1937  |          .  (not estimable)
                                 1938  |          .  (not estimable)
                                 1939  |          .  (not estimable)
                                 1940  |          .  (not estimable)
                                 1941  |          .  (not estimable)
                                 1942  |          .  (not estimable)
                                 1943  |          .  (not estimable)
                                 1944  |          .  (not estimable)
                                 1945  |          .  (not estimable)
                                 1946  |          .  (not estimable)
                                 1947  |          .  (not estimable)
                                 1948  |          .  (not estimable)
                                 1949  |          .  (not estimable)
                                 1950  |          .  (not estimable)
                                 1951  |          .  (not estimable)
                                 1952  |          .  (not estimable)
                                 1953  |          .  (not estimable)
                                 1954  |          .  (not estimable)
                          ------------------------------------------------------------------------------
                          Note: dy/dx for factor levels is the discrete change from the base level.
                          
                          .
                          . regress mvalue i.group##i.year
                          
                                Source |       SS           df       MS      Number of obs   =       200
                          -------------+----------------------------------   F(39, 160)      =      2.32
                                 Model |   124324337        39  3187803.52   Prob > F        =    0.0001
                              Residual |   219513951       160  1371962.19   R-squared       =    0.3616
                          -------------+----------------------------------   Adj R-squared   =    0.2060
                                 Total |   343838288       199   1727830.6   Root MSE        =    1171.3
                          
                          ------------------------------------------------------------------------------
                                mvalue | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                               1.group |  -1059.738   740.8002    -1.43   0.155    -2522.745    403.2694
                                       |
                                  year |
                                 1936  |     660.72   740.8002     0.89   0.374    -802.2874    2123.727
                                 1937  |    1151.36   740.8002     1.55   0.122    -311.6473    2614.367
                                 1938  |     208.34   740.8002     0.28   0.779    -1254.667    1671.347
                                 1939  |     642.22   740.8002     0.87   0.387    -820.7874    2105.227
                                 1940  |   741.1199   740.8002     1.00   0.319    -721.8874    2204.127
                                 1941  |     684.46   740.8002     0.92   0.357    -778.5474    2147.467
                                 1942  |     282.88   740.8002     0.38   0.703    -1180.127    1745.887
                                 1943  |     468.22   740.8002     0.63   0.528    -994.7874    1931.227
                                 1944  |     515.96   740.8002     0.70   0.487    -947.0474    1978.967
                                 1945  |     714.64   740.8002     0.96   0.336    -748.3674    2177.647
                                 1946  |     823.38   740.8002     1.11   0.268    -639.6274    2286.387
                                 1947  |     324.24   740.8002     0.44   0.662    -1138.767    1787.247
                                 1948  |     243.46   740.8002     0.33   0.743    -1219.547    1706.467
                                 1949  |     287.98   740.8002     0.39   0.698    -1175.027    1750.987
                                 1950  |     358.08   740.8002     0.48   0.629    -1104.927    1821.087
                                 1951  |      778.3   740.8002     1.05   0.295    -684.7074    2241.307
                                 1952  |     812.74   740.8002     1.10   0.274    -650.2674    2275.747
                                 1953  |    1171.56   740.8002     1.58   0.116    -291.4473    2634.567
                                 1954  |    1070.24   740.8002     1.44   0.150    -392.7674    2533.247
                                       |
                            group#year |
                               1 1936  |   -577.234    1047.65    -0.55   0.582    -2646.239    1491.771
                               1 1937  |  -1013.082    1047.65    -0.97   0.335    -3082.087    1055.923
                               1 1938  |   -137.338    1047.65    -0.13   0.896    -2206.343    1931.667
                               1 1939  |   -538.874    1047.65    -0.51   0.608    -2607.879    1530.131
                               1 1940  |   -629.148    1047.65    -0.60   0.549    -2698.153    1439.857
                               1 1941  |   -607.966    1047.65    -0.58   0.563    -2676.971    1461.039
                               1 1942  |    -219.01    1047.65    -0.21   0.835    -2288.015    1849.995
                               1 1943  |   -361.054    1047.65    -0.34   0.731    -2430.059    1707.951
                               1 1944  |   -391.318    1047.65    -0.37   0.709    -2460.323    1677.687
                               1 1945  |   -564.012    1047.65    -0.54   0.591    -2633.017    1504.993
                               1 1946  |   -653.774    1047.65    -0.62   0.533    -2722.779    1415.231
                               1 1947  |   -208.828    1047.65    -0.20   0.842    -2277.833    1860.177
                               1 1948  |   -107.902    1047.65    -0.10   0.918    -2176.907    1961.103
                               1 1949  |   -155.794    1047.65    -0.15   0.882    -2224.799    1913.211
                               1 1950  |   -176.992    1047.65    -0.17   0.866    -2245.997    1892.013
                               1 1951  |   -555.374    1047.65    -0.53   0.597    -2624.379    1513.631
                               1 1952  |   -531.658    1047.65    -0.51   0.613    -2600.663    1537.347
                               1 1953  |  -802.5001    1047.65    -0.77   0.445    -2871.505    1266.505
                               1 1954  |   -679.538    1047.65    -0.65   0.518    -2748.543    1389.467
                                       |
                                 _cons |    1237.34   523.8248     2.36   0.019     202.8376    2271.842
                          ------------------------------------------------------------------------------
                          
                          . margins, dydx(group) over(year)
                          
                          Average marginal effects                                   Number of obs = 200
                          Model VCE: OLS
                          
                          Expression: Linear prediction, predict()
                          dy/dx wrt:  1.group
                          Over:       year
                          
                          ------------------------------------------------------------------------------
                                       |            Delta-method
                                       |      dy/dx   std. err.      t    P>|t|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                          0.group      |  (base outcome)
                          -------------+----------------------------------------------------------------
                          1.group      |
                                  year |
                                 1935  |  -1059.738   740.8002    -1.43   0.155    -2522.745    403.2694
                                 1936  |  -1636.972   740.8002    -2.21   0.029    -3099.979   -173.9646
                                 1937  |   -2072.82   740.8002    -2.80   0.006    -3535.827   -609.8126
                                 1938  |  -1197.076   740.8002    -1.62   0.108    -2660.083    265.9314
                                 1939  |  -1598.612   740.8002    -2.16   0.032    -3061.619   -135.6046
                                 1940  |  -1688.886   740.8002    -2.28   0.024    -3151.893   -225.8786
                                 1941  |  -1667.704   740.8002    -2.25   0.026    -3130.711   -204.6966
                                 1942  |  -1278.748   740.8002    -1.73   0.086    -2741.755    184.2594
                                 1943  |  -1420.792   740.8002    -1.92   0.057    -2883.799     42.2154
                                 1944  |  -1451.056   740.8002    -1.96   0.052    -2914.063    11.95143
                                 1945  |   -1623.75   740.8002    -2.19   0.030    -3086.757   -160.7426
                                 1946  |  -1713.512   740.8002    -2.31   0.022    -3176.519   -250.5046
                                 1947  |  -1268.566   740.8002    -1.71   0.089    -2731.573    194.4414
                                 1948  |   -1167.64   740.8002    -1.58   0.117    -2630.647    295.3674
                                 1949  |  -1215.532   740.8002    -1.64   0.103    -2678.539    247.4754
                                 1950  |   -1236.73   740.8002    -1.67   0.097    -2699.737    226.2774
                                 1951  |  -1615.112   740.8002    -2.18   0.031    -3078.119   -152.1046
                                 1952  |  -1591.396   740.8002    -2.15   0.033    -3054.403   -128.3886
                                 1953  |  -1862.238   740.8002    -2.51   0.013    -3325.245   -399.2307
                                 1954  |  -1739.276   740.8002    -2.35   0.020    -3202.283   -276.2686
                          ------------------------------------------------------------------------------
                          Note: dy/dx for factor levels is the discrete change from the base level.
                          Being time-invariant, group is a between-panel effect and it has no within-panel effect. Therefore its effect is not estimable in a fixed-effects (within-panel effects only) model.

                          If your x-variable is time-invariant, then this is the source of your problem. But I had the impression that it is not.

                          Comment


                          • #14
                            My variable, x, is indeed time invariant within distcode, just as colgrad is time invariant within the nls idcode. In this case, x is aquifer depth within Indian districts; that is fixed within a district, such that you obviously wouldn't include aquifer depth within a district fixed effect model alone. But the *impact* of aquifer coverage on my outcome may change over time, hence the time interaction being interesting. This is not an uncommon use of fixed effects for economists -- e.g., the ReSTAT by Bhalotra and Rawlings, "Gradients of the intergenerational transmission of health in developing countries", where she examines changes in the association between (mother-fixed) maternal health and child health over time, using maternal fixed effects.

                            In sum, this use of fixed effects it is both theoretically acceptable and fairly widespread. I just didn't explain what I was doing explicitly at the top of my post, as perhaps I ought. Why xtreg, fe drops one of the time-x interactions I'm not sure -- it may be theoretically necessary, but I thought about the math a bit, and I don't think so. (Also, as you can see in my reproducible example, reghdfe does not drop that last time-x interaction unless x is specified as continuous.) If it was theoretically necessary, it would be because that particular interaction was implicitly accounted for by other level variables, such as the time levels. No matter what, the marginal effects are theoretically present.
                            Last edited by Leah Bevis; 11 May 2023, 09:49.

                            Comment


                            • #15
                              You get more stability with reghdfe (SSC) if you estimate the coefficients on the time variable rather than absorbing them. Have you tried this?

                              Code:
                              webuse nlswork, clear
                              reghdfe ln_wage i.collgrad##i.year, absorb(idcode)
                              margins, dydx(collgrad) over(year)

                              Comment

                              Working...
                              X