Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A puzzle about -reg-

    Dear Statalists,

    I have one question regarding the -reg-. It returns a result that seems counterintuitive to me. I would appreciate it if someone could point out something that I missed here.

    Outcome variable Y at the US tract level
    Regressor of interests X at the US state level
    State name variable v_StateAbb

    I confirm that X has no variations within a state
    Code:
    tab v_StateAbb , su(X)
    
                |  Summary of AnnualTrendinCVSshares
          State |        Mean   Std. dev.       Freq.
    ------------+------------------------------------
             AK |   .37829286           0          53
             AL |  -.38800469           0         690
             AR |  -1.5852451           0         403
             AZ |    .5369724           0         755
             CA |   1.9787552           0       3,646
             CO |  -.00790215           0         583
             CT |   .87569064           0         400
             DC |   .19462891           0         104
             DE |   2.1682057           0         123
             FL |   .34600583           0       2,674
             GA |  -1.1858838           0       1,250
             HI |  -.63364446           0         121
             IA |   .51249278           0         437
             ID |  -.11874523           0         193
             IL |   1.1629089           0       1,394
             IN |  -.40260735           0         699
             KS |   .46364346           0         407
             KY |   .07007198           0         585
             LA |  -.29478988           0         631
             MA |   .65337485           0         752
             MD |   .01431732           0         678
             ME |   .87541354           0         169
             MI |  -.23156051           0       1,422
             MN |   .26225701           0         572
             MO |  -.43804073           0         747
             MS |  -.88524896           0         367
             MT |   -.0210947           0         147
             NC |   .22245559           0       1,135
             ND |   1.3994192           0         119
             NE |  -.11140289           0         279
             NH |   .49829119           0         141
             NJ |   .27578011           0       1,226
             NM |   .15972424           0         198
             NV |   1.8037741           0         319
             NY |  -.87026596           0       2,791
             OH |   .60460913           0       1,366
             OK |  -.18099365           0         522
             OR |   .32661819           0         384
             PA |  -.10615063           0       1,660
             RI |   .07889557           0         118
             SC |   .01788273           0         592
             SD |   .08945541           0         111
             TN |  -.31160012           0         753
             TX |  -.18431321           0       3,156
             UT |  -.60220224           0         329
             VA |   .14057216           0         899
             VT |  -1.0188156           0          77
             WA |   .33728275           0         692
             WI |   .02243576           0         631
             WV |  -.49865133           0         274
             WY |   .51382083           0          68
    ------------+------------------------------------
          Total |   .18485678   .81794065      37,842
    When I include X and state dummies in a regression, Stata -reg- should not give me an estimate on X as suggested by the FWL theorem, but I actually have the following results.

    Code:
     reg Y X i.v_StateAbb
    note: 58.v_StateAbb omitted because of collinearity.
    
          Source |       SS           df       MS      Number of obs   =     9,655
    -------------+----------------------------------   F(50, 9604)     =      3.48
           Model |  4.90390359        50  .098078072   Prob > F        =    0.0000
        Residual |  270.859535     9,604  .028202784   R-squared       =    0.0178
    -------------+----------------------------------   Adj R-squared   =    0.0127
           Total |  275.763439     9,654  .028564682   Root MSE        =    .16794
    
    ------------------------------------------------------------------------------
               Y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
               X |   1.844638   .9464016     1.95   0.051    -.0105091    3.699785
                 |
      v_StateAbb |
             AL  |   1.452647    .801136     1.81   0.070    -.1177481    3.023043
             AR  |   3.622016   1.932963     1.87   0.061    -.1669999    7.411032
             AZ  |  -.2874431   .1004394    -2.86   0.004    -.4843256   -.0905606
             CA  |  -2.916875   1.442792    -2.02   0.043    -5.745052   -.0886981
             CO  |   .8158381   .4439045     1.84   0.066    -.0543083    1.685985
             CT  |  -.9001777   .4026835    -2.24   0.025    -1.689522   -.1108329
             DC  |    .377255    .256196     1.47   0.141    -.1249433    .8794533
             DE  |  -3.206503   1.622339    -1.98   0.048     -6.38663   -.0263755
             FL  |   .0838297   .1218526     0.69   0.491     -.155027    .3226865
             GA  |    2.89117   1.554955     1.86   0.063    -.1568707    5.939211
             HI  |   1.912112   1.033152     1.85   0.064    -.1130829    3.937307
             IA  |  -.2475502   .0877354    -2.82   0.005    -.4195302   -.0755703
             ID  |   1.166855   .5537868     2.11   0.035     .0813161    2.252394
             IL  |   -1.41285   .6723271    -2.10   0.036    -2.730753   -.0949466
             IN  |    1.45249   .8148669     1.78   0.075    -.1448211    3.049801
             KS  |  -.1574409   .0678766    -2.32   0.020    -.2904933   -.0243886
             KY  |   .5938723    .370954     1.60   0.109    -.1332758     1.32102
             LA  |   1.249468   .7132864     1.75   0.080     -.148724     2.64766
             MA  |  -.4566652   .1977022    -2.31   0.021    -.8442032   -.0691272
             MD  |   .7202919   .4227036     1.70   0.088    -.1082963     1.54888
             ME  |   -.878546   .4035683    -2.18   0.030    -1.669625    -.087467
             MI  |   1.135163   .6536229     1.74   0.082    -.1460762    2.416401
             MN  |   .2140441   .1943763     1.10   0.271    -.1669744    .5950626
             MO  |   1.521465   .8483378     1.79   0.073    -.1414564    3.184386
             MS  |   2.330777   1.270893     1.83   0.067    -.1604409    4.821995
             MT  |   .7367254   .4576467     1.61   0.107    -.1603587    1.633809
             NC  |   .3071262    .229894     1.34   0.182    -.1435145    .7577669
             ND  |  -1.883608   .8979758    -2.10   0.036     -3.64383   -.1233861
             NE  |   .9033112   .5413516     1.67   0.095    -.1578521    1.964475
             NH  |  -.1324646   .0792314    -1.67   0.095    -.2877748    .0228456
             NJ  |   .2406748   .1819778     1.32   0.186      -.11604    .5973897
             NM  |   .4031799   .2892051     1.39   0.163    -.1637232     .970083
             NV  |  -2.629496   1.277481    -2.06   0.040    -5.133628   -.1253645
             NY  |   2.360282   1.256556     1.88   0.060    -.1028333    4.823396
             OH  |  -.4121096   .1547749    -2.66   0.008     -.715501   -.1087182
             OK  |   1.031681   .6062353     1.70   0.089    -.1566682     2.22003
             OR  |   .3022176   .1411878     2.14   0.032     .0254598    .5789754
             PA  |    .915221   .5356034     1.71   0.088    -.1346748    1.965117
             RI  |   .5681525   .3628655     1.57   0.117    -.1431404    1.279446
             SC  |   .6933975   .4193797     1.65   0.098    -.1286752     1.51547
             SD  |   .5328004   .3655079     1.46   0.145    -.1836722    1.249273
             TN  |   1.296993   .7290961     1.78   0.075    -.1321894    2.726175
             TX  |   1.066651   .6090893     1.75   0.080     -.127293    2.260594
             UT  |    1.92987   1.003672     1.92   0.055    -.0375378    3.897278
             VA  |   .4708615   .3050879     1.54   0.123    -.1271752    1.068898
             VT  |   2.577159   1.398083     1.84   0.065    -.1633792    5.317697
             WA  |   .0756488    .131186     0.58   0.564    -.1815034     .332801
             WI  |   .6684756     .41537     1.61   0.108    -.1457372    1.482688
             WV  |   1.717644   .9057221     1.90   0.058    -.0577621    3.493051
             WY  |          0  (omitted)
                 |
           _cons |  -.6978133   .4359562    -1.60   0.109    -1.552379    .1567529
    ------------------------------------------------------------------------------

  • #2
    There is a colinearity among X, the constant term, and the state variables that leaves the model unidentified. To produce some results, (at least) one of those variable has to go. You were expecting that Stata would identify the model by removing X. Instead, it chose to remove WY. And, no, that's not the usual removal of one from a group of "dummy variables," because if you look carefully at your output, AK is also absent. Removal of AK is the standard removal of one from a group of "dummies"; removal of WY is the additional variable removed to identify the model.

    Remember that it doesn't matter which way Stata breaks the colinearity for the purpose of identifying the model. The resulting models are always just different parameterizations of the same model and all model-level results, and variable-level results for the variables not involved in the colinearity, are the same regardless. Crucially, you must also bear in mind that regardless of which variable is omitted to break colinearity (or if the colinearity is broken by imposing some other linear constraint(s)), the results for the variables involved in the colinearity are meaningless and should be disregarded.

    If you want Stata to remove X instead of WY, rewrite the command as -reg Y i.v_StateAbb X-, because Stata usually breaks colinearities by removing the involved variable closest to the end of the regressor variable list in the command. But I can't emphasize enough that there is no statistical reason to care which variable gets removed--it is purely a matter of taste because the effects of the retained uninvolved variables are not estimable anyway and the "results" for them are just artifacts of the particular way the colinearity got broken..
    Last edited by Clyde Schechter; 09 May 2025, 17:52.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      There is a colinearity among X, the constant term, and the state variables that leaves the model unidentified. To produce some results, (at least) one of those variable has to go. You were expecting that Stata would identify the model by removing X. Instead, it chose to remove WY. And, no, that's not the usual removal of one from a group of "dummy variables," because if you look carefully at your output, AK is also absent. Removal of AK is the standard removal of one from a group of "dummies"; removal of WY is the additional variable removed to identify the model.

      Remember that it doesn't matter which way Stata breaks the colinearity for the purpose of identifying the model. The resulting models are always just different parameterizations of the same model and all model-level results, and variable-level results for the variables not involved in the colinearity, are the same regardless. Crucially, you must also bear in mind that regardless of which variable is omitted to break colinearity (or if the colinearity is broken by imposing some other linear constraint(s)), the results for the variables involved in the colinearity are meaningless and should be disregarded.

      If you want Stata to remove X instead of WY, rewrite the command as -reg Y i.v_StateAbb X-, because Stata usually breaks colinearities by removing the involved variable closest to the end of the regressor variable list in the command. But I can't emphasize enough that there is no statistical reason to care which variable gets removed--it is purely a matter of taste because the effects of the retained uninvolved variables are not estimable anyway and the "results" for them are just artifacts of the particular way the colinearity got broken..
      I'm grateful for this detailed explanation. Thank you, Professor Schechter.

      Comment


      • #4
        Yugen:
        possibly OOT, but why not clustering your SEs at US States level?
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Yugen:
          possibly OOT, but why not clustering your SEs at US States level?
          Thank you, Dr Lazzaro! I wanted to keep my regression as simple as possible in the post. My question was about the coefficient estimation, so I was sloppy on the estimation for the variance matrix. But yes, a serious practice should use the cluster robust variance estimates at the state level.

          Comment

          Working...
          X