Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why does Stata override base level?

    Code:
            // TWFE regression
            fvset base $baseYear year
            reg `o' ib2010.year#c.Z ib2010.year i.$geo [aw=L${baseYear}], cluster($geo)
    I saw people say that is because I did not include the main effect, but I dont want to do that. How can I force Stata to do what I am asking it to?

  • #2
    So can you explain what is it what you want to do?

    Comment


    • #3
      Stata doesn't know that you really want to omit all levels of a factor, and it's a sensible default not to do so unless explicitly instructed to do so. This is because it's most often an error.

      Do you have a valid statistical reason to exclude the main effect of -z- ? If not, you model may not be interpretable.

      Compare the output of these two completely absurd, but equivalent, models to show how Stata handles the default case, and convince yourself they're the same models.

      Code:
      . reg price ib3.rep78##c.trunk
      
            Source |       SS           df       MS      Number of obs   =        69
      -------------+----------------------------------   F(9, 59)        =      1.21
             Model |  89992665.2         9  9999185.03   Prob > F        =    0.3053
          Residual |   486804294        59  8250920.23   R-squared       =    0.1560
      -------------+----------------------------------   Adj R-squared   =    0.0273
             Total |   576796959        68  8482308.22   Root MSE        =    2872.4
      
      -------------------------------------------------------------------------------
              price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      --------------+----------------------------------------------------------------
              rep78 |
                 1  |   5304.142    11917.1     0.45   0.658    -18541.92    29150.21
                 2  |   1526.613   4074.293     0.37   0.709    -6626.029    9679.255
                 4  |   3625.523   3009.916     1.20   0.233    -2397.305    9648.351
                 5  |   -2802.15   4634.127    -0.60   0.548    -12075.02    6470.716
                    |
              trunk |   332.4264   148.5569     2.24   0.029     35.16469     629.688
                    |
      rep78#c.trunk |
                 1  |  -578.7597   1362.207    -0.42   0.672    -3304.529     2147.01
                 2  |  -121.3617   263.6709    -0.46   0.647    -648.9658    406.2425
                 4  |  -251.5533   198.8251    -1.27   0.211    -649.4014    146.2947
                 5  |    310.197   372.4613     0.83   0.408    -435.0963     1055.49
                    |
              _cons |   1354.191   2327.813     0.58   0.563    -3303.752    6012.134
      -------------------------------------------------------------------------------
      
      . reg price ib3.rep78#c.trunk i.rep78
      
            Source |       SS           df       MS      Number of obs   =        69
      -------------+----------------------------------   F(9, 59)        =      1.21
             Model |  89992665.2         9  9999185.03   Prob > F        =    0.3053
          Residual |   486804294        59  8250920.23   R-squared       =    0.1560
      -------------+----------------------------------   Adj R-squared   =    0.0273
             Total |   576796959        68  8482308.22   Root MSE        =    2872.4
      
      -------------------------------------------------------------------------------
              price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      --------------+----------------------------------------------------------------
      rep78#c.trunk |
                 1  |  -246.3333   1354.082    -0.18   0.856    -2955.845    2463.178
                 2  |   211.0647   217.8375     0.97   0.337    -224.8271    646.9565
                 3  |   332.4264   148.5569     2.24   0.029     35.16469     629.688
                 4  |   80.87302   132.1449     0.61   0.543    -183.5482    345.2943
                 5  |   642.6234   341.5527     1.88   0.065    -40.82201    1326.069
                    |
              rep78 |
                 1  |   5304.142    11917.1     0.45   0.658    -18541.92    29150.21
                 2  |   1526.613   4074.293     0.37   0.709    -6626.029    9679.255
                 4  |   3625.523   3009.916     1.20   0.233    -2397.305    9648.351
                 5  |   -2802.15   4634.127    -0.60   0.548    -12075.02    6470.716
                    |
              _cons |   1354.191   2327.813     0.58   0.563    -3303.752    6012.134
      -------------------------------------------------------------------------------
      If you really want to omit the level of an interaction, you need the 'o' prefix which stands for "omit". Example below, and -help fvvarlist- for details.

      Code:
      * now level 3 is omitted from the interaction.
      reg price i.rep78#c.trunk ib3.rep78 o3.rep78#oc.trunk

      Comment


      • #4
        The way to program this is to use the "o.Z".

        is
        Code:
                 // TWFE regression         fvset base $baseYear year         reg `o' ib2010.year#c.Z ib2010.year i.$geo o.Z [aw=L${baseYear}], cluster($geo)
        The reason I do not include the main effect is because it is captured by the unit fixed effects. This is commonly done in DiD regressions. You will have unit fixed effect, time fixed effect and then an interaction between treatment dummy and time fixed effect. I dont want to include the base level because it is collinear with the unit fixed effects.

        Here is an example of a paper that runs such a regression: https://economics.mit.edu/sites/defa...lity_draft.pdf

        Comment


        • #5
          Thanks for your answer on the o. category!

          Originally posted by Leonardo Guizzetti View Post
          Stata doesn't know that you really want to omit all levels of a factor, and it's a sensible default not to do so unless explicitly instructed to do so. This is because it's most often an error.
          I don't share this position. Stata lets me run the regression which is presumably inappropriate, and it doesnt even give me any kind of warning or error message! Instead it just adds another issue by setting the base category to a different value than what the user wanted. Very weird behavior in my opinion. But good that there is at least a manual way to prevent it.

          Comment


          • #6
            I tend to agree with Henry Strawforrd on this one. If the user specifies a base group, Stata should respect it. Whether the main effect of the interacted variable is included or not should not matter.

            Note that just running a regression without an interaction is essentially a special case of a regression where the factor variable is interacted with the intercept. But even without intercept (the "main effect" in this case), Stata respects the choice of the base group, even though no base group is needed:
            Code:
            reg price ib3.rep78, nocons
            In my opinion, this should be brought to the attention of Stata's tech support.
            https://www.kripfganz.de/stata/

            Comment

            Working...
            X