Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Omission of Post variable in panel data difference-in-differences model with time-fixed effects

    Hi all,
    I am working on ex-ante impact of bankruptcy reform that took place in 2016. I am observing whether the treated group firms (which i construct based on baseline period and a firm remains in that group across all the years including endline period) observe any treatment impact on the firm debt. It may be possible in this case that the reform may even impact the control group firms.


    For this problem i have observed two models in the literature :

    1. Debtit = b0 + b1 postt*treatedi + b2 postt + b3 controlit-1 + firm fixed effect + time fixed effect + Eit (very few)

    2. Debtit = b0 + b1 postt*treatedi + b2 controlit-1 + firm fixed effect + time fixed effect + Eit (popular one)


    Since there is firm fixed effect so dropping the treated variable seems right. But considering that time fixed effect has already captured the impact of post (or there is very high correlation between the two) it is better to drop post as if we drop time fixed effect and keep the post then the non reform time related effect may not be controlled and hence can impact results. It seems that the popular one is right.
    1. Please confirm what is right to do and whether it makes any difference in case of balanced and unbalanced panel.
    2. Is it common, that when i observe the summary table i find that post reform the control group observes increased debt whereas the treated group observes decreased debt but using the second model i observe a positive sign for b1. I understand that many times after controlling for variables, firm fixed effects and time fixed effects the results may change. But it seems little weird to observe opposite result.
    3. While running the subsample analyis
    Debtit = b0 + b1 postt+ b2 controlit-1 + firm fixed effect + Eit (if control)
    Debtit = b0 + b1 postt+ b2 controlit-1 + firm fixed effect + Eit (if treated)


    b1 is insignificant for control group but negatively significant for treated group.

    Thanks and Regards
    Last edited by Pranshu Tripathi; 23 Aug 2025, 13:21.

  • #2
    The two models you show are actually equivalent. In model 1, either the post variable or one of the time indicators will always be dropped to break the colinearity among them. I think the popularity of the second model arises from the fact that the output resembles the input directly and you are not forced to think about the resolution of that colinearity, nor to explain the omission of one of the variables when you report your results.

    ...if we drop time fixed effect and keep the post then the non reform time related effect may not be controlled and hence can impact results.
    This is not true. It is true that you get different coefficients for the time indicators depending on whether you drop post or one of the time indicators, and which time indicators in the latter case. But because the time indicators and post variable and constant term are all colinear, all of the information needed to adjust for (not "control"--in an observational study you cannot control anything) those effects is contained in any subset that omits one. In fact, if you try it both ways and run -predict- after each, you will see that regardless of what variable you choose to omit, the predicted values from the models are all the same--so the adjustment is just fine regardless.

    when i observe the summary table i find that post reform the control group observes increased debt whereas the treated group observes decreased debt but using the second model i observe a positive sign for b1. I understand that many times after controlling for variables, firm fixed effects and time fixed effects the results may change. But it seems little weird to observe opposite result.
    To get this cleared up you will have to show the specific "summary table" you are referring to, and also clarify how the treated and post variables are coded. That said, if the summary table is just something like a tabulation of the mean debt levels in the treated and untreated groups before and after the change, there is nothing at all weird about the sign being in the opposite direction. Conditional inference (regression model) is different from marginal inference (simple results table), and the difference can include opposite signs. This phenomenon is known as Simpson's paradox (usually applied when discussing discrete variables) or Lord's paradox (in the context of regression modeling of continuous outcomes). Wikipedia has good explanations of these.

    Comment


    • #3
      Dear Clyde,
      This is the simple table with classification

      total pre
      Variable Obs Mean Std. Dev. Min Max
      td (dep var) 2855 .413 .201 .011 1.366
      psg 2855 .179 .375 -.565 1.617
      psize 2855 7.141 1.758 2.262 14.328
      ptang 2855 .298 .2 .002 .786
      pliq 2855 .073 .194 -.745 .732
      pRoA 2855 .07 .053 -.176 .251
      pndts 2855 .027 .021 .001 .089
      Post 2855 0 0 0 0
      Treat 2855 .466 .499 0 1
      total post
      Variable Obs Mean Std. Dev. Min Max
      td (dep var) 3674 .395 .21 .011 1.366
      psg 3674 .092 .335 -.565 1.617
      psize 3674 7.573 1.763 3.336 14.824
      ptang 3674 .279 .195 .002 .786
      pliq 3674 .082 .218 -.745 .732
      pRoA 3674 .063 .065 -.176 .251
      pndts 3674 .026 .02 .001 .089
      Post 3674 1 0 1 1
      Treat 3674 .505 .5 0 1
      control pre
      Variable Obs Mean Std. Dev. Min Max
      td (dep var) 1525 .27 .125 .011 1.366
      psg 1525 .15 .408 -.565 1.617
      psize 1525 7.238 1.843 3.118 13.832
      ptang 1525 .286 .201 .002 .786
      pliq 1525 .078 .208 -.745 .732
      pRoA 1525 .043 .048 -.176 .251
      pndts 1525 .025 .02 .001 .089
      Post 1525 0 0 0 0
      Treat 1525 0 0 0 0
      control post
      Variable Obs Mean Std. Dev. Min Max
      td (dep var) 1817 .293 .172 .011 1.366
      sb 1817 .189 .131 .005 .712
      psg 1817 .082 .368 -.565 1.617
      psize 1817 7.614 1.89 3.346 14.135
      ptang 1817 .283 .2 .002 .786
      pliq 1817 .061 .231 -.745 .732
      pRoA 1817 .034 .061 -.176 .251
      pndts 1817 .024 .019 .001 .089
      Post 1817 1 0 1 1
      Treat 1817 0 0 0 0
      treat pre
      Variable Obs Mean Std. Dev. Min Max
      td (dep var) 1330 .578 .134 .146 1.366
      psg 1330 .212 .33 -.565 1.617
      psize 1330 7.03 1.65 2.262 14.328
      ptang 1330 .312 .197 .002 .786
      pliq 1330 .068 .178 -.745 .712
      pRoA 1330 .101 .042 -.176 .251
      pndts 1330 .029 .021 .001 .089
      Post 1330 0 0 0 0
      Treat 1330 1 0 1 1
      treat post
      Variable Obs Mean Std. Dev. Min Max
      td (dep var) 1857 .494 .194 .019 1.366
      psg 1857 .103 .298 -.565 1.617
      psize 1857 7.533 1.628 3.336 14.824
      ptang 1857 .276 .19 .002 .786
      pliq 1857 .103 .202 -.745 .714
      pRoA 1857 .091 .056 -.176 .251
      pndts 1857 .028 .02 .001 .089
      Post 1857 1 0 1 1
      Treat 1857 1 0 1 1
      It shows the mean of the variables. I understand that the adjusting for the other predictors the result may change. But again when i am running the commands

      1. xtreg td i.Post#i.Treat sg psize ptang pliq pRoA pndts i.t,fe vce(robust) (where treat is time invarying group variable which includes treated and control dummies measured for only baseline period and post is pre and post policy dummies )
      2.xtreg td i.Post##i.Treat sg psize ptang pliq pRoA pndts i.t,fe vce(robust) (where treat is time invarying group variable which includes treated and control dummies measured for only baseline period and post is pre and post policy dummies )

      The correlation between Post and Year is 0.83
      The results for these models are as follows
      (1) (2)
      VARIABLES non Post with Post
      1.Post -0.0468***
      (0.0123)
      0b.Post#0b.Treat 0.0000 0.0000
      (0.0000) (0.0000)
      0b.Post#1o.Treat 0.0000
      (0.0000)
      1o.Post#0b.Treat 0.0000
      (0.0000)
      1.Post#1.Treat -0.0947***
      (0.0086)
      0b.Post#1.Treat 0.0947***
      (0.0086)
      sg -0.0191*** -0.0191***
      (0.0041) (0.0041)
      psize -0.0030 -0.0030
      (0.0081) (0.0081)
      ptang 0.0230 0.0230
      (0.0287) (0.0287)
      piq -0.1945*** -0.1945***
      (0.0256) (0.0256)
      pRoA -0.4736*** -0.4736***
      (0.0442) (0.0442)
      pndts 0.2112 0.2112
      (0.2031) (0.2031)
      5.t -0.0055 -0.0055
      (0.0051) (0.0051)
      6.t -0.0173*** -0.0173***
      (0.0055) (0.0055)
      7.t -0.0205*** -0.0205***
      (0.0066) (0.0066)
      8.t -0.0301*** -0.0301***
      (0.0072) (0.0072)
      9.t -0.0379*** -0.0379***
      (0.0077) (0.0077)
      10.t -0.0066 0.0402***
      (0.0085) (0.0082)
      11.t -0.0113 0.0355***
      (0.0090) (0.0077)
      12.t -0.0063 0.0405***
      (0.0097) (0.0069)
      13.t -0.0136 0.0332***
      (0.0102) (0.0066)
      14.t -0.0353*** 0.0115*
      (0.0106) (0.0059)
      15.t -0.0330*** 0.0138***
      (0.0115) (0.0051)
      16.t -0.0314*** 0.0155***
      (0.0121) (0.0041)
      17.t -0.0468***
      (0.0123)
      Constant 0.4316*** 0.4778***
      (0.0589) (0.0575)
      Observations 10,589 10,589
      R-squared 0.2233 0.2233
      Number of firms 906 906
      firm fixed effect YES YES
      Time fixed effect YES YES

      Robust standard errors in parentheses
      *** p<0.01, ** p<0.05, * p<0.1





      In the first model where i use only i.Post#i.Treat i have positive coefficient 0.0947** but in the second model when i use i.Post##i.Treat then the coefficient is -0.947***. I am stuck in this. Even if 17.t is omitted as it is captured by i.Post in model 2 result, why does the sign changed?
      Last edited by Pranshu Tripathi; 25 Aug 2025, 16:08.

      Comment


      • #4
        Thanks for the extra detail. I had not understood from your original post what you were driving at; I thought you were concerned about a sign flip attributable to the presence or absence of covariates. But you have covariates in both models. So Simpson's/Lord's paradox is not the issue here.

        You are comparing two different things. One of them is (1) is 0.Post#1.Treat and the other (2) is 1.Post#1.Treat. So in (1) you are looking at differences for 0.Post - 1.Post and in the other you are looking at differences for 1.Post - 0.Post. So you are getting exactly what would be expected for that: a sign flip.

        Comment


        • #5
          Dear Clyde,

          Sorry for adding to the confusion. here’s my question stated clearly.

          Setup
          • post is coded 0 for years < 2017 and 1 for years ≥ 2017. considering treat each firm is classified once (based on its average pre-2017 characteristics) and retains that status in all years .
          • TWFE regression with unit FE (fe) and time FE included via i.t.
          Model 1
          xtreg td i.post##i.treat sg psize ptang pliq pRoA pndts i.t, fe vce(robust

          Expectation: In a standard DiD with unit FE + year FE, both i.post and i.treat should be collinear with the fixed effects and drop.
          Observation: Stata drops i.treat but does not drop i.post.
          Question 1: Why is i.post not being dropped here?

          Model 2 (interaction plus post main effect)
          xtreg td i.post#i.treat i.post sg psize ptang pliq pRoA pndts i.t, fe vce(robust)
          Observation: The coefficient on the interaction term changes sign relative to Model 1.
          Question 2: If I accept that i.post stays in the model, why does the sign of i.post#i.treat flip between these two specifications? And how do I interpret Post in this model

          Thanks.
          Last edited by Pranshu Tripathi; 25 Aug 2025, 18:12.

          Comment


          • #6
            Question 1: Why is i.post not being dropped here?
            The colinearity involves all of the t indicators and the post variable. To break the colinearity, it is enough to drop any one of them. If you look carefully at your output, you will see that Stata elected to drop 17.t instead of post. It doesn't matter. All estimable functions of the model parameters are the same regardless of how the colinearity is broken.

            If I accept that i.post stays in the model, why does the sign of i.post#i.treat flip
            I answered this in the second paragraph of #4. Please re-read that.

            And how do I interpret Post in this model
            It is not interpretable at all. More generally, whenever you have a situation where a set of colinear variables is used in a model and the colinearity is then broken by omitting one (or by some other 1 df constraint), the surviving variables' coefficients are meaningless. They are just arbitrary numbers that balance things out so that all of the outcome predictions of the model will be correct. But they cannot be related to any real-world things. In fact, if we look beyond just omitting variables to break the colinearity and consider using other types of 1 df constraints, it is possible to show mathematically that you can always find a constraint that will produce any pre-specified value for the coefficient of a chosen variable. So such numbers are entirely without real-world meaning. They are artifacts of the particular way in which the colinearity gets broken, nothing more.

            Comment


            • #7
              Thanks Clyde.
              I understood everything related to first model and most of things related to second model. First model seems easier to interpret and the following summary is related to first model only.

              1. I should always use ## to make things easy
              2. In the ## model stata has dropped i.post component and it has rather showed that as the coefficient of a time indicator component
              2. i.post is not interpretable as in normal moderation cases (like i.post shows the impact of policy on control group)
              3. I should not report i.post in my study explicitly, rather I should just mention that I have used TWFE.

              Are these points correct?
              Thanks a lot.
              Last edited by Pranshu Tripathi; 25 Aug 2025, 20:17.

              Comment


              • #8
                Certainly numbers 2, 2, and 3 are correct. I'm not sure I would say that use of ## is always easier, but it is in most circumstances, so let's count number 1 as 95% correct.

                Comment


                • #9
                  Thanks Clyde.
                  These issues were haunting me for a long time. You are a savior.
                  Regards

                  Comment

                  Working...
                  X