Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effect difference-in-differences model

    Hi everyone,

    I have a question about the difference-in-differences (DID) model with fixed effects.
    According to my understanding there are two kinds of DID model:

    1) Y=a0+a1*TREAT+a2*POST+a3*TREAT_POST+e
    2) Y=a0+a1*TREAT_POST+time fixed effects+firm fixed effects

    Here TREAT is an indicator variable that represent a group of firms that will be affected by a policy.
    POST represent the periods when the policy was introduced.
    TREAT_POST=1 for the TREAT firms in the POST period.

    According to my understanding, Model 2 is better if there are possible omitted time-invariant and time-specific variables.
    I also know that in Model 2 TREAT and POST indicator variables should usually be dropped.
    (please correct me if I'm wrong)

    However, I find that I can actually run regression with:

    Y=a0+a1*TREAT+a2*POST+a3*TREAT_POST+firm and time fixed effects
    Stata didn't report any warnings about collinearity problem, and the coefficients a1 and a2 are both significant with reasonable signs.

    So my questions are:
    1) Should we always drop the TREAT and POST if fixed effects are included?
    2) How to explain the coefficients on TREAT and POST when fixed effects are included?

    In my case, I have
    206 TREAT firms and 53 CONTROL firms, 2 pre-year and 2 post-year.
    I'm using Stata 14.0.

    Thanks a lot!!!

    Best,
    Yiting

  • #2
    I also know that in Model 2 TREAT and POST indicator variables should usually be dropped.
    (please correct me if I'm wrong)
    It's not so much that they should be dropped as that they inevitably will be dropped. The TREAT variable will be colinear with the firm fixed effects, and the POST variable will be colinear with the time fixed effects.

    However, I find that I can actually run regression with:

    Y=a0+a1*TREAT+a2*POST+a3*TREAT_POST+firm and time fixed effects
    Stata didn't report any warnings about collinearity problem, and the coefficients a1 and a2 are both significant with reasonable signs.
    Then something is wrong! Look at your output carefully. It may be that instead of dropping TREAT and POST, Stata chose to drop one of the firm fixed effects and one of the time fixed effects. Are you sure that all of the firms and times (except for one reference category for each of those) are represented in the output? If nothing at all has been omitted, the absence of colinearity implies that the TREAT and POST variables are not correctly coded in your data.

    If, as I suspect, Stata has simply dealt with the colinearity by dropping some of the fixed effects you can overcome that by using the appropriate -xt- regression command. So, perhaps

    Code:
    xtset firm time
    xtreg Y i.TREAT##i.POST i.time
    Stata will prefer to drop the "main effect" of TREAT and retain the firm fixed effects that way. With regard to time effects it is not guaranteed, but likely that Stata will choose to drop POST. If that doesn't happen, you can force the issue with:
    Code:
    xtset firm time
    xtreg Y i.TREAT#i.POST i.time // NOTE SINGLE #, NOT ##
    where you are explicitly directly omitting the "main effects."

    According to my understanding, Model 2 is better if there are possible omitted time-invariant and time-specific variables.

    Yes, if there are omitted variables of this nature. If there aren't, you can end up overfitting some of the noise in the data. I would say that, as a general rule, omitted time-invariant firm-specific effects are likely, time-specific firm-invariant effects somewhat less so.



    Comment


    • #3
      I was about to post the same question, but Clyde answered it. Thanks!

      Are the results interpreted the same across both Model 1 and Model 2? Meaning, because there is a firm fixed-effect in Model 2, is it still interpreted as the "difference between treated and non-treated"? Or is it "difference within the firm pre and post treatment"?

      Comment


      • #4
        Clyde,

        Thanks a lot!
        You are right, Stata dropped one more firm fixed effect and one more time fixed effect when I keep TREAT, POST and all the fixed effects.

        I'm only interested in the DID effect (i.e. the coefficient on TREAT_POST),and I'm pretty sure that there are some time-invariant firm-specific effects that I can not control and probably also the time-specific firm-invariant effects (since the industry that I'm focusing on experienced some other policy changes during the test period). Thus, I think in my case, I should probably go with Model 2. So code:

        reg Y TREAT_POST i.firm i.year, cluster(firm)

        TREAT_POST is the indicator variable for the firms affected by the policy in the post-period.

        Am I right?

        Thanks again!

        Best,
        Yiting

        Comment


        • #5
          In Model 1 from post #1, the "main effect" of TREAT is the expected difference in Y between treated and untreated firms when POST = 0, and the "main effect" of POST is the expected difference in Y between pre- and post-treatment epochs among the firms in the TREAT = 0 group.

          By using an interaction term, we are in fact stipulating that there is no such thing as the difference between treated and non-treated. We are working, rather, with a model in which there is one such difference in the pre-treatment epoch and another in the post-treatment epoch; in general they will be different. Similarly there is no single within-firm difference pre and post treatment: there is one such difference for the TREAT = 0 group and another for the TREAT = 1 group.

          After running the regression model, you can get direct output of these various effects with the commands:

          Code:
          margins, dydx(TREAT) at(POST = (0 1))
          margins, dydx(POST) at(TREAT = (0 1))
          The difference-in-differences estimator of the effectiveness of the treatment is, of course, given by the coefficient of the interaction term, and this is usually the focus of interest for statistical testing.

          Comment


          • #6
            Great. Thanks for clearing up my confusion.

            Comment


            • #7
              Yiting Cao
              reg Y TREAT_POST i.firm i.year, cluster(firm)

              TREAT_POST is the indicator variable for the firms affected by the policy in the post-period.

              Am I right?
              Yes, that is correct. But I still recommend you do it as shown in the last code block of #2, rather than generating your own TREAT_POST variable. The reasons are that 1) it's good to get into the habit of using factor notation routinely, and 2) you will be able to run the margins commands shown in #5, and also get other statistics that might be of interest from -margins-. Also, by using -xtreg- instead of -reg-, your output won't be cluttered with 206+53-1 firm effects that you probably have no interest in.

              Comment


              • #8
                Thanks Clyde! That is clear!

                Hi Arian,
                I think the coefficient of interaction term TREAT_POST in Model 2 also measures the Difference in Differences.
                For example, in this paper:
                http://www.mitpressjournals.org/doi/...2/REST_a_00049

                Best,
                Yiting

                Comment


                • #9
                  Clyde Schechter

                  Sure! I will definitely try your method!

                  Many thanks!

                  Comment


                  • #10
                    Hi there!

                    I am just wanted to ask a question:

                    A Fixed Effect regression will eliminate omitted variables biases that are time-invariant
                    But when I add:
                    -A Firm ID effect, I take out all possible omitted Firm ID specific effects.
                    -And a year effect to take out possible omitted time variant effects.
                    Is this correct?

                    And also if my POST and TREATMENT variable get omitted will this still be a difference in differences model?
                    As I will have the same coefficient/result as if I did a normal fixed effect regression for example

                    Best Regards,

                    JD

                    Comment


                    • #11
                      When you include a firm ID effect in the model, you eliminate any confounding that might be caused by effects (observed or unobserved) that are constant over time within each firm.

                      When you include a year effect, you eliminate any confounding that might be caused by effects (observed or unobserved) that are constant across all firms within each year.

                      If you include these both, you eliminate entirely both the treatment group effect (which is constant within firms over time) and the pre-post effect (which is constant across firms within years, at least in your design). Both the TREAT and POST variables will be dropped. So your model can no longer estimate the impact of the intervention when you do this: it is a ghost of a difference-in-differences model and will provide you with no information about the intervention's impact.

                      Comment


                      • #12
                        Thanks Clyde!

                        One last thing that was worrying me is should I use:
                        xtset Firm ID year
                        or
                        xtset Firm ID

                        for my FE in this case?
                        As I am worried xtset Firm ID year will take out some effects of my dummy variables.

                        Best,

                        JD

                        Comment


                        • #13
                          For the purposes of specifying what variables will be included as fixed effects by -xtreg-, it doesn't matter. -xtreg- ignores -xtset-'s time variable: only the panel variable is incorporated as a fixed effect in the model. The time variable is used only one you use Stata's lead, lag, and difference operators, or when using estimation commands that allow the fitting of autoregressive error structure and the like. If you will be using those things, then you need to set the time variable, but just to run a fixed effects regression, you don't need it (but it won't hurt you to have it.)

                          Note, by the way, that your variable name cannot contain blanks. -xtset Firm ID year- will be a syntax error because it looks to Stata as if you are specifying three variables. -xtset Firm ID- will cause Stata to set Firm (or a unique variable whose name begins with Firm) as the panel identifier and will set the variable ID, if it exists, as the time variable. If there is no variable named ID (or no unique variable starting with ID) then this, too, will be a syntax error.

                          Comment


                          • #14
                            Hi again!

                            Thanks a lot clyde!

                            One last question, I did the Hausman test and it tells me that the FE regression is better than the RE regression although I struggle to see the difference between the two?
                            Does random effect not take into account endogeneity? Is that the only difference?
                            Thus if I believe that there is unobserved heterogeneity in my regression, I should use FE and not RE? Is this statement correct!


                            Thank you so much!

                            Bets Regards,

                            JD


                            Comment


                            • #15
                              My understanding of the difference between fixed and random effects estimators has nothing to do with endogeneity. As I understand it, for the random effects estimator to be consistent requires the assumption that the panel-level effects are independently and identically normally distributed, and they are independent of the covariates specified in the model. These assumptions are, in practice, sometimes false. The fixed effects estimator does not require these stringent assumptions and is more broadly consistent. However, the random effects estimator is more efficient (i.e. produces more precise estimates of the model coefficients) if it is consistent--which is why it is preferable to use it when it is not inconsistent. The Hausman test compares the results of the two models, and this indirectly tests whether the assumptions necessary for consistency of the random effects estimator are met.

                              I should add that in economics there is, from what I have seen, a strong preference for consistent estimators and less regard for efficiency, so that random effects models are nearly always rejected if they do not pass the Hausman test. In some other fields, the traditions are different and if the results of the two estimators appear reasonably similar, a random effects model will be used even if it fails the Hausman test. (Particularly if the sample size is very large, so that the Hausman test has power to pick up tiny but immaterial departures from the assumptions.)

                              Comment

                              Working...
                              X