Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-Differences with Multiple Groups and Time Periods - Data Organisation and Regression Design

    I am trying to estimate the effect that merger activity has on company employment, profitability and productivity. I have one pre-treatment year and three post-treatment years. I want to find treatment effects in the following format*:

    a) t - (t -1)
    b) (t + 1) - (t - 1)
    c) (t + 2) - (t - 1)
    d) (t + 3) - (t - 1)

    *Where treatment year is year t and so on.

    How would I specify a regression to do this for employment for example?

    Secondly, my data is currently tabulated in an unhelpful format. It has been grouped by time period:

    w -------- y ------- z
    x ---------x----------x
    x--------- x--------- x
    x--------- x ---------x

    *Where w, y, z refer to the time periods t-1, t, t+1 etc.

    Intuitively this format would allow for a simple manual calculation of the DID estimators but I am keen to produce error terms and evaluate significance hence my interest in carrying out a regression.

    Any ideas on how to quickly manipulate the data into a more suitable format? I do not particularly want to transpose each line of the data one by one.

    Thanks in advance.

    Last edited by Kieron Spiiter; 09 Apr 2016, 12:05.

  • #2
    So the data is currently in wide layout. As you have already perceived, for analysis in Stata it needs to be in long form. For that you will need the -reshape- command. -reshape- is one of the most important data management tools in Stata and if you are going to be using Stata more than once, you need to learn how to use it. The manual section in the [D] user's manual is extensive with lots of good examples. That said, it's a little bit like riding a bicycle: you have to actually practice using it for a while to really get the hang of it. It seems difficult at first, but at some point it "clicks" and you just know what to do with it from that point on.

    Suggest you begin by importing your data into Stata and then giving -reshape- a try. If you get stuck, post back showing 1) a small representative sample of your data, 2) the commands you tried, and 3) what Stata did in response. Also, if it isn't obvious, show how what you got from Stata differs from what you wanted. To post sample data, please use the -dataex- command. You can install it by running -ssc install dataex-, and then read the simple instructions at -help dataex-. This is the kind of problem where anyone helping you out will probably want to test out some commands on your data, and -dataex- is the best way (really, the only consistently good way) to give data in a form that those responding can easily use to replicate your data example quickly and faithfully. To post the commands you tried and Stata's response, since the details are very important, you should copy from the Results window or your log file directly to the clipboard and then paste into a code block on this Forum. For instructions on setting up a code block, see FAQ #12 paragraph 7. Please don't make any edits to the code or the output: every little detail is important.

    From your description it isn't clear that you actually have a DiD design here. You make no mention of a control group with no mergers. Without a control group you have only a pre-post design. So you need to be clear about whether you do or not: the analysis is different.

    Comment


    • #3
      It's actually better not to create the interaction term(s) manually. If you let Stata do it for you, with factor variable notation, you will be able to use -margins- after you run your regression, which will make it much easier to understand the results. I assume your treatment variable is coded 0/1, and is 1 in every observation for any treated firm, and is 0 in every observation for any untreated firm. I assume your treatment periods are coded in a variable, era, which takes on value 0 in the pre-treatment era, 1, during treatment period 1, 2, during treatment period 2, and 3 during treatment period 3.

      Then I would do the regression as:

      Code:
      xtset id year
      xtreg y i.treatment##i.era, fe
      // AND THEN TO MAKE THE RESULTS MORE UNDERSTANDABLE
      margins treatment#era
      A few other thoughts.

      1. You may have covariates you want to include in the regression. That's fine. Remember, though, that if they don't vary over time within id, they will be colinear with the fixed effects and will be dropped. Effects of such variables are not estimable in a fixed-effects model.

      2. The treatment variable itself does not vary over time within id, so it will be omitted due to colinearity in the output. That is normal and is not a problem. Your treatment effects are actually represented by the interaction terms.

      3. If you decide to include i.year as covariates to control for year-on-year shocks to your outcome, be aware that in addition to losing 1 year as a base category (the earliest year, unless you specify otherwise), you will also lose 3 other years due to colinearity with the 3 i.era variables. This, too, is normal and you shouldn't be concerned about it.

      4. The -margins- command will show you your model's predicted values of unemployment, adjusted to the distributions of the other covariates in the model, in each combination of treatment and era. -margins- can do a lot more than that, too. See the -margins- section of the users manual, or, for a quicker, very clear introduction, see Richard William's excellent article in the Stata Journal, http://www.stata-journal.com/article...article=st0260.

      5. For the specific purpose of contrasting each of the three treatment periods with the untreated periods in both groups, you can run an additional -margins- command: -margins, dydx(i.era) at(treatment = (0 1))

      Last edited by Clyde Schechter; 14 Apr 2016, 16:42.

      Comment


      • #4
        The output you show is jumbled and unreadable because you did not put it in a code block. Also, it isn't possible to interpret the -margins- output without also seeing the exact margins command and the regression command that preceded it. So please copy all of those from your Results window or log file and past them into a code block. If you do not know how to create a code block, see FAQ #12, 7th paragraph for the simple instructions.

        Comment


        • #5
          What should I take as the average treatment effect on treated here?
          None of the output shown gives you that. The -margins- results you have are expected values of y in each combination of treat and era. They are not marginal effects, let alone average marginal effects.

          TO get marginal effects of treatment in the different eras you would use

          Code:
          margins era, dydx(treat)
          To get average marginal effects of treatment irrespective of era
          Code:
          margins, dydx(treat)
          That said, looking at your regression output, the coefficient of 1.treat#4.era is really strikingly different from that of the other interaction terms. This suggests that the treatment effect is really very different in different eras. Consequently, I think that the average marginal effect would be a rather misleading statistic--sweeping the difference among eras under the rug.

          As an aside, I'm disturbed by the not estimable results for the 1.treat#0.era and 0.treat#0.era margins. Do you have non-missing data for both treatment and control entities in era 0? If not, you don't have a complete DID design.

          Comment


          • #6
            Yes, this comes up frequently with these DID designs. The problem is that there are empty interaction cells between the fixed effects and the treatment variable because each panel (id) occurs only with treat = 0 or only with treat = 1 in these designs. That makes the marginal effect as Stata usually defines it in -margins- non-estimable, because that requires averaging over all possible combinations and the information on some combinations isn't in the data. However, precisely because this is by design, the missingness of that information is a structural aspect of the model. For this reason, it is legitimate in this circumstance to use the -noestimcheck- option.

            Code:
            margins, dydx(treat) noestimcheck
            Now, again, by calculating this average marginal effect, you are averaging together several era-specific treatment effects that are actually rather different from each other in your results, so I fear that this result will be somewhere between meaningless and misleading. At the very least, you probably want to exclude era 0, when the treatment was actually not in use:
            Code:
            margins if era > 0, dydx(treat) noestimcheck
            And, again, seeing how different the 1.treat#4.era interaction term is from the others, I'm still uncomfortable even with that. I really think it is better to stick to era-specific effects:

            Code:
            maragins era, dydx(treat) noestimcheck

            Comment


            • #7
              No. The output of -margins era, dydx(treat) noestimcheck- is the average treatment effect in all eras. I don't think it's very meaningful. That's why I suggested you go instead with
              Code:
              margins if era > 0, dydx(treat) noestimcheck
              
              // OR
              
              margins era, dydx(treat) noestimcheck
              The first of these will give you the average effect of treatment (vs. the no-treatment group) during the three eras when the treatment was actually in effect. This number might have some meaning, although, as I have said before, I am skeptical of averaging era 4 with the others because it looks very different.

              The second one will give you the effect of treatment (vs the no-treatment group) during all four eras.

              If what you are interested in is the contrast of effect of each era vs era 0, that is different:

              Code:
              margins treat, dydx(era) noestimcheck
              That will give you the contrast of each era 1-4 vs era 0, separately for the treat and no-treat groups. (And you can disregard the results for the no-treat group if they are not of interest.)

              Comment


              • #8
                No, you're misunderstanding the table. The 1030.081 is the marginal effect of era1 vs era0 among the non-treatment group. The 2552.737 is the marginal effect of era1 vs era0 among the treatment group. So for both the treatment and no-treatment groups you are looking at the change in outcome between baseline (era0) and a subsequent observation periods in each panel of the table.

                If you want the marginal effects of treatment (expected outcome difference treatment vs no-treatment) in each era, the code is the reverse of what you ran::

                Code:
                margins era, dydx(treat) noestimcheck
                Last edited by Clyde Schechter; 23 Apr 2016, 17:26. Reason: Correct typo.

                Comment


                • #9
                  I don't understand this. (A-B)/B is, as far as I can see, a meaningless statistic. In fact, unless A and B are both scaled in the same units, you can't even make sense out of A-B.

                  If you are looking for (Y1-Y0)/(X1-X0), that is what -margins, dydx()- gives you, in absolute terms. If you want [(Y1-Y0)/Y0]/[(X1-X0)/X0], which economists often refer to as elasticity, you can get that with -margins, eyex()- Be aware, though, that with a 0/1 dichotomy such as treat or era, elasiticity is undefined, because X0 = 0 and you can't divide by it. If you want [(Y1-Y0)/Y0]/(X1-X0), -margins, eydx()- will do that.

                  Comment


                  • #10
                    Generally speaking, when we write down equations for our models and use subscripts, the idea is that the combination of all subscripts uniquely identifies observations. I think you are making a mistake using t to index era in this data structure. Rather I think t should index year, and then the era variable would be subscripted by t.

                    Comment

                    Working...
                    X