Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generalized difference-in-difference - time varying shocks

    Dear all,

    I searched and find other posts (see below) on difference-in-difference topic, I want to ask a general setup question that might be of interests to many.

    below I present to data structures that are common for Generalized difference-in-difference, can someone give a one line stata command for testing it

    data structure 1: the shock is at industry level, sic is industry code, shock=1 signify industry experience a shock in that year, note, this is not the coded treat variable yet. my unit of analysis is firm year observation

    Code:
    clear
    input int(firmid sic year) byte shock
    1000 3000 1990 0 
    1000 3000 1991 1 
    1000 3000 1992 0 
    1000 3000 1993 0 
    1001 3100 1991 0 
    1001 3100 1992 0 
    1001 3100 1993 1 
    1002 3101 1991 1 
    1002 3101 1992 0 
    1002 3101 1993 0 
    end
    data structure 2: the shock is at state level, shock=1 signify firms in that state experience a shock in that year, note, this is not the coded treat variable yet.
    my unit of analysis is firm year observation
    Code:
    clear
    input int firmid str2 state int year byte shock
    1000 "NY" 1990 0 
    1000 "NY" 1991 1 
    1000 "NY" 1992 0 
    1000 "NY" 1993 0 
    1001 "NY" 1991 1 
    1001 "NY" 1992 0 
    1001 "NY" 1993 0 
    1002 "TN" 1991 0 
    1002 "TN" 1992 0 
    1002 "TN" 1993 0 
    end
    for simplicity i did not include other covariates, we can refer to them as xvars if needed.


    #1. I would like to know if you code the treat dummy differently in these two scenarios ,


    #2. in this discussion https://www.statalist.org/forums/for...reatment-group

    Clyde recommended using

    regress outcome i.treatment_group##i.pre_post other_covariates margins treatment_group#pre_post margins treatment_group, dydx(pre_post)

    question 1: using my data structure 1, 2, would the pre_post dummy be coded differently, for example, if industry 3000 is shocked in 1991, pre_post is set to 1 from 1991 onward, is this a common practice ?

    question 2: do we need control for time fixed effect? if so, do we need a
    Code:
    i.year
    i.year term in the regression ?

    if you could post a couple of lines for executing my described data, that will be greatly appreciated.

    Rochelle

  • #2
    The code shown in #1,
    Code:
    regress outcome i.treatment_group##i.pre_post other_covariates 
    margins treatment_group#pre_post
    margins treatment_group, dydx(pre_post)
    is used for a classical DID analysis. It cannot be applied to either of the two data setups shown because in those data setups the shock occurs at different times to different firms/states. The classical DID analysis can only be applied when there is a single point in time that distinguishes pre- and post-shock in both the treatment and control groups.

    The approach to the examples shown is somewhat different. Instead of a treatment variable and a pre_post variable and their interaction, we define an "under_treatment" variable which is 1 in an observation in which the firm/state has already experienced (and, in the case of shocks whose effects may be transient, is still experiencing) the shock, and 0 otherwise. We then use that variable as a substitute for the interaction term of the classical DID analysis, and we incorporate both firm and time effects.

    To be concrete, let's assume that the shock is expected to produce an abrupt jump in the outcome variable and thereafter the outcome variable remains at the newly established level. (So, a one-time event producing a permanent effect.)

    Code:
    // DATA ORGANIZATION #1
    clear
    input int(firmid sic year) byte shock
    1000 3000 1990 0
    1000 3000 1991 11
    1000 3000 1992 0
    1000 3000 1993 0
    1001 3100 1991 0
    1001 3100 1992 0
    1001 3100 1993 1
    1002 3101 1991 1
    1002 3101 1992 0
    1002 3101 1993 0
    end
    
    by firmid (year), sort: gen under_treatment = sum(shock)
    xtset firmid year
    xtreg outcome i.under_treatment i.year, fe vce(cluster firm)
    Note: the data example shown does not include an outcome variable, so the -xtreg- command will not actually run with this example. Also, the use of cluster-robust VCE assumes that the number of firms is large enough.

    Notice that there are no -margins- commands suggested here. The generalized DID estimate of the treatment effect is given by the coefficient of under_treatment in the -xtreg- output itself. It is, in principle, possible to estimate the pre- and post-shock expected outcomes for each individual firm in this analysis, but you would end up with a separate estimate for each firm that received the treatment, as opposed to an averaged treatment-group estimate.

    In the second data organization, what differs is that the firms are nested in states, and the shock occurs at the state level rather than the firm level. (And, of course, there are annual observations nested within firms as well.) This is where my approach to the data encounters resistance in the finance and econometrics community, because I believe that three-level data should be analyzed using three-level analytic models--which means that we cannot use the fixed-effects model that is normally preferred in those disciplines. Nevertheless, although I don't really like the approach, it is possible to make a two-level analysis with fixed effects here:

    Code:
    //    DATA ORGANIZATION #2
    by firmid (year), sort: gen byte under_treatment = sum(shock)
    
    xtset firmid year
    xtreg outcome i.under_treatment i.year, fe vce(cluster state)
    Again, the use of cluster-robust VCE is contingent on the number of states in the sample being large enough to support that.

    Notice that the code is almost identical to that for the previous data organization.This is because state-level effects are being represented implicitly in the firmid effects and are not explicitly mentioned except for VCE clustering in this approach.

    (Of all the ways to "flatten" three-level data into a two-dimensional analysis, this one, in my opinion, is best, unless part of the goal is to estimate state-level effects.)

    Again, the generalized DID estimate of the effect of the shocks is given by the coefficient of under_treatment in the -xtreg- output.





    Comment


    • #3
      Dear Clyde,

      Thanks for your detailed explanations ! it is very helpful.

      two things:

      1. is it correct to say the coefficient of "under_treatment" is analogous to classical DID 's coefficient of of post*treatment , however, we can't get the coefficient of the Post by itself.

      2. is it possible to do parallel trend test using my data structure? I recall there is balance test in stata

      3, DATA ORGANIZATION #1

      data structure 1: the shock is at industry level, but i did not include multiple firms in the same industry. Anyway, in this case , I think i should modify the vce with sic instead of firm, right?
      Code:
      xtset firmid year
      xtreg outcome i.under_treatment i.year, fe vce(cluster sic)

      Comment


      • #4
        1. is it correct to say the coefficient of "under_treatment" is analogous to classical DID 's coefficient of of post*treatment , however, we can't get the coefficient of the Post by itself.
        Yes.

        2. is it possible to do parallel trend test using my data structure? I recall there is balance test in stata
        In principle, yes. In practice it is messy. It is pretty simple to do parallel trend testing in the period prior to the first shock in the data set: it's really just as in classical DID. But after that, you have to start re-assigning firms from the untreated group to the treatment group to assess it. It gets even messier if there can be multiple shocks to the same firm or if the shock's effect is transient. What is probably your best bet in the situation where different firms experience the shock at different times, but each firm experiences at most one shock and the shock's effects are permanent is to do something like this:

        Code:
        by firmid, sort: egen gets_a_shock = max(shock)
        by firmid, sort: egen shock_year = max(cond(shock, year, .))
        by firmid (year), sort: gen final_year = year[_N]
        replace shock_year = final_year + 1 if missing(shock_year)
        gen years_before_shock = shock_year - year
        keep if shock_year > year
        xtreg outcome i.gets_a_shock##i.years_before_shock, fe
        testparm i.gets_a_shock#i.years_before_shock
        margins gets_a_shock#years_before_shock, noestimcheck
        marginsplot, xdimension(years_before_shock)
        This code will look at each firm up to (but not including) the point where it gets its shock (or the entire observation for firms that never get a shock), and it tests whether the outcomes are following similar patterns at similar numbers of years prior to the shock, and gives you a graph to look at. (I actually think the graph is far more useful than the test. The number of observations in any of the interaction cells is going to typically be much smaller than the entire sample, and these numbers are quite noisy. I think the visual impression you get from the graph is a better way to appraise the adequacy of the parallel trends assumption. This is not a perfect test of parallel trends, because it ignores the fact that the shock year is different for different firms and that these secular trends may themselves be associated with the occurrence of the shock. This kind of confounding can create the spurious appearance of parallel trends when they are really different, or it can fool you in the opposite direction. I don't know of any simple and fully bullet-proof solution here.

        3, DATA ORGANIZATION #1

        data structure 1: the shock is at industry level, but i did not include multiple firms in the same industry. Anyway, in this case , I think i should modify the vce with sic instead of firm, right?
        Well, if there is only one firm per industry, vce(cluster sic) and vce(cluster firm) will be exactly the same thing.

        Comment


        • #5
          Thank you Clyde !!! It helps me a lot.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            It gets even messier if there can be multiple shocks to the same firm or if the shock's effect is transient.
            I am struggling with the situation where one firm can experience multiple shocks over time. Any advice on how to handle such case...thanks!

            Comment


            • #7
              It depends on a lot of things:

              1. How, if at all, does a history of prior shocks modify the effect of a new shock? Is there synergy? Interference? No difference?

              2. Is the effect of a shock persistent for the long haul, or does it terminate at some point. If it terminates, does it do that abruptly, or does it taper over time. If so, what does the decay of effect curve look like?

              3. Is the effect of a shock recognizable immediately thereafter, or does it phase in gradually? If the latter, just how does the effect grow with time?

              4. Does a prior shock make the occurrence of another shock in the future more likely? Less likely? No difference? (Note that this is a very different question than 1.)

              In other words, there is a lot of modeling that has to be decided on before you can even begin to deal with this. Often there isn't even enough information or theory to answer these questions. In that case, it is sometimes better to just analyze only the effect of the first shock for any given firm. (Which means that any firm's data from the second shock on is excluded from analysis.)

              Anyway, these are the things you need to think about. Once you've settled those issues in your mind, if you share the details and some example data (use -dataex-), more concrete advice might be possible.

              Comment


              • #8
                Thanks Clyde! Indeed, it may be easier to just analyze the first shock, but it will be great to have some solution for multiple shocks. I came across this paper by Derrien and Dessaint (2018) - The Effects of Investment Bank Rankings: Evidence from M&A League Tables which analyzes multiple shocks.

                Comment

                Working...
                X