Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in difference analysis

    I am trying to analyse the impact of competition in US states on banking stability. I measure competition with two variables, the Herfindahl-Hirschman index in each state and the Interstate Branching Index (IBR) developed by Rice and Strahan (2010). This index takes the values between 0 and 4 with 4 indicating that the state is most heavily regulated. The interstate branching index originated when the US government introduced a new banking act that allowed banks to introduce barriers to competition. These barriers were introduced by different states at different points in time creating exogeneous shocks to competition. Moreover, some states revised their barriers during the study period introducing extra variation to competition (i.e. an X state deregulated in 2003 and 2005 causing a decrease in the index value from 4 to 3 and then from 3 to 1, respectively).

    Initially, I adopted a fixed effects panel data approach. My model was based on bank level data and had the following form:

    xtreg Stability_Measure ln_HHI ln_HHISQ IBR control_variables, fe vce(cluster StateCode) ~ errors were clustered at the state level

    However, someone suggested using difference in difference analysis as it allows to make more causal inferences and control for omitted variables. Thus, I created the following model:

    xtreg Stability_Measure ln_HHI ln_HHISQ IBR_0 IBR_1 IBR_2 IBR_3 ib4.IBR#c.ln_HHI control_variables,fe vce(cluster StateCode) ~ errors were clustered at the state level

    IBR_0 IBR_1 IBR_2 IBR_3 are dummies that take the value 1 when a state deregulated and for the period thereafter. If I understand correctly, the final interaction will be the treatment effect (ib4.IBR#c.ln_HHI). I am not sure whether a difference in difference analysis is perfectly applicable because by 2010, all states have removed some of the barriers. Thus, there is no state where the index remained at 4 which might mean I do not have a control group (please correct me if I am wrong).

    My questions are:
    Are these models correct?
    Which approach appears to be more correct from an econometric point of view?
    Is there maybe a third approach that will be better?

    Thank you very much!
    Last edited by Andrzej Zacharjasz; 28 Jul 2018, 06:19.

  • #2
    I don't follow some of the jargon in your post, so I may be misunderstanding what you have done. But my take is that you have an intervention that is initiated at different times in different states. If that's correct, then you cannot use a classical DID analysis. You must instead use a generalized DID analysis, which is set up somewhat differently. I recommend you read

    https://www.ipr.northwestern.edu/wor...s/Day%204.2.pd

    for the general approach.

    Also, I'm not sure what you mean by a variable called ln_HHlSQ here. If it is (log(HHlSQ))^2, it's fine. But if it is log(HHlSQ^2) it is just equal to 2* log(HHl) and it will be omitted by Stata due to colinearity with ln_HHl itself.



    Comment


    • #3
      Hi Clyde,

      Thank you for your quick response.

      That’s right, the intervention occurs at different times in different states. I read the material you posted. If I understand correctly the Generalise DiD approach consists in adding only state and year fixed effects. Am I wrong?
      In that case, my model would simply have the following form:

      xtreg y ln(HHI) (log(HHlSQ))^2 i.IBR i.Year I.State X, fe vce(cluster StateCode)

      where IBR is a discrete index variable with 4 categories depending on how regulated is a state at a given time t.


      Comment


      • #4
        Not quite. You left out the key ingredient. You also need a variable, let's call it effect, that is 1 in every observation where the state has the intervention active in that year, and 0 elsewhere. You have to calculate that variable. Actually since you have four different interventions (or four different types of intervention that you want to consider as distinct) you need effect to take on the value 1 in those states and years where level 1 intervention is in effect, the value 2 in those states and years where level 2 intervention is in effect, similarly for levels 3 and 4. And this same variable is 0 whenever no intervention is in effect in the state. The coefficients of 1.effect, 2.effect, 3.effect, and 4.effect will be your generalized DID estimators for the effects of the four interventions.

        Since I don't have an example of your data, I can't tell you how to calculate it, but it won't be difficult. Also, if your fixed effects are the states, as I assume, then rather than including i.state, just -xtset state- first and then -xtreg, fe- will automatically include the state level fixed effects.

        Also, it is best if you use factor variable notation to include the quadratic term in lnHHl.

        So it will look something like this:
        Code:
        /* code to create effect variable goes here*/
        gen lnHHl = log(HHL)
        xtset state
        xtreg y c.lnHHl##c.lnHHL i.effect i.year, fe vce(cluster state)

        Comment


        • #5
          I think the effect variable you are referring to and my IBR variable are the same thing. If I include both of them, Stata will omit one due to collinearity.

          Actually, my data and most of my variables are at the firm/bank level (xtset Bank Year) . So xtreg automatically includes bank fixed effects. I definitely need to keep the year fixed effects. Should I also include state level fixed effects?

          Thank you for the advice regarding the quadratic term. This is being extremely helpful.

          Comment


          • #6
            OK. I didn't recognize your IBR variable for what it is. That's fine. Yes, you definitely must include state level fixed effects here, and since you are -xtset- at a different grouping, you need to include them explicitly as i.state. Now, if it happens that each bank operates in only one state, then the state effects will be omitted due to colinearity with the bank fixed effect--that's not a problem and don't worry if that happens. Even if the state effects are omitted, the information in them is carried by the bank effects so the model is still a valid generalized DID.

            Comment


            • #7
              It all makes a lot more sense now. Thank you so much!

              Also, since the "interventions" occur at different dates, do I still need to carry tests such as the parallel trends test or the placebo test?

              Comment


              • #8
                A placebo test would be a good idea. As for parallel trends, that's harder (though not impossible) to do when there are multiple start dates, but it is certainly easy to look at parallelism of trends in the era preceding the first start date.

                Comment


                • #9
                  Great, I saw your replies in another post and I think I will be able to show parallelism of trends in the era preceding the first "intervention".

                  As a placebo test, I thought I could either take a lag on the effect variable and show that it's not significant or generate a new random effect variable where interventions occur at random dates. This variable should also be not significant once incorporated into the model specification. Are these tests valid? Do you recommend any method in particular?

                  I think the second method could be more easily adopted by randomly changing the state assigned to each bank.
                  Last edited by Andrzej Zacharjasz; 29 Jul 2018, 09:25.

                  Comment


                  • #10
                    As a placebo test, I thought I could either take a lag on the effect variable and show that it's not significant or generate a new random effect variable where interventions occur at random dates. This variable should also be not significant once incorporated into the model specification. Are these tests valid? Do you recommend any method in particular?

                    I think the second method could be more easily adopted by randomly changing the state assigned to each bank.
                    I think that both are useful. Using a lagged effect variable is probably the more stringent test. One concern with a DID design is that the intervention is itself a consequence of the conditions that we are trying to call the consequence of the intervention. It can be the case that both the "effect" and the intervention take place at the same time, driven by some exogenous but unidentified circumstance. (For example, crime rates rise for some reason and then "tougher" laws are passed soon thereafter. In this situation it is possible, if you are not careful, to mis-identify the tougher laws as the cause of the crime wave instead of the other way around.) The confusion of cause and effect is possible because if you have several rounds of observation both before and after the intervention, the pre-post variable rather crudely models the outcome as an abrupt jump in the outcome variable and misses the fact that the outcome has been drifting towards the end of the pre- period and into the beginning of the post period. By using lag effect variables, you can uncover this, seeing that the outcome actually also changes shortly before the intervention really occurred. Since this tendency of pre-post variables to miss any subtle drift of the outcome is a particular weakness of DID analyses in longitudinal data, this particular type of placebo test is, in my opinion, particularly important.

                    Randomizing the effect variable is another approach, but it is more of a blunt instrument. It is not focused on uncovering a subtle miss of the timing. Rather it just tells you that you won't find these results with just any old effect variable. That's of some value, too, and since it's easy to do I would do it also, but I think it is less important.

                    Comment

                    Working...
                    X