Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-difference estimation: multiple pre/post-periods, lags, state-specific time trends, treatment intensity

    Dear Statalist community,

    I want to execute a Diff-in-diff estimation.

    My data is on the state-level and comprises the years 2005-2013.
    State-year-level observations;
    state:= 1,..,16;
    year:= 2005,..,2013.


    My treatment:
    2 of 16 states implemented a policy:
    state_1 in 2009, state_2 in 2010.
    I expect this policy change (which is in place starting from the time of implementation) to affect the outcome variable not in the year of implementation, but with a two-year lag, three-year lag, .., .

    1. Question)
    Supposed the treatment-effect is treated as equal for both states, I model:
    Code:
    gen D=0
    replace D=1 if state==1 & year==2009
    replace D=1 if state==2 & year==2010
    
    reg outcome_ij i.state i.year l2.D_ij l3.D_ij l4.D_ij, vce(cluster state)
    Is this specification correct?

    If I excluded the terms "l3.D_ij l4.D_ij":
    Would Stata assume in 2012/2013 that the outcome of state_1 would return to its state- and year-fixed effect level and expect no treatment effect?

    2. Question)
    How could I include state-specific time trends instead of assuming that they are independent of the state?
    Code:
    reg outcome_ij i.state i.year state#year l2.D_ij l3.D_ij l4.D_ij, vce(cluster state)
    Is this specification correct?

    3. Question)
    Some of my covariates are distorted after 2010, some after 2011.
    Is it possible to include them even if they only cover the pre-treatment period (like in the Synthetic Control Method approach)?


    4. Question)
    Public Health Insurance (share): Variable of treatment intensity OR important covariate?

    The treatment only affects individuals that are publicly insured.
    I do not expect the share of public HI to directly affect the outcome.
    But the share is expected to affect the treatment effect.

    Some papers interact the share of individuals with public HI with the treatment indicator.
    But, one author states that "controlling for .. health insurance coverage (as a predictor of a closely related outcome) is important .. given that uninsured individuals are not directly affected by the mandates".
    Is it enough to interact share of publicly HI with my treatment indicator or should I even include it as a covariate?
    I think: If the share is exceptionally high in the two treated states, the treatment effect could be inflated.
    How would I include the interaction term in my specification?


    Thank you very much for your support/help in advance!

    Kind regards,
    Mischa

  • #2
    This is a bit out of the ordinary difference-in-differences (DID) framework. Let's consider some of the issues. For a standard DID analysis, there are two groups, the states that adopted the policy and those that did not. Moreover, in a standard DID analysis, the states that adopt the policy all do so at the same time. The analysis then relies primarily on two variables: group (adopter vs non_adopter) and era (pre-adoption vs post-adoption) and their interaction. In your situation, it is not possible to define the era variable because the two states that adopted the policy did so in different years. This is a difficulty, but not an insurmountable one. The usual approach is to try to match the non-adopter states with the adopter states in some way that makes scientific sense, and then impute to each state a "would have adopted" date corresponding to which adopter they match with. So in your case, you would need to partition your 14 non-adopter states into those for whom you will define the post-adoption era to begin in 2009, and those for whom it will begin in 2010. This assignment would be based on which of the two adopter states each state resembles in terms of whatever variables are relevant here. (Relevance, in this context, is a scientific, not statistical, question.) Then you would define the era variable for each state according to that state's adoption (or would-be adoption) year. If there is no scientifically sensible way to do this kind of matching, the next best approach is to do the assignment at random. Once you have done that, you code the era variable to be 1 in all years after adoption and 0 in all years before. (The year of adoption itself might be coded as 1 or 0 depending on whether you expect its effects to be felt in the same year or not. Based on what you say in your post, I suppose 0 would be more sensible for this project. In fact, given what you say about a lag of 2 or 3 years, you might actually want to code era as 1 only in those years that are 2 years beyond the actual or would-be adoption date.)

    Note that for coding purposes, to make a variable like era, the code would look rather different from what you have proposed. Something like this:
    Code:
    gen byte era = (year >= 2010) // IF ADOPTION YEAR WERE ALWAYS 2010
    
    // OR
    
    gen byte era = (year > adoption_year) // WHERE adoption_year IS A VARIABLE
    // IDENTIFYING THE ACTUAL OR WOULD-BE ADOPTION YEAR OF THE STATE
    Similarly you need a variable, call it group, that is set to 1 (in every year) for the states that adopt, and 0 for the other states. So, if, for example, states 1 and 2 are the adopter states, it goes:

    Code:
    gen byte group = inlist(state, 1, 2)
    Then the DID analysis goes like this:
    Code:
    regress outcome i.group##i.era // AND OPTIONALLY INCLUDE RELEVANT COVARIATES
    Note that there are no _ij "subscripts" in the Stata code. Note also that you do not regress on the individual state indicators here--it is the group (adopter vs non-adopter) that is central to the analysis.

    That said, the observations are not independent because they are clustered in states. So it is appropriate to adjust for that. For that reason, rather than using -regress-, I would do this:

    Code:
    xtset state year
    xtreg outcome i.group##i.era, fe // AND OPTIONAL COVARIATES
    Now, I could go farther down this road and discuss more of the specifics. But I foresee a large problem with your analysis. You have only two adopting states, and even assuming that the effects of the intervention were felt starting in the adopting year and lasting throughout the rest of the time span of the data, that gives you a total of 9 observations in the adopted condition (adopting state, at or after year of adoption). That is very sparse data for estimating the effect of the intervention. It is even sparser when you consider that of those 9 potential degrees of freedom, 2 are lost to the state-level fixed effects, bringing you down to 7. Unless the effect of this intervention is massive, you have little hope of distinguishing its signal from noise with so few observations on it. If you start adding in covariates such as time trends, you will probably be even worse off in this respect. So I would urge you to reconsider what you are doing here and see if there isn't the possibility of finding more adopting states and getting data on them. If there isn't, you will probably end up with inconclusive results. Not every question can be answered.

    I don't know what you mean when you say that some of your covariates are "distorted" in certain years. Do you mean that you consider those values to be data errors? If so, the only solution to that problem is to develop a reasonable scheme to try to replace the data errors with correct values. That will, in general, require either finding other sources of information for those data values, or a good understanding the mechanism that "distorts" those variables so that you can calculate in a "correction." Or, if the "distortion" mechanism operates randomly, you could consider treating the erroneous values as missing and then doing multiple imputation, or just a robustness analysis looking at different possible scenarios about what the correct values might be.

    Whether the prevalence of health insurance affects the outcome directly, or does so only by modifying the effect of the intervention is a substantive scientific question. But from the perspective of how you would handle it in your model, if you believe that there may be effect modification, then you need to have an interaction between health insurance and intervention effect (which in a DID model is itself the group#era interaction) in your model to capture that. But the inclusion of the interaction term implies that the "main effect" of health insurance must also be included so that the interaction of health insurance with group#era will itself be correctly interpretable. So either way, you end up including health insurance as a covariate in the model, either explicitly by itself, or implicitly as part of health_insurance##group##era.

    I realize that this discussion does not completely and specifically answer all of your questions. I hope it has given you a start in the right direction. But before you get too far "into the weeds," do reconsider whether you have enough data here to even pursue the project.





    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      This is a bit out of the ordinary difference-in-differences (DID) framework. Let's consider some of the issues. For a standard DID analysis, there are two groups, the states that adopted the policy and those that did not. Moreover, in a standard DID analysis, the states that adopt the policy all do so at the same time. The analysis then relies primarily on two variables: group (adopter vs non_adopter) and era (pre-adoption vs post-adoption) and their interaction. In your situation, it is not possible to define the era variable because the two states that adopted the policy did so in different years. This is a difficulty, but not an insurmountable one. The usual approach is to try to match the non-adopter states with the adopter states in some way that makes scientific sense, and then impute to each state a "would have adopted" date corresponding to which adopter they match with. So in your case, you would need to partition your 14 non-adopter states into those for whom you will define the post-adoption era to begin in 2009, and those for whom it will begin in 2010. This assignment would be based on which of the two adopter states each state resembles in terms of whatever variables are relevant here. (Relevance, in this context, is a scientific, not statistical, question.) Then you would define the era variable for each state according to that state's adoption (or would-be adoption) year. If there is no scientifically sensible way to do this kind of matching, the next best approach is to do the assignment at random. Once you have done that, you code the era variable to be 1 in all years after adoption and 0 in all years before. (The year of adoption itself might be coded as 1 or 0 depending on whether you expect its effects to be felt in the same year or not. Based on what you say in your post, I suppose 0 would be more sensible for this project. In fact, given what you say about a lag of 2 or 3 years, you might actually want to code era as 1 only in those years that are 2 years beyond the actual or would-be adoption date.)
      What do you mean by "scientific sense"?

      Comment


      • #4
        Dear Clyde,

        I know it is a very late response - but I still wanted to say thank you for the comprehensive feedback you gave me back then!
        In the end, I went for the two-way fixed effects DID (with state and year fixed effects).

        Best,
        Mischa

        Comment

        Working...
        X