Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching and diff-in-diff with multiple treatment periods

    Dear Statalist community,

    I want to execute a matching based on pscore followed by a Diff-in-diff estimation.

    My data is on the nidividual-level and comprises the years 1998-2010. I would like to study the impact of the displacement of an individual on its earnings 5 years following the displacement year. I have a variable displaced=1 the year of displacement for the displaced individual, treatment_=1 all over the period if the individual belongs to the treatment group and 0 if he belongs to the control group, post=1 in the period post-displacement for all individuals and an interaction variable postxtreatment for the displaced individual in the post displacement period.

    I will run a diff-in-diff regression based on this code :

    xtreg dailyearnings i.treatment_ i.post i.treatment_#i.post i.year, fe
    (I am not quite sure whether I should include the treatment variable since I include a fixed effect, but I will cope with that after solving the matching problem).

    I have multiple treatment periods, that means that an individual can be displaced in any year between 2002 and 2005. So there is no obvious year to use to distinguish pre- and post- for the control group. I will this match treatment-control pairs, based on their probability to be displaced. My problem is the following : I want to match the individuals by cohort. Example with cohort 2005 : A worker displaced in 2005 is matched with and individual who has NOT been displaced in 2005 (based on their 2004 characteristics, so their 2004 probability of being displaced) BUT that who may have been displaced after 2005 (the only restriction imposed is that workers has earnings in at least on of the fixe years after displacement). In this case, it is possible that the treatment individual in this year will become the control individual in another year.

    In order to do so, I have thought about doing the following :
    I take different files for each displacement year (2002, 20003, 2004, 2005). I constitute pair of individuals for each year (displaced - non displaced). Then I merge them with the principal dataset, so I have 5 files for the five cohort (but the same individuals can appear in the different files as control of treated individual). For each file, I create a cohort variable, and the treatment variable. and a new "id" variable composed of the pair-code of the individual concatenated with the cohort number. Then I pool the 5 files. I will have a file containing several time the same individuals.

    Do you thing this is a correct manner to do things ? I do not want to compare my displaced individuals with individuals that will not be displaced all over the period because I think it will introduce a bias. What do you think ?


    Thank you in advance for your support!

    Kind regards,
    Eugenie

  • #2
    First the simple question, whether to include i.treatment in the model along with i.post and i.treatment_post and fixed effects. In one sense it makes no difference. As I think you have in mind, i.treatment will be a fixed attribute of each individual, and so it will be colinear with the fixed effects and Stata will omit it if you try to include it. So you will end up with a model that excludes it either way. But, perhaps surpisingly, I recommend that you include it nevertheless. That's because doing so gives you one more check on your data. If you write the code including i.treatment, Stata will check for colinearity. If there is an error in the i.treatment or individual ID variables, the colinearity may not be found and Stata will include it. So if you see output for i.treatment you will know that you have a problem in your data. If you just write the regression without i.treatment, such a data error would go unnoticed at that point, and would probably show up at some later point in your workflow, probably at the most embarrassing or inconvenient possible time. So I urge you to leave it in as an insurance policy, and check the output to make sure that Stata omits it for you.

    With regard to the matching, I think you are making this too complicated. What exactly are you hoping to accomplish with the matching? Any attributes of individuals that are invariant over time will automatically be adjusted for by the fixed effects. Any time-specific events that apply to all individuals will be automatically adjusted for by the i.year variables. So the only thing left to adjust for by matching are variables that vary both over time and across persons. But to effectively match for that, you will have to find in your data pairs of individuals who exhibit the same, or close to the same, patterns of variation in these variables over time. With 12 years of data, there is a lot of latitude for two people's histories on any variable to diverge substantially. Consequently, I think you will find that this kind of variation will be extremely difficult to find matches for.

    I am going out a bit on a limb here because I don't know exactly what your context is, and if I knew it might be outside my area of expertise. But from general principles, I could say that it would be an unusual situation that would permit you to successfully match many individuals in this way. Yes propensity score matching allows some laxity, but my instinct is that the degree of matching you would create with propensity scoring would be fairly low.

    In the past I have, myself, argued on this Forum for doing some matching on exposure time before embarking on a DID analysis. But more recently I have been pointed to generalized DID analysis (see https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf), which demonstrates that matching is not really needed here either.

    Comment


    • #3
      Thank you very much for the answer !

      Indeed, I was thinking that as i.treatment is a fixed attribute of each individual, it would be colinear with the fixed effects. I will add it anyway as you suggested, and check if the variable is omitted. Thank you for the advice.

      Regarding the matching, your link is not available, so i can not check it. If you have a link that is working, I would be very interested in reading this article. I have thought matching could be useful in this context to control for variables varying over time, mainly variation of earnings.
      ..
      To me, the matching method was essential in order to be able to add multiple treatment periods. As you mention in this post that I found today, and confirms my intuition : https://www.statalist.org/forums/for...atment-periods

      "In order to do a DID analysis you must have both pre- and post- observations in both the treatment and control groups. In the classic DID analysis this is simple: there is a single start time (date, year, whatever) at which treatment begins for everyone in the treatment group. That same start time then defines the post variable for all observations: -gen post = time > start_time-.
      Your situation is more complicated. The earthquakes that occurred in the treatment cities would have, I imagine, occurred in various different years. So there is no obvious year to use to distinguish pre- and post- for the control group. The best solution here would be to form matched treatment-control pairs. You should try to match them on variables you have that are predictive of your popdensity outcome. "


      If I want to add individuals displaced in different years over the period, I should have matched pairs of control and treatment. otherwise, I would not know when to define the pre and the post treatment period for each "control" individual. Am I correct ?

      The only solution I see is: to construct my cohort file and define a pre and post period within the file (for the 2005 cohort it will be before 2005 for the pre period and starting from 2005 for the post period), without matching individuals. After, I can pool all cohort files. But again, I will have the same individuals several times with different pre and post periods. I am not sure whether it would work in a regression.

      What do you think about this ?

      Comment


      • #4
        Summarizing my last comment, using matching based on propensity score, I want to :

        (1) Reduce the dependence of the treatment variable on worker characteristics. Displacement is assumed as being uncorrelated with workers characteristics (performance and career aspirations) because we only take displacement after firm closure. That is to say, the separation after firm closure can be considered exogenous from a worker's perspective. self-selecting into career changes as long as displacement is uncorrelated with worker characteristics. Using matching based on pre displacement co variates enables to avoid introducing selection biases. By including lags of predisplacement wages and the logarithm of wage growth between 5 and 2 years before the displacement event, I can capture a worker's predisplacement wage curve., establishing counterfactual careers for displaced workers. To my knowledge, that is not possible without matching.

        (2) Add different treatment periods : If I want to add individuals displaced in different years over the period, I should have matched pairs of control and treatment. otherwise, I would not know when to define the pre and the post treatment period for each "control" individual.

        The core of my problem lies in including individuals that can be displaced in t+1 as" control " individuals in t for displaced individuals in t, and how to apply it to my data.

        I hope it sounds clearer now !

        Thank you !

        Comment


        • #5
          Hmm, that link isn't work now. But this one, https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf, is.

          Moving on to what you say in #4, your reasoning in 1) strikes me as correct. But bear in mind that "me" is, in this case, a lay person. I have no expertise in labor economics (or any economics, for that matter), and it may be that somebody who is knowledgeable in that area would find this approach to modeling a counterfactual career unsatisfactory. For advice on that you really need to consult the literature to see if others have used this approach, or speak with a labor economist. More specifically, understanding whether a person separated in year t+1 is a suitable control for one separated in year t is a judgment that can only be made by somebody who knows something about labor economics. I won't venture an opinion on that.

          With regard to (2), I suggest you read the material in the presentation linked at the top of this response. (Sorry the link from yesterday doesn't work--I don't know what went wrong there.) You will see that the material at the link contradicts what I said, and you quoted, in #3. You do not need matching to do this. It is not a classical DID analysis (whether you match or not), but even without the matching you still have identification of the treatment effect.

          It may be that with matching you will gain precision by providing counterfactual careers and thereby reducing residual variance, but I haven't worked through the math to know if that is actually true. In any case, such a gain would be in proportion to the quality of the match itself.

          In terms of implementing your match in Stata, it seems like you need to pair each person in the treatment group with somebody who either is never in it, or joins it only in a later year (perhaps with some threshold lag of even more than just 1 year). I think you will find Robert Picard's -rangejoin- function very helpful for that. You can get it from SSC by running -ssc install rangejoin-. That program, in turn, calls Robert Picard, Nick Cox and Roberto Ferrer's -rangestat-, also available at SSC. You will not need to partition your data set into subsets by year. -rangejoin- will enable you to directly link each observation in the data with all other observations whose entry into treatment status is at least # years beyond the case's entry. You can then select specific matched pairs by imposing additional conditions on these pairings using -keep- and -drop- commands, and, finally, if necessary, reduce to a single match per case by random sampling from the pairings that survive.



          Comment


          • #6
            Dear Clyde,
            Could you please provide me with the presentation? both links are not working.

            Thank you
            Maye

            Comment


            • #7
              Dear Clyde,
              This tread is almost 5 years old, but could you please share the presentations you provided? The links are not working.
              Thank you
              Eben

              Comment


              • #8
                I'm sorry, but I do not have the presentation that was at that link, and I don't know where it might now be on the internet, if anywhere.

                Comment

                Working...
                X