Dear Statalist community,
I want to execute a matching based on pscore followed by a Diff-in-diff estimation.
My data is on the nidividual-level and comprises the years 1998-2010. I would like to study the impact of the displacement of an individual on its earnings 5 years following the displacement year. I have a variable displaced=1 the year of displacement for the displaced individual, treatment_=1 all over the period if the individual belongs to the treatment group and 0 if he belongs to the control group, post=1 in the period post-displacement for all individuals and an interaction variable postxtreatment for the displaced individual in the post displacement period.
I will run a diff-in-diff regression based on this code :
xtreg dailyearnings i.treatment_ i.post i.treatment_#i.post i.year, fe
(I am not quite sure whether I should include the treatment variable since I include a fixed effect, but I will cope with that after solving the matching problem).
I have multiple treatment periods, that means that an individual can be displaced in any year between 2002 and 2005. So there is no obvious year to use to distinguish pre- and post- for the control group. I will this match treatment-control pairs, based on their probability to be displaced. My problem is the following : I want to match the individuals by cohort. Example with cohort 2005 : A worker displaced in 2005 is matched with and individual who has NOT been displaced in 2005 (based on their 2004 characteristics, so their 2004 probability of being displaced) BUT that who may have been displaced after 2005 (the only restriction imposed is that workers has earnings in at least on of the fixe years after displacement). In this case, it is possible that the treatment individual in this year will become the control individual in another year.
In order to do so, I have thought about doing the following :
I take different files for each displacement year (2002, 20003, 2004, 2005). I constitute pair of individuals for each year (displaced - non displaced). Then I merge them with the principal dataset, so I have 5 files for the five cohort (but the same individuals can appear in the different files as control of treated individual). For each file, I create a cohort variable, and the treatment variable. and a new "id" variable composed of the pair-code of the individual concatenated with the cohort number. Then I pool the 5 files. I will have a file containing several time the same individuals.
Do you thing this is a correct manner to do things ? I do not want to compare my displaced individuals with individuals that will not be displaced all over the period because I think it will introduce a bias. What do you think ?
Thank you in advance for your support!
Kind regards,
Eugenie
I want to execute a matching based on pscore followed by a Diff-in-diff estimation.
My data is on the nidividual-level and comprises the years 1998-2010. I would like to study the impact of the displacement of an individual on its earnings 5 years following the displacement year. I have a variable displaced=1 the year of displacement for the displaced individual, treatment_=1 all over the period if the individual belongs to the treatment group and 0 if he belongs to the control group, post=1 in the period post-displacement for all individuals and an interaction variable postxtreatment for the displaced individual in the post displacement period.
I will run a diff-in-diff regression based on this code :
xtreg dailyearnings i.treatment_ i.post i.treatment_#i.post i.year, fe
(I am not quite sure whether I should include the treatment variable since I include a fixed effect, but I will cope with that after solving the matching problem).
I have multiple treatment periods, that means that an individual can be displaced in any year between 2002 and 2005. So there is no obvious year to use to distinguish pre- and post- for the control group. I will this match treatment-control pairs, based on their probability to be displaced. My problem is the following : I want to match the individuals by cohort. Example with cohort 2005 : A worker displaced in 2005 is matched with and individual who has NOT been displaced in 2005 (based on their 2004 characteristics, so their 2004 probability of being displaced) BUT that who may have been displaced after 2005 (the only restriction imposed is that workers has earnings in at least on of the fixe years after displacement). In this case, it is possible that the treatment individual in this year will become the control individual in another year.
In order to do so, I have thought about doing the following :
I take different files for each displacement year (2002, 20003, 2004, 2005). I constitute pair of individuals for each year (displaced - non displaced). Then I merge them with the principal dataset, so I have 5 files for the five cohort (but the same individuals can appear in the different files as control of treated individual). For each file, I create a cohort variable, and the treatment variable. and a new "id" variable composed of the pair-code of the individual concatenated with the cohort number. Then I pool the 5 files. I will have a file containing several time the same individuals.
Do you thing this is a correct manner to do things ? I do not want to compare my displaced individuals with individuals that will not be displaced all over the period because I think it will introduce a bias. What do you think ?
Thank you in advance for your support!
Kind regards,
Eugenie
Comment