Hey, My recent project use a "PSM+DID" empirical design and my dataset is longitudinal. The panel data structure give me some strength in the empirical identification, but also cause some trouble in my data management practice,especially, how to implement PSM on panel data correctly. A template for me is Heyman et al. (JIE,2007), in that paper, they implement a year-by-year psm on "whether a firm is foregin obtained". I rely on the popular user-written command --psmatch2--. Following is a snippet of my code
What I want is to obtain the _treated (indicating treatment and control group), _weight(indicating whether the obs is used for match) and obtain the year when the treatment happened. The tricky issue is , in each loop, psmatch2 "refresh" these _variables (_treated _weight), so it's necessary to record them in a NEW variable, that's exactly what I did.
what worries me is , after the code was executed and sent me a series treatment variables, namely, treatment_1 - treatment_6. For I have specified the noreplacement option, it's more likely that each panel units (in my case, nfid) are used only once, occationally towice. However, the generated matched sample is like
The results worries me since many units are used as matched sample in "EVERY YEAR". It's interesting because they should be not. I was supposed to obtain something like
I don't know if there is something wrong in my code. So please help me to check my code and figure out what's going on. Thank you
Code:
**group by years egen g = group(year) levels g, local(gr) * Note that in each loop, psmatch2 replace its _variables (_treated _weight) * So it's necessary to record them in a NEW variable foreach j of local gr { cap noi psmatch2 bigchangetag $x $high_order $xv if g==`j', n(1) logit qui common noreplacement ** Collect the treated year by nfid (treatment),sort: gen temp = (_treated==1) by nfid: egen num_treated = total(temp) by nfid (temp) ,sort: replace treat_year1 = year[_N] if treat_year1==.&num_treated==1 drop temp num_treated ** Collect the (untreated) match year by nfid (treatment),sort: gen temp = (_treated==0) by nfid: egen num_treated = total(temp) by nfid (temp) ,sort: replace treat_year2 = year[_N] if treat_year2==.&num_treated==1 drop temp num_treated replace treatment =_treated if treatment==. replace pairs =_id if pairs==. replace matched =_weight if matched==. tab year _treated if matched==1 }
what worries me is , after the code was executed and sent me a series treatment variables, namely, treatment_1 - treatment_6. For I have specified the noreplacement option, it's more likely that each panel units (in my case, nfid) are used only once, occationally towice. However, the generated matched sample is like
Code:
nfid year treatment pairs(_id) treat_year
161 2000 0 1036589 1999
161 2001 0 1029618 1999
161 2002 0 1050054 1999
161 2003 . 1010596 1999
164 1998 . 1695000 1999
164 1999 0 80890 1999
164 2000 0 879366 1999
164 2001 0 781947 1999
164 2002 0 785361 1999
164 2003 . 957154 1999
169 2003 . 1681113 2004
169 2004 0 1053593 2004
171 1998 . 1548697 2000
171 1999 0 102531 2000
171 2000 1 952717 2000
171 2001 0 889980 2000
171 2002 0 848699 2000
171 2003 . 882134 2000
171 2004 0 552188 2000
173 1998 . 1674613 1999
173 1999 0 79626 1999
176 1998 . 1995491 1999
176 1999 0 40486 1999
176 2000 0 405328 1999
179 1998 . 1963984 1999
179 1999 0 55616 1999
179 2000 0 515677 1999
179 2001 0 622819 1999
179 2002 0 610122 1999
179 2003 . 597080 1999
179 2004 0 446528 1999
188 1998 . 1900139 1999
188 1999 1 117730 1999
188 2000 0 916699 1999
188 2001 0 1148184 1999
188 2002 0 1174944 1999
188 2003 . 1321465 1999
193 1998 . 1544959 1999
193 1999 1 117069 1999
193 2000 0 669927 1999
193 2001 0 907363 1999
193 2002 0 842696 1999
193 2003 . 730017 1999
Code:
nfid year treatment pairs(_id) treat_year 188 1998 . 1900139 1999 188 1999 1 117730 1999 188 2000 . 916699 1999 188 2001 . 1148184 1999 188 2002 . 1174944 1999
Comment