Dear Statalists!
I want to run a difference in difference analysis in Stata, analyzing the impact of a board gender policy introduction on targeted firms. Targeted firms are those included in a particular index (captured by dummy variable inclindex) with 2018 as pre-treatment period and 2020 as post-treatment period for my major analyses.
I have pooled cross-sectional data containing board gender information on firms from 2008 to 2020.
My approach:
(1) Using propensity score matching to form a matched control sample using psmatch2 command with the average genderratio by industry as a covariate:
Further, I deploy pstest to assess the quality of the match
The psmatch2 command generates the following new variables: _pscore _treated _support _weight _genderratio
(2) Test for parallel trends with didregress
Now, I would like to test for parallel trends prior to 2018 by using the matched sample from (1). However, I am puzzled on how to use the new variables to form the matched treatment and control group.
For a non-matched control sample, I would proceed as follows
Visual inspection of parallel trends using
Testing of parallel trends using
But how do I have to adjust the above code to use the propensity score matched sample?
For the code regress, I found the following https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm which suggests using the frequency weights as depicted in the below reported codes for nearest neighbour matched samples:
Simply applying this code to didregress like
does not work as it requires integer values for the _weight variable which is not the case for the weights in caliper matching. Logically, the code results in the error " option [fweight=_weight] not allowed r(198)".
Checking help weight, help psmatch2 and help didregress I could not clarify how to use the results from caliper PSM in diff-in-diff analyses.
Do you have any advice on how to derive the treatment and control group after using psmatch2 with caliper matching and how I can include the matched control group in my didregress code?
Thanks in advance - your help is highly appreciated!
(I am using Stata 18)
I want to run a difference in difference analysis in Stata, analyzing the impact of a board gender policy introduction on targeted firms. Targeted firms are those included in a particular index (captured by dummy variable inclindex) with 2018 as pre-treatment period and 2020 as post-treatment period for my major analyses.
I have pooled cross-sectional data containing board gender information on firms from 2008 to 2020.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input double CompanyID float year double genderratio float(inclindex inclindex2020 av_genratio_bysic) byte sic_first_two_digits_numeric 1383 2008 .8890000000000001 1 0 .86325 1 6727 2008 .875 1 0 .86325 1 21032 2008 .8000000000000002 1 0 .86325 1 30201 2008 .8890000000000001 1 0 .86325 1 5507 2008 .8570000000000001 1 0 .857 2 855753 2008 1 0 0 1 7 32407 2008 1 1 0 1 7 29534 2008 1 0 0 1 7 928025 2008 1 0 0 .9632609 10 14555 2008 1 1 0 .9632609 10 896926 2008 1 0 0 .9632609 10 1040083 2008 1 0 0 .9632609 10 7223 2008 1 1 0 .9632609 10 19019 2008 .5 0 0 .9632609 10 641563 2008 1 0 0 .9632609 10 22069 2008 .8330000000000001 1 0 .9632609 10 930452 2008 1 0 0 .9632609 10 13516 2008 1 0 0 .9632609 10 912623 2008 1 0 0 .9632609 10 24744 2008 1 0 0 .9632609 10 1027165 2008 1 0 0 .9632609 10 29267 2008 .8890000000000001 1 0 .9632609 10 1235823 2008 1 0 0 .9632609 10 28009 2008 1 0 0 .9632609 10 1019871 2008 1 0 0 .9632609 10 12493 2008 .9329999999999999 1 0 .9632609 10 3324 2008 1 0 0 .9632609 10 917665 2008 1 1 0 .9632609 10 29256 2008 1 1 0 .9632609 10 20764 2008 1 0 0 .9632609 10 32799 2008 1 0 0 .9632609 10 604915 2008 1 1 0 .9476666 12 21455 2008 1 1 0 .9476666 12 754792 2008 1 1 0 .9476666 12 1005041 2008 1 1 0 .9476666 12 598387 2008 .8890000000000001 1 0 .9476666 12 914699 2008 1 0 0 .9476666 12 1467 2008 .8330000000000001 0 0 .9476666 12 33380 2008 1 1 0 .9476666 12 33052 2008 1 1 0 .9476666 12 19772 2008 .9000000000000001 1 0 .9476666 12 746499 2008 .75 1 0 .9476666 12 566406 2008 1 1 0 .9476666 12 86176 2008 1 1 0 .9623077 13 22049 2008 .8329999999999997 1 0 .9623077 13 2138 2008 .8000000000000002 1 0 .9623077 13 10667 2008 1 1 0 .9623077 13 637778 2008 1 1 0 .9623077 13 24502 2008 1 1 0 .9623077 13 12696 2008 1 1 0 .9623077 13 665249 2008 1 0 0 .9623077 13 13388 2008 .9000000000000001 1 0 .9623077 13 13132 2008 1 1 0 .9623077 13 20318 2008 1 1 0 .9623077 13 9002 2008 1 1 0 .9623077 13 10101 2008 1 0 0 .9623077 13 23730 2008 1 1 0 .9623077 13 467702 2008 1 1 0 .9623077 13 29605 2008 1 1 0 .9623077 13 24355 2008 .769 1 0 .9623077 13 1042184 2008 1 0 0 .9623077 13 20703 2008 1 0 0 .9623077 13 22312 2008 1 1 0 .9623077 13 482852 2008 .9000000000000001 0 0 .9623077 13 1025180 2008 1 0 0 .9623077 13 9634 2008 1 1 0 .9623077 13 11769 2008 .8330000000000001 0 0 .9623077 13 19977 2008 .8570000000000001 1 0 .9623077 13 82989 2008 1 1 0 .9623077 13 26667 2008 .8890000000000001 1 0 .9623077 13 1067306 2008 .8570000000000001 1 0 .9623077 13 17077 2008 .75 0 0 .9623077 13 22899 2008 .9170000000000004 1 0 .9623077 13 2307 2008 .9169999999999999 1 0 .9623077 13 827641 2008 1 1 0 .9623077 13 925876 2008 .8570000000000001 0 0 .9623077 13 5506 2008 1 1 0 .9623077 13 644340 2008 1 0 0 .9623077 13 25756 2008 1 1 0 .9623077 13 23716 2008 .8000000000000002 0 0 .9623077 13 2993 2008 .8330000000000001 1 0 .9623077 13 11364 2008 1 1 0 .9623077 13 938013 2008 .8570000000000001 0 0 .9623077 13 531 2008 1 1 0 .9623077 13 917978 2008 1 0 0 .9623077 13 14212 2008 .9000000000000001 1 0 .9623077 13 141137 2008 1 1 0 .9623077 13 3642 2008 1 1 0 .9623077 13 2520 2008 1 1 0 .9623077 13 780575 2008 1 1 0 .9623077 13 9103 2008 .8890000000000001 1 0 .9623077 13 1006743 2008 1 0 0 .9623077 13 32067 2008 .9169999999999999 1 0 .9623077 13 24266 2008 .8890000000000001 1 0 .9623077 13 6592 2008 .909 1 0 .9623077 13 630809 2008 1 0 0 .9623077 13 23784 2008 1 1 0 .9623077 13 1857 2008 1 0 0 .9623077 13 1025988 2008 1 0 0 .9623077 13 10762 2008 1 0 0 .9623077 13 end
My approach:
(1) Using propensity score matching to form a matched control sample using psmatch2 command with the average genderratio by industry as a covariate:
Code:
psmatch2 inclindex av_genratio_bysic, out(genderratio) radius caliper(0.01)
Code:
pstest av_genratio_bysic
(2) Test for parallel trends with didregress
Now, I would like to test for parallel trends prior to 2018 by using the matched sample from (1). However, I am puzzled on how to use the new variables to form the matched treatment and control group.
For a non-matched control sample, I would proceed as follows
Code:
didregress (genderratio c.sic_first_two_digits_numeric c.numberdirectors) (inclindex2020), group (inclindex) time (year)
Code:
estat trendplots
Code:
estat ptrends
For the code regress, I found the following https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm which suggests using the frequency weights as depicted in the below reported codes for nearest neighbour matched samples:
Code:
psmatch2 t x1 x2, out(y) logit reg y x1 x2 t [fweight=_weight]
Code:
didregress (genderratio c.sic_first_two_digits_numeric c.numberdirectors) (inclindex2020), group (inclindex) time (year) [fweight=_weight]
Checking help weight, help psmatch2 and help didregress I could not clarify how to use the results from caliper PSM in diff-in-diff analyses.
Do you have any advice on how to derive the treatment and control group after using psmatch2 with caliper matching and how I can include the matched control group in my didregress code?
Thanks in advance - your help is highly appreciated!
(I am using Stata 18)
Comment