Dear Statalists!
I want to run a difference in difference analysis in Stata, analyzing the impact of a board gender policy introduction on targeted firms. Targeted firms are those included in a particular index (captured by dummy variable inclindex) with 2018 as pre-treatment period and 2020 as post-treatment period for my major analyses.
I have pooled cross-sectional data containing board gender information on firms from 2008 to 2020.
My approach:
(1) Using propensity score matching to form a matched control sample using psmatch2 command with the average genderratio by industry as a covariate:
Further, I deploy pstest to assess the quality of the match
The psmatch2 command generates the following new variables: _pscore _treated _support _weight _genderratio
(2) Test for parallel trends with didregress
Now, I would like to test for parallel trends prior to 2018 by using the matched sample from (1). However, I am puzzled on how to use the new variables to form the matched treatment and control group.
For a non-matched control sample, I would proceed as follows
Visual inspection of parallel trends using
Testing of parallel trends using
But how do I have to adjust the above code to use the propensity score matched sample?
For the code regress, I found the following https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm which suggests using the frequency weights as depicted in the below reported codes for nearest neighbour matched samples:
Simply applying this code to didregress like
does not work as it requires integer values for the _weight variable which is not the case for the weights in caliper matching. Logically, the code results in the error " option [fweight=_weight] not allowed r(198)".
Checking help weight, help psmatch2 and help didregress I could not clarify how to use the results from caliper PSM in diff-in-diff analyses.
Do you have any advice on how to derive the treatment and control group after using psmatch2 with caliper matching and how I can include the matched control group in my didregress code?
Thanks in advance - your help is highly appreciated!
(I am using Stata 18)
I want to run a difference in difference analysis in Stata, analyzing the impact of a board gender policy introduction on targeted firms. Targeted firms are those included in a particular index (captured by dummy variable inclindex) with 2018 as pre-treatment period and 2020 as post-treatment period for my major analyses.
I have pooled cross-sectional data containing board gender information on firms from 2008 to 2020.
Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input double CompanyID float year double genderratio float(inclindex inclindex2020 av_genratio_bysic) byte sic_first_two_digits_numeric
1383 2008 .8890000000000001 1 0 .86325 1
6727 2008 .875 1 0 .86325 1
21032 2008 .8000000000000002 1 0 .86325 1
30201 2008 .8890000000000001 1 0 .86325 1
5507 2008 .8570000000000001 1 0 .857 2
855753 2008 1 0 0 1 7
32407 2008 1 1 0 1 7
29534 2008 1 0 0 1 7
928025 2008 1 0 0 .9632609 10
14555 2008 1 1 0 .9632609 10
896926 2008 1 0 0 .9632609 10
1040083 2008 1 0 0 .9632609 10
7223 2008 1 1 0 .9632609 10
19019 2008 .5 0 0 .9632609 10
641563 2008 1 0 0 .9632609 10
22069 2008 .8330000000000001 1 0 .9632609 10
930452 2008 1 0 0 .9632609 10
13516 2008 1 0 0 .9632609 10
912623 2008 1 0 0 .9632609 10
24744 2008 1 0 0 .9632609 10
1027165 2008 1 0 0 .9632609 10
29267 2008 .8890000000000001 1 0 .9632609 10
1235823 2008 1 0 0 .9632609 10
28009 2008 1 0 0 .9632609 10
1019871 2008 1 0 0 .9632609 10
12493 2008 .9329999999999999 1 0 .9632609 10
3324 2008 1 0 0 .9632609 10
917665 2008 1 1 0 .9632609 10
29256 2008 1 1 0 .9632609 10
20764 2008 1 0 0 .9632609 10
32799 2008 1 0 0 .9632609 10
604915 2008 1 1 0 .9476666 12
21455 2008 1 1 0 .9476666 12
754792 2008 1 1 0 .9476666 12
1005041 2008 1 1 0 .9476666 12
598387 2008 .8890000000000001 1 0 .9476666 12
914699 2008 1 0 0 .9476666 12
1467 2008 .8330000000000001 0 0 .9476666 12
33380 2008 1 1 0 .9476666 12
33052 2008 1 1 0 .9476666 12
19772 2008 .9000000000000001 1 0 .9476666 12
746499 2008 .75 1 0 .9476666 12
566406 2008 1 1 0 .9476666 12
86176 2008 1 1 0 .9623077 13
22049 2008 .8329999999999997 1 0 .9623077 13
2138 2008 .8000000000000002 1 0 .9623077 13
10667 2008 1 1 0 .9623077 13
637778 2008 1 1 0 .9623077 13
24502 2008 1 1 0 .9623077 13
12696 2008 1 1 0 .9623077 13
665249 2008 1 0 0 .9623077 13
13388 2008 .9000000000000001 1 0 .9623077 13
13132 2008 1 1 0 .9623077 13
20318 2008 1 1 0 .9623077 13
9002 2008 1 1 0 .9623077 13
10101 2008 1 0 0 .9623077 13
23730 2008 1 1 0 .9623077 13
467702 2008 1 1 0 .9623077 13
29605 2008 1 1 0 .9623077 13
24355 2008 .769 1 0 .9623077 13
1042184 2008 1 0 0 .9623077 13
20703 2008 1 0 0 .9623077 13
22312 2008 1 1 0 .9623077 13
482852 2008 .9000000000000001 0 0 .9623077 13
1025180 2008 1 0 0 .9623077 13
9634 2008 1 1 0 .9623077 13
11769 2008 .8330000000000001 0 0 .9623077 13
19977 2008 .8570000000000001 1 0 .9623077 13
82989 2008 1 1 0 .9623077 13
26667 2008 .8890000000000001 1 0 .9623077 13
1067306 2008 .8570000000000001 1 0 .9623077 13
17077 2008 .75 0 0 .9623077 13
22899 2008 .9170000000000004 1 0 .9623077 13
2307 2008 .9169999999999999 1 0 .9623077 13
827641 2008 1 1 0 .9623077 13
925876 2008 .8570000000000001 0 0 .9623077 13
5506 2008 1 1 0 .9623077 13
644340 2008 1 0 0 .9623077 13
25756 2008 1 1 0 .9623077 13
23716 2008 .8000000000000002 0 0 .9623077 13
2993 2008 .8330000000000001 1 0 .9623077 13
11364 2008 1 1 0 .9623077 13
938013 2008 .8570000000000001 0 0 .9623077 13
531 2008 1 1 0 .9623077 13
917978 2008 1 0 0 .9623077 13
14212 2008 .9000000000000001 1 0 .9623077 13
141137 2008 1 1 0 .9623077 13
3642 2008 1 1 0 .9623077 13
2520 2008 1 1 0 .9623077 13
780575 2008 1 1 0 .9623077 13
9103 2008 .8890000000000001 1 0 .9623077 13
1006743 2008 1 0 0 .9623077 13
32067 2008 .9169999999999999 1 0 .9623077 13
24266 2008 .8890000000000001 1 0 .9623077 13
6592 2008 .909 1 0 .9623077 13
630809 2008 1 0 0 .9623077 13
23784 2008 1 1 0 .9623077 13
1857 2008 1 0 0 .9623077 13
1025988 2008 1 0 0 .9623077 13
10762 2008 1 0 0 .9623077 13
end
My approach:
(1) Using propensity score matching to form a matched control sample using psmatch2 command with the average genderratio by industry as a covariate:
Code:
psmatch2 inclindex av_genratio_bysic, out(genderratio) radius caliper(0.01)
Code:
pstest av_genratio_bysic
(2) Test for parallel trends with didregress
Now, I would like to test for parallel trends prior to 2018 by using the matched sample from (1). However, I am puzzled on how to use the new variables to form the matched treatment and control group.
For a non-matched control sample, I would proceed as follows
Code:
didregress (genderratio c.sic_first_two_digits_numeric c.numberdirectors) (inclindex2020), group (inclindex) time (year)
Code:
estat trendplots
Code:
estat ptrends
For the code regress, I found the following https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm which suggests using the frequency weights as depicted in the below reported codes for nearest neighbour matched samples:
Code:
psmatch2 t x1 x2, out(y) logit reg y x1 x2 t [fweight=_weight]
Code:
didregress (genderratio c.sic_first_two_digits_numeric c.numberdirectors) (inclindex2020), group (inclindex) time (year) [fweight=_weight]
Checking help weight, help psmatch2 and help didregress I could not clarify how to use the results from caliper PSM in diff-in-diff analyses.
Do you have any advice on how to derive the treatment and control group after using psmatch2 with caliper matching and how I can include the matched control group in my didregress code?
Thanks in advance - your help is highly appreciated!
(I am using Stata 18)

Comment