Using difference-in-differences for a propensity score matched sample with caliper matching

Maite Jander

Join Date: May 2024
Posts: 2

Using difference-in-differences for a propensity score matched sample with caliper matching

13 May 2024, 04:52

Dear Statalists!

I want to run a difference in difference analysis in Stata, analyzing the impact of a board gender policy introduction on targeted firms. Targeted firms are those included in a particular index (captured by dummy variable inclindex) with 2018 as pre-treatment period and 2020 as post-treatment period for my major analyses.
I have pooled cross-sectional data containing board gender information on firms from 2008 to 2020.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double CompanyID float year double genderratio float(inclindex inclindex2020 av_genratio_bysic) byte sic_first_two_digits_numeric
   1383 2008 .8890000000000001 1 0   .86325  1
   6727 2008              .875 1 0   .86325  1
  21032 2008 .8000000000000002 1 0   .86325  1
  30201 2008 .8890000000000001 1 0   .86325  1
   5507 2008 .8570000000000001 1 0     .857  2
 855753 2008                 1 0 0        1  7
  32407 2008                 1 1 0        1  7
  29534 2008                 1 0 0        1  7
 928025 2008                 1 0 0 .9632609 10
  14555 2008                 1 1 0 .9632609 10
 896926 2008                 1 0 0 .9632609 10
1040083 2008                 1 0 0 .9632609 10
   7223 2008                 1 1 0 .9632609 10
  19019 2008                .5 0 0 .9632609 10
 641563 2008                 1 0 0 .9632609 10
  22069 2008 .8330000000000001 1 0 .9632609 10
 930452 2008                 1 0 0 .9632609 10
  13516 2008                 1 0 0 .9632609 10
 912623 2008                 1 0 0 .9632609 10
  24744 2008                 1 0 0 .9632609 10
1027165 2008                 1 0 0 .9632609 10
  29267 2008 .8890000000000001 1 0 .9632609 10
1235823 2008                 1 0 0 .9632609 10
  28009 2008                 1 0 0 .9632609 10
1019871 2008                 1 0 0 .9632609 10
  12493 2008 .9329999999999999 1 0 .9632609 10
   3324 2008                 1 0 0 .9632609 10
 917665 2008                 1 1 0 .9632609 10
  29256 2008                 1 1 0 .9632609 10
  20764 2008                 1 0 0 .9632609 10
  32799 2008                 1 0 0 .9632609 10
 604915 2008                 1 1 0 .9476666 12
  21455 2008                 1 1 0 .9476666 12
 754792 2008                 1 1 0 .9476666 12
1005041 2008                 1 1 0 .9476666 12
 598387 2008 .8890000000000001 1 0 .9476666 12
 914699 2008                 1 0 0 .9476666 12
   1467 2008 .8330000000000001 0 0 .9476666 12
  33380 2008                 1 1 0 .9476666 12
  33052 2008                 1 1 0 .9476666 12
  19772 2008 .9000000000000001 1 0 .9476666 12
 746499 2008               .75 1 0 .9476666 12
 566406 2008                 1 1 0 .9476666 12
  86176 2008                 1 1 0 .9623077 13
  22049 2008 .8329999999999997 1 0 .9623077 13
   2138 2008 .8000000000000002 1 0 .9623077 13
  10667 2008                 1 1 0 .9623077 13
 637778 2008                 1 1 0 .9623077 13
  24502 2008                 1 1 0 .9623077 13
  12696 2008                 1 1 0 .9623077 13
 665249 2008                 1 0 0 .9623077 13
  13388 2008 .9000000000000001 1 0 .9623077 13
  13132 2008                 1 1 0 .9623077 13
  20318 2008                 1 1 0 .9623077 13
   9002 2008                 1 1 0 .9623077 13
  10101 2008                 1 0 0 .9623077 13
  23730 2008                 1 1 0 .9623077 13
 467702 2008                 1 1 0 .9623077 13
  29605 2008                 1 1 0 .9623077 13
  24355 2008              .769 1 0 .9623077 13
1042184 2008                 1 0 0 .9623077 13
  20703 2008                 1 0 0 .9623077 13
  22312 2008                 1 1 0 .9623077 13
 482852 2008 .9000000000000001 0 0 .9623077 13
1025180 2008                 1 0 0 .9623077 13
   9634 2008                 1 1 0 .9623077 13
  11769 2008 .8330000000000001 0 0 .9623077 13
  19977 2008 .8570000000000001 1 0 .9623077 13
  82989 2008                 1 1 0 .9623077 13
  26667 2008 .8890000000000001 1 0 .9623077 13
1067306 2008 .8570000000000001 1 0 .9623077 13
  17077 2008               .75 0 0 .9623077 13
  22899 2008 .9170000000000004 1 0 .9623077 13
   2307 2008 .9169999999999999 1 0 .9623077 13
 827641 2008                 1 1 0 .9623077 13
 925876 2008 .8570000000000001 0 0 .9623077 13
   5506 2008                 1 1 0 .9623077 13
 644340 2008                 1 0 0 .9623077 13
  25756 2008                 1 1 0 .9623077 13
  23716 2008 .8000000000000002 0 0 .9623077 13
   2993 2008 .8330000000000001 1 0 .9623077 13
  11364 2008                 1 1 0 .9623077 13
 938013 2008 .8570000000000001 0 0 .9623077 13
    531 2008                 1 1 0 .9623077 13
 917978 2008                 1 0 0 .9623077 13
  14212 2008 .9000000000000001 1 0 .9623077 13
 141137 2008                 1 1 0 .9623077 13
   3642 2008                 1 1 0 .9623077 13
   2520 2008                 1 1 0 .9623077 13
 780575 2008                 1 1 0 .9623077 13
   9103 2008 .8890000000000001 1 0 .9623077 13
1006743 2008                 1 0 0 .9623077 13
  32067 2008 .9169999999999999 1 0 .9623077 13
  24266 2008 .8890000000000001 1 0 .9623077 13
   6592 2008              .909 1 0 .9623077 13
 630809 2008                 1 0 0 .9623077 13
  23784 2008                 1 1 0 .9623077 13
   1857 2008                 1 0 0 .9623077 13
1025988 2008                 1 0 0 .9623077 13
  10762 2008                 1 0 0 .9623077 13
end

My approach:
(1) Using propensity score matching to form a matched control sample using psmatch2 command with the average genderratio by industry as a covariate:

Code:

 psmatch2 inclindex av_genratio_bysic, out(genderratio) radius caliper(0.01)

Further, I deploy pstest to assess the quality of the match

Code:

 pstest av_genratio_bysic

The psmatch2 command generates the following new variables: _pscore _treated _support _weight _genderratio

(2) Test for parallel trends with didregress
Now, I would like to test for parallel trends prior to 2018 by using the matched sample from (1). However, I am puzzled on how to use the new variables to form the matched treatment and control group.

For a non-matched control sample, I would proceed as follows

Code:

 didregress (genderratio c.sic_first_two_digits_numeric c.numberdirectors) (inclindex2020), group (inclindex) time (year)

Visual inspection of parallel trends using

Code:

 estat trendplots

Testing of parallel trends using

Code:

 estat ptrends

But how do I have to adjust the above code to use the propensity score matched sample?

For the code regress, I found the following https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm which suggests using the frequency weights as depicted in the below reported codes for nearest neighbour matched samples:

Code:

psmatch2 t x1 x2, out(y) logit
reg y x1 x2 t [fweight=_weight]

Simply applying this code to didregress like

Code:

 didregress (genderratio c.sic_first_two_digits_numeric c.numberdirectors) (inclindex2020), group (inclindex) time (year) [fweight=_weight]

does not work as it requires integer values for the _weight variable which is not the case for the weights in caliper matching. Logically, the code results in the error " option [fweight=_weight] not allowed r(198)".

Checking help weight, help psmatch2 and help didregress I could not clarify how to use the results from caliper PSM in diff-in-diff analyses.

Do you have any advice on how to derive the treatment and control group after using psmatch2 with caliper matching and how I can include the matched control group in my didregress code?

Thanks in advance - your help is highly appreciated!

(I am using Stata 18)

Tags: None

George Ford

Join Date: Aug 2014

Posts: 3146
#2

13 May 2024, 08:00

try aweight
Comment
Maite Jander

Join Date: May 2024

Posts: 2
#3

14 May 2024, 06:43

Thanks a lot! Using aweight and slightly adjusting the syntax worked for me!

For those interested, I used the following code

Code:

didregress (genderratio c.sic_first_two_digits_numeric c.numberdirectors) (inclindex2020) [aweight=_weight], group (inclindex) time (year)

Further, I used the descriptive statistics command to compare the matched control sample to the treatment sample based on a set of (arbitrary) characteristics. I used the following code:

Code:

dtable noquals genderratio networksize TimeBrd TimeInCo [aweight = _weight], by(inclindex)

As a result, I get that the number of observations N for both groups - treatment and control - is the same. Before matching, the control group consists of 11,043 observations and the treatment group consists of 25,788 observations. After caliper matching as described in #1, the control group and the treatment group consist of 25,788 observations each.
This appears surprising to me. I would have expected such a result for one-to-one but not caliper matching. I am not sure whether this is how it is supposed to be as I feel I am missing out on the intuition underlying this result. I performed a sensitivity analysis by changing caliper size but the number of observations per group remains the same.

Is it supposed to be like this? If yes, what is the underlying rationale?

Thanks for shedding some light on this matter!
Comment

Announcement

Using difference-in-differences for a propensity score matched sample with caliper matching

Comment

Comment