-psmatch2- graph for propensity score matching

David Radwin

Join Date: Mar 2014

Posts: 369
#16

18 Mar 2016, 10:30

I do not think you should evaluate your matched samples solely on the basis of one pair of graphs, but this pair of graphs does suggest that the balance is worse after matching. That is not so surprising. There is no guarantee that matching improves balance. As Sekhon (2011) writes:

A significant shortcoming of common matching methods such as Mahalanobis distance and propensity score matching is that they may (and in practice, frequently do) make balance worse across measured potential confounders. These methods may make balance worse, in practice, even if covariates are distributed ellipsoidally because in a given finite sample there may be departures from an ellipsoidal distribution.

For an example where matching worsens balance, see p. 11.

Sekhon, J. 2011. Matching: Multivariate Matching with Automated Balance Optimization in R. Journal of Statistical Software 42(7):1-52.

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
Comment
Alhassane Bah

Join Date: Mar 2016

Posts: 4
#17

19 Mar 2016, 07:58

Dear Sir,
I sincerely appreciate your time in assisting me on this. After going back to your details, I tried to run the nearest neighbor matching and after auto generating _n and _id, I run the codes provided at post #7 above and got the graph below, which looks better now.
But my question now is, if I had to run a kernel matching, what would I replace with _n and _id on the codes of post #7 to get the graph? Or this is only possible with nearest neighbor matching?
Thank you so much David.
Alhassane.

Last edited by Alhassane Bah; 19 Mar 2016, 08:04.
Comment

David Radwin

Join Date: Mar 2014
Posts: 369

#18

21 Mar 2016, 10:51

There is no nearest neighbor, and therefore no variables to indicate nearest neighbor, if you don't use nearest neighbor matching.

If you want to reproduce Richard Hofler's graph after kernel matching, you can weight the results instead. This example does so and makes two additional changes. It saves the graphs to memory rather than to disk and it uses Vince Wiggins's grc1leg program I mentioned earlier.

Code:

sysuse auto, clear
psmatch2 foreign mpg, out(price)

* before
twoway (kdensity _pscore if _treated==1) (kdensity _pscore if _treated==0, ///
lpattern(dash)), legend( label( 1 "treated") label( 2 "control" ) ) ///
xtitle("propensity scores BEFORE matching") name(before, replace)

* after
twoway (kdensity _pscore if _treated==1 [aweight=_weight]) ///
(kdensity _pscore if _treated==0 [aweight=_weight] ///
, lpattern(dash)), legend( label( 1 "treated") label( 2 "control" )) ///
xtitle("propensity scores AFTER matching") name(after, replace)

* combined
grc1leg before after, ycommon

Click image for larger version

Name: kernel.png
Views: 1
Size: 16.5 KB
ID: 1331838

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him

Comment

Alhassane Bah

Join Date: Mar 2016

Posts: 4
#19

22 Mar 2016, 10:31

Thank you so much Sir.
Comment
Ronald Obong

Join Date: Mar 2016

Posts: 3
#20

29 May 2016, 09:49

I have problems drawing propensity score distribution graph showing region of common support for treated and non-treated group. What command can I use to do this? the graph looks like the one attached
Attached Files

pscore.docx (22.9 KB, 5 views)
Comment
Sebastian Geiger

Join Date: Oct 2015

Posts: 124
#21

01 Jun 2016, 19:39

Edit: oh I didn't see the second page. So just ignore my post.

Alhassane,

what is the code your using to produce these graphs? Most likely, you should not just flip the title of the graphs to get the result/improvement you want.

The simplest way is probably to use the -pstest- command.

Code:

sysuse auto, clear psmatch2 foreign mpg, out(price) kernel pstest _pscore, density both

In this case, -pstest- knows what to do depending on the matching procedure used (nearest-neighbor, kernel, radius etc.). Those matching methods, like kernel matching, re-weight the initial propensity score to obtain a matched sample In contrast, nearest-neighbor matching uses the non-weighted propensity score, but drops the observations for which no matched counterpart exists.

What -pstest- does in my example above is essentially to create a (or since the both option is specified two) -twoway kernel- graphs with using weights (obtained by the matching). Maybe the source code of the command gives you a better indication about what is done (note the used aw weights - [aw=`mweight'] ) :

Code:

twoway (kdensity `varlist' if `touse' & `treated'==1 [aw=`mweight'], clwid(thick)) (kdensity `varlist' if `touse' & `treated'==0 [aw=`mweight'], clwid(thin) clcolor(black)), xlab(#6) xti("") yti("") title("`Ytitle'") subtitle("Matched samples") legend(order(1 "Treated" 2 "Untreated")) graphregion(color(gs16)) `options'

Of course, it is possible that the matching procedure does more harm than good. If that happens you should re-consider your model which is used to estimate the propensity score and/or the chosen matching method (they all have advantages and disadvantages).

Last edited by Sebastian Geiger; 01 Jun 2016, 19:57.
Comment

Sebastian Geiger

Join Date: Oct 2015
Posts: 124

#22

01 Jun 2016, 19:41

Ronald,

maybe try something like this

Code:

sysuse auto, clear
psmatch2 foreign mpg, out(price)

sum _pscore if _treat==1
local minsupport = r(min)
sum _pscore if _treat==0
local maxsupport = r(max)
dis "`minsupport'"
dis "`maxsupport'"

gen match=_n1
replace match=_id if match==.
duplicates tag match, gen(dup)

twoway (kdensity _pscore if _treated==1) (kdensity _pscore if _treated==0 ///
, lpattern(dash)), legend( label( 1 "treated") label( 2 "control" )) ///
xtitle("propensity scores AFTER matching") saving(after, replace) ///
xline(`minsupport') xline(`maxsupport')

Probably, it won't look as nice with "real" data as it does in your drawn picture.

Last edited by Sebastian Geiger; 01 Jun 2016, 20:17.

Comment

Julie Lima

Join Date: Jul 2014

Posts: 1
#23

09 Feb 2017, 10:52

Thank you all for posting in this thread. Building from Richard Hofler's syntax from April 2014 and re-pasted below, does anyone know how to alter the syntax when doing a 1 to 3 nearest neighbor match as opposed to a 1 to 1 match? So, therefore, I have _n1, _n2, and _n3 to work with. I would rather not plot 3 separate lines for each of these compared to my treated group, but rather a single line for all matched controls, if possible.

// compare _pscores before matching & save graph to disk
twoway (kdensity _pscore if _treated==1) (kdensity _pscore if _treated==0, ///
lpattern(dash)), legend( label( 1 "treated") label( 2 "control" ) ) ///
xtitle("propensity scores BEFORE matching") saving(before, replace)

// compare _pscores *after* matching & save graph to disk
gen match=_n1
replace match=_id if match==.
duplicates tag match, gen(dup)
twoway (kdensity _pscore if _treated==1) (kdensity _pscore if _treated==0 ///
& dup>0, lpattern(dash)), legend( label( 1 "treated") label( 2 "control" )) ///
xtitle("propensity scores AFTER matching") saving(after, replace)

// combine these two graphs that were saved to disk
// put both graphs on y axes with common scales
graph combine before.gph after.gph, ycommon

Thanks in advance,
Julie Lima
Brown University
Comment
Line Sorensen

Join Date: Feb 2019

Posts: 1
#24

26 Feb 2019, 08:51

Julie Lima, I realize this is somewhat late but here goes if other needs.
You need to generate matches for _n1 _n2 and _n3 (same way as described above, but change names)
Then you need to run the duplicates command on all three matches, and finally generate a combined dup.

Like this

gen matchn1=_n1
replace matchn1=_id if matchn1==.

gen matchn2=_n2
replace matchn2=_id if matchn2==.

gen matchn3=_n3
replace matchn3=_id if matchn3==.

duplicates tag matchn1, gen(dupn1)

duplicates tag matchn2, gen(dupn2)

duplicates tag matchn3, gen(dupn3)

gen dup=dupn1+dupn2+dupn3

Then use the code from above

twoway (kdensity _pscore if _treated==1) (kdensity _pscore if _treated==0 ///
& dup>0, lpattern(dash)), legend( label( 1 "treated") label( 2 "control" )) ///
xtitle("propensity scores AFTER matching") saving(after, replace)
Comment
amira elshal

Join Date: Dec 2015

Posts: 75
#25

26 Feb 2019, 10:50

Hi all. A quick question: When I used the syntax David Radwin kindly suggested, I got the figure below. What does this figure indicate? What does it mean that the propensity scores are never larger that 0.4?

CS.gph

Thanks in advance.
Comment
amira elshal

Join Date: Dec 2015

Posts: 75
#26

26 Feb 2019, 10:56

Hi all. A quick question: When I used the syntax David Radwin kindly suggested, I got the figure below. What does this figure indicate? What does it mean that the propensity scores are never larger that 0.4? I re-attach the figure.

CS.gph

Thanks in advance.
Attached Files

Last edited by amira elshal; 26 Feb 2019, 11:15.
Comment
David Radwin

Join Date: Mar 2014

Posts: 369
#27

26 Feb 2019, 11:43

The point of graphs like these is to visually inspect and show the closeness of the two groups and the overlap between them, before and after matching.

It looks like it is the case that the propensity scores are never larger than 0.4 before or after matching, but a better way to ascertain this fact is something like

Code:

summarize _pscore

possibly limiting the analysis to the matched samples, then checking that the maximum value is less than 0.4.

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
1 like
Comment
amira elshal

Join Date: Dec 2015

Posts: 75
#28

26 Feb 2019, 12:16

Thanks for the prompt reply. I double checked; yes the propensity scores are never higher than 0.35. Is that okay? I understand that the propensity score is the probability of treatment. I am studying a health sector reform.
Comment
David Radwin

Join Date: Mar 2014

Posts: 369
#29

26 Feb 2019, 12:52

I don't think there is any reason why a lower maximum propensity score is better than a higher one. (Of course, by construction, they are always between 0 and 1.)

The point of matching is to get the propensity scores (and other statistics) of the treated and control groups to be as similar as possible (in other words, to be balanced) and to overlap. Regarding overlap, you do not want the treatment group to have a much higher max p score than the control group, or vice-versa, after matching.

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
1 like
Comment
amira elshal

Join Date: Dec 2015

Posts: 75
#30

26 Feb 2019, 12:56

So, given the figure I posted, is this common support or overlap area acceptable? Also, the figure shows that the control group has a much higher max p score than the treatment group, right? Thanks.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment