Dear all,
This is my first ever post here, so please bear with me...
I have been looking for a solution to this problem for quite a while, and I think I have come up with a solution. I want to do 1:n propensity score matching (with n being flexible up to a certain number) without replacement. However, psmatch2 only allows 1:n matching with replacement. Note that this is not a discussion of the advantages or disadvantages of either method.
I've combined advice on similar topics from a number of users in the do-file below. It's a bit clunky, but I think it does what I want it to do:
- 1:3 (in this example) propensity score matching on a previously predicted propensity score [pscore], without replacement
- The output mirrors that of psmatch2, so pstest or similar can be used
- Matches are identifiable through the variable [pair], allowing for condition logistic regression or other analysis
I was wondering if people could try this on their data to see if I haven't made a mess out of it?
Also, if this does work, then maybe it can help somebody in a similar situation...
I am looking forward to your feedback, please be kind.
Johannes
**** trial of 1:3 matching without replacement, using repeated 1:1 matching with psmatch2 ****
*** run your own prediction model, save your propensity score under variable [pscore]
* create copy of propensity score
sum pscore
gen pscore_original=pscore
* random order
set seed 1000
gen x=uniform()
sort x
*** Round 1
** nearest neighbour 1:1 matching with caliper 0.20*SD, adjust for your own data from 'sum pscore' results above
psmatch2 [your intervention], pscore(pscore) caliper (0.024) noreplacement descending
** remove matched controls by changing propensity score to 91 (future rounds will be 92, 93 etc)
replace pscore=91 if _treated==0 & _weight==1
** keep ID of matched control by generating new n1 and ID variable (new variable without underscore so it doesn't get overwritten)
gen n1=_n1
gen id=_id
** generate paired ID for later analysis
gen pair = _id if pscore==91
replace pair = _n1 if _treated==1
bysort pair: egen paircount = count(pair)
replace pair=. if paircount!=2
drop paircount
*** Round 2
sort x
** nearest neighbour 1:1 matching with caliper 0.20*SD
psmatch2 [your intervention], pscore(pscore) caliper (0.024) noreplacement descending
** remove matched controls by changing propensity score to 92
replace pscore=92 if _treated==0 & _weight==1
** keep ID of matched control by generating new _n2 variable
gen _n2=_n1
gen pair2 = _id if pscore==92
replace pair2 = _n2 if _treated==1
gsort pair2 _treated
replace pair=pair[_n+1] if pair==. & pair2!=.
bysort pair: egen paircount = count(pair)
drop pair2 paircount
*** Round 3
sort x
** nearest neighbour 1:1 matching with caliper 0.20*logit of SD
psmatch2 [your intervention], pscore(pscore) caliper (0.024) noreplacement descending
** remove matched controls by changing propensity score to 93
replace pscore=93 if _treated==0 & _weight==1
** keep ID of matched control by generating new n1 variable
gen _n3=_n1
gen pair2 = _id if pscore==93
replace pair2 = _n3 if _treated==1
gsort pair2 _treated
replace pair=pair[_n+1] if pair==. & pair2!=.
bysort pair: egen paircount = count(pair)
drop pair2
**** Tidy up and recreate psmatch 1:3 output
*create 1:3 match descriptor for all matched
gen one_to_n=(paircount-1)
replace one_to_n=. if one_to_n==-1
drop paircount
sort _treated
by _treated: tab one_to_n
* reconstruct matches to original ID numbers
gsort pair pscore
replace _n2=id[_n+2] if pscore[_n+2]==92 & _n2!=.
replace _n3=id[_n+3] if pscore[_n+3]==93 & _n3!=.
drop _n1
rename n1 _n1
** recreate output from 1:n matching with psmatch2
replace _id=id
replace _weight=1 if _treated==1 & _n1!=.
replace _weight=1 if _treated==0 & one_to_n==1
replace _weight=0.5 if _treated==0 & one_to_n==2
replace _weight=0.333 if _treated==0 & one_to_n==3
replace _nn=0 if _treated==0
replace _nn=0 if _treated==1 & _n1==.
replace _nn=1 if _treated==1 & _n1!=. & _n2==.
replace _nn=2 if _treated==1 & _n1!=. & _n2!=. & _n3==.
replace _nn=3 if _treated==1 & _n1!=. & _n2!=. & _n3!=.
replace _support=1 if _treated==1 & _weight==1
replace _pscore=pscore_original
order _pscore _treated _support _weight _id _n1 _n2 _n3 _nn , after (x)
sort pair
**** check if this worked by using pstest
This is my first ever post here, so please bear with me...
I have been looking for a solution to this problem for quite a while, and I think I have come up with a solution. I want to do 1:n propensity score matching (with n being flexible up to a certain number) without replacement. However, psmatch2 only allows 1:n matching with replacement. Note that this is not a discussion of the advantages or disadvantages of either method.
I've combined advice on similar topics from a number of users in the do-file below. It's a bit clunky, but I think it does what I want it to do:
- 1:3 (in this example) propensity score matching on a previously predicted propensity score [pscore], without replacement
- The output mirrors that of psmatch2, so pstest or similar can be used
- Matches are identifiable through the variable [pair], allowing for condition logistic regression or other analysis
I was wondering if people could try this on their data to see if I haven't made a mess out of it?
Also, if this does work, then maybe it can help somebody in a similar situation...
I am looking forward to your feedback, please be kind.
Johannes
**** trial of 1:3 matching without replacement, using repeated 1:1 matching with psmatch2 ****
*** run your own prediction model, save your propensity score under variable [pscore]
* create copy of propensity score
sum pscore
gen pscore_original=pscore
* random order
set seed 1000
gen x=uniform()
sort x
*** Round 1
** nearest neighbour 1:1 matching with caliper 0.20*SD, adjust for your own data from 'sum pscore' results above
psmatch2 [your intervention], pscore(pscore) caliper (0.024) noreplacement descending
** remove matched controls by changing propensity score to 91 (future rounds will be 92, 93 etc)
replace pscore=91 if _treated==0 & _weight==1
** keep ID of matched control by generating new n1 and ID variable (new variable without underscore so it doesn't get overwritten)
gen n1=_n1
gen id=_id
** generate paired ID for later analysis
gen pair = _id if pscore==91
replace pair = _n1 if _treated==1
bysort pair: egen paircount = count(pair)
replace pair=. if paircount!=2
drop paircount
*** Round 2
sort x
** nearest neighbour 1:1 matching with caliper 0.20*SD
psmatch2 [your intervention], pscore(pscore) caliper (0.024) noreplacement descending
** remove matched controls by changing propensity score to 92
replace pscore=92 if _treated==0 & _weight==1
** keep ID of matched control by generating new _n2 variable
gen _n2=_n1
gen pair2 = _id if pscore==92
replace pair2 = _n2 if _treated==1
gsort pair2 _treated
replace pair=pair[_n+1] if pair==. & pair2!=.
bysort pair: egen paircount = count(pair)
drop pair2 paircount
*** Round 3
sort x
** nearest neighbour 1:1 matching with caliper 0.20*logit of SD
psmatch2 [your intervention], pscore(pscore) caliper (0.024) noreplacement descending
** remove matched controls by changing propensity score to 93
replace pscore=93 if _treated==0 & _weight==1
** keep ID of matched control by generating new n1 variable
gen _n3=_n1
gen pair2 = _id if pscore==93
replace pair2 = _n3 if _treated==1
gsort pair2 _treated
replace pair=pair[_n+1] if pair==. & pair2!=.
bysort pair: egen paircount = count(pair)
drop pair2
**** Tidy up and recreate psmatch 1:3 output
*create 1:3 match descriptor for all matched
gen one_to_n=(paircount-1)
replace one_to_n=. if one_to_n==-1
drop paircount
sort _treated
by _treated: tab one_to_n
* reconstruct matches to original ID numbers
gsort pair pscore
replace _n2=id[_n+2] if pscore[_n+2]==92 & _n2!=.
replace _n3=id[_n+3] if pscore[_n+3]==93 & _n3!=.
drop _n1
rename n1 _n1
** recreate output from 1:n matching with psmatch2
replace _id=id
replace _weight=1 if _treated==1 & _n1!=.
replace _weight=1 if _treated==0 & one_to_n==1
replace _weight=0.5 if _treated==0 & one_to_n==2
replace _weight=0.333 if _treated==0 & one_to_n==3
replace _nn=0 if _treated==0
replace _nn=0 if _treated==1 & _n1==.
replace _nn=1 if _treated==1 & _n1!=. & _n2==.
replace _nn=2 if _treated==1 & _n1!=. & _n2!=. & _n3==.
replace _nn=3 if _treated==1 & _n1!=. & _n2!=. & _n3!=.
replace _support=1 if _treated==1 & _weight==1
replace _pscore=pscore_original
order _pscore _treated _support _weight _id _n1 _n2 _n3 _nn , after (x)
sort pair
**** check if this worked by using pstest
Comment