Dear all,
I am using the teffects psmatch and psmatch2 commands in Stata 13.1. I have managed to generate the exact same pscores using both commands. The frequency of the two weight variables generated from both commands is also the same but the weights are assigned to completely different observations even though their pscores are exactly the same. This gives completely different regression results. Which results should i trust more? Am i doing something wrong? Or is the completely wrong approach to begin with?
Any help/advice would be highly appreciated.
Code below using the standard cattaneo2 dataset.
Best wishes,
Asjad Naqvi
webuse cattaneo2, clear
ren bweight y
ren mbsmoke t
ren mage x1 // continuous variable
ren prenatal1 x2 // dummy variable
ren mmarried x3 // dummy variable
ren fbaby x4 // dummy variable
ren mrace x5 // dummy variable
keep y t x*
gen id = _n
order id y t x1 x2 x3 x4 x5
teffects psmatch (y) (t x*), gen(nearn)
*teffects overlap
predict pscore1 pscore2, ps
ren nearn1 nn1
drop nearn*
preserve
keep if t
keep nn1
bysort nn1: gen weight = _N
bysort nn1: keep if _n==1
ren nn1 id
save tweights.dta, replace
restore
merge m:1 id using tweights
replace weight=1 if t==1
sort id
psmatch2 t x*, out(y) logit
pstest, both graph
// using psmatch2 weights
pstest x2, both
// using teffects weights
regress x1 y, noheader
regress x1 y [pw = weight], noheader
// comparing outputs
compare weight _weight
compare pscore2 _pscore
tab weight
tab _weight
list id y t x* pscore2 weight _pscore _weight if weight==20 | _weight==20
list id y t x* pscore2 weight _pscore _weight if weight==21 | _weight==21
// weighted regressions
regress y t
regress y t [pw=_weight]
regress y t [pw= weight]
I am using the teffects psmatch and psmatch2 commands in Stata 13.1. I have managed to generate the exact same pscores using both commands. The frequency of the two weight variables generated from both commands is also the same but the weights are assigned to completely different observations even though their pscores are exactly the same. This gives completely different regression results. Which results should i trust more? Am i doing something wrong? Or is the completely wrong approach to begin with?
Any help/advice would be highly appreciated.
Code below using the standard cattaneo2 dataset.
Best wishes,
Asjad Naqvi
webuse cattaneo2, clear
ren bweight y
ren mbsmoke t
ren mage x1 // continuous variable
ren prenatal1 x2 // dummy variable
ren mmarried x3 // dummy variable
ren fbaby x4 // dummy variable
ren mrace x5 // dummy variable
keep y t x*
gen id = _n
order id y t x1 x2 x3 x4 x5
teffects psmatch (y) (t x*), gen(nearn)
*teffects overlap
predict pscore1 pscore2, ps
ren nearn1 nn1
drop nearn*
preserve
keep if t
keep nn1
bysort nn1: gen weight = _N
bysort nn1: keep if _n==1
ren nn1 id
save tweights.dta, replace
restore
merge m:1 id using tweights
replace weight=1 if t==1
sort id
psmatch2 t x*, out(y) logit
pstest, both graph
// using psmatch2 weights
pstest x2, both
// using teffects weights
regress x1 y, noheader
regress x1 y [pw = weight], noheader
// comparing outputs
compare weight _weight
compare pscore2 _pscore
tab weight
tab _weight
list id y t x* pscore2 weight _pscore _weight if weight==20 | _weight==20
list id y t x* pscore2 weight _pscore _weight if weight==21 | _weight==21
// weighted regressions
regress y t
regress y t [pw=_weight]
regress y t [pw= weight]