Dear Professors, I am trying to understand propensity scores. I have created my own dataset to try and understand how psmatch2 works. I will be attending a course on propensity scores to further understand this.
https://www.bing.com/search?q=propen...ANAB01&PC=U531
However, I would like your help if of course you are able to answer and hopefully I have articulated my questions clearly.
Research question: *Research question - Is SMOKING associated with increased INFECTION after SURGERY
Dataset below (with dataex)
My statalist questions - I have 4 questions:
Question 1
Can someone please better explain what _nn _n1 _n2 _n3 means
I understand that _id - is a number given to all observations, so if I have 22 I will have ordinal observations 1-22
However, help file says _nn --> stores the observation number of the k-th matched control observation (i am matching 3:1 )
Does this mean from my dataset for example, that row 11 - row 17 are the matched data? And the corresponding _n1 _n2 _n3 are the closest matched controls for when each _treatment = 1 ?
Question 2.
In order to show the ethnicity by smoking status pre-matching, I used the following code and post matching
PRE MATCHING
POST MATCHING to show how well my sample is matched:
The following error came up:
may not use noninteger frequency weights
How can I address this problem to create a histogram of the post-matching results for categorical variables?
Question 2b: Should I just save my matched dataset by just
keep if _weight != .
(or would this not make a difference as in the future anlaysis will not take into consideration those with weight = . so therefore this step is useless
Question 3
I often learn by first looking at what others have posted on this excellent forum.
In this post (#post 12) is similar to what I would like to try and construct in my dataset i.e creating a kdensity curve after matching, why does the user type (see yellow sections) . I have attempted to explain it myself (in blue), but I got stuck in the bold section
Propensity Score Matching - Statalist
gen match = _n1 // Here the user creates a new variable which is equivalent to _n1 (which in my case is 7 matched)
replace match = _id if match == . // here the user matches it to the _id if unmatched where _n1 == .
I don't understand why the user used the syntax - duplicates tag match, gen(dup) --> why is the user trying to find duplicates of match?
Can anyone explain this to me, the user then goes on to use dup > 0 for the kdensity graph.
Question 4
Can I use categorical and continuous variables with psmatch2 ?
My educated guess is yes and I do not need to enter a i.categorical unlike in pscore. Therefore it should be ok to use:
But eg for ethnicity or social deprivation, which are two of my covariates, it does not make sense to have a mean from pstest. But rather I should have n(%). I assume I should get these numbers from
bys _treated : sum ethnicity
Is this correct?
Or perhaps I should use this below, this tells stata they are categorical variables but the results do not indicate which ethnicity is which....
https://www.bing.com/search?q=propen...ANAB01&PC=U531
However, I would like your help if of course you are able to answer and hopefully I have articulated my questions clearly.
Research question: *Research question - Is SMOKING associated with increased INFECTION after SURGERY
Dataset below (with dataex)
My statalist questions - I have 4 questions:
Question 1
Can someone please better explain what _nn _n1 _n2 _n3 means
I understand that _id - is a number given to all observations, so if I have 22 I will have ordinal observations 1-22
However, help file says _nn --> stores the observation number of the k-th matched control observation (i am matching 3:1 )
Does this mean from my dataset for example, that row 11 - row 17 are the matched data? And the corresponding _n1 _n2 _n3 are the closest matched controls for when each _treatment = 1 ?
Question 2.
In order to show the ethnicity by smoking status pre-matching, I used the following code and post matching
PRE MATCHING
Code:
histogram ethnicity, by($treatment)
Code:
histogram ethnicity [w=_weight], by($treatment)
may not use noninteger frequency weights
How can I address this problem to create a histogram of the post-matching results for categorical variables?
Question 2b: Should I just save my matched dataset by just
keep if _weight != .
(or would this not make a difference as in the future anlaysis will not take into consideration those with weight = . so therefore this step is useless
Question 3
I often learn by first looking at what others have posted on this excellent forum.
In this post (#post 12) is similar to what I would like to try and construct in my dataset i.e creating a kdensity curve after matching, why does the user type (see yellow sections) . I have attempted to explain it myself (in blue), but I got stuck in the bold section
Propensity Score Matching - Statalist
gen match = _n1 // Here the user creates a new variable which is equivalent to _n1 (which in my case is 7 matched)
replace match = _id if match == . // here the user matches it to the _id if unmatched where _n1 == .
I don't understand why the user used the syntax - duplicates tag match, gen(dup) --> why is the user trying to find duplicates of match?
Can anyone explain this to me, the user then goes on to use dup > 0 for the kdensity graph.
Question 4
Can I use categorical and continuous variables with psmatch2 ?
My educated guess is yes and I do not need to enter a i.categorical unlike in pscore. Therefore it should be ok to use:
Code:
psmatch2 $treatment $covariates, outcome($ylist) neighbor(3) bw (0.06) common logit
bys _treated : sum ethnicity
Is this correct?
Code:
pstest $xlist, treated($treatment) both
Code:
pstest i.gender i.ethnicity i.socialdeprivation, raw treated($treatment)
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(gender smoking infection socialdeprivation ethnicity) double _pscore byte(_treated _support) double(_weight _infection) byte(_id _n1 _n2 _n3) float _nn double _pdif float match byte dup2 1 0 0 7 4 .14629124373657137 0 1 .3333333333333333 . 1 . . . 0 . 1 0 0 0 0 7 1 .18353598336623161 0 1 .3333333333333333 . 2 . . . 0 . 2 1 1 0 0 6 4 .23642515629258887 0 1 .6666666666666666 . 3 . . . 0 . 3 0 1 0 1 4 1 .27681680297995453 0 1 .3333333333333333 . 4 . . . 0 . 4 0 0 0 0 6 1 .2888530710771261 0 1 .3333333333333333 . 5 . . . 0 . 5 1 0 0 0 6 2 .3595669083790513 0 1 .3333333333333333 . 6 . . . 0 . 6 0 1 0 0 3 1 .4088562831270797 0 1 .6666666666666666 . 7 . . . 0 . 7 0 1 0 0 2 1 .5554993045631528 0 1 1.6666666666666665 . 8 . . . 0 . 8 2 1 0 1 1 1 .6930733259357975 0 1 1.3333333333333333 . 9 . . . 0 . 9 3 1 0 0 2 3 .7048182691423677 0 1 1 . 10 . . . 0 . 10 0 1 1 1 5 1 .17480944488523578 1 1 1 0 11 2 1 3 3 .008726538480995832 2 1 1 1 1 5 3 .2881297448796828 1 1 1 .3333333333333333 12 5 4 3 3 .0007233261974433081 5 1 0 1 1 5 2 .5035916419173779 1 1 1 0 13 8 7 6 3 .051907662645774844 8 2 1 1 1 2 1 .5554993045631528 1 1 1 .3333333333333333 14 8 9 7 3 0 8 2 1 1 0 2 2 .6333538983172218 1 1 1 .3333333333333333 15 9 10 8 3 .05971942761857574 9 3 1 1 0 2 2 .6333538983172218 1 1 1 .3333333333333333 16 9 10 8 3 .05971942761857574 9 3 1 1 1 1 1 .6930733259357975 1 1 1 .3333333333333333 17 9 10 8 3 0 9 3 0 1 1 1 1 .8866625742391517 1 0 . . 18 . . . . . 18 0 0 1 1 1 1 .8866625742391517 1 0 . . 19 . . . . . 19 0 0 1 1 1 3 .9372933094185172 1 0 . . 20 . . . . . 20 0 0 1 1 1 4 .9538339376992043 1 0 . . 21 . . . . . 21 0 0 1 0 . 4 . . . . . 22 . . . . . 22 0 end label values gender Gender label def Gender 0 "Female", modify label def Gender 1 "Male", modify label values smoking Smoking label def Smoking 0 "Nonsmoker", modify label def Smoking 1 "Smoker", modify label values socialdeprivation social label def social 1 "Most deprived", modify label def social 7 "Least deprived", modify label values ethnicity Ethnicity label def Ethnicity 1 "White", modify label def Ethnicity 2 "Asian", modify label def Ethnicity 3 "Black African", modify label def Ethnicity 4 "Mixed", modify label values _treated _treated label def _treated 0 "Untreated", modify label def _treated 1 "Treated", modify label values _support _support label def _support 0 "Off support", modify label def _support 1 "On support", modify
Comment