Dear Stata List,
I am using the -teffects nnmatch- command in Stata 13 to find appropriate matches. I am not (immediately) interested in the estimation of the treatment effects. I am trying to find many matches for each observation, but not all observations have that many possible matches within the group of exact matches; I would like Stata to just give me as many matches as it can, so I can save those in a file and use the file later.
The -osample()- option indicates which observations do not have sufficient matches. My first thought was to use only those observations with enough matches, than make a loop counting down to lower numbers of matches until all (or most) observations have been matched. However, filtering on the -osample()- variable also deletes observations that would have been good matches for other observations (ie, the observations itself may not have enough matches, but could be used as a match for another observation). This reduces my available data much more than necessary.
An example might make things more clear:
****************************
version 13
clear
set seed 123
set obs 25
gen id=floor((10)*runiform()+1)
expand id
gen x=rnormal()
gen w=rnormal()<.25
gen y0=1+x+rnormal()
gen y1=2+x+rnormal()
gen y=(1-w)*y0+w*y1
teffects nnmatch (y x) (w), nn(10)
cap noi teffects nnmatch (y x) (w), nn(10) ematch(id) osample(osample)
cap noi teffects nnmatch (y x) (w) if !osample, nn(10) ematch(id) osample(osample2)
exit
******************************
Results are not so dramatic in this example, but it makes a big difference for my real work. In the end, I would like to have a datafile with 10 matches for each observation where possible, and as many matches as possible for the other observations.
Is there a way to solve this? I am aware of the -mahapick- command (and related) (written by David Kantor; available on SSC) that does what I want. However, -teffects nnmatch- is much faster, so I would prefer to use that.
Thanks!
Matthijs
I am using the -teffects nnmatch- command in Stata 13 to find appropriate matches. I am not (immediately) interested in the estimation of the treatment effects. I am trying to find many matches for each observation, but not all observations have that many possible matches within the group of exact matches; I would like Stata to just give me as many matches as it can, so I can save those in a file and use the file later.
The -osample()- option indicates which observations do not have sufficient matches. My first thought was to use only those observations with enough matches, than make a loop counting down to lower numbers of matches until all (or most) observations have been matched. However, filtering on the -osample()- variable also deletes observations that would have been good matches for other observations (ie, the observations itself may not have enough matches, but could be used as a match for another observation). This reduces my available data much more than necessary.
An example might make things more clear:
****************************
version 13
clear
set seed 123
set obs 25
gen id=floor((10)*runiform()+1)
expand id
gen x=rnormal()
gen w=rnormal()<.25
gen y0=1+x+rnormal()
gen y1=2+x+rnormal()
gen y=(1-w)*y0+w*y1
teffects nnmatch (y x) (w), nn(10)
cap noi teffects nnmatch (y x) (w), nn(10) ematch(id) osample(osample)
cap noi teffects nnmatch (y x) (w) if !osample, nn(10) ematch(id) osample(osample2)
exit
******************************
Results are not so dramatic in this example, but it makes a big difference for my real work. In the end, I would like to have a datafile with 10 matches for each observation where possible, and as many matches as possible for the other observations.
Is there a way to solve this? I am aware of the -mahapick- command (and related) (written by David Kantor; available on SSC) that does what I want. However, -teffects nnmatch- is much faster, so I would prefer to use that.
Thanks!
Matthijs
