I received the following private question about matching; I’m posting the question and my reply here so that others may weigh in.
Dear Melissa,
I’m writing private message rather than the public one as the focus of my question is much more on the matching itself, rather than on Stata commands for matching. I was wondering if you have a bit of free time to help me in clarifying some of (for me rather complicated) issues I describe below?
I’m trying to test 2 hypotheses (H1 and H2 below) to see whether 2 spatially separated participatory programs had an effect on attitudes and knowledge of local people. I’m measuring ATT and my outcomes are both categorical (attitudes on scale from 1 to 5) and continuous (knowledge: 0 to 8 points).
Here are the brief details of the hypotheses.
H1: compare effects of participation in any of the two programs v.s. non-participants.
H2: compare effects of participation in a program A versus participation in a program B.
Therefore, I have two treatment variable (one for each hypothesis). I also have two different probit models to estimate two different propensity scores (one for each hypothesis). These probit models have different sample sizes as for the second hypothesis I use only a sub-sample of participants and Im looking at the differences inside that subsample.
Im using user-written commands psmatch2 and pscore on Stata 12.
Here are my doubts:
1) For testing H1, I have 93 non-participants and 210 participants. Do you think matching is the way to go here, as I have comparatively very small number of controls?
*** Matching may or may not work for you; it will depend on how many of your non-participants and participants are within the range of common support. If you end up with a very small fraction of your non-participants being used as matches for your participants, you might want to consider other methods of analysis.
2) If yes, which type of matching would be theoretically the best in this situation? I’m using kernel and nearest neighbor matching with replacement and with 3 neighbours as these two have the best matching quality (evaluated with the pstest command). Here is the command line:
psmatch2 treat_V VfordistN tr_wakillN2 sc_viltrustN1 hh_headEDUY hh_waterN tr_compN district pw1 satmobtv, outcome (trknowledge tigerlake bd_otherwildlifelikeN bd_forestlikeY) com n(3)
*** There is no one “best” type of matching. I tend to evaluate a few methods and choose the one that minimizes bias without sacrificing sample size.
3) If no, what would be the alternative in order to get rid of selection bias/have causal inference?
*** If you have multiple time points of data, you might consider a difference-in-difference model.
4) With the current propensity score specification, I have 48 treated cases off support. Is that too much in comparison to my overall sample? Should I try to refit my probit model?
*** I would try a different specification of your probit model. Your goal is to get a treatment and comparison group that are roughly equivalent on observed covariates so that you can isolate the effect of your treatment. If you have many off support cases, you will have a very limited population to which you may generalize your results.
5) Since I never came across categorical outcomes in papers that use matching, I’m wondering if it makes sense to measure ATT on the categorical outcomes?
*** Yes, you can use categorical outcomes – the propensity score match process just makes your treatment and comparison group more similar to each other. You can evaluate a variety of outcomes after matching observations.
6) For testing H2: is it OK to use matching as I’m comparing two treatments rather than a treatment and a control?
***Yes, that is fine. An alternative would be to evaluate H1 and H2 within a single model, but you would need Stata 13 for that. See the manual entry on –teffects multivalued-; you can use inverse probability of treatment weighting to do both comparisons within a single model.
Dear Melissa,
I’m writing private message rather than the public one as the focus of my question is much more on the matching itself, rather than on Stata commands for matching. I was wondering if you have a bit of free time to help me in clarifying some of (for me rather complicated) issues I describe below?
I’m trying to test 2 hypotheses (H1 and H2 below) to see whether 2 spatially separated participatory programs had an effect on attitudes and knowledge of local people. I’m measuring ATT and my outcomes are both categorical (attitudes on scale from 1 to 5) and continuous (knowledge: 0 to 8 points).
Here are the brief details of the hypotheses.
H1: compare effects of participation in any of the two programs v.s. non-participants.
H2: compare effects of participation in a program A versus participation in a program B.
Therefore, I have two treatment variable (one for each hypothesis). I also have two different probit models to estimate two different propensity scores (one for each hypothesis). These probit models have different sample sizes as for the second hypothesis I use only a sub-sample of participants and Im looking at the differences inside that subsample.
Im using user-written commands psmatch2 and pscore on Stata 12.
Here are my doubts:
1) For testing H1, I have 93 non-participants and 210 participants. Do you think matching is the way to go here, as I have comparatively very small number of controls?
*** Matching may or may not work for you; it will depend on how many of your non-participants and participants are within the range of common support. If you end up with a very small fraction of your non-participants being used as matches for your participants, you might want to consider other methods of analysis.
2) If yes, which type of matching would be theoretically the best in this situation? I’m using kernel and nearest neighbor matching with replacement and with 3 neighbours as these two have the best matching quality (evaluated with the pstest command). Here is the command line:
psmatch2 treat_V VfordistN tr_wakillN2 sc_viltrustN1 hh_headEDUY hh_waterN tr_compN district pw1 satmobtv, outcome (trknowledge tigerlake bd_otherwildlifelikeN bd_forestlikeY) com n(3)
*** There is no one “best” type of matching. I tend to evaluate a few methods and choose the one that minimizes bias without sacrificing sample size.
3) If no, what would be the alternative in order to get rid of selection bias/have causal inference?
*** If you have multiple time points of data, you might consider a difference-in-difference model.
4) With the current propensity score specification, I have 48 treated cases off support. Is that too much in comparison to my overall sample? Should I try to refit my probit model?
*** I would try a different specification of your probit model. Your goal is to get a treatment and comparison group that are roughly equivalent on observed covariates so that you can isolate the effect of your treatment. If you have many off support cases, you will have a very limited population to which you may generalize your results.
5) Since I never came across categorical outcomes in papers that use matching, I’m wondering if it makes sense to measure ATT on the categorical outcomes?
*** Yes, you can use categorical outcomes – the propensity score match process just makes your treatment and comparison group more similar to each other. You can evaluate a variety of outcomes after matching observations.
6) For testing H2: is it OK to use matching as I’m comparing two treatments rather than a treatment and a control?
***Yes, that is fine. An alternative would be to evaluate H1 and H2 within a single model, but you would need Stata 13 for that. See the manual entry on –teffects multivalued-; you can use inverse probability of treatment weighting to do both comparisons within a single model.
Comment