teffects psmatch - how much of the data were used?

jepearso

Join Date: Apr 2014

Posts: 2
#1

teffects psmatch - how much of the data were used?

17 Apr 2014, 11:40

All --

Please forgive me if this is an elementary question. I have searched the treatment effects manual and the internet in general for an answer to this question and have come up short. I suspect that the solution is obvious.

I am conducting a propensity score matching analysis in Stata 13 using teffects psmatch. My total N is 2123, with 672 "treated" and 1451 "control" individuals. When I run the teffects psmatch command, the output states that the "number of obs = 2123." I can't tell if that is before or after matching. I am currently running everything with the default settings, so nearest neighbor=1. If this is indeed what is happening and I have exactly one match/treated individual (and I keep all my treated individuals), then I should have a matched sample of 2*672.

Another way to ask my question is -- does effects psmatch generate a variable (or can I make Stata generate a variable) that indicates the "matched" sample?

Thanks in advance for your help!

Jenni
Tags: None
Joe Canner

Join Date: Mar 2014

Posts: 580
#2

17 Apr 2014, 14:26

Jenni,

Check out the generate() option which can be used to specify the names for generated variables that will contain the observation numbers of the matching observations.

Also note that, by default, teffects psmatch computes the ATE (average treatment effect), which has implications for how Stata performs the matching. You may, in fact, want to specify atet (average treatment effect on the treated) instead.

Regards,
Joe
1 like
Comment
Ariel Linden

Join Date: Apr 2014

Posts: 170
#3

20 Apr 2014, 06:34

Also note that the algorithm uses matching with replacement, so the same controls could be used repeatedly
Comment
jepearso

Join Date: Apr 2014

Posts: 2
#4

23 Apr 2014, 09:17

Thanks Joe! I added the option "generate(match)" and generated two variables (to my surprise) - "treat1" and "match1". Their values are identical. I'm not sure why that happened or what the utility is of having both variables, but now I can figure out how many of my controls were used in the matching.
Comment
Cindy Ann Kilgo

Join Date: Apr 2014

Posts: 1
#5

23 Apr 2014, 22:13

I'm having a similar problem. I used "generate(match)" as an option, but it generated four variables "match1" "match2" "match3" and "match4." This may be an elementary question, but I am curious what this means? I'm trying to find the number of cases it kept (or the number it kicked out). Using attnd in Stata12, it provided the number in the "treated" and "control" groups based on the matches it made -- but I am not able to find that using teffects psmatch.

Any help would be appreciated!
Thanks!
Cindy
Comment
Joe Canner

Join Date: Mar 2014

Posts: 580
#6

24 Apr 2014, 06:32

Cindy,

Stata 13 generates match variables according to the number of neighbors you requested in nneighbors(#). The values of these variables are the observation numbers of the matches. You will have to do a little work to reconstruct the total number of cases with matches and the number of controls used as matches.

Regards,
Joe
1 like
Comment
Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#7

24 Apr 2014, 09:49

Originally posted by Cindy Ann Kilgo View Post

I'm having a similar problem. I used "generate(match)" as an option, but it generated four variables "match1" "match2" "match3" and "match4." This may be an elementary question, but I am curious what this means?

If 4 variables were created then there is at least one observation that was matched with four other observations (the output in the Stata window should show it as "max = 4". Joe Canner has already mentioned that the new variables take on the value of observation numbers. nneighbor(#) is the minimum number of observations that will be matched to any one observation and it defaults to 1.

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
Comment
Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#8

24 Apr 2014, 10:15

Originally posted by Cindy Ann Kilgo View Post

II'm trying to find the number of cases it kept (or the number it kicked out).

Maybe you mean something like:

Code:

clear all set more off webuse cattaneo2 keep bweight mbsmoke mmarried mage fbaby medu teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu), gen(match) nneighbor(4) * counts egen cou = rownonmiss(match*) * just one example: quietly summarize cou if mbsmoke, meanonly display "number of controls (non smokers) matched with treated: " r(sum) quietly summarize cou if !mbsmoke, meanonly display "number of treated (smokers) matched with controls: " r(sum)

but I'm not sure.

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
Comment
Kevin

Join Date: Apr 2014

Posts: 6
#9

24 Apr 2014, 10:31

In Stata 12 I use psmatch2, it gives indications whether observations are on support and whether observations are matched. Usually, for nearest neighbor, replacement option is good (Angrist and Pischke 2008).

You can also try to use exact covariate matching by "ccmatch".

Reference: Angrist, J. D., & Pischke, J. S. (2008). Mostly harmless econometrics: An empiricist's companion. Princeton university press.
Comment
Kevin

Join Date: Apr 2014

Posts: 6
#10

24 Apr 2014, 10:34

Originally posted by jepearso View Post

All --

Please forgive me if this is an elementary question. I have searched the treatment effects manual and the internet in general for an answer to this question and have come up short. I suspect that the solution is obvious.

I am conducting a propensity score matching analysis in Stata 13 using teffects psmatch. My total N is 2123, with 672 "treated" and 1451 "control" individuals. When I run the teffects psmatch command, the output states that the "number of obs = 2123." I can't tell if that is before or after matching. I am currently running everything with the default settings, so nearest neighbor=1. If this is indeed what is happening and I have exactly one match/treated individual (and I keep all my treated individuals), then I should have a matched sample of 2*672.

Another way to ask my question is -- does effects psmatch generate a variable (or can I make Stata generate a variable) that indicates the "matched" sample?

Thanks in advance for your help!

Jenni

Also, be sure to do stuff like box plots and kernel density plots for before/after matching, and do a balance check. It is suggested by Guo and Fraser book on PSM. Here are some sections (with graphs and procedures) that you can follow when writing up a report: http://papers.ssrn.com/sol3/papers.c...act_id=2335669
1 like
Comment
Dano Hano

Join Date: Jan 2017

Posts: 3
#11

03 Jan 2017, 13:09

please I need help…. I am running teffects psmatch2 when the nearest neighbor==2, I am using the below codes to get the number of matched unites. However, I do not know how to adjust the belwo codes in case of having match1 and match2.

gen ob=_n //store the observation numbers for future use
save fulldata,replace // save the complete data set
keep if t // keep just the treated group
keep match1 // keep just the match1 variable (the observation numbers of their matches)
bysort match1: gen weight=_N // count how many times each control observation is a match
by match1: keep if _n==1 // keep just one row per control observation
ren match1 ob //rename for merging purposes
merge 1:m ob using fulldata // merge back into the full data
replace weight=1 if t // set weight to 1 for treated observations
Comment
John Orme

Join Date: Jul 2017

Posts: 6
#12

26 Jul 2017, 12:53

Hi All,

I’m very sorry, but I’m still confused about this issue. I’m using Stata 15 to estimate the following model the output from this model indicates that it is based on 1,879 observations (people) (some variables have missing data, but results of a logistic regression with the below treatment variable and covariates indicates a sample of 1,879):

teffects psmatch (PostMisdemeanorA_sum) (person_type sex race marital_status cAge PreFelSUM PreMisSUM), generate(match)

This generates two new variables, match1 and match2. The frequency distribution for match1 lists a sample size of 1,879 and I gather that this frequency distribution lists the number of matches for each case (this ranges from 1 to 10)? The frequency distribution for match2 lists one case id with a frequency of 1. Does this mean that all but one case had a match and, if so, why does the output from the teffects model list 1,879 observations?

Thanks for any help you can provide with this.

Warmest Regards,

John
Comment
Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 118
#13

26 Jul 2017, 15:07

Hi John,

The variables created by teffects psmatch, generate() do not contain any information about frequency of matches. Rather, they store the observation numbers from the matched neighbors. If in your case teffects psmatch, generate() created two new variables, this means that you have at least one case for which two matches were found (which in this case is due to a tie in the propensity score given that you used the default of matching one neighbor). teffects psmatch is not arbitrarily dropping or trimming any observations and uses all observations with valid data across the used variables, in your case N=1,879.

Joerg
Comment
John Orme

Join Date: Jul 2017

Posts: 6
#14

27 Jul 2017, 08:17

Many thanks, Joerg! John
Comment
Cathy Antonakos

Join Date: May 2014

Posts: 5
#15

14 Dec 2017, 13:36

I've read this post many times and am still uncertain about the proper n to report in a table with ATET results. The full sample N is listed in the output, but I specified a one-to-one match and think the N I should report is (2 x treatment group N). Why doesn't Stata provide the number of observations used for the ATET in the output directly? Am I missing something?
Thanks in advance.
Cathy
Comment

Announcement

teffects psmatch - how much of the data were used?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment