teffects psmatch: stata stops running when using large samples

Anna Tanner

Join Date: Apr 2016

Posts: 7
#1

teffects psmatch: stata stops running when using large samples

28 Apr 2016, 07:08

Hi everyone,

I am using teffects psmatch in Stata/MP 13.0 for Windows (64-bitx86-64) on Windows 7 Professional (I have a main memory of 8 GB of which 7.66 GB is usable). My data set contains 940’465 observations and about 150-200 variables depending on the model specification. I used the command compress to reduce the amount of memory used by the data. As Stata stopped working when I used the whole data set, I tried it with smaller sample sizes: Stata still crashed with 42'165 observations but worked with 21'083 observations. Is there something I can do to make teffects psmatch run with larger sample sizes? If it worked with 164’000 observations, it would already be great.

I also tried the user written command psmatch2 byE. Leuven and B. Sianesi and it could handle 940’465 observations without problems. Unfortunately, the standard errors computed by psmatch2 are not reliable.

Kafuti Miller had a similar problem with teffects psmatch on the 12 April 2016 but nobody replied; here is his post: http://www.statalist.org/forums/foru...n-large-sample

I thank you a lot in advance for your help,
Anna
Tags: None
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 217
#2

28 Apr 2016, 08:35

Dear Anna,

An alternative is to try a couple of the parametric models that use the same treatment model you used (logit or probit) and a parametric model for the outcome. I would suggest the doubly robust teffects aipw and teffects ipwra. If you do this for a subset of your data, say 20,000 observations, and the treatment effects you get from these estimators and teffects psmatch are similar, it might be that the computational intensity of teffects psmatch, a semiparametric model, might be unnecessarily onerous.

Something to consider is that if you have a lot of categorical variables there might be numerous ties which increases computational time. We keep indices of all matches, this may consume a considerable amount of memory even for moderate sample sizes.

Another alternative, if all the variables your model are categorical (I have seen this a couple of times) is to contract your data and create a variable for the frequency weights with the option freq(newvar). You can then estimate your model on the reduced sample using frequency weights.
Comment
Anna Tanner

Join Date: Apr 2016

Posts: 7
#3

01 May 2016, 03:04

Dear Enrique,

Thank you so much for your quick and very helpful answer.

Unfortunately, I still have some problems. I analyse the gender wage gap in the whole service and industry sector as well as in the 24 different industries. I focus on the common support problem, which is quite severe. I’ve already done Ñopo matching (exact covariate matching, matches directly on discrete covariates) and I would like to compare the results with the ones from propensity score matching or other data balancing methods. As the unexplained gender wage gap resulting from Ñopo matching corresponds formally to the average treatment effect on the treated (ATET), applying teffects aipw makes no sense as it only allows to estimate average treatment effects in the population (ATE). Thus, I tried to estimate the ATET with teffects psmatch and teffects ipwra in a small industry with 8987 observations. All the covariates of the treatment and outcome model are categorical except three. I had someproblems with both commands:

Results from teffects psmatch:

Code:

teffects psmatch (mbls) (female $BASS_Ind4), atet osample(nooverlap)

As 8 women have a propensity score greater than 1 - 1.00e-05 (nooverlap is equal to 1 for 8 observations), I dropped them, ran the command again and plotted the estimated densities of the predicted probabilities of being female:

Code:

drop if nooverlap ==1 teffects psmatch (mbls) (female $BASS_Ind4), atet teffects overlap, ptlevel(1)

So far, everything is as expected. Now, when I define a caliper, problems arise. First, none of the tried caliper sizes worked; I even tried it with caliper 0.98:

Code:

teffects psmatch (mbls) (female $BASS_Ind4), atet caliper(0.98) osample(toofar)

Stata always reports:
no propensity-score matches for observation 1 within caliper #; this is not allowed.
r(459)

Second, the indicator variable toofar does not identify the observations for which no match is found within the defined caliper range so that I can not drop them (toofar is not equal to 1 for any observation). After deleting several first observations and still getting the same error message, I estimated the ATET with the psmatch2 command with the same sample (i.e. 8987 observations minus the 8 women with too high propensity scores):

Code:

psmatch2 female $BASS_Ind4, outcome(mbls) neighbor(1) caliper(0.01) logit ties

All observations are matched even with caliper(0.01) and the estimated ATET is the same as with teffects psmatch without caliper (-905.0962). psmatch2 with caliper(0.001) yields 295 observations off common support. To doublecheck if teffects psmatch and psmatch2 estimate really the same, I dropped these 295 obsevations and restimated the ATET with both commands without any caliper and got the same result.

What am I doing wrong when using teffects psmatch?

Then, I applied teffects ipwra on the original sample minus the 8 women with too high propensity scores. It could not converge; I stopped it after 148 iterations corresponding to about 15 minutes.

If I manage to apply teffects psmatch correctly, I could transform the continuous variables of my model into categorical ones and use the contract command you proposed to perform propensity score matching in larger industries. But does this procedure also work when using sample weights? I’ve just realised that in Stata 14 it’s possible to use sampling weights. The data I’m using is based on a complex random sampling (enterprises are dispatched in 1600 categories according size, business branch and major region). I did not get the stratification variable but the weighting variable, so that I’m not able to account for the double stratification but for the survey weights. As incorporating sample weights is now feasible, I think I have to do so because accounting for sampling weights affects the results considerably as far as I know!?

And one last question: do you think teffects psmatch would work with about 164’000 observations on a computer with 16 GB memory?

I thank you very much for your help,
Anna
Comment
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 217
#4

02 May 2016, 08:17

Hello Anna,

With regard to the caliper() option, my guess is that you have a set of observations between 1 - 1.00e-05 and .98 that have no counterpart (match) on your data set. One way to verify this is to run your propensity score model (logit or probit), predict and the sort the predicted values. Observations in the previously mentioned range are your "offenders". If this is not the case, if you can, please send us a copy of your data to technical support, we will try to identify any potential problem in the code.

With regard to the osample() option, I think it is working correctly. It only identifies violators of the overlap condition for the tolerance level 1.00e-05. You may modify this using the pstolerance() option and then it will identify all the observations within the new tolerance.

With regard to what you are doing with teffects psmatch and psmatch, I do not think you are doing anything incorrectly, but the calipers are a bit different (.98 and 1 - .01) maybe try .99 for teffects psmatch and and .01 for psmatch2 or .98 and .02. Again, if this does not work, if you can, please send us a copy of your data to technical support, we will try to identify any potential problem in the code.

With regard to teffects ipwra the fact that the model does not converge is a bit of a red flag with regard to the identification of your model (this is also true given the behavior of your propensity scores). I would suggest that you take a closer look at your propensity score model.

With regard to the weights, they are an important consideration but I do not know what is the appropriate thing to do is in your case. As of now teffects psmatch allows for frequency weights (this is true for 13 which is the version you have).

Finally, with regard to your last question it is a tricky one. It depends on your data but also on the number of matches, as I mentioned before. Here is a rough way of thinking about how much memory your data consumes, without considering the memory usage of the command:

http://www.stata.com/support/faqs/da...-dataset-size/
Comment
Anna Tanner

Join Date: Apr 2016

Posts: 7
#5

03 May 2016, 11:32

Hi Enrique,

Unfortunately, I am not allowed to attach a copy of my data because the data set is confidential. With regard to the caliper(#) option, I might have misunderstood the general functioning. As in the Stata manual is written that “caliper(#) specifies the maximum distance at which two observations are a potential match”, I thought that for example caliper(0.01) means that all observations with propensity scores which differ by 0.01 or less can be matched and observations with propensity scores whose difference exceeds 0.01 cannot be matched (e.g. a treated observation with propensity score 0.111 can be matched with a control observation with propensity score 0.119 but not with a control with a propensity score of 0.125). If my understanding of the caliper(#) option was right, one would expect that all observations are matched with caliper(0.98) and observations between 1 - 1.00e-05 and 0.98 shouldn’t be a problem (an observation with propensity score 0.99 could be matched to all observations with propensity scores between 0.01 and 1 - 1.00e-05). In psmatch2 the caliper option is described as value for maximum distance of controls used to perform nearest neighbor(s) within caliper, radius matching and Mahalanobis 1-to-1 matching. Therefore, I thought caliper(#) has the same meaning in teffects psmatch and psmatch2.
Concerning the osample(newvar) option, I understood that newvar should also identify observations without a counterpart within the range defined by caliper(#). From the manual: osample(newvar) specifies that indicator variable newvar be created to identify observations that violate the overlap assumption. Two checks are made to verify the assumption. The first ensures that the propensity scores are greater than pstolerance(#) and less than 1- pstolerance(#). The second ensures that each observation has at least nneighbor(#) matches in the opposite treatment group within the distance specified by caliper(#).

With regard to the sampling weights, I first thought that they can be used in all teffects commands in Stata 14 but, by comparing the manuals of Stata 13 and 14, I realized that the only thing that changed is that pweights are now allowed with teffects ipw and teffects ipwra. I’m sorry for the confusing question in my previous post. As my sample weights are noninteger, fweights do not work. I attach the sampling weights of 6 individuals of my data set: in the first column is the number of the firm the individual belongs to and in the second column is the sampling weight; individuals working in the same firm have the same sample weights.

Code:

10819749 10.118408 10819749 10.118408 10819855 1.3844484 10819855 1.3844484 10819855 1.3844484 10819855 1.3844484

Is there a possibility to use such sample weights with propensity score matching?

If not, is inverse-probability weighting or inverse-probability weighted regression adjustment better suited to analyse the gender wage gap if I would like to take the common support problem into account and at the same time get reliable estimates? That is, on the one hand, I want to avoid baised estimates due to differences in the supports of the empirical distributions of indvidual characteristics for men and women. The traditional Blinder-Oaxaca decomposition, for example, fails to recognize these gender differences in the supports. It is implicitly based on an “out-of-support assumption”: it assumes that the linear estimators of the earnings equations are also valid out of the supports of individual characteristics for which they were estimated. This often leads to an overestimation of the unexplained gender wage gap. On the other hand, it is mentioned in the Stata manual that inverse-probability weighting estimators become extremely unstable as the overlap assumption gets close to being violated and that in this case they produce erratic estimates. Do you think teffects ipw performs well if I exclude observations with propensity scores which are smaller than pstolerance(1e-5) or smaller than a tolerance level slightly larger than the default? I mean, are the estimated ATET reliable?

I tried teffects ipw in a small industry using the previous propensity score model, which I’m going to improve, and dropped the observations violating the default tolerance pstolerance(1e-5).

I think an ATET of -516 Swiss Francs is possible, but I’m still unsure.

And the overlap plot:

If it is impossible to use sampling weights with propensity score matching, is IPW or IPWRA more appropriate for my purpose? Or would you recommend none of the two methods?

Many, many thanks in advance!
Anna
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

03 May 2016, 16:41

For how to use sampling and IPW weights together, see:

DuGoff, Eva H, Megan Schuler, and Elizabeth A Stuart. 2014. Generalizing observational study results: applying propensity score methods to complex surveys. Health services research 49, no. 1: 284-303, downloadable at http://europepmc.org/articles/pmc3894255

However, there is reason to think that matching on propensity scores is a poor tactic. See: Why Propensity Scores Should Not Be Used for Matching

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 217
#7

06 May 2016, 07:28

Hello Anna,

With regard to the caliper() option from my understanding of psmatch2 the options have an opposite meaning. This would also explain what is going on with the osample option.

I do not know about the literature on propensity score matching and weights but Steve gave you a reference that is worth taking a look at.

With regard to which estimator to use I would recommend using the double robust estimators. Also, note that the overlap condition is a necessary condition for all treatment effects to work. If it fails there is something to think hard about and not just drop the observations that are "problematic". Although you will get estimates when doing this I think you should be careful when interpreting them. In a sense you are now working with a different sample and you lost some observations which might not be missing at random.

A final comment is with regard to the second paper Steve mentioned. This paper DOES NOT invalidate propensity score matching it just says that it might work poorly with some data used by researchers. It is more like a caveat emptor and in that sense it is an important contribution. This is not a theoretical paper (in terms of deriving statistical properties and providing proofs). In a similar fashion someone can write a paper showing the scenarios under which the estimator works well. The paper has phrases like "unfortunately, this proof, although mathematically correct, is either of little use or misleading when applied to real data ..." . It is more inductive than deductive reasoning, and should not be taken as a reason to disregard propensity score matching.
Comment
Anna Tanner

Join Date: Apr 2016

Posts: 7
#8

06 May 2016, 11:44

Thanks a lot to both of you!

DuGoff et al. recommend including the survey weight as a predictor in the propensity score model but the propensity score model does not need to be weighted (step 1). When using the propensity score to estimate the treatment effects (step 2), it is imperative to account for weights if one wants to make population-level inferences like me. However, from my understanding the authors match on the propensity score in a first step and run a survey-weighted outcome regression within the matched sample in a second step (p. 290: “When using propensity score matching, the effect estimate is generated from a survey-weighted regression that accounts for the complex survey design within the matched sample." p. 300: “When matching, the outcome regression is conducted within the matched data.”). Thus, it is not the same like the semiparametric propensity score matching implemented in teffects psmatch and psmatch2 I intend to do. In addition, the authors do part of their analysis in R. Stas Kolenikov points out in his post about the DuGoff et al. paper that “matching in one package and estimating treatment effect in another package makes it impossible to produce the right standard errors, as they don't account for uncertainty due to externally performed matching” (http://www.statalist.org/forums/foru...pensity-scores, the additional references he gives do not deal with matching).

In the help file of psmatch2 it is written “As far as we know it's not really clear in the literature how to accommodate sample weights in the context of matching. If you are aware how to properly account for sampling weights, please let us know. In the meantime, here are some thoughts you might want to take into consideration [..]” Then, they propose a way how one could use sampling weights with psmatch2. However, the standard errors from psmatch2 are not reliable.

So my first question is: Is it important to use sampling weights when performing semiparametric propensity score matching?

If yes, does someone know if there is a way to use my kinds of weights with the teffects psmatch command (an example of my sampling weights is in my previous post)?

Thanks for helping me!
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#9

09 May 2016, 16:20

To answer your first question: It is important to use the sampling weights. The simulations summarized in Figure 2 (p. 293) of the DuGoff article show the inferiority of any estimator of ATT that ignores them. The same figure shows that the bias of the matching estimator with sampling weights, though small, is twice the bias of the weighted regression estimator with weights equal to the product of the sampling and propensity weights. I don't have an answer to your question about psmatch2.

Last edited by Steve Samuels; 09 May 2016, 17:08.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Anna Tanner

Join Date: Apr 2016

Posts: 7
#10

17 May 2016, 06:23

Thank you very much Steve! If it is not possible to use noninteger weights with teffects psmatch, I try teffects ipw and teffects ipwra which allow using pweights.
Comment
Ondrej Dvoulety

Join Date: Jul 2017

Posts: 23
#11

16 May 2018, 11:57

Dear colleagues,

thank you very much for this discussion - I have tried to read it carefully, however it seems that it still does not fix my own problem.

I work with a STATA 14 on the estimation of the ATET. I have population data (N=900,000) and for the sake of transparency, I want to keep the large sample. I get estimates with psmatch2 for PSM and kelner matching (after a day), however I can never get estimates for NN (command teffects nn match). The strange thing is that if I make the sample smaller, the estimates work.
Basically, I have the following questions:

1) Is there any way how to use different command to estimate teffects nn match?
2) Do you have any idea, how to estimate this effect with teffects nn match (I have been working in this issue really a lot of days and I have no clue, how to overcome this).

Many thanks for your replies in advance.
Comment
Peter Howard-Jones

Join Date: Oct 2016

Posts: 5
#12

17 Apr 2019, 05:14

I am using IPWRA and have tried to use osample to identify the overlap violation but am getting a command unrecognised response. My university IT dept cannot solve the problem. Can somebody help please.
Comment

Announcement

teffects psmatch: stata stops running when using large samples

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment