Treatment Effects Inverse Probability Weighting with Sample Weights

Julian Emmler

Join Date: Mar 2017

Posts: 7
#1

Treatment Effects Inverse Probability Weighting with Sample Weights

27 Jul 2017, 10:01

Hello everyone,

I have a question regarding the implementation of the "teffects ipw" command in combination with sample weights (pweights). Unfortunately my data is confidential so I can't post the actual data and results from Stata. I also tried to recreate the problem with made up data but didn't succed, so I will try to make the problem clear with an appropirate example. I use Stata 14 and make use of the inbuild teffects ipw command:

The data structure would look something like this:
id wage t sex age sample_weight

1 100 1 0 23 100

2 40 0 1 38 500

3 60 0 0 45 250

4 80 1 1 25 180

Where id is the individual identifier, wage is the persons wage, t is a dummy indicating treatment (on the job training), sex is a dummy for gender, age is the age and sample_weight are the probability weights for an individual of being sampled.

I want to estimate the average treatment effect of the treatment t, controlling for different distributions of the covariates in treatment and control group. I use inverse probability weighting for this, so my code would look something like this:

Code:

teffects ipw (wage) (t sex age), ate tebalance summarize

If I run it like this, the IPW works fine and the covariates are very balanced between treatment and control group after reweighting through IPW.

However, older people and man (sex==1) have a lower probability of being sample, so I should use the probability weights stored in "sample_weights". pweights are allowed with teffects, so I tried just using them in my treatment effect estimation i.e. I used:

Code:

teffects ipw (wage) (t sex age) [pweight=sample_weight], ate tebalance summarize

However, now IPW doesn't work that well anymore since the covariates are not balanced anymore after reweighting through IPW, i.e. tebalance summarize still gives substantial differences in the means of the covariates in treatment and control group after reweighting.

My question would be if anyone knows why this is and if there is a way to fix this?

I tried recreating the teffects ipw command by hand, which is not that complicated at least if one is only interested in the treatment effect and not its standard error. The "hand-build" IPW would look like this:

Code:

*Create IPW weights . logit t sex age . predict ps . gen ipw_weight=1/ps if t==1 . replace ipw_weight=1/(1-ps) if t==0 *Normalize weights . sum ipw_weight . scalar ipw_weight=r(mean) . scalar n_sample=r(N) . gen ipw_weight_normalized=n_sample/(n_sample*ipw_weight) if t==1 . replace ipw_weight_normalized=(n_sample*ipw_weight)/((n_sample*ipw_weight)) if t==0 *Determine Treatment Wffect . reg wage t [pweight=ipw_weight_normalized]

The coefficient on t is then the treatment effect estimated by teffects ipw. However, I was only able to recreate it for the unweighted case. I don't really know how the teffects ipw command deals with the additional pweights and couldn't find any documentation anywhere, e.g. if it uses them in the regression for the ipw_weights and the outcome regression or only in one of those. If anyone has an idea how to fix this, I would be very thankful. Thanks in advance. Sorry for the lohn post if anything remained unclear I happy to clarify,

Julian
Tags: None
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

31 Jul 2017, 07:19

I've never used teffects, but this reference might be relevant to your question.

DuGoff, E. H., Schuler, M., & Stuart, E. A. (2014). Generalizing observational study results: applying propensity score methods to complex surveys. Health services research, 49(1), 284-303.

Last edited by Steve Samuels; 31 Jul 2017, 07:25.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014
Posts: 116

01 Aug 2017, 16:24

Hi Julian,

teffects ipw uses sampling weights for the propensity score model, and then the weight for computing the means of the outcome is essentially the product of the sampling weights and the inverse-probability weights. Here is an example where we replicate the point estimates from teffects ipw with sampling weights:

Code:

. * Example data:
. webuse cattaneo2
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)
. 
. * Fake weights:
. set seed 123
. gen double sweight = 1/runiform()
. 
. * -teffects ipw-:
. teffects ipw (bweight) ///
>              (mbsmoke foreign alcohol mage medu fage fedu) [pw=sweight]

Iteration 0:   EE criterion =  1.358e-24  
Iteration 1:   EE criterion =  2.188e-26  

Treatment-effects estimation                    Number of obs     =      4,642
Estimator      : inverse-probability weights
Outcome model  : weighted mean
Treatment model: logit
------------------------------------------------------------------------------
             |               Robust
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE          |
     mbsmoke |
    (smoker  |
         vs  |
 nonsmoker)  |  -298.2673   78.86414    -3.78   0.000    -452.8382   -143.6964
-------------+----------------------------------------------------------------
POmean       |
     mbsmoke |
  nonsmoker  |    3494.05   73.47651    47.55   0.000     3350.038    3638.061
------------------------------------------------------------------------------
. 
. * Inverse-probability weights:
. logit mbsmoke foreign alcohol mage medu fage fedu [pw=sweight]

Iteration 0:   log pseudolikelihood = -14443.681  
Iteration 1:   log pseudolikelihood = -13636.387  
Iteration 2:   log pseudolikelihood = -13316.782  
Iteration 3:   log pseudolikelihood = -13314.851  
Iteration 4:   log pseudolikelihood =  -13314.85  

Logistic regression                             Number of obs     =      4,642
                                                Wald chi2(6)      =      60.87
                                                Prob > chi2       =     0.0000
Log pseudolikelihood =  -13314.85               Pseudo R2         =     0.0782

------------------------------------------------------------------------------
             |               Robust
     mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     foreign |  -.9151093   .4612633    -1.98   0.047    -1.819169   -.0110498
     alcohol |   1.803091   .4144556     4.35   0.000     .9907725    2.615409
        mage |  -.0191375   .0270356    -0.71   0.479    -.0721263    .0338513
        medu |  -.1237135   .0780042    -1.59   0.113    -.2765989     .029172
        fage |  -.0459804   .0269683    -1.70   0.088    -.0988373    .0068765
        fedu |  -.0303248   .0620912    -0.49   0.625    -.1520212    .0913717
       _cons |   1.425061   .8745558     1.63   0.103    -.2890368    3.139159
------------------------------------------------------------------------------

. predict double ps if e(sample)
(option pr assumed; Pr(mbsmoke))

. gen double ipw = 1.mbsmoke/ps + 0.mbsmoke/(1-ps)
. 
. * Weight for weighted -bweight- means:
. gen double w = ipw*sweight
. 
. * ATE and POmean:
. regress bweight i.mbsmoke [pw=w]
(sum of wgt is   8.8130e+04)

Linear regression                               Number of obs     =      4,642
                                                F(1, 4640)        =      13.48
                                                Prob > F          =     0.0002
                                                R-squared         =     0.0778
                                                Root MSE          =     513.29

------------------------------------------------------------------------------
             |               Robust
     bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     mbsmoke |
     smoker  |  -298.2673   81.22871    -3.67   0.000    -457.5142   -139.0204
       _cons |    3494.05   73.95511    47.25   0.000     3349.062    3639.037
------------------------------------------------------------------------------

I hope this helps,

Joerg

Comment

Julian Emmler

Join Date: Mar 2017

Posts: 7
#4

08 Aug 2017, 09:36

Thank you for the reference Steve, its very helpful as a guideline when thinking about IPW and Sampling weights. @ Joerg, thanks for the clarification! So for the specific question of why the covariates are not balanced when using the sampling weights, could it be that the tebalance analyses the balancing for the sample at hand which might not be as good as before, because the IPW-weights (PS-scores) are estimated so as to balance the covariates in the population and not the sample at hand? I will try to replicate the tebalance command for the two cases and compare.
Comment
Efehi Mena

Join Date: Jun 2019

Posts: 1
#5

31 Mar 2020, 20:41

Hello Julian,
I also had the same issue when using teffects IPW on a complex survey with sample weights. Where you able to balance with covariates after using the sample weights?
Comment

id	wage	t	sex	age	sample_weight
1	100	1	0	23	100
2	40	0	1	38	500
3	60	0	0	45	250
4	80	1	1	25	180

Announcement

Treatment Effects Inverse Probability Weighting with Sample Weights

Comment

Comment

Comment

Comment