Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Treatment Effects Inverse Probability Weighting with Sample Weights

    Hello everyone,

    I have a question regarding the implementation of the "teffects ipw" command in combination with sample weights (pweights). Unfortunately my data is confidential so I can't post the actual data and results from Stata. I also tried to recreate the problem with made up data but didn't succed, so I will try to make the problem clear with an appropirate example. I use Stata 14 and make use of the inbuild teffects ipw command:

    The data structure would look something like this:
    id wage t sex age sample_weight
    1 100 1 0 23 100
    2 40 0 1 38 500
    3 60 0 0 45 250
    4 80 1 1 25 180
    Where id is the individual identifier, wage is the persons wage, t is a dummy indicating treatment (on the job training), sex is a dummy for gender, age is the age and sample_weight are the probability weights for an individual of being sampled.

    I want to estimate the average treatment effect of the treatment t, controlling for different distributions of the covariates in treatment and control group. I use inverse probability weighting for this, so my code would look something like this:

    Code:
    teffects ipw (wage) (t sex age), ate
    tebalance summarize

    If I run it like this, the IPW works fine and the covariates are very balanced between treatment and control group after reweighting through IPW.

    However, older people and man (sex==1) have a lower probability of being sample, so I should use the probability weights stored in "sample_weights". pweights are allowed with teffects, so I tried just using them in my treatment effect estimation i.e. I used:

    Code:
    teffects ipw (wage) (t sex age) [pweight=sample_weight], ate
    tebalance summarize
    However, now IPW doesn't work that well anymore since the covariates are not balanced anymore after reweighting through IPW, i.e. tebalance summarize still gives substantial differences in the means of the covariates in treatment and control group after reweighting.

    My question would be if anyone knows why this is and if there is a way to fix this?

    I tried recreating the teffects ipw command by hand, which is not that complicated at least if one is only interested in the treatment effect and not its standard error. The "hand-build" IPW would look like this:

    Code:
    *Create IPW weights
    . logit t sex age
    . predict ps
    . gen ipw_weight=1/ps if t==1
    . replace ipw_weight=1/(1-ps) if t==0
    
    *Normalize weights
    . sum ipw_weight
    . scalar ipw_weight=r(mean)
    . scalar n_sample=r(N)
    . gen ipw_weight_normalized=n_sample/(n_sample*ipw_weight) if t==1
    . replace ipw_weight_normalized=(n_sample*ipw_weight)/((n_sample*ipw_weight)) if t==0
    
    *Determine Treatment Wffect
    . reg wage t [pweight=ipw_weight_normalized]
    The coefficient on t is then the treatment effect estimated by teffects ipw. However, I was only able to recreate it for the unweighted case. I don't really know how the teffects ipw command deals with the additional pweights and couldn't find any documentation anywhere, e.g. if it uses them in the regression for the ipw_weights and the outcome regression or only in one of those. If anyone has an idea how to fix this, I would be very thankful. Thanks in advance. Sorry for the lohn post if anything remained unclear I happy to clarify,

    Julian

  • #2
    I've never used teffects, but this reference might be relevant to your question.

    DuGoff, E. H., Schuler, M., & Stuart, E. A. (2014). Generalizing observational study results: applying propensity score methods to complex surveys. Health services research, 49(1), 284-303.
    Last edited by Steve Samuels; 31 Jul 2017, 07:25.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Hi Julian,

      teffects ipw uses sampling weights for the propensity score model, and then the weight for computing the means of the outcome is essentially the product of the sampling weights and the inverse-probability weights. Here is an example where we replicate the point estimates from teffects ipw with sampling weights:

      Code:
      . * Example data:
      . webuse cattaneo2
      (Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)
      . 
      . * Fake weights:
      . set seed 123
      . gen double sweight = 1/runiform()
      . 
      . * -teffects ipw-:
      . teffects ipw (bweight) ///
      >              (mbsmoke foreign alcohol mage medu fage fedu) [pw=sweight]
      
      Iteration 0:   EE criterion =  1.358e-24  
      Iteration 1:   EE criterion =  2.188e-26  
      
      Treatment-effects estimation                    Number of obs     =      4,642
      Estimator      : inverse-probability weights
      Outcome model  : weighted mean
      Treatment model: logit
      ------------------------------------------------------------------------------
                   |               Robust
           bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      ATE          |
           mbsmoke |
          (smoker  |
               vs  |
       nonsmoker)  |  -298.2673   78.86414    -3.78   0.000    -452.8382   -143.6964
      -------------+----------------------------------------------------------------
      POmean       |
           mbsmoke |
        nonsmoker  |    3494.05   73.47651    47.55   0.000     3350.038    3638.061
      ------------------------------------------------------------------------------
      . 
      . * Inverse-probability weights:
      . logit mbsmoke foreign alcohol mage medu fage fedu [pw=sweight]
      
      Iteration 0:   log pseudolikelihood = -14443.681  
      Iteration 1:   log pseudolikelihood = -13636.387  
      Iteration 2:   log pseudolikelihood = -13316.782  
      Iteration 3:   log pseudolikelihood = -13314.851  
      Iteration 4:   log pseudolikelihood =  -13314.85  
      
      Logistic regression                             Number of obs     =      4,642
                                                      Wald chi2(6)      =      60.87
                                                      Prob > chi2       =     0.0000
      Log pseudolikelihood =  -13314.85               Pseudo R2         =     0.0782
      
      ------------------------------------------------------------------------------
                   |               Robust
           mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
           foreign |  -.9151093   .4612633    -1.98   0.047    -1.819169   -.0110498
           alcohol |   1.803091   .4144556     4.35   0.000     .9907725    2.615409
              mage |  -.0191375   .0270356    -0.71   0.479    -.0721263    .0338513
              medu |  -.1237135   .0780042    -1.59   0.113    -.2765989     .029172
              fage |  -.0459804   .0269683    -1.70   0.088    -.0988373    .0068765
              fedu |  -.0303248   .0620912    -0.49   0.625    -.1520212    .0913717
             _cons |   1.425061   .8745558     1.63   0.103    -.2890368    3.139159
      ------------------------------------------------------------------------------
      
      . predict double ps if e(sample)
      (option pr assumed; Pr(mbsmoke))
      
      . gen double ipw = 1.mbsmoke/ps + 0.mbsmoke/(1-ps)
      . 
      . * Weight for weighted -bweight- means:
      . gen double w = ipw*sweight
      . 
      . * ATE and POmean:
      . regress bweight i.mbsmoke [pw=w]
      (sum of wgt is   8.8130e+04)
      
      Linear regression                               Number of obs     =      4,642
                                                      F(1, 4640)        =      13.48
                                                      Prob > F          =     0.0002
                                                      R-squared         =     0.0778
                                                      Root MSE          =     513.29
      
      ------------------------------------------------------------------------------
                   |               Robust
           bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
           mbsmoke |
           smoker  |  -298.2673   81.22871    -3.67   0.000    -457.5142   -139.0204
             _cons |    3494.05   73.95511    47.25   0.000     3349.062    3639.037
      ------------------------------------------------------------------------------
      I hope this helps,

      Joerg

      Comment


      • #4
        Thank you for the reference Steve, its very helpful as a guideline when thinking about IPW and Sampling weights. @ Joerg, thanks for the clarification! So for the specific question of why the covariates are not balanced when using the sampling weights, could it be that the tebalance analyses the balancing for the sample at hand which might not be as good as before, because the IPW-weights (PS-scores) are estimated so as to balance the covariates in the population and not the sample at hand? I will try to replicate the tebalance command for the two cases and compare.

        Comment


        • #5
          Hello Julian,
          I also had the same issue when using teffects IPW on a complex survey with sample weights. Where you able to balance with covariates after using the sample weights?

          Comment

          Working...
          X