Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • TEFFECTS with a binary outcome

    Hello,
    I am using STATA 13 to implement TEFFECTS to look at the relation between a Treatment (T, Binary) and a binary outcome (Y, yes/no) while adjusting for covariates (X). I have used the RA, AIPW and IPWRA approaches. Here is the code for AIPW :
    teffects aipw (Y X, logit) (T X, logit), aequations ate
    While the relation between X and Y is correctly modeled using a logistic regression and the relation between X and T is also modeled using a logistic regression, the ATE estimate seems to be the result of a linear regression; ie, the coefficient is very similar to what you get if you do a linear regression of Y on X and if I take the exponential of this coefficient, this is very different from an odds ratio.
    Does anybody know how to model the relation between X and Y on a logistic scale, ie to obtain and odds ratio ? Am I missing an option in the syntax ?
    Thank you in advance,
    Marianne


  • #2
    I am having the same problem with interpreting the logit coefficient in Stata 13. It gives exactly the same coefficient estimate when I used the default (linear) versus when I specified logit as the outcome model option. Is there a bug in the program?

    Comment


    • #3
      I'm not finding that problem. I think you are misunderstanding the output. In all cases, the estimate of the ATE is reported. The logit functional form is already taken into account. So the difference in estimated logit functions for T = 0 and T = 1 is averaged across all i (or all treated i). The answers when you use different functional forms -- linear, logit, probit for Y -- are often very similar, especially with decent balance in the covariates. You do not need to do anything to the reported ATE, as the transformation has been done.

      You will see that the coefficient estimates are very different for the linear and logit specifications -- as we expect. But as in many applications, the average partial effect (of with the ATE is a special case) is often very close across very different functional forms.

      Comment


      • #4
        I am having the same problem as the first 2 people mentioned.

        When using the most simple model with nothing apart from a binary outcome variable and a binary treatment variable, both assuming a logit model, the outcome is identical to running a simple linear regression of treatment on outcome. This can't be right? It's the same whether using psmatch, ipw ipwra, etc as well as ATE or ATET....

        Comment


        • #5
          Hello Steve and Marianne,

          To illustrate Jeff's point, I provide a simulation. The potential outcomes and treatment are binary and the correct model specification is a probit. I show that we obtain consistent estimates of the ATE for all specifications for the regression adjustment and inverse probability weighted regression adjustments (you may want to try using my data to test other treatment effects estimators). Here is the code I used to generate my data.

          Code:
          . clear
          
          . version 14
          
          . set seed 111
          
          . set obs 10000
          number of observations (_N) was 0, now 10,000
          
          . // Generating unobserved variables
          . generate e0 = rnormal()
          
          . generate e1 = rnormal()
          
          . generate et = rnormal()
          
          . // Generating regressors
          . generate x1 = rbeta(2,3)
          
          . generate x2 = rchi2(1)-1
          
          . generate x3 = rpoisson(1)
          
          . // Generating linear combinatins
          . generate xb0 = .5  + .5*x1 - .5*x2
          
          . generate xb1 = -.5 - .5*x1 + .5*x2
          
          . generate xbt = -.5 + .5*x1 - .01*x3
          
          . // Generating potential outcomes (probits)
          . generate y0  = xb0 + e0 > 0
          
          . generate y1  = xb1 + e1 > 0
          
          . //Generate treatment (probit)
          . generate t = xbt + et > 0
          
          . // Generate outcome
          . generate y = t*y1 + (1-t)*y0
          
          . // Generating approximate treatment effect
          . generate te = y1 - y0

          Now I estimate the average treatment effects and compare them to the average of the variable te (which should be close to the true ATE).

          Code:
          . // Estimating ra
          . quietly teffects ra (y x1 x2)(t)
          
          . local linear = _b[ATE:r1vs0.t]
          
          . quietly teffects ra (y x1 x2, probit)(t)
          
          . local probit = _b[ATE:r1vs0.t]
          
          . quietly teffects ra (y x1 x2, logit)(t)
          
          . local logit  = _b[ATE:r1vs0.t]
          
          . display "Estimated linear ATE `linear'"
          Estimated linear ATE -.4759506338941534
          
          . display "Estimated logit  ATE `logit'"
          Estimated logit  ATE -.4755710260371921
          
          . display "Estimated probit ATE `probit'"
          Estimated probit ATE -.4751690891931658
          
          . summarize te
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                    te |     10,000      -.4782    .6845229         -1          1
          
          . // Estimating ipwra
          . quietly teffects ipwra (y x1 x2)(t x1 i.x3)
          
          . local linear = _b[ATE:r1vs0.t]
          
          . quietly teffects ipwra (y x1 x2, probit)(t x1 i.x3)
          
          . local probit = _b[ATE:r1vs0.t]
          
          . quietly teffects ipwra (y x1 x2, logit)(t x1 i.x3)
          
          . local logit  = _b[ATE:r1vs0.t]
          
          . display "Estimated linear ATE `linear'"
          Estimated linear ATE -.4759833454993844
          
          . display "Estimated logit  ATE `logit'"
          Estimated logit  ATE -.4756452793994013
          
          . display "Estimated probit ATE `probit'"
          Estimated probit ATE -.4752342533276195
          
          . summarize te
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                    te |     10,000      -.4782    .6845229         -1          1
          The estimates of the ATE are very close but not the same. As pointed out by Jeff.

          WHY?

          The estimate of the conditional expectation from the linear regression, an input for teffects, is a consistent estimator of the true conditional expectation eventhough the DGP is probit. This is one of the reason you get very similar results. In the case of the logit, average marginal effects from logit and probit are also very close. Below I illustrate this:

          Code:
          . quietly regress t x1 i.x3
          
          . predict reg
          (option xb assumed; fitted values)
          
          . quietly probit  t x1 i.x3
          
          . predict probit
          (option pr assumed; Pr(t))
          
          . quietly logit   t x1 i.x3
          
          . predict logit
          (option pr assumed; Pr(t))
          
          . summarize reg probit logit
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                   reg |     10,000       .3777    .0400418   .2052064   .4944153
                probit |     10,000    .3777151    .0399751    .210655   .4974573
                 logit |     10,000       .3777     .040054   .2146426   .4985349
          In sum, you should not be surprised by similar results in the ATEs which, depending on the structure of your data, might be almost identical.

          Comment


          • #6
            I'll add some final comments. In the case Steve is talking about, one will get numerically identical answers -- every time. That's because the regression is saturated. With a binary treatment and nothing else, one estimates the mean for each treatment category and takes the difference. The weighting will have no effect. It is something worth working through by hand.

            Do the following with your example, Steve:

            1. Run OLS of Y on 1, T and obtain the coefficient on T:

            Code:
            reg Y T
            2. Do logit (or probit, or anything else) of Y on 1, i.T:

            Code:
            logit Y i.Y
            margins, dydx(T)
            You will see the logit coefficients are very different from part (1). However, the effect given by the margins command will be identical to the OLS coefficient on T. This will always happen.

            Even when you add covariates, and estimate separate equations, the ATE for a linear model and that for a logit (or probit) will often be quite similar.

            JW

            Comment


            • #7
              Many thanks for your help Enrique and Jeff.

              So maybe my problem is I'm not sure what the ATE means for a binary outcome!

              I'll explain what I was hoping to do, which will maybe clarify things.

              So I have some observational data where there were four distinct treatment groups. The outcome is binary, whether or not an event occurred for those patients. Then there are various covariates. The covariates are likely to be unbalanced for the treatment groups, which is why I'd like to use propensity weighting. Now what I was aiming to get out of this was an odds ratio for each of the three treatments in comparison to the 'control' group, where the weighting reduces the bias from the covariates. I still can't quite fathom what the ATE is actually giving me, but I thought that the ATET would give me what I wanted.

              Sorry for my slowness and many thanks for your help!

              Comment


              • #8
                Dear Steve,

                I believe the ATE gives you exactly what you want. The effect of the treatment on the outcome relative to a fixed level of the treatment. Odds ratios are just a transformation of the parameters of a logit regression that help you understand effects. But you already get an interpretation of this with the ATE. The ATET is the same as the ATE but if you were only interested in the counterfactual treatment effect for the treated individuals. Here is an example from the Stata help:

                Code:
                webuse cattaneo2, clear
                
                . teffects ipw (lbweight) (msmoke mmarried c.mage##c.mage fbaby medu), atet
                
                Iteration 0:   EE criterion =  2.441e-24  
                Iteration 1:   EE criterion =  6.892e-33  
                
                Treatment-effects estimation                    Number of obs     =      4,642
                Estimator      : inverse-probability weights
                Outcome model  : weighted mean
                Treatment model: (multinomial) logit
                -----------------------------------------------------------------------------------------
                                        |               Robust
                               lbweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                ------------------------+----------------------------------------------------------------
                ATET                    |
                                 msmoke |
                (1-5 daily vs 0 daily)  |  -.0102034   .0171729    -0.59   0.552    -.0438617     .023455
                           (6-10 daily  |
                                    vs  |
                              0 daily)  |   .0656168   .0194664     3.37   0.001     .0274633    .1037703
                (11+ daily vs 0 daily)  |   .0288205   .0169358     1.70   0.089    -.0043731    .0620142
                ------------------------+----------------------------------------------------------------
                POmean                  |
                                 msmoke |
                               0 daily  |   .0652034   .0058804    11.09   0.000      .053678    .0767287
                -----------------------------------------------------------------------------------------
                The outcome is a binary variable for low birth weight . The treatments are the number of cigarettes smoked. Notice that you are measuring the effect relative to no smoking. So if you smoke 1-5 the effect of birth weight the probability of having low birth weight does not increase, if you smoke 6-10 it increases .065 percentage points, if you smoke more than 11 probability of low birth weight increases .03 (only significant at 10% level). The probability of low birth weight if you do not smoke is .065. The change in probability is additive that is the probability of low birth weight if you smoke 6-10 cigarettes daily is (.0652 + .0656).
                Last edited by Enrique Pinzon (StataCorp); 26 May 2015, 08:48.

                Comment


                • #9
                  Note:

                  I did not mean percentage points. It is percent which is the scale. Thus, .06 is 6 percent.

                  Comment


                  • #10
                    Many thanks Enrique,

                    I think this is making sense now. Thanks for your patience.

                    With the following example I've tried to understand what teffects is doing once the propensity scores are estimated:

                    ​teffects ipw (outcome) (treat bmi age female), ate

                    quietly{
                    predict pscore, ps
                    predict p0 p1 p2 p3, ps

                    gen w = 1/p0 if treat == 0
                    replace w = 1/p1 if treat == 1
                    replace w = 1/p2 if treat == 2
                    replace w = 1/p3 if treat == 3

                    logit outcome i.treat [pw=w]
                    }
                    margins , dydx(treat)
                    regress outcome i.treat [pw=w]


                    This gives the following output:


                    . teffects ipw (outcome) (treat bmi age female), ate

                    Iteration 0: EE criterion = 9.191e-19
                    Iteration 1: EE criterion = 1.970e-34

                    Treatment-effects estimation Number of obs = 2060
                    Estimator : inverse-probability weights
                    Outcome model : weighted mean
                    Treatment model: (multinomial) logit
                    ------------------------------------------------------------------------------
                    | Robust
                    outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                    ATE |
                    treat |
                    (1 vs 0) | .0279649 .01305 2.14 0.032 .0023874 .0535423
                    (2 vs 0) | -.0132727 .0090607 -1.46 0.143 -.0310313 .004486
                    (3 vs 0) | -.0031805 .0101336 -0.31 0.754 -.023042 .0166809
                    -------------+----------------------------------------------------------------
                    POmean |
                    treat |
                    0 | .0302919 .007016 4.32 0.000 .0165407 .0440431
                    ------------------------------------------------------------------------------

                    .
                    . quietly{

                    . margins , dydx(treat)

                    Conditional marginal effects Number of obs = 2060
                    Model VCE : Robust

                    Expression : Pr(outcome), predict()
                    dy/dx w.r.t. : 1.treat 2.treat 3.treat

                    ------------------------------------------------------------------------------
                    | Delta-method
                    | dy/dx Std. Err. z P>|z| [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                    treat |
                    1 | .0279649 .0131849 2.12 0.034 .0021229 .0538068
                    2 | -.0132727 .0091254 -1.45 0.146 -.0311581 .0046128
                    3 | -.0031805 .0102601 -0.31 0.757 -.0232899 .0169288
                    ------------------------------------------------------------------------------
                    Note: dy/dx for factor levels is the discrete change from the base level.

                    . regress outcome i.treat [pw=w]
                    (sum of wgt is 8.2421e+03)

                    Linear regression Number of obs = 2060
                    F( 3, 2056) = 3.70
                    Prob > F = 0.0113
                    R-squared = 0.0073
                    Root MSE = .17855

                    ------------------------------------------------------------------------------
                    | Robust
                    outcome | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                    treat |
                    1 | .0279649 .0131945 2.12 0.034 .0020889 .0538408
                    2 | -.0132727 .009132 -1.45 0.146 -.0311817 .0046364
                    3 | -.0031805 .0102675 -0.31 0.757 -.0233164 .0169553
                    |
                    _cons | .0302919 .0070895 4.27 0.000 .0163885 .0441953
                    ------------------------------------------------------------------------------



                    Am I understanding correctly what is going on? i.e the treatment groups are being regressed on the outcome, weighted for the inverse of the propensity score?

                    I also modelled the same using logistic regression. The results are very similar as you said they would be. So is there any reason why logistic regression couldn't be used instead, and odds ratios presented?

                    Do you know why my standard errors are slightly different from those from teffects? This differential increases as more covariates are included.

                    Many thanks again for all of your help.
                    Steve

                    Comment


                    • #11
                      Dear Steve,

                      I think you have a great understanding of what is going on . Also, you pose two questions that are at the crux of teffects:

                      1. Why not use "equivalent" estimators ?
                      2. Why are the standard errors different ?

                      teffects uses gmm to compute the final results. Like any optimization routine, it takes starting values and then iterates to the solution. If you look at the iteration log you will find something like:

                      Code:
                      . webuse cattaneo2, clear
                      (Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)
                      
                      . teffects ra (bweight medu i.alcohol)(mbsmoke)
                      
                      Iteration 0:   EE criterion =  2.971e-25  
                      Iteration 1:   EE criterion =  5.193e-26  
                      
                      Treatment-effects estimation                    Number of obs     =      4,642
                      Estimator      : regression adjustment
                      Outcome model  : linear
                      Treatment model: none
                      ------------------------------------------------------------------------------
                                   |               Robust
                           bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                      ATE          |
                           mbsmoke |
                          (smoker  |
                               vs  |
                       nonsmoker)  |  -251.7277   23.58388   -10.67   0.000    -297.9512   -205.5041
                      -------------+----------------------------------------------------------------
                      POmean       |
                           mbsmoke |
                        nonsmoker  |   3408.127   9.418437   361.86   0.000     3389.667    3426.587
                      ------------------------------------------------------------------------------
                      It only iterates once. Why? teffects starts the GMM computation at the final "correct" point estimates. GMM is only used to compute standard errors. So you could get the same point estimates using different estimators (in my example: a regression for each treatment group, and the average of the difference between their predicted values. More on this later.). But even though your point estimates are correct, your standard errors are wrong. You do not take into account the randomness at each step of the estimation process. In your example you are not accounting for the randomness coming from the fact that the weights were estimated.

                      I will illustrate the point using my example:

                      Code:
                      . quietly regress bweight medu i.alcohol if mbsmoke==0
                      
                      . predict xb0
                      (option xb assumed; fitted values)
                      
                      . quietly regress bweight medu i.alcohol if mbsmoke==1
                      
                      . predict xb1
                      (option xb assumed; fitted values)
                      
                      . generate te = xb1 - xb0
                      
                      . summarize te
                      
                          Variable |        Obs        Mean    Std. Dev.       Min        Max
                      -------------+---------------------------------------------------------
                                te |      4,642   -251.7277     26.9427   -388.052  -146.8342
                      You get the same point estimate using summarize but the standard errors are not incorporating the randomness of the two regressions. GMM allows us to simultaneously estimate all the steps necessary to obtain treatment effects and incorporate the randomness between equations ("steps").

                      Comment


                      • #12
                        Hi ,

                        Thanks for such a detailed explanation and discussion. This helped me understand teffects with binary treatment. But what about variables with multiple treatment (i.e. when using mlogit ) ? How does it work? Any insights would be really helpful !!

                        Thanks,
                        Joie
                        Last edited by Joie_ray; 25 Aug 2015, 01:10. Reason: further clarification on questions

                        Comment


                        • #13
                          Originally posted by Jeff Wooldridge View Post
                          I'm not finding that problem. I think you are misunderstanding the output. In all cases, the estimate of the ATE is reported. The logit functional form is already taken into account. So the difference in estimated logit functions for T = 0 and T = 1 is averaged across all i (or all treated i). The answers when you use different functional forms -- linear, logit, probit for Y -- are often very similar, especially with decent balance in the covariates. You do not need to do anything to the reported ATE, as the transformation has been done.

                          You will see that the coefficient estimates are very different for the linear and logit specifications -- as we expect. But as in many applications, the average partial effect (of with the ATE is a special case) is often very close across very different functional forms.
                          Hello Jeff,

                          I'm also looking to get a better understanding of the main output (coefficient) when using teffects psmatch with a binary outcome (as well as a binary exposure). Would what you describe above be the equivalent of a "risk difference" (the well-known measure of effect in epidemiology)?

                          Thank you for the insights!

                          Comment


                          • #14
                            Originally posted by Francis Clark View Post

                            Hello Jeff,

                            I'm also looking to get a better understanding of the main output (coefficient) when using teffects psmatch with a binary outcome (as well as a binary exposure). Would what you describe above be the equivalent of a "risk difference" (the well-known measure of effect in epidemiology)?

                            Thank you for the insights!
                            Yes, I believe it is the "risk difference" averaged across all units. It's the difference in the estimated probabilities under treatment and control for each unit averaged across all units.

                            Comment


                            • #15
                              Thank you Jeff! Very clear.

                              Also, your "Treatment Effects" chapter (21) in your book "Econometric Analysis of Cross Section and Panel Data" (2nd edition) was very informative as well. A great resource.

                              Comment

                              Working...
                              X