TEFFECTS with a binary outcome

Marianne Canonico

Join Date: Jan 2015

Posts: 1
#1

TEFFECTS with a binary outcome

16 Jan 2015, 04:23

Hello,
I am using STATA 13 to implement TEFFECTS to look at the relation between a Treatment (T, Binary) and a binary outcome (Y, yes/no) while adjusting for covariates (X). I have used the RA, AIPW and IPWRA approaches. Here is the code for AIPW :
teffects aipw (Y X, logit) (T X, logit), aequations ate
While the relation between X and Y is correctly modeled using a logistic regression and the relation between X and T is also modeled using a logistic regression, the ATE estimate seems to be the result of a linear regression; ie, the coefficient is very similar to what you get if you do a linear regression of Y on X and if I take the exponential of this coefficient, this is very different from an odds ratio.
Does anybody know how to model the relation between X and Y on a logistic scale, ie to obtain and odds ratio ? Am I missing an option in the syntax ?
Thank you in advance,
Marianne
Tags: None

1 like
Nicola Man

Join Date: Apr 2015

Posts: 1
#2

27 Apr 2015, 00:24

I am having the same problem with interpreting the logit coefficient in Stata 13. It gives exactly the same coefficient estimate when I used the default (linear) versus when I specified logit as the outcome model option. Is there a bug in the program?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2157
#3

27 Apr 2015, 05:23

I'm not finding that problem. I think you are misunderstanding the output. In all cases, the estimate of the ATE is reported. The logit functional form is already taken into account. So the difference in estimated logit functions for T = 0 and T = 1 is averaged across all i (or all treated i). The answers when you use different functional forms -- linear, logit, probit for Y -- are often very similar, especially with decent balance in the covariates. You do not need to do anything to the reported ATE, as the transformation has been done.

You will see that the coefficient estimates are very different for the linear and logit specifications -- as we expect. But as in many applications, the average partial effect (of with the ATE is a special case) is often very close across very different functional forms.
3 likes
Comment
Steve g

Join Date: May 2015

Posts: 3
#4

21 May 2015, 07:48

I am having the same problem as the first 2 people mentioned.

When using the most simple model with nothing apart from a binary outcome variable and a binary treatment variable, both assuming a logit model, the outcome is identical to running a simple linear regression of treatment on outcome. This can't be right? It's the same whether using psmatch, ipw ipwra, etc as well as ATE or ATET....
Comment

Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015
Posts: 217

21 May 2015, 15:38

Hello Steve and Marianne,

To illustrate Jeff's point, I provide a simulation. The potential outcomes and treatment are binary and the correct model specification is a probit. I show that we obtain consistent estimates of the ATE for all specifications for the regression adjustment and inverse probability weighted regression adjustments (you may want to try using my data to test other treatment effects estimators). Here is the code I used to generate my data.

Code:

. clear

. version 14

. set seed 111

. set obs 10000
number of observations (_N) was 0, now 10,000

. // Generating unobserved variables
. generate e0 = rnormal()

. generate e1 = rnormal()

. generate et = rnormal()

. // Generating regressors
. generate x1 = rbeta(2,3)

. generate x2 = rchi2(1)-1

. generate x3 = rpoisson(1)

. // Generating linear combinatins
. generate xb0 = .5  + .5*x1 - .5*x2

. generate xb1 = -.5 - .5*x1 + .5*x2

. generate xbt = -.5 + .5*x1 - .01*x3

. // Generating potential outcomes (probits)
. generate y0  = xb0 + e0 > 0

. generate y1  = xb1 + e1 > 0

. //Generate treatment (probit)
. generate t = xbt + et > 0

. // Generate outcome
. generate y = t*y1 + (1-t)*y0

. // Generating approximate treatment effect
. generate te = y1 - y0

Now I estimate the average treatment effects and compare them to the average of the variable te (which should be close to the true ATE).

Code:

. // Estimating ra
. quietly teffects ra (y x1 x2)(t)

. local linear = _b[ATE:r1vs0.t]

. quietly teffects ra (y x1 x2, probit)(t)

. local probit = _b[ATE:r1vs0.t]

. quietly teffects ra (y x1 x2, logit)(t)

. local logit  = _b[ATE:r1vs0.t]

. display "Estimated linear ATE `linear'"
Estimated linear ATE -.4759506338941534

. display "Estimated logit  ATE `logit'"
Estimated logit  ATE -.4755710260371921

. display "Estimated probit ATE `probit'"
Estimated probit ATE -.4751690891931658

. summarize te

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          te |     10,000      -.4782    .6845229         -1          1

. // Estimating ipwra
. quietly teffects ipwra (y x1 x2)(t x1 i.x3)

. local linear = _b[ATE:r1vs0.t]

. quietly teffects ipwra (y x1 x2, probit)(t x1 i.x3)

. local probit = _b[ATE:r1vs0.t]

. quietly teffects ipwra (y x1 x2, logit)(t x1 i.x3)

. local logit  = _b[ATE:r1vs0.t]

. display "Estimated linear ATE `linear'"
Estimated linear ATE -.4759833454993844

. display "Estimated logit  ATE `logit'"
Estimated logit  ATE -.4756452793994013

. display "Estimated probit ATE `probit'"
Estimated probit ATE -.4752342533276195

. summarize te

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          te |     10,000      -.4782    .6845229         -1          1

The estimates of the ATE are very close but not the same. As pointed out by Jeff.

WHY?

The estimate of the conditional expectation from the linear regression, an input for teffects, is a consistent estimator of the true conditional expectation eventhough the DGP is probit. This is one of the reason you get very similar results. In the case of the logit, average marginal effects from logit and probit are also very close. Below I illustrate this:

Code:

. quietly regress t x1 i.x3

. predict reg
(option xb assumed; fitted values)

. quietly probit  t x1 i.x3

. predict probit
(option pr assumed; Pr(t))

. quietly logit   t x1 i.x3

. predict logit
(option pr assumed; Pr(t))

. summarize reg probit logit

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         reg |     10,000       .3777    .0400418   .2052064   .4944153
      probit |     10,000    .3777151    .0399751    .210655   .4974573
       logit |     10,000       .3777     .040054   .2146426   .4985349

In sum, you should not be surprised by similar results in the ATEs which, depending on the structure of your data, might be almost identical.

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2157
#6

21 May 2015, 16:51

I'll add some final comments. In the case Steve is talking about, one will get numerically identical answers -- every time. That's because the regression is saturated. With a binary treatment and nothing else, one estimates the mean for each treatment category and takes the difference. The weighting will have no effect. It is something worth working through by hand.

Do the following with your example, Steve:

1. Run OLS of Y on 1, T and obtain the coefficient on T:

Code:

reg Y T

2. Do logit (or probit, or anything else) of Y on 1, i.T:

Code:

logit Y i.Y margins, dydx(T)

You will see the logit coefficients are very different from part (1). However, the effect given by the margins command will be identical to the OLS coefficient on T. This will always happen.

Even when you add covariates, and estimate separate equations, the ATE for a linear model and that for a logit (or probit) will often be quite similar.

JW
Comment
Steve g

Join Date: May 2015

Posts: 3
#7

22 May 2015, 03:43

Many thanks for your help Enrique and Jeff.

So maybe my problem is I'm not sure what the ATE means for a binary outcome!

I'll explain what I was hoping to do, which will maybe clarify things.

So I have some observational data where there were four distinct treatment groups. The outcome is binary, whether or not an event occurred for those patients. Then there are various covariates. The covariates are likely to be unbalanced for the treatment groups, which is why I'd like to use propensity weighting. Now what I was aiming to get out of this was an odds ratio for each of the three treatments in comparison to the 'control' group, where the weighting reduces the bias from the covariates. I still can't quite fathom what the ATE is actually giving me, but I thought that the ATET would give me what I wanted.

Sorry for my slowness and many thanks for your help!
Comment

Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015
Posts: 217

26 May 2015, 08:39

Dear Steve,

I believe the ATE gives you exactly what you want. The effect of the treatment on the outcome relative to a fixed level of the treatment. Odds ratios are just a transformation of the parameters of a logit regression that help you understand effects. But you already get an interpretation of this with the ATE. The ATET is the same as the ATE but if you were only interested in the counterfactual treatment effect for the treated individuals. Here is an example from the Stata help:

Code:

webuse cattaneo2, clear

. teffects ipw (lbweight) (msmoke mmarried c.mage##c.mage fbaby medu), atet

Iteration 0:   EE criterion =  2.441e-24  
Iteration 1:   EE criterion =  6.892e-33  

Treatment-effects estimation                    Number of obs     =      4,642
Estimator      : inverse-probability weights
Outcome model  : weighted mean
Treatment model: (multinomial) logit
-----------------------------------------------------------------------------------------
                        |               Robust
               lbweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
ATET                    |
                 msmoke |
(1-5 daily vs 0 daily)  |  -.0102034   .0171729    -0.59   0.552    -.0438617     .023455
           (6-10 daily  |
                    vs  |
              0 daily)  |   .0656168   .0194664     3.37   0.001     .0274633    .1037703
(11+ daily vs 0 daily)  |   .0288205   .0169358     1.70   0.089    -.0043731    .0620142
------------------------+----------------------------------------------------------------
POmean                  |
                 msmoke |
               0 daily  |   .0652034   .0058804    11.09   0.000      .053678    .0767287
-----------------------------------------------------------------------------------------

The outcome is a binary variable for low birth weight . The treatments are the number of cigarettes smoked. Notice that you are measuring the effect relative to no smoking. So if you smoke 1-5 the effect of birth weight the probability of having low birth weight does not increase, if you smoke 6-10 it increases .065 percentage points, if you smoke more than 11 probability of low birth weight increases .03 (only significant at 10% level). The probability of low birth weight if you do not smoke is .065. The change in probability is additive that is the probability of low birth weight if you smoke 6-10 cigarettes daily is (.0652 + .0656).

Last edited by Enrique Pinzon (StataCorp); 26 May 2015, 08:48.

Comment

Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 217
#9

26 May 2015, 12:32

Note:

I did not mean percentage points. It is percent which is the scale. Thus, .06 is 6 percent.
1 like
Comment
Steve g

Join Date: May 2015

Posts: 3
#10

29 May 2015, 04:53

Many thanks Enrique,

I think this is making sense now. Thanks for your patience.

With the following example I've tried to understand what teffects is doing once the propensity scores are estimated:

teffects ipw (outcome) (treat bmi age female), ate

quietly{
predict pscore, ps
predict p0 p1 p2 p3, ps

gen w = 1/p0 if treat == 0
replace w = 1/p1 if treat == 1
replace w = 1/p2 if treat == 2
replace w = 1/p3 if treat == 3

logit outcome i.treat [pw=w]
}
margins , dydx(treat)
regress outcome i.treat [pw=w]

This gives the following output:

. teffects ipw (outcome) (treat bmi age female), ate

Iteration 0: EE criterion = 9.191e-19
Iteration 1: EE criterion = 1.970e-34

Treatment-effects estimation Number of obs = 2060
Estimator : inverse-probability weights
Outcome model : weighted mean
Treatment model: (multinomial) logit
------------------------------------------------------------------------------
| Robust
outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE |
treat |
(1 vs 0) | .0279649 .01305 2.14 0.032 .0023874 .0535423
(2 vs 0) | -.0132727 .0090607 -1.46 0.143 -.0310313 .004486
(3 vs 0) | -.0031805 .0101336 -0.31 0.754 -.023042 .0166809
-------------+----------------------------------------------------------------
POmean |
treat |
0 | .0302919 .007016 4.32 0.000 .0165407 .0440431
------------------------------------------------------------------------------

.
. quietly{

. margins , dydx(treat)

Conditional marginal effects Number of obs = 2060
Model VCE : Robust

Expression : Pr(outcome), predict()
dy/dx w.r.t. : 1.treat 2.treat 3.treat

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treat |
1 | .0279649 .0131849 2.12 0.034 .0021229 .0538068
2 | -.0132727 .0091254 -1.45 0.146 -.0311581 .0046128
3 | -.0031805 .0102601 -0.31 0.757 -.0232899 .0169288
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. regress outcome i.treat [pw=w]
(sum of wgt is 8.2421e+03)

Linear regression Number of obs = 2060
F( 3, 2056) = 3.70
Prob > F = 0.0113
R-squared = 0.0073
Root MSE = .17855

------------------------------------------------------------------------------
| Robust
outcome | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treat |
1 | .0279649 .0131945 2.12 0.034 .0020889 .0538408
2 | -.0132727 .009132 -1.45 0.146 -.0311817 .0046364
3 | -.0031805 .0102675 -0.31 0.757 -.0233164 .0169553
|
_cons | .0302919 .0070895 4.27 0.000 .0163885 .0441953
------------------------------------------------------------------------------

Am I understanding correctly what is going on? i.e the treatment groups are being regressed on the outcome, weighted for the inverse of the propensity score?

I also modelled the same using logistic regression. The results are very similar as you said they would be. So is there any reason why logistic regression couldn't be used instead, and odds ratios presented?

Do you know why my standard errors are slightly different from those from teffects? This differential increases as more covariates are included.

Many thanks again for all of your help.
Steve
Comment

Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015
Posts: 217

#11

29 May 2015, 07:39

Dear Steve,

I think you have a great understanding of what is going on . Also, you pose two questions that are at the crux of teffects:

1. Why not use "equivalent" estimators ?
2. Why are the standard errors different ?

teffects uses gmm to compute the final results. Like any optimization routine, it takes starting values and then iterates to the solution. If you look at the iteration log you will find something like:

Code:

. webuse cattaneo2, clear
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)

. teffects ra (bweight medu i.alcohol)(mbsmoke)

Iteration 0:   EE criterion =  2.971e-25  
Iteration 1:   EE criterion =  5.193e-26  

Treatment-effects estimation                    Number of obs     =      4,642
Estimator      : regression adjustment
Outcome model  : linear
Treatment model: none
------------------------------------------------------------------------------
             |               Robust
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE          |
     mbsmoke |
    (smoker  |
         vs  |
 nonsmoker)  |  -251.7277   23.58388   -10.67   0.000    -297.9512   -205.5041
-------------+----------------------------------------------------------------
POmean       |
     mbsmoke |
  nonsmoker  |   3408.127   9.418437   361.86   0.000     3389.667    3426.587
------------------------------------------------------------------------------

It only iterates once. Why? teffects starts the GMM computation at the final "correct" point estimates. GMM is only used to compute standard errors. So you could get the same point estimates using different estimators (in my example: a regression for each treatment group, and the average of the difference between their predicted values. More on this later.). But even though your point estimates are correct, your standard errors are wrong. You do not take into account the randomness at each step of the estimation process. In your example you are not accounting for the randomness coming from the fact that the weights were estimated.

I will illustrate the point using my example:

Code:

. quietly regress bweight medu i.alcohol if mbsmoke==0

. predict xb0
(option xb assumed; fitted values)

. quietly regress bweight medu i.alcohol if mbsmoke==1

. predict xb1
(option xb assumed; fitted values)

. generate te = xb1 - xb0

. summarize te

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          te |      4,642   -251.7277     26.9427   -388.052  -146.8342

You get the same point estimate using summarize but the standard errors are not incorporating the randomness of the two regressions. GMM allows us to simultaneously estimate all the steps necessary to obtain treatment effects and incorporate the randomness between equations ("steps").

Comment

Joie_ray

Join Date: Dec 2014

Posts: 13
#12

25 Aug 2015, 01:09

Hi ,

Thanks for such a detailed explanation and discussion. This helped me understand teffects with binary treatment. But what about variables with multiple treatment (i.e. when using mlogit ) ? How does it work? Any insights would be really helpful !!

Thanks,
Joie

Last edited by Joie_ray; 25 Aug 2015, 01:10. Reason: further clarification on questions
Comment
Francis Clark

Join Date: Sep 2016

Posts: 15
#13

12 Jun 2017, 18:40

Originally posted by Jeff Wooldridge View Post

I'm not finding that problem. I think you are misunderstanding the output. In all cases, the estimate of the ATE is reported. The logit functional form is already taken into account. So the difference in estimated logit functions for T = 0 and T = 1 is averaged across all i (or all treated i). The answers when you use different functional forms -- linear, logit, probit for Y -- are often very similar, especially with decent balance in the covariates. You do not need to do anything to the reported ATE, as the transformation has been done.

You will see that the coefficient estimates are very different for the linear and logit specifications -- as we expect. But as in many applications, the average partial effect (of with the ATE is a special case) is often very close across very different functional forms.

Hello Jeff,

I'm also looking to get a better understanding of the main output (coefficient) when using teffects psmatch with a binary outcome (as well as a binary exposure). Would what you describe above be the equivalent of a "risk difference" (the well-known measure of effect in epidemiology)?

Thank you for the insights!
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2157
#14

12 Jun 2017, 21:01

Originally posted by Francis Clark View Post

Hello Jeff,

I'm also looking to get a better understanding of the main output (coefficient) when using teffects psmatch with a binary outcome (as well as a binary exposure). Would what you describe above be the equivalent of a "risk difference" (the well-known measure of effect in epidemiology)?

Thank you for the insights!

Yes, I believe it is the "risk difference" averaged across all units. It's the difference in the estimated probabilities under treatment and control for each unit averaged across all units.
Comment
Francis Clark

Join Date: Sep 2016

Posts: 15
#15

13 Jun 2017, 13:38

Thank you Jeff! Very clear.

Also, your "Treatment Effects" chapter (21) in your book "Econometric Analysis of Cross Section and Panel Data" (2nd edition) was very informative as well. A great resource.
Comment

Announcement