Marginal Effects in Probit model for a Log-Transformed Variable

Pedro Adelfos

Join Date: Mar 2015

Posts: 2
#1

Marginal Effects in Probit model for a Log-Transformed Variable

03 Mar 2015, 09:54

Hi,

I am estimating a probit model in which the some variables are in logs. I would like to report the marginal effects, therefore I have used the command - margins -
margins, dydx(*) atmeans -----> For Marginal Effects at Means (MEM)
margins, dydx(*) -----> For Average Marginal Effects (AME)

I don't know how to interpret the marginal effects reported by Stata.
If the marginal effect of the logs-transformed variable is 0.0729 after (MEM), how can I interpret this?
- A 1% increase in the log transformed variable increases the probability of success in a 7.29 percent points. Am I right?

Or is it better to run the probit model with the original variables and then use - margins, eyex(original variable) ?

Thanks a lot,
Pedro
Tags: None
Cesar Augusto

Join Date: Nov 2015

Posts: 7
#2

14 Apr 2016, 12:42

Pedro,

Did you ever get an answer to this question from someone via private message or another source?

Thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#3

14 Apr 2016, 13:28

Not sure why #1 never got an answer, at least not publicly.

Suppose your outcome variable is y, and your predictor variable is x, but, for whatever reason, you choose to use log x as the predictor in the model and run this:

Code:

gen log_x = log(x) probit y log_x other_variables margins, dydx(*) atmeans

And suppose the margin for log_x is 0.0729.

This means that a difference of 1 in log x (not 1%, nor 1 percentage point: logarithms are dimensionless) is associated with an increase of 0.0729 in the probability of y = 1. So, if the "baseline" probability is, say 0.05, an increase of 1 in log x is associated to an expected probability of 0.1229. Note that a difference of 1 in log x, when viewed from the perspective of x itself, means x being multiplied by 2.71828..., which is a roughly 172% increase in x.
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#4

14 Apr 2016, 14:09

The user-written -mcp- command (available from SSC) has some nice ways of plotting y vs log transformed variables, which may help with interpretation. See section 5 of

http://www.stata-journal.com/sjpdf.h...iclenum=gr0056

The section starts "Suppose the relationship between the response variable, y, and x is log linear. Such a situation is not uncommon. We wish to model E (y) as a linear function of log x, and we want to graph the relationship on the original scale of x, not the scale of log x."

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Kerstin Schmidt

Join Date: Apr 2017

Posts: 120
#5

17 May 2017, 00:38

Clyde, concerning "which is a roughly 172% increase in x":
How do you come to 172%?

Thanks!
Comment
John Mullahy

Join Date: Dec 2016

Posts: 752
#6

17 May 2017, 07:14

Pedro: Recall from calculus that df(x) / dln(x) = x * df(x) / dx. So if you divide your estimated marginal effects (based on log-x) by x you will get df(x)/dx. But this should be done observation-by-observation, not based on average-x's. Here's the basic idea (inelegantly programmed):

Code:

gen lnx=ln(x) probit y lnx predict xb, xb matrix b=e(b) matrix b1=b[1,1] gen dpydx=normalden(xb)*trace(b1)/x

x must be positive for ln(x) to be defined, so dividing by x shouldn't be a problem.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#7

17 May 2017, 12:38

Re #5: As noted, an increase of 1 in log x corresponds to multiplying x by e = 2.71828... So the absolute change in x is 2.71828...*x - x, which simplifies to 1.71828...*x. Putting that in percentage terms, its a 171.828...% change in x, which I rounded to 172%.
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 752
#8

17 May 2017, 12:51

Re: #6, this is a little less inelegant

Code:

gen lnx=ln(x) probit y lnx predict xb, xb matrix b=e(b) scalar b1=b[1,1] gen dn=normalden(xb) gen dpydx=dn*b1/x
Comment
Marcel Campion

Join Date: Feb 2017

Posts: 30
#9

05 Feb 2018, 06:52

Hi STATA users,

Very interesting discussion! I am having similar issue of interpretation with log transformed variables. I am running a linear probability model and my variable of interest is log transformed. (Log transformation of a distance).

In STATA 14.1 I run the following regression:

Code:

regress inorganic /// organic lndistance rainfall_06 /// Livestock share plot_twi /// i.culture i.year i.inside_zone i.zone i.culture i.ms00q11 i.zone /// if culture < 99 /// , vce(cluster grappe)

Code:

margins, dydx(lndistance)

Here is the outcome

Average marginal effects Number of obs = 6,374
Model VCE : Robust

Expression : Linear prediction, predict()
dy/dx w.r.t. : lndistance

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lndistance | -.0964674 .03403 -2.83 0.005 -.1636933 -.0292414
------------------------------------------------------------------------------

My baseline probability is 0.17. So if I understood well Clyde's comment I should interpret my result as: A difference in 1 of the log distance is associated with a decrease of 0.09 in the probability of Y=1. My baseline probability being 0.1786, an increase in 1 of the log x is associated to an expected probability of 0.08%.

In other words, since before being log transformed the average distance in my sample is 35.60 km, an increase in 1 in log(distance) equivalent to an increase in 25.6km (35,6*1,718) decreases the probability of Y=1 by 9%.

I hope my table can be seen clearly on the forum and that my question about interpretation makes sense to you.

Best,
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#10

05 Feb 2018, 08:56

I agree with your interpretations in #9 until you get to the end. First, there is an arithmetic problem: 35.6*1.718 is not 25.6. Next, 1.718 is not the correct factor to multiply by. A unit increase in ln(distance) corresponds to multiplying distance by
e = 2.718. So if the baseline distance is 35.60, the other distance is 35.60 * 2.718, which is 96.8 km (approximately).

Also, the probability of Y = 1 decrease by 9 percentage points, not 9%. A 9% decrease of a baseline of 0.17 would bring you to 0.17*(1-.09) = 0.17*.91 = 0.155 = 15.5%. A change of X% is always understood to be multiplicative; a change of X percentage points is additive.
Comment
Linh Nguyen

Join Date: Nov 2017

Posts: 85
#11

10 Jan 2019, 06:22

Hi Clyde Schechter:

In #10, you explain how to calculate an expected probability:

0.17*(1-.09) = 0.17*.91 = 0.155 = 15.5%.

However, I can't apply this formula to calculate an expected probability of 0.1229 in #3. Following the formula in #10, the expected probability in #3 is 0.05*(1-0.0729)= 0.046, not 0.1229.

In logistic regression, if the baseline probability is .05, then the baseline odds is 0.05/(1-0.05) ≈ 0.053. So a one degree increase is associated with an odds of 0.053×0.0729 ≈ 0.0038, which corresponds with a probability of 0.0038/(1+0.0038)≈ 0.38%

Could you please explain more?

Best regards,

Last edited by Linh Nguyen; 10 Jan 2019, 06:31.

--------------------
(Stata 15.1 MP)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#12

10 Jan 2019, 16:52

The formula quoted from #10 in #11 is calculating something different from what is calculated in #3, so it does not produce the result that was obtained in #3. I don't know how to explain #3 and #10 more clearly. Try re-reading them carefully until you see that they are two different things.
Comment
Linh Nguyen

Join Date: Nov 2017

Posts: 85
#13

11 Jan 2019, 04:13

I see that #3 uses a nonlinear regression (-probit-) while #9 uses a linear regression (-reg-). Hence, I tried to use your formula in #10 and the formula I know about the logistic regression calculate the expected probability in #3. However, they didn't work.

Could you please write the formula which is used to calculate the expected probability of 0.1229 in #3?

--------------------
(Stata 15.1 MP)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#14

11 Jan 2019, 13:07

The baseline outcome probability in #3 is .05. The marginal effect of log_x (not x itself) is 0.0729. Therefore the expected outcome probability with a unit increase in log_x is 0.05 + 0.0729 = 0.1229.
2 likes
Comment
Isabel Higginson

Join Date: Mar 2019

Posts: 2
#15

24 Mar 2019, 08:52

Hi, I posted a similar question which I am still struggling on and would really appreciate some help: https://www.statalist.org/forums/for...a-probit-model
I've posted the question below too:

I have an explanatory variable in log format ln(income) and the dependent variable, y, is a dummy variable (74% of observations are y=1).

I initially use a linear probability model and the coefficient on ln(income) is 0.00875. I have interpreted this as: the probability of y=1 associated with a 1% increase in income is a 0.0000875% point increase (basically no effect)

The marginal effect at means on the probit model on ln(income) is 0.00907. I have interpreted this as: the probability of y=1 associated with a 172% increase in income is a 0.00907% point increase.
Therefore, the probability of y=1 associated with a 1% increase in income is a 0.00907/172= 0.000053% point increase (basically no effect).

I was wondering if this is the right interpretation and if so, can I just say there is no effect of household income on y=1?
Many thanks in advance
Comment

Announcement

Marginal Effects in Probit model for a Log-Transformed Variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment