No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Marginal Effects in Probit model for a Log-Transformed Variable


    I am estimating a probit model in which the some variables are in logs. I would like to report the marginal effects, therefore I have used the command - margins -
    margins, dydx(*) atmeans -----> For Marginal Effects at Means (MEM)
    margins, dydx(*) -----> For Average Marginal Effects (AME)

    I don't know how to interpret the marginal effects reported by Stata.
    If the marginal effect of the logs-transformed variable is 0.0729 after (MEM), how can I interpret this?
    - A 1% increase in the log transformed variable increases the probability of success in a 7.29 percent points. Am I right?

    Or is it better to run the probit model with the original variables and then use - margins, eyex(original variable) ?

    Thanks a lot,

  • #2

    Did you ever get an answer to this question from someone via private message or another source?



    • #3
      Not sure why #1 never got an answer, at least not publicly.

      Suppose your outcome variable is y, and your predictor variable is x, but, for whatever reason, you choose to use log x as the predictor in the model and run this:

      gen log_x = log(x)
      probit y log_x other_variables
      margins, dydx(*) atmeans
      And suppose the margin for log_x is 0.0729.

      This means that a difference of 1 in log x (not 1%, nor 1 percentage point: logarithms are dimensionless) is associated with an increase of 0.0729 in the probability of y = 1. So, if the "baseline" probability is, say 0.05, an increase of 1 in log x is associated to an expected probability of 0.1229. Note that a difference of 1 in log x, when viewed from the perspective of x itself, means x being multiplied by 2.71828..., which is a roughly 172% increase in x.


      • #4
        The user-written -mcp- command (available from SSC) has some nice ways of plotting y vs log transformed variables, which may help with interpretation. See section 5 of

        The section starts "Suppose the relationship between the response variable, y, and x is log linear. Such a situation is not uncommon. We wish to model E (y) as a linear function of log x, and we want to graph the relationship on the original scale of x, not the scale of log x."
        Richard Williams, Notre Dame Dept of Sociology
        Stata Version: 15.1MP (2 processor)

        EMAIL: rwilliam@ND.Edu


        • #5
          Clyde, concerning "which is a roughly 172% increase in x":
          How do you come to 172%?



          • #6
            Pedro: Recall from calculus that df(x) / dln(x) = x * df(x) / dx. So if you divide your estimated marginal effects (based on log-x) by x you will get df(x)/dx. But this should be done observation-by-observation, not based on average-x's. Here's the basic idea (inelegantly programmed):
            gen lnx=ln(x)
            probit y lnx
            predict xb, xb
            matrix b=e(b)
            matrix b1=b[1,1]
            gen dpydx=normalden(xb)*trace(b1)/x
            x must be positive for ln(x) to be defined, so dividing by x shouldn't be a problem.


            • #7
              Re #5: As noted, an increase of 1 in log x corresponds to multiplying x by e = 2.71828... So the absolute change in x is 2.71828...*x - x, which simplifies to 1.71828...*x. Putting that in percentage terms, its a 171.828...% change in x, which I rounded to 172%.


              • #8
                Re: #6, this is a little less inelegant
                gen lnx=ln(x)
                probit y lnx
                predict xb, xb
                matrix b=e(b)
                scalar b1=b[1,1]
                gen dn=normalden(xb)
                gen dpydx=dn*b1/x


                • #9
                  Hi STATA users,

                  Very interesting discussion! I am having similar issue of interpretation with log transformed variables. I am running a linear probability model and my variable of interest is log transformed. (Log transformation of a distance).

                  In STATA 14.1 I run the following regression:

                  regress inorganic ///
                      organic lndistance rainfall_06 ///
                      Livestock share plot_twi ///
                      i.culture i.year i.inside_zone i.culture i.ms00q11 ///
                      if culture < 99 ///
                      , vce(cluster grappe)
                  margins, dydx(lndistance)

                  Here is the outcome

                  Average marginal effects Number of obs = 6,374
                  Model VCE : Robust

                  Expression : Linear prediction, predict()
                  dy/dx w.r.t. : lndistance

                  | Delta-method
                  | dy/dx Std. Err. t P>|t| [95% Conf. Interval]
                  lndistance | -.0964674 .03403 -2.83 0.005 -.1636933 -.0292414

                  My baseline probability is 0.17. So if I understood well Clyde's comment I should interpret my result as: A difference in 1 of the log distance is associated with a decrease of 0.09 in the probability of Y=1. My baseline probability being 0.1786, an increase in 1 of the log x is associated to an expected probability of 0.08%.

                  In other words, since before being log transformed the average distance in my sample is 35.60 km, an increase in 1 in log(distance) equivalent to an increase in 25.6km (35,6*1,718) decreases the probability of Y=1 by 9%.

                  I hope my table can be seen clearly on the forum and that my question about interpretation makes sense to you.



                  • #10
                    I agree with your interpretations in #9 until you get to the end. First, there is an arithmetic problem: 35.6*1.718 is not 25.6. Next, 1.718 is not the correct factor to multiply by. A unit increase in ln(distance) corresponds to multiplying distance by
                    e = 2.718. So if the baseline distance is 35.60, the other distance is 35.60 * 2.718, which is 96.8 km (approximately).

                    Also, the probability of Y = 1 decrease by 9 percentage points, not 9%. A 9% decrease of a baseline of 0.17 would bring you to 0.17*(1-.09) = 0.17*.91 = 0.155 = 15.5%. A change of X% is always understood to be multiplicative; a change of X percentage points is additive.


                    • #11
                      Hi Clyde Schechter:

                      In #10, you explain how to calculate an expected probability:
                      0.17*(1-.09) = 0.17*.91 = 0.155 = 15.5%.
                      However, I can't apply this formula to calculate an expected probability of 0.1229 in #3. Following the formula in #10, the expected probability in #3 is 0.05*(1-0.0729)= 0.046, not 0.1229.

                      In logistic regression, if the baseline probability is .05, then the baseline odds is 0.05/(1-0.05) ≈ 0.053. So a one degree increase is associated with an odds of 0.053×0.0729 ≈ 0.0038, which corresponds with a probability of 0.0038/(1+0.0038)≈ 0.38%

                      Could you please explain more?

                      Best regards,
                      Last edited by Linh Nguyen; 10 Jan 2019, 06:31.
                      (Stata 15.1 MP)


                      • #12
                        The formula quoted from #10 in #11 is calculating something different from what is calculated in #3, so it does not produce the result that was obtained in #3. I don't know how to explain #3 and #10 more clearly. Try re-reading them carefully until you see that they are two different things.


                        • #13
                          I see that #3 uses a nonlinear regression (-probit-) while #9 uses a linear regression (-reg-). Hence, I tried to use your formula in #10 and the formula I know about the logistic regression calculate the expected probability in #3. However, they didn't work.

                          Could you please write the formula which is used to calculate the expected probability of 0.1229 in #3?
                          (Stata 15.1 MP)


                          • #14
                            The baseline outcome probability in #3 is .05. The marginal effect of log_x (not x itself) is 0.0729. Therefore the expected outcome probability with a unit increase in log_x is 0.05 + 0.0729 = 0.1229.


                            • #15
                              Hi, I posted a similar question which I am still struggling on and would really appreciate some help:
                              I've posted the question below too:

                              I have an explanatory variable in log format ln(income) and the dependent variable, y, is a dummy variable (74% of observations are y=1).

                              I initially use a linear probability model and the coefficient on ln(income) is 0.00875. I have interpreted this as: the probability of y=1 associated with a 1% increase in income is a 0.0000875% point increase (basically no effect)

                              The marginal effect at means on the probit model on ln(income) is 0.00907. I have interpreted this as: the probability of y=1 associated with a 172% increase in income is a 0.00907% point increase.
                              Therefore, the probability of y=1 associated with a 1% increase in income is a 0.00907/172= 0.000053% point increase (basically no effect).

                              I was wondering if this is the right interpretation and if so, can I just say there is no effect of household income on y=1?
                              Many thanks in advance