Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Marginal Effects in Probit model for a Log-Transformed Variable

    Hi,

    I am estimating a probit model in which the some variables are in logs. I would like to report the marginal effects, therefore I have used the command - margins -
    margins, dydx(*) atmeans -----> For Marginal Effects at Means (MEM)
    margins, dydx(*) -----> For Average Marginal Effects (AME)

    I don't know how to interpret the marginal effects reported by Stata.
    If the marginal effect of the logs-transformed variable is 0.0729 after (MEM), how can I interpret this?
    - A 1% increase in the log transformed variable increases the probability of success in a 7.29 percent points. Am I right?

    Or is it better to run the probit model with the original variables and then use - margins, eyex(original variable) ?

    Thanks a lot,
    Pedro

  • #2
    Pedro,

    Did you ever get an answer to this question from someone via private message or another source?

    Thanks

    Comment


    • #3
      Not sure why #1 never got an answer, at least not publicly.

      Suppose your outcome variable is y, and your predictor variable is x, but, for whatever reason, you choose to use log x as the predictor in the model and run this:

      Code:
      gen log_x = log(x)
      probit y log_x other_variables
      margins, dydx(*) atmeans
      And suppose the margin for log_x is 0.0729.

      This means that a difference of 1 in log x (not 1%, nor 1 percentage point: logarithms are dimensionless) is associated with an increase of 0.0729 in the probability of y = 1. So, if the "baseline" probability is, say 0.05, an increase of 1 in log x is associated to an expected probability of 0.1229. Note that a difference of 1 in log x, when viewed from the perspective of x itself, means x being multiplied by 2.71828..., which is a roughly 172% increase in x.


      Comment


      • #4
        The user-written -mcp- command (available from SSC) has some nice ways of plotting y vs log transformed variables, which may help with interpretation. See section 5 of

        http://www.stata-journal.com/sjpdf.h...iclenum=gr0056

        The section starts "Suppose the relationship between the response variable, y, and x is log linear. Such a situation is not uncommon. We wish to model E (y) as a linear function of log x, and we want to graph the relationship on the original scale of x, not the scale of log x."
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        Stata Version: 17.0 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Clyde, concerning "which is a roughly 172% increase in x":
          How do you come to 172%?

          Thanks!

          Comment


          • #6
            Pedro: Recall from calculus that df(x) / dln(x) = x * df(x) / dx. So if you divide your estimated marginal effects (based on log-x) by x you will get df(x)/dx. But this should be done observation-by-observation, not based on average-x's. Here's the basic idea (inelegantly programmed):
            Code:
            gen lnx=ln(x)
            probit y lnx
            predict xb, xb
            matrix b=e(b)
            matrix b1=b[1,1]
            gen dpydx=normalden(xb)*trace(b1)/x
            x must be positive for ln(x) to be defined, so dividing by x shouldn't be a problem.

            Comment


            • #7
              Re #5: As noted, an increase of 1 in log x corresponds to multiplying x by e = 2.71828... So the absolute change in x is 2.71828...*x - x, which simplifies to 1.71828...*x. Putting that in percentage terms, its a 171.828...% change in x, which I rounded to 172%.

              Comment


              • #8
                Re: #6, this is a little less inelegant
                Code:
                gen lnx=ln(x)
                probit y lnx
                predict xb, xb
                matrix b=e(b)
                scalar b1=b[1,1]
                gen dn=normalden(xb)
                gen dpydx=dn*b1/x

                Comment


                • #9
                  Hi STATA users,

                  Very interesting discussion! I am having similar issue of interpretation with log transformed variables. I am running a linear probability model and my variable of interest is log transformed. (Log transformation of a distance).

                  In STATA 14.1 I run the following regression:

                  Code:
                  regress inorganic ///
                      organic lndistance rainfall_06 ///
                      Livestock share plot_twi ///
                      i.culture i.year i.inside_zone i.zone i.culture i.ms00q11 i.zone ///
                      if culture < 99 ///
                      , vce(cluster grappe)
                  Code:
                  margins, dydx(lndistance)

                  Here is the outcome


                  Average marginal effects Number of obs = 6,374
                  Model VCE : Robust

                  Expression : Linear prediction, predict()
                  dy/dx w.r.t. : lndistance

                  ------------------------------------------------------------------------------
                  | Delta-method
                  | dy/dx Std. Err. t P>|t| [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                  lndistance | -.0964674 .03403 -2.83 0.005 -.1636933 -.0292414
                  ------------------------------------------------------------------------------


                  My baseline probability is 0.17. So if I understood well Clyde's comment I should interpret my result as: A difference in 1 of the log distance is associated with a decrease of 0.09 in the probability of Y=1. My baseline probability being 0.1786, an increase in 1 of the log x is associated to an expected probability of 0.08%.

                  In other words, since before being log transformed the average distance in my sample is 35.60 km, an increase in 1 in log(distance) equivalent to an increase in 25.6km (35,6*1,718) decreases the probability of Y=1 by 9%.

                  I hope my table can be seen clearly on the forum and that my question about interpretation makes sense to you.

                  Best,

                  Comment


                  • #10
                    I agree with your interpretations in #9 until you get to the end. First, there is an arithmetic problem: 35.6*1.718 is not 25.6. Next, 1.718 is not the correct factor to multiply by. A unit increase in ln(distance) corresponds to multiplying distance by
                    e = 2.718. So if the baseline distance is 35.60, the other distance is 35.60 * 2.718, which is 96.8 km (approximately).

                    Also, the probability of Y = 1 decrease by 9 percentage points, not 9%. A 9% decrease of a baseline of 0.17 would bring you to 0.17*(1-.09) = 0.17*.91 = 0.155 = 15.5%. A change of X% is always understood to be multiplicative; a change of X percentage points is additive.

                    Comment


                    • #11
                      Hi Clyde Schechter:

                      In #10, you explain how to calculate an expected probability:
                      0.17*(1-.09) = 0.17*.91 = 0.155 = 15.5%.
                      However, I can't apply this formula to calculate an expected probability of 0.1229 in #3. Following the formula in #10, the expected probability in #3 is 0.05*(1-0.0729)= 0.046, not 0.1229.

                      In logistic regression, if the baseline probability is .05, then the baseline odds is 0.05/(1-0.05) ≈ 0.053. So a one degree increase is associated with an odds of 0.053×0.0729 ≈ 0.0038, which corresponds with a probability of 0.0038/(1+0.0038)≈ 0.38%

                      Could you please explain more?

                      Best regards,
                      Last edited by Linh Nguyen; 10 Jan 2019, 06:31.
                      --------------------
                      (Stata 15.1 MP)

                      Comment


                      • #12
                        The formula quoted from #10 in #11 is calculating something different from what is calculated in #3, so it does not produce the result that was obtained in #3. I don't know how to explain #3 and #10 more clearly. Try re-reading them carefully until you see that they are two different things.

                        Comment


                        • #13
                          I see that #3 uses a nonlinear regression (-probit-) while #9 uses a linear regression (-reg-). Hence, I tried to use your formula in #10 and the formula I know about the logistic regression calculate the expected probability in #3. However, they didn't work.

                          Could you please write the formula which is used to calculate the expected probability of 0.1229 in #3?
                          --------------------
                          (Stata 15.1 MP)

                          Comment


                          • #14
                            The baseline outcome probability in #3 is .05. The marginal effect of log_x (not x itself) is 0.0729. Therefore the expected outcome probability with a unit increase in log_x is 0.05 + 0.0729 = 0.1229.

                            Comment


                            • #15
                              Hi, I posted a similar question which I am still struggling on and would really appreciate some help: https://www.statalist.org/forums/for...a-probit-model
                              I've posted the question below too:

                              I have an explanatory variable in log format ln(income) and the dependent variable, y, is a dummy variable (74% of observations are y=1).

                              I initially use a linear probability model and the coefficient on ln(income) is 0.00875. I have interpreted this as: the probability of y=1 associated with a 1% increase in income is a 0.0000875% point increase (basically no effect)

                              The marginal effect at means on the probit model on ln(income) is 0.00907. I have interpreted this as: the probability of y=1 associated with a 172% increase in income is a 0.00907% point increase.
                              Therefore, the probability of y=1 associated with a 1% increase in income is a 0.00907/172= 0.000053% point increase (basically no effect).

                              I was wondering if this is the right interpretation and if so, can I just say there is no effect of household income on y=1?
                              Many thanks in advance

                              Comment

                              Working...
                              X