Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interaction effect between two continuous variables in xtlogit RE using average marginal effects

    Hi guys,

    I am currently working on master thesis I have encountered an issue.

    Basically my dependent variable is binary and thus far I just have used the xtreg random effects for the regression however my supervisor told me that a logit model would be preferred when the outcome variable is binary. Since I am testing for an interaction effect between two continuous variables I have thus far used the command
    *margins, dydx(X) at(Z = (1 2 4))*
    to see how the slope changes for different values of Z. However, I have heard that testing interaction effects in logit model is somewhat more difficult. Can I also use this command for the logit model or do I have to use another command? I also heard about using odd ratios, but this was at bit confusing for me.

    I appreciate all answers and thanks in advance,

    Marco
    Last edited by Marco Denter; 12 Jun 2019, 08:28.

  • #2
    If you are not familiar with odds ratios, you need to pull out an introductory statistics book and read up on them. The odds ratio is the "natural" metric for logistic regression models, and you will flounder using logistic regression until you learn about odds ratios.

    There is nothing particularly complicated about testing interactions in logistic models. The -margins, dydx(...)- command gives you estimates of the marginal effect of X at your chosen values of Z. It's perfectly good syntax and a perfectly good way to look at the results of a logistic interaction model It's not a test of anything.

    The American Statistical Association recommends, and I strongly agree, that we shouldn't be doing significance tests at all any more, that the whole concept is just flawed. Read https://www.tandfonline.com/doi/full...5.2019.1583913 when you get a chance. But let's put that aside on the assumption that you are being supervised by somebody who hasn't yet gotten that memo.

    Even in the absence of an interaction term in the model at all, the marginal effect of X on the probability of your outomce will be different at different values of Z. That's because the logit link is non-linear, and different values of Z change the "baseline" outcome probability, which in turn changes the marginal effect of X. So, in this sense, when looking at predicted probabilities, the marginal effect of any variable in the model will differ when the values of other variables are different, even when there are no interaction terms in the model to begin with. In that sense, the assertion that interaction effects should not be assessed using the output of -margins, dydx()- makes some sense: how can you distinguish the impact of an interaction term from the "pseudo-interaction" that comes just from the non-linearity of the logit link? Those who take this position will tell you that the test for interaction must be based instead on the significance of the coefficient of the interaction term in the -logistic- output.

    That coefficient is the logarithm of the ratio of odds ratios--which is a bit of a mouthful and can be a bit difficult to wrap your mind around. Now, if you're really doing just a significance test, it doesn't really matter what the metric is: it's either significant or it's not. The drawback is that if you're actually interested in the magnitude of the interaction effect, the log ratio of odds ratios is a pretty unfamiliar metric for most people. At least in my line of work, we are usually interested in the magnitude of the interaction effect, along with some measure of uncertainty, and we want it in the probability metric. In this situation, the output of -margins, dydx(X) at(Z = (1 2 4))- tells us precisely what we want. (Or if we really want the differences in marginal effect of X at Z = 1, 2, or 4, just add the -pwcompare- option to that command to get those.) These numbers are usually easier to understand and are usually the kind of statistics that you would make practical decisions from. But it is, at least conceptually, different from testing the significance of an interaction term.

    I can also tell you that in my experience, it is rare for the -margins, dydx(X) at (Z = (1 2 4)) pwcompare- p-values to be much different from the p-value associated with the coefficient of the interaction term in the logistic regression output. You can create artificial data sets in which they differ a great deal, but in real world application this doesn't seem to happen much. Personally, if confronted with a real life situation in which they did differ, I would rely on the -margins, dydx(....- results, even though, conceptually, they are not testing the interaction term. I'm more interested in the consequences of different outcome probabilities than with the theoretically more correct estimate of causal effect in a metric that is not helpful for decision making. YMMV.

    Comment


    • #3
      Hi Clyde,

      Thank you very much for your detailed answer, it helps a lot. However, I got another question regarding the interpretation of odds. As I mentioned my dependent variable is binary and my three independent variables are all continuous variables. Now after running a regression I got different kinds of odds ratios, whereas the interpretation of odds ratios above 1 seems fairly easy I am not too sure about the interpretation of odds ratios below 1.
      If I have a odds ratio of .843 for one of my independent variables (X), do I basically say 1-0.843 = 0.157 therefore a one unit increase in X decreases the odds of Y by 15.7%? Would that be the correct approach?

      Comment


      • #4
        If I have a odds ratio of .843 for one of my independent variables (X), do I basically say 1-0.843 = 0.157 therefore a one unit increase in X decreases the odds of Y by 15.7%? Would that be the correct approach?
        That is one way of saying it, and you will find it said that way commonly.

        But it is not strictly speaking correct. When your X variable is continuous, it is better not to think about a unit change in X: depending on the scale that might be so small as to be undetectable, or so large as to be unachievable, or anywhere in between. With continuous variables it is better to think of the coefficient as the rate of change in the log odds of outcome per unit of X. So, even if I am planning a journey of several light years, or only going to my next door neighbor, it is meaningful to measure my speed in km/hr: say that aloud--kilometers per hour. Another thing to take away from this is that even though my car's speedometer may say that I am traveling at 80 km/hr, it does not follow that I will travel 80 km in the next hour, because my speed may change. So I would say it as: the odds of a positive outcome decreases by a factor of 15.7% per unit change in X. (And I would actually mention what the unit of measurement of X is.) This way of thinking and speaking about it is more accurate, and applies equally to odds ratios above or below 1.

        Comment


        • #5
          Thank you for the clarification, I will keep it in mind when interpreting my output.

          Cheers,

          Marco

          Comment

          Working...
          X