Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic model: Expressing influence of predictor in maximally simple-clear practical terms.

    Points:

    1. Logistic model with 10 predictors, the key one of which is dichotomous
    2. Key independent var: taking a specific class
    3. Depvar: graduating
    4. My nonspecialist audience operates “on the ground” and won’t care what an "odds ratio" is. Magnitude of the key predictor influence must be expressed in maximally simplistic-intuitive terms.
    5. I don’t know “the alternatives” but intuitive to me would be expressing the change in probability, like the change in the chance of rain, associated with taking the class. Expressed differently, that intuitive number would be meaningful to me, and I think my audience, if the probability difference (bump) is “in the same unit” and can be compared against the raw descriptive probability of the outcome (graduating)
    6. I also welcome language on how to language the magnitude of interest to my “on the ground” audience.
    7. Side note: my presentation will be relatively short

    Suggestions?

    Much thanks in advance.

  • #2
    When running your logistic model, use factor variable notation. (See -help fvvarlist- for details.) In particular, make sure you prefix your key variable with i. so that Stata will know it is discrete. After that, you can get the predicted probabilities of graduation by running -margins whatever- where you replace whatever by the actual name of your key variable. And if you want Stata to calculate the difference in predicted probabilities for you, that is -margins, dydx(whatever)-.

    I would express the results as: "Adjusting for the 9 other predictors, the expected probability of graduation for those who do not take the class is X% (95% CI ...), and for those who do take the class it is Y% (95% CI...). The difference between these is Z% (95% CI ...)." X and Y and their confidence intervals should be filled in from the output of the first -margins- command, and Z and its confidence interval from that of the second.

    If you would like more information on the -margins- command, I recommend reading the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. It is crystal clear and contains several nicely worked examples.

    Comment


    • #3
      Clear. Awesome.

      I am on my way!

      Comment


      • #4
        This solution is great for dichotomous variables.

        Does this approach, or an analogous one, work for continuous variables?

        Comment


        • #5
          Not sure what you're asking here. If you run -logit- with a continuous outcome variable, Stata treats it as 0 = false and everything else = true, which is almost never a sensible analysis.

          Assuming you mean running an ordinary linear regression, or some general linear model appropriate to continuous variables, you can use -margins- afterwards and you will get predicted outcomes, by whatever, adjusted for the other variables in the model. Now, you have to be a little cautious here. Whatever regression command you are going to use, look at the help file for that command's postestimation, and select margins. Then look and see what the default prediction is. The default may not be the statistic you want, and you may have to specify something different in -margins-' -predict()- option. But other than this detail, the approach is the same.

          Comment


          • #6
            Sorry for the incompleteness of my question. At the moment I had in mind not the question of the outcome variable, but of how to use a "margins-like" approach (per above) for a continuous predictor (in the same logistic-style model). My understanding of -margins- is that it works for binary but not for continuous predictors. I find myself having a nice way to summarize the effect of the binary predictors and wonder about having an analogous nice way for the continuous predictors.

            Comment


            • #7
              Well, in fact, -margins- can be used for continuous predictors as well. But the syntax is a bit different, and it also requires a little bit more thought. With a dichotomous predictor, the effect is clear cut: the predictor is either 0 or 1, and the effect is the difference between the corresponding expected outcomes. But with a continuous predictor there is no natural difference.

              Now, the simplistic thought would be,well let's say that my continuous predictor can, in real life, range between 55 and 191 (I made up those numbers just to demonstrate something). One might say something like pick a number in the middle of that range, like their average, 123, and then look at the difference between the expected outcome when the predictor = 123 and the expected outcome when the predictor = 124. There are two serious problems with this approach, however. One of them is that because the logistic model is non-linear, the difference between E(outcome|X=124) and E(outcome|X=123) will, in general, not be the same as the difference between E(outcome | X = 130) and E(outcome | X = 129). In fact, each value of X, X+1 will give you a different difference in expected outcomes (unless the logistic regression coefficient for X is 0). This is the peril of non-linearity. There is another, lesser, problem in that a 1 unit change in X may not be particularly relevant here, particularly if X is on a large scale. So something more complicated has to be done.

              There are several approaches. One is to plot a graph of the outcome probability as a function of X, adjusted for all the other model variables. To do that you have to pick a reasonably large number of values of X that span the range of that variable. So to continue our made up example of X ranging from 55 to 191 you might do something like this:
              Code:
              margins, at(X = (55(10)190))
              marginsplot
              Note that the -marginsplot- function accepts nearly all options available in -graph twoway-, so you can customize this graph's appearance to your liking. The code as shown, with no options specified will give a reasonable attractive, clear graph. If your audience is visually oriented, this may well be the best way to go.

              A slightly more abstract approach is to calculate the "average marginal effect" of X. Recall I said above that different pairs X and X+1 give different differences in expected outcome, so no one result of that kind can really be said to summarize the effect of X. The average marginal effect averages all of those possibilities together. The code for that is:

              Code:
              margins, dydx(X)
              Now, I have to confess that I have been less than 100% truthful up to this point. With continuous variables, the marginal effect is not actually defined as the difference in expected outcomes at X+1 and at X. Instead, the marginal effect is defined using calculus: it is the first derivative of the expected outcome with respect to X, evaluated at the specified value (or, in the case of an average marginal effect, averaged over all values of X.) This is probably not something I would share with a non-technical audience. But there is an easy way to explain it. Suppose we were traveling in a car. The speedometer at this moment reads 80 km/hr. This is the analog of a "marginal effect." But we know that in real road trips, we do not maintain a constant speed at all times, just as in non-linear models like logistic, we do not maintain a constant marginal effect as X changes. So knowing that our speed at this moment is 80 km/hr does not imply that in one hour we will actually travel 80 km. Still, people are comfortable with the notion of our speed at the moment. And this is what a marginal effect is.

              So if, say, the result of -margins, dydx(X)- is 0.07, this means that, on average, the rate of increase in the probability of the outcome is 0.07 (i.e. 7 percentage points) per unit difference in X.

              Another approach, instead of using the average marginal effect, is to pick several interesting or representative values of X and present the marginal effects at those particular values. So, if interesting values of X are 80, 100, and 120:
              Code:
              margins, dydx(X) at(X = (80 100 120))

              Comment

              Working...
              X