Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Visualizing and determining curvature in multiple logistic regression

    Dear everyone,

    Could you share some Stata commands to visualize whether there is curvature in a multiple logistic regression?

    All five predictor variables are continuous; I want to determine whether they are individually nonlinearly related (curvature) to the binary outcome, fit an appropriate multiple polynomial logistic regression and assess the model fit.

    I will be glad if I can get a link to any comprehensive sources or Stata tutorials on detecting curvature and fitting polynomial logistic regressions. Thank you.

  • #2
    It's going to be curved (and oddly so) by the nature of the model.

    Not sure how legit this is, but maybe a start.

    Code:
    clear all
    sysuse auto, clear
    summ length
    local lmean = r(mean)
    mlogit rep78 weight length
    margins, over(weight) at(length = `lmean')
    marginsplot , noci

    Comment


    • #3
      Code:
      h lowess
      h lpoly

      Comment


      • #4
        I thought about lowess on the model's predictions, but couldn't square it with a multivariate model, but it could be possible I suppose.

        Comment


        • #5
          Originally posted by Rich Goldstein View Post
          Code:
          h lowess
          h lpoly
          Thank you for your response, but I don’t understand this. Could elaborate? Thank you.

          Comment


          • #6
            Originally posted by George Ford View Post
            It's going to be curved (and oddly so) by the nature of the model.

            Not sure how legit this is, but maybe a start.

            Code:
            clear all
            sysuse auto, clear
            summ length
            local lmean = r(mean)
            mlogit rep78 weight length
            margins, over(weight) at(length = `lmean')
            marginsplot , noci
            Thank you, George. Apologies my request was not clear enough. I want to fit a logistic regression with a polynomial term, but I want to plot the relationship between my variables (i.e. between binary outcome and each of my five continuous predictor variables) to determine if there is curvature. This will justify including a polynomial term (and the degree) in the logistic regression. I did not mean that I wanted to fit a polytomous logistic regression.

            Comment


            • #7
              Since of the variables is dichotomous, all you can get is a clump of points to 2 spots.

              Maybe cut the predictors into deciles and plot the mean of the DV to look for patterns. Or use the residual of a linear probability model to look for patterns.

              Perhaps the easiest way is run the model with quadratics. If the coefficients are the quadratics are poorly estimated, then exclude them and move on.

              I'd be asking myself why add quadratics? In linear models I can see it, but Logit/Probit are non-linear models.

              Also, make sure you know how to interpret the quadratic term and it tells you what you want. I suspect it's not as straightforward as a linear model. If you added a cubic term, then it might be really difficult to understand the results.


              Comment


              • #8
                the point about local regression smoothers, e.g., -lowess-, is that they can help visualize whether the relationship between a predictor and the outcome (even if the outcome is binary) is linear - see the examples in the manual (you can get there by clicking on the blue link at the top of the help file); note that the use of -lowess- and of -lpoly- for examining possible non-linearity of such a relationship has been discussed many times on Statalist and you can find those by doing a search

                I believe that George Ford and I are possibly interpreting #1 differently so be careful

                Comment


                • #9
                  Originally posted by George Ford View Post
                  Since of the variables is dichotomous, all you can get is a clump of points to 2 spots.

                  Maybe cut the predictors into deciles and plot the mean of the DV to look for patterns. Or use the residual of a linear probability model to look for patterns.

                  Perhaps the easiest way is run the model with quadratics. If the coefficients are the quadratics are poorly estimated, then exclude them and move on.

                  I'd be asking myself why add quadratics? In linear models I can see it, but Logit/Probit are non-linear models.

                  Also, make sure you know how to interpret the quadratic term and it tells you what you want. I suspect it's not as straightforward as a linear model. If you added a cubic term, then it might be really difficult to understand the results.

                  Thank you for the suggestions. I will explore them.

                  Comment


                  • #10
                    Originally posted by Rich Goldstein View Post
                    the point about local regression smoothers, e.g., -lowess-, is that they can help visualize whether the relationship between a predictor and the outcome (even if the outcome is binary) is linear - see the examples in the manual (you can get there by clicking on the blue link at the top of the help file); note that the use of -lowess- and of -lpoly- for examining possible non-linearity of such a relationship has been discussed many times on Statalist and you can find those by doing a search

                    I believe that George Ford and I are possibly interpreting #1 differently so be careful
                    Good to know that the topic has been discussed extensively. I will search for the discussion. Thank you.

                    Comment


                    • #11
                      My take here is close to that of Rich Goldstein.

                      In ecology the combination of logit and a quadratic in one or more predictors is utterly standard and has been given a name: Gaussian logit. The name arises from the fact that a parabola in the space (logit proportion, x) is a bell-like curve in the space (proportion, x). (Some don't like the name for good reasons, but it is a good phrase for searching.)

                      The idea grows out of something familiar to every gardener or amateur naturalist.

                      For any given species (taxon more generally), it can be too hot or too cold, or just about right. Too wet ... too dry; too saline ... not saline enough, You get the picture, Abundance therefore is at a maximum at or near what is optimum for temperature, moisture, salinity or any other control.

                      What I would do is cycle through your predictors and use lpoly to smooth your binary outcome as a function of each in turn.

                      You could do that with a command like combineplot from SSC.

                      Notes:

                      Cubics and quartics and on and upward are dangerous. That way lies the peril of nonsense or over-fitting. I wish this was better understood in some quarters.

                      You don't have to have a turning point for quadratics to be useful. The most common examples on Statalist seem to have the flavour that a quadratic just imparts some curvature. A turning point is implied, but it is way outside the range of the data. We're just talking empirical fits, not Newtonian mechanics.

                      You may or may not want or need to look at interactions too.

                      Comment

                      Working...
                      X