Visualizing and determining curvature in multiple logistic regression

David Okunlola

Join Date: Feb 2021

Posts: 13
#1

Visualizing and determining curvature in multiple logistic regression

02 May 2024, 07:57

Dear everyone,

Could you share some Stata commands to visualize whether there is curvature in a multiple logistic regression?

All five predictor variables are continuous; I want to determine whether they are individually nonlinearly related (curvature) to the binary outcome, fit an appropriate multiple polynomial logistic regression and assess the model fit.

I will be glad if I can get a link to any comprehensive sources or Stata tutorials on detecting curvature and fitting polynomial logistic regressions. Thank you.
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3156
#2

02 May 2024, 12:10

It's going to be curved (and oddly so) by the nature of the model.

Not sure how legit this is, but maybe a start.

Code:

clear all sysuse auto, clear summ length local lmean = r(mean) mlogit rep78 weight length margins, over(weight) at(length = `lmean') marginsplot , noci
1 like
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#3

02 May 2024, 13:14

Code:

h lowess h lpoly
Comment
George Ford

Join Date: Aug 2014

Posts: 3156
#4

02 May 2024, 13:54

I thought about lowess on the model's predictions, but couldn't square it with a multivariate model, but it could be possible I suppose.
Comment
David Okunlola

Join Date: Feb 2021

Posts: 13
#5

02 May 2024, 19:33

Originally posted by Rich Goldstein View Post

Code:

h lowess h lpoly

Thank you for your response, but I don’t understand this. Could elaborate? Thank you.
Comment
David Okunlola

Join Date: Feb 2021

Posts: 13
#6

02 May 2024, 19:41

Originally posted by George Ford View Post

It's going to be curved (and oddly so) by the nature of the model.

Not sure how legit this is, but maybe a start.

Code:

clear all sysuse auto, clear summ length local lmean = r(mean) mlogit rep78 weight length margins, over(weight) at(length = `lmean') marginsplot , noci

Thank you, George. Apologies my request was not clear enough. I want to fit a logistic regression with a polynomial term, but I want to plot the relationship between my variables (i.e. between binary outcome and each of my five continuous predictor variables) to determine if there is curvature. This will justify including a polynomial term (and the degree) in the logistic regression. I did not mean that I wanted to fit a polytomous logistic regression.
Comment
George Ford

Join Date: Aug 2014

Posts: 3156
#7

03 May 2024, 07:37

Since of the variables is dichotomous, all you can get is a clump of points to 2 spots.

Maybe cut the predictors into deciles and plot the mean of the DV to look for patterns. Or use the residual of a linear probability model to look for patterns.

Perhaps the easiest way is run the model with quadratics. If the coefficients are the quadratics are poorly estimated, then exclude them and move on.

I'd be asking myself why add quadratics? In linear models I can see it, but Logit/Probit are non-linear models.

Also, make sure you know how to interpret the quadratic term and it tells you what you want. I suspect it's not as straightforward as a linear model. If you added a cubic term, then it might be really difficult to understand the results.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#8

03 May 2024, 08:27

the point about local regression smoothers, e.g., -lowess-, is that they can help visualize whether the relationship between a predictor and the outcome (even if the outcome is binary) is linear - see the examples in the manual (you can get there by clicking on the blue link at the top of the help file); note that the use of -lowess- and of -lpoly- for examining possible non-linearity of such a relationship has been discussed many times on Statalist and you can find those by doing a search

I believe that George Ford and I are possibly interpreting #1 differently so be careful
Comment
David Okunlola

Join Date: Feb 2021

Posts: 13
#9

03 May 2024, 10:11

Originally posted by George Ford View Post

Since of the variables is dichotomous, all you can get is a clump of points to 2 spots.

Maybe cut the predictors into deciles and plot the mean of the DV to look for patterns. Or use the residual of a linear probability model to look for patterns.

Perhaps the easiest way is run the model with quadratics. If the coefficients are the quadratics are poorly estimated, then exclude them and move on.

I'd be asking myself why add quadratics? In linear models I can see it, but Logit/Probit are non-linear models.

Also, make sure you know how to interpret the quadratic term and it tells you what you want. I suspect it's not as straightforward as a linear model. If you added a cubic term, then it might be really difficult to understand the results.

Thank you for the suggestions. I will explore them.
Comment
David Okunlola

Join Date: Feb 2021

Posts: 13
#10

03 May 2024, 10:12

Originally posted by Rich Goldstein View Post

the point about local regression smoothers, e.g., -lowess-, is that they can help visualize whether the relationship between a predictor and the outcome (even if the outcome is binary) is linear - see the examples in the manual (you can get there by clicking on the blue link at the top of the help file); note that the use of -lowess- and of -lpoly- for examining possible non-linearity of such a relationship has been discussed many times on Statalist and you can find those by doing a search

I believe that George Ford and I are possibly interpreting #1 differently so be careful

Good to know that the topic has been discussed extensively. I will search for the discussion. Thank you.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35730
#11

04 May 2024, 04:06

My take here is close to that of Rich Goldstein.

In ecology the combination of logit and a quadratic in one or more predictors is utterly standard and has been given a name: Gaussian logit. The name arises from the fact that a parabola in the space (logit proportion, x) is a bell-like curve in the space (proportion, x). (Some don't like the name for good reasons, but it is a good phrase for searching.)

The idea grows out of something familiar to every gardener or amateur naturalist.

For any given species (taxon more generally), it can be too hot or too cold, or just about right. Too wet ... too dry; too saline ... not saline enough, You get the picture, Abundance therefore is at a maximum at or near what is optimum for temperature, moisture, salinity or any other control.

What I would do is cycle through your predictors and use lpoly to smooth your binary outcome as a function of each in turn.

You could do that with a command like combineplot from SSC.

Notes:

Cubics and quartics and on and upward are dangerous. That way lies the peril of nonsense or over-fitting. I wish this was better understood in some quarters.

You don't have to have a turning point for quadratics to be useful. The most common examples on Statalist seem to have the flavour that a quadratic just imparts some curvature. A turning point is implied, but it is way outside the range of the data. We're just talking empirical fits, not Newtonian mechanics.

You may or may not want or need to look at interactions too.
Comment

Announcement

Visualizing and determining curvature in multiple logistic regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment