Multiple logistic regression issues

Rey Yeo

Join Date: Dec 2015

Posts: 25
#1

Multiple logistic regression issues

04 Mar 2016, 03:27

Hi,

Like to check if my techniques for variable selection in multiple logistic regression is robust.

I have a outcome and several predictors, some continuous and some categorical variables. I used univariate logistic regression to screen the predictors and keep those with p < 0.10. For the significant variables, I then put them in a multiple logistic regression model. I was taught not to depend on automatic variable selection so I remove the variables one by one starting with the variable with highest p value until I reach a final model for which all p < 0.05.

My question is how do I decide if a categorical variable should be retained or eliminated from the multiple logistic regression model. Some levels of a categorical variable could be < 0.05, some > 0.05. There may be two different categorical variables with p > 0.05 at all levels. Is there any test I can rely on to perhaps measure the overall p value of a categorical variable? I heard of a function called testparm. Should this function be used at every step for all remaining categorical variables regardless of their p values at individual levels?

Thank you.
Tags: None
Rey Yeo

Join Date: Dec 2015

Posts: 25
#2

04 Mar 2016, 04:03

Just to add on ... since I am removing variables one at a time, can I look at the log likelihood reported as well to help decide?

The later model can be said to be nested within the earlier model.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17749
#3

04 Mar 2016, 04:14

Rey:
it sounds strange that you were taught to use stepwise regression at all.
You are going to obtain a model that, in all likelihood, has nothing to do with your original data and, as a consequence, its results, significant or not, are weakly realiable at best.
Just to quote one of the most towering members of the "don't do it" party, you may want to take a look at Frank Harrel's Regression Model Strategy. 2nd edition. Springer: 67-72.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Rey Yeo

Join Date: Dec 2015

Posts: 25
#4

04 Mar 2016, 04:34

Originally posted by Carlo Lazzaro View Post

Rey:
it sounds strange that you were taught to use stepwise regression at all.
You are going to obtain a model that, in all likelihood, has nothing to do with your original data and, as a consequence, its results, significant or not, are weakly realiable at best.
Just to quote one of the most towering members of the "don't do it" party, you may want to take a look at Frank Harrel's Regression Model Strategy. 2nd edition. Springer: 67-72.

Yah I was thinking it may be quite a simplistic way but indeed I was taught to do it manually as described.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17749
#5

04 Mar 2016, 06:23

Rey:
the problem with stepwise variable selection is not that it is simplicistic, but that it's biased.
That said, if you want to follow that road, you may consider a higher than usual P-value threshold for variable omission (e.g.: P<0.20).

Kind regards,
Carlo
(Stata 19.0)
Comment
Rey Yeo

Join Date: Dec 2015

Posts: 25
#6

04 Mar 2016, 07:22

Originally posted by Carlo Lazzaro View Post

Rey:
the problem with stepwise variable selection is not that it is simplicistic, but that it's biased.
That said, if you want to follow that road, you may consider a higher than usual P-value threshold for variable omission (e.g.: P<0.20).

I certainly want to learn the right way.

I am under the impression that manual selection and elimination of variables is the preferred way over automatic procedures. But if its not, I hope to get some insights on how to do it properly.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17749
#7

04 Mar 2016, 07:35

Rey:
you may want to start off from -stepwise- entry in Stata .pdf manual.

Kind regards,
Carlo
(Stata 19.0)
Comment
Rey Yeo

Join Date: Dec 2015

Posts: 25
#8

04 Mar 2016, 08:04

Originally posted by Carlo Lazzaro View Post

Rey:
you may want to start off from -stepwise- entry in Stata .pdf manual.

Is it: http://www.stata.com/manuals14/rstepwise.pdf

Are you recommending a stepwise automatic procedure? I did considered it ... but I have problem specifying the reference group in my categorical variables.

I can use "xi: stepwise ..." but I can't seem to specify the reference group I want like in "xi: logistic outcome ib1.var ..."
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17749
#9

04 Mar 2016, 08:18

Rey:
no, I do notv recommend this procedure at all.
I was simply pointing you out to an entry you might be interested in.

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#10

04 Mar 2016, 08:31

My question is how do I decide if a categorical variable should be retained or eliminated from the multiple logistic regression model.

IMHO, the p-value is not what matters most so as to decide to include a given covariate in the model. Rather, it is the rationale. For example, in health sciences, I'd never exclude "age" or "gender", no matter the p-values. This is also to say that I agree with Carlo. What is more, inserting only covariates with "significant" p-value may eventually end up in a "fully significant" model, albeit a far cry from a relevant, well-adjusted, "generalizable" model.

Best,

Marcos

Best regards,

Marcos
Comment
Rey Yeo

Join Date: Dec 2015

Posts: 25
#11

04 Mar 2016, 09:44

It has been a tough journey.

What I have been taught in school ... not to use automatic procedures, to choose variables based on existing evidence ... using univariate logistic regression first before putting variables into multivariate models ... I spent a lot of time practising and now I am not even sure whether they are correct.

When I want to give up and just use automatic procedures ... forward, backward, stepwise ... I find that I cannot specify my reference groups properly for categorical variables. And people have also been advising me that these procedures are biased but no alternatives are offered.

I am caught in between. I am not sure whether there are indeed fixed steps to follow or some judgment calls are inherent so making this a kind of art that cannot be spelt out precisely in textbooks. Its like no one cares how u do it but the final output just have to be logical and explainable.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17749
#12

04 Mar 2016, 11:46

Rey:
you may hopefully find some relief from your modelling troubles in the literature of your research field: take a look at what others did in the past when presented with the same research topic.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Multiple logistic regression issues

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment