Hi,
Like to check if my techniques for variable selection in multiple logistic regression is robust.
I have a outcome and several predictors, some continuous and some categorical variables. I used univariate logistic regression to screen the predictors and keep those with p < 0.10. For the significant variables, I then put them in a multiple logistic regression model. I was taught not to depend on automatic variable selection so I remove the variables one by one starting with the variable with highest p value until I reach a final model for which all p < 0.05.
My question is how do I decide if a categorical variable should be retained or eliminated from the multiple logistic regression model. Some levels of a categorical variable could be < 0.05, some > 0.05. There may be two different categorical variables with p > 0.05 at all levels. Is there any test I can rely on to perhaps measure the overall p value of a categorical variable? I heard of a function called testparm. Should this function be used at every step for all remaining categorical variables regardless of their p values at individual levels?
Thank you.
Like to check if my techniques for variable selection in multiple logistic regression is robust.
I have a outcome and several predictors, some continuous and some categorical variables. I used univariate logistic regression to screen the predictors and keep those with p < 0.10. For the significant variables, I then put them in a multiple logistic regression model. I was taught not to depend on automatic variable selection so I remove the variables one by one starting with the variable with highest p value until I reach a final model for which all p < 0.05.
My question is how do I decide if a categorical variable should be retained or eliminated from the multiple logistic regression model. Some levels of a categorical variable could be < 0.05, some > 0.05. There may be two different categorical variables with p > 0.05 at all levels. Is there any test I can rely on to perhaps measure the overall p value of a categorical variable? I heard of a function called testparm. Should this function be used at every step for all remaining categorical variables regardless of their p values at individual levels?
Thank you.
Comment