Multiple Logistic Regression interpretation for a categorical variable containing multiple values

Alexander Smithson

Join Date: May 2018

Posts: 4
#1

Multiple Logistic Regression interpretation for a categorical variable containing multiple values

29 May 2018, 14:58

I am having difficulty interpreting a multiple logistic regression which includes a categorical variable that includes multiple values. Essentially, I am trying to look at injury in a certain population of patients, and would like to interpret the effects of initial therapy type, called TAC_TYPE, (among other categorical variables like BMI, Age, etc) in my multiple logistic regression.

TAC_TYPE however is a categorical variable that identifies several therapy types (1 = therapy A, 2 = therapy B, 3 = therapy C, ....). Now for binary and even continuous variables, I understand how to interpret the RRR, p, and CI.

How do I interpret the output mlogit gives me with regards to the variable? What would the RRR and CI indicate?

Attached Files
Tags: None
Igor Paploski

Join Date: Oct 2014

Posts: 174
#2

29 May 2018, 15:16

Hello Alexander,

If TAC_TYPE is a categorical variable (not continuous), the way you inserted it in the model is wrong. Stata is considering your categorical variable to be continuous. The way to correctly specify this is by adding "i." in front of whatever variables are categorical (and not dichotomic). Once you do that, STATA will pick one category of TAC_TYPE to serve as a reference, or "base" (the one with the smallest coded value) and will report the RRR of (let's say) TAC_TYPE 1 versus 0, TAC_TYPE 2 versus 0, controlling for co-variates. You can change the reference (base) level by specifying it in front of the variable name: say you want the code 2 to be the base level, you should write: ib2.TAC_TYPE. Last, I find useful to specify all reference (base) levels in my output, I find it easier to interpret. This can be done adding ", allbase" after your command. It would look like:

Code:

mlogit deteriorate AGE_65 BMI CM_COPD ISS TAC_TYPE, rrr // your original code mlogit deteriorate AGE_65 BMI CM_COPD ISS i.TAC_TYPE, rrr // enters TAC_TYPE as a categorical variable mlogit deteriorate AGE_65 BMI CM_COPD ISS ib2.TAC_TYPE, rrr // enters TAC_TYPE as a categorical variable and selects code 2 as the base level mlogit deteriorate AGE_65 BMI CM_COPD ISS ib2.TAC_TYPE, rrr allbase // enters TAC_TYPE as a categorical variable and selects code 2 as the base level and display all base levels in the output

On a side note, your outcome seem to be dichotomous. Why are you using mlogit instead of simple logit or logistic?

Last edited by Igor Paploski; 29 May 2018, 15:19.
1 like
Comment
Igor Paploski

Join Date: Oct 2014

Posts: 174
#3

29 May 2018, 15:24

Oh, by the way, after you do that, interpretation of RRR, CI and p-values are similar to other indicator variables (the risk of having the outcome in this group of TAC_TYPE is X many times than in the reference group of TAC_TYPE, with the following CI and p-value, after controlling for the other co-variates).
1 like
Comment
Alexander Smithson

Join Date: May 2018

Posts: 4
#4

29 May 2018, 15:39

Fantastic! This was extremely helpful, thank you so much.
Is there any rule of thumb behind picking an appropriate reference level?
Comment
Igor Paploski

Join Date: Oct 2014

Posts: 174
#5

29 May 2018, 15:46

I tend to choose reference groups that make sense theoretically while trying to pick one that is well populated. I've heard people that always pick the most prevalent group as the reference, but I think this simplifies things too much (say you are modeling something entering the year of occurrence as a predictor, you could say that the first year of your series is your baseline, even if it is not the year in which you have the most observations). Be aware of categories that are extremely underpopulated (wide CIs could hint at that).
1 like
Comment
Alexander Smithson

Join Date: May 2018

Posts: 4
#6

29 May 2018, 16:02

My apologies, I just saw your side note.
I was under the impression that due to TAC_TYPE being a categorical variable with multiple values that mlogit would be more appropriate? Would logit be better in this case?

Last edited by Alexander Smithson; 29 May 2018, 16:04.
Comment
Igor Paploski

Join Date: Oct 2014

Posts: 174
#7

29 May 2018, 16:13

Hi Alexander,

The fact that you have a predictor that is polytomous (more than 2 categories) is not an issue when using logit/logistic. If your outcome is binary (as yours seem to be), using logit/logistic with the codes we discussed before should work perfectly fine (you will have to remove the rrr option). The "i." and "ibX." options work on logit/logistic.

mlogit is suited for when you outcome is polytomous (more than 2 categories).

This link contain interesting info on multinomial logistic regression on STATA.
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#8

29 May 2018, 18:21

The other problem with picking the most prevalent group is that it may not be the most prevalent group in all your analyses or in a different data set. I greatly prefer to pick my baseline category myself.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35722
#9

29 May 2018, 18:56

Cross-posted at https://stats.stackexchange.com/ques...ingency-tables and likely to close there, but please note our cross-posting policy, which is that you are asked to tell us about it.
Comment

Announcement

Multiple Logistic Regression interpretation for a categorical variable containing multiple values

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment