Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple Logistic Regression interpretation for a categorical variable containing multiple values

    I am having difficulty interpreting a multiple logistic regression which includes a categorical variable that includes multiple values. Essentially, I am trying to look at injury in a certain population of patients, and would like to interpret the effects of initial therapy type, called TAC_TYPE, (among other categorical variables like BMI, Age, etc) in my multiple logistic regression.

    TAC_TYPE however is a categorical variable that identifies several therapy types (1 = therapy A, 2 = therapy B, 3 = therapy C, ....). Now for binary and even continuous variables, I understand how to interpret the RRR, p, and CI.

    How do I interpret the output mlogit gives me with regards to the variable? What would the RRR and CI indicate?

    Attached Files

  • #2
    Hello Alexander,

    If TAC_TYPE is a categorical variable (not continuous), the way you inserted it in the model is wrong. Stata is considering your categorical variable to be continuous. The way to correctly specify this is by adding "i." in front of whatever variables are categorical (and not dichotomic). Once you do that, STATA will pick one category of TAC_TYPE to serve as a reference, or "base" (the one with the smallest coded value) and will report the RRR of (let's say) TAC_TYPE 1 versus 0, TAC_TYPE 2 versus 0, controlling for co-variates. You can change the reference (base) level by specifying it in front of the variable name: say you want the code 2 to be the base level, you should write: ib2.TAC_TYPE. Last, I find useful to specify all reference (base) levels in my output, I find it easier to interpret. This can be done adding ", allbase" after your command. It would look like:

    Code:
    mlogit deteriorate AGE_65 BMI CM_COPD ISS TAC_TYPE, rrr // your original code
    mlogit deteriorate AGE_65 BMI CM_COPD ISS i.TAC_TYPE, rrr // enters TAC_TYPE as a categorical variable
    mlogit deteriorate AGE_65 BMI CM_COPD ISS ib2.TAC_TYPE, rrr // enters TAC_TYPE as a categorical variable and selects code 2 as the base level
    mlogit deteriorate AGE_65 BMI CM_COPD ISS ib2.TAC_TYPE, rrr allbase // enters TAC_TYPE as a categorical variable and selects code 2 as the base level and display all base levels in the output

    On a side note, your outcome seem to be dichotomous. Why are you using mlogit instead of simple logit or logistic?
    Last edited by Igor Paploski; 29 May 2018, 15:19.

    Comment


    • #3
      Oh, by the way, after you do that, interpretation of RRR, CI and p-values are similar to other indicator variables (the risk of having the outcome in this group of TAC_TYPE is X many times than in the reference group of TAC_TYPE, with the following CI and p-value, after controlling for the other co-variates).

      Comment


      • #4
        Fantastic! This was extremely helpful, thank you so much.
        Is there any rule of thumb behind picking an appropriate reference level?

        Comment


        • #5
          I tend to choose reference groups that make sense theoretically while trying to pick one that is well populated. I've heard people that always pick the most prevalent group as the reference, but I think this simplifies things too much (say you are modeling something entering the year of occurrence as a predictor, you could say that the first year of your series is your baseline, even if it is not the year in which you have the most observations). Be aware of categories that are extremely underpopulated (wide CIs could hint at that).

          Comment


          • #6
            My apologies, I just saw your side note.
            I was under the impression that due to TAC_TYPE being a categorical variable with multiple values that mlogit would be more appropriate? Would logit be better in this case?
            Last edited by Alexander Smithson; 29 May 2018, 16:04.

            Comment


            • #7
              Hi Alexander,

              The fact that you have a predictor that is polytomous (more than 2 categories) is not an issue when using logit/logistic. If your outcome is binary (as yours seem to be), using logit/logistic with the codes we discussed before should work perfectly fine (you will have to remove the rrr option). The "i." and "ibX." options work on logit/logistic.

              mlogit is suited for when you outcome is polytomous (more than 2 categories).

              This link contain interesting info on multinomial logistic regression on STATA.

              Comment


              • #8
                The other problem with picking the most prevalent group is that it may not be the most prevalent group in all your analyses or in a different data set. I greatly prefer to pick my baseline category myself.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Cross-posted at https://stats.stackexchange.com/ques...ingency-tables and likely to close there, but please note our cross-posting policy, which is that you are asked to tell us about it.

                  Comment

                  Working...
                  X