Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Please help with organization and interpretation of logistic regression with interaction

    Hello Statalist community,

    I am doing logistic regression to look at the effect of two categorical predictors on a binary outcome. I checked the interaction between two of these predictors using LRT and found an interaction. I wish to present individual odds ratios (ORs) for the different combinations of the interacting predictors in reference to the same reference combination. I have gone through the different recommendations in the statalist forum that suggest using the margins, contrast statements and lincom commands and I am honestly confused as to how best achieve my desired objective. I was wondering if someone could kindly show me how to produce individual ORs using the example data and model below. For example, where white non-smokers are the reference, how do I estimate ORs for the joint effects of race and smoking on low birthweight for white smokers, black non-smokers, black smokers etc? Is this possible and if not what would you recommend? Any help with sample code would be greatly appreciated.

    Code:
     
    
    
    . use "https://www.stata-press.com/data/r17/lbw.dta"
    (Hosmer & Lemeshow data)
    
    . fre race smoke low
    
    race -- Race
    -------------------------------------------------------------
                    |      Freq.    Percent      Valid       Cum.
    ----------------+--------------------------------------------
    Valid   1 White |         96      50.79      50.79      50.79
            2 Black |         26      13.76      13.76      64.55
            3 Other |         67      35.45      35.45     100.00
            Total   |        189     100.00     100.00           
    -------------------------------------------------------------
    
    smoke -- Smoked during pregnancy
    -----------------------------------------------------------------
                        |      Freq.    Percent      Valid       Cum.
    --------------------+--------------------------------------------
    Valid   0 Nonsmoker |        115      60.85      60.85      60.85
            1 Smoker    |         74      39.15      39.15     100.00
            Total       |        189     100.00     100.00           
    -----------------------------------------------------------------
    
    low -- Birthweight<2500g
    -----------------------------------------------------------
                  |      Freq.    Percent      Valid       Cum.
    --------------+--------------------------------------------
    Valid   0     |        130      68.78      68.78      68.78
            1     |         59      31.22      31.22     100.00
            Total |        189     100.00     100.00           
    -----------------------------------------------------------
    
    . logit low i.race##i.smoke
    
    Iteration 0:   log likelihood =   -117.336  
    Iteration 1:   log likelihood = -108.80624  
    Iteration 2:   log likelihood = -108.41275  
    Iteration 3:   log likelihood = -108.40889  
    Iteration 4:   log likelihood = -108.40889  
    
    Logistic regression                             Number of obs     =        189
                                                    LR chi2(5)        =      17.85
                                                    Prob > chi2       =     0.0031
    Log likelihood = -108.40889                     Pseudo R2         =     0.0761
    
    -------------------------------------------------------------------------------
              low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
             race |
           Black  |   1.514127   .7522693     2.01   0.044     .0397068    2.988548
           Other  |   1.742969   .5946187     2.93   0.003     .5775379      2.9084
                  |
            smoke |
          Smoker  |   1.750516   .5982763     2.93   0.003     .5779162    2.923116
                  |
       race#smoke |
    Black#Smoker  |  -.5565938   1.032236    -0.54   0.590    -2.579738    1.466551
    Other#Smoker  |  -1.527373   .8828155    -1.73   0.084    -3.257659    .2029138
                  |
            _cons |  -2.302585   .5244044    -4.39   0.000    -3.330399   -1.274771
    -------------------------------------------------------------------------------
    Thank you in advance

  • #2
    Try
    Code:
    help margins

    Comment


    • #3
      Code:
      forvalues r = 1/3 {
          forvalues s = 0/1 {
              lincom _b[`r'.race] + _b[`s'.smoke] + _b[`r'.race#`s'.smoke], or
          }
      }
      Added: Crossed with #2.
      Last edited by Clyde Schechter; 01 Apr 2023, 18:32.

      Comment


      • #4
        Dear Clyde and Bader,

        Thank you both for your quick responses. The sample code provided by Clyde is greatly appreciated and I wanted to follow to make sure that I understand the output and interpretation. The output is shown below and it was interesting to note that some of the estimates and 95% CIs from the lincom command are identical to those from the main logistic regression results. For example, the OR for smoke in the logistic regression is about 5.75 and equivalent to the lincom results for the joint effect of white race and smoker status on low birthweight. I will need to learn more about how to interpret categorical variable interactions but I am confused by why this equivalence is happening. Also is it correct to interpret this effect for example as "Among white individuals, those who smoked had 5.75 higher odds of low birthweight compared to those who did not smoke"?


        Code:
        logistic low i.race##i.smoke
        
        Logistic regression                             Number of obs     =        189
                                                        LR chi2(5)        =      17.85
                                                        Prob > chi2       =     0.0031
        Log likelihood = -108.40889                     Pseudo R2         =     0.0761
        
        -------------------------------------------------------------------------------
                  low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
        --------------+----------------------------------------------------------------
                 race |
               Black  |   4.545453   3.419405     2.01   0.044     1.040506    19.85683
               Other  |   5.714284    3.39782     2.93   0.003     1.781646    18.32746
                      |
                smoke |
              Smoker  |   5.757574    3.44462     2.93   0.003     1.782321    18.59916
                      |
           race#smoke |
        Black#Smoker  |    .573158   .5916341    -0.54   0.590     .0757938     4.33426
        Other#Smoker  |   .2171053   .1916639    -1.73   0.084     .0384784    1.224967
                      |
                _cons |         .1   .0524405    -4.39   0.000     .0357788    .2794949
        -------------------------------------------------------------------------------
        
        . forvalues r = 1/3 {
          2.     forvalues s = 0/1 {
          3.         lincom _b[`r'.race] + _b[`s'.smoke] + _b[`r'.race#`s'.smoke], or
          4.     }
          5. }
        
         ( 1)  [low]1b.race + [low]0b.smoke + [low]1b.race#0b.smoke = 0
        
        ------------------------------------------------------------------------------
                 low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 (1) |          1  (omitted)
        ------------------------------------------------------------------------------
        
         ( 1)  [low]1b.race + [low]1.smoke + [low]1b.race#1o.smoke = 0
        
        ------------------------------------------------------------------------------
                 low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 (1) |   5.757574    3.44462     2.93   0.003     1.782321    18.59916
        ------------------------------------------------------------------------------
        
         ( 1)  [low]2.race + [low]0b.smoke + [low]2o.race#0b.smoke = 0
        
        ------------------------------------------------------------------------------
                 low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 (1) |   4.545453   3.419405     2.01   0.044     1.040506    19.85683
        ------------------------------------------------------------------------------
        
         ( 1)  [low]2.race + [low]1.smoke + [low]2.race#1.smoke = 0
        
        ------------------------------------------------------------------------------
                 low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 (1) |         15   12.47497     3.26   0.001     2.938845    76.56066
        ------------------------------------------------------------------------------
        
         ( 1)  [low]3.race + [low]0b.smoke + [low]3o.race#0b.smoke = 0
        
        ------------------------------------------------------------------------------
                 low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 (1) |   5.714284    3.39782     2.93   0.003     1.781646    18.32746
        ------------------------------------------------------------------------------
        
         ( 1)  [low]3.race + [low]1.smoke + [low]3.race#1.smoke = 0
        
        ------------------------------------------------------------------------------
                 low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 (1) |   7.142855   5.614562     2.50   0.012     1.530363    33.33875
        ------------------------------------------------------------------------------
        
        .
        Thank you!

        Comment


        • #5
          For example, the OR for smoke in the logistic regression is about 5.75 and equivalent to the lincom results for the joint effect of white race and smoker status on low birthweight. I will need to learn more about how to interpret categorical variable interactions but I am confused by why this equivalence is happening.
          This is precisely the way it is supposed to work. When you do the interaction model with ##, as you did, and as is, for most purposes, the best approach, the "main effects" are not main effects any more. They are effects corresponding to an interaction with the base category of the other variable in the interaction. So, in your situation, where White and non-smoker are the base categories for race and smoking, the coefficient of Black under the race variable does not represent an effect of Black. Rather it represents the effect of Black, conditional on smoke being in its base category. So the "Black" coefficient or odds ratio is actually that of the Black#non-smoker interaction level.

          For a particularly clear and lucid explanation of interpretation interaction models, I recommend https://www3.nd.edu/~rwilliam/stats2/l53.pdf, by the excellent Richard Williams.

          Also is it correct to interpret this effect for example as "Among white individuals, those who smoked had 5.75 higher odds of low birthweight compared to those who did not smoke"?
          Almost, but not quite. The interpretation would be that among White individuals, those who smoked had odds of low birthweight 5.75 times as high as those who did not smoke. (5.75 times higher odds would correspond to an odds ratio of 6.75.)

          Added: In interpreting your results, you should not just look at the odds ratios. Pay attention also to the confidence intervals around them. As befits a modest sample of this size, the confidence intervals are very wide, so these odds ratio estimates are rather imprecise. For example, the odds ratio of 5.75 we have just discussed as a confidence interval ranging from 1.8 to 18.6! So we really haven't pinned it down very much at all.
          Last edited by Clyde Schechter; 02 Apr 2023, 11:10.

          Comment


          • #6
            Thank you so much for all of your help and for sharing the link to Richard Williams' explanation. Very much appreciated!

            Comment

            Working...
            X