Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to cope categorical#continuous (interaction) if not all categories are significant in logistic regression

    Hello,

    I use Stata 14.1 and I have 3 variables: a, b, and c. a is the dependent variable(0/1). b is a categorical independent variable(0/1/2, 2 is the reference). c is a continuous independent variable. Other independent variables are not listed here.

    I have a main effects model: logit(a)=cons+coef1*b_0+coef2*b_1+coef3*c. The P for coef1 and coef3 are less than 0.05 while for coef2 is over 0.05.

    Then I ran the interaction between the 2 variables: logit a ib2.b c ib2.b#c.c,nolog. This time the P for the coefficient of b_0#c is over 0.05 while for b_1#c is less than 0.05.

    So I am wondering if I should report the interaction? Or maybe there is something wrong with my analysis?

    Thank you!

  • #2
    I have a main effects model: logit(a)=cons+coef1*b_0+coef2*b_1+coef3*c. The P for coef1 and coef3 are less than 0.05 while for coef2 is over 0.05.

    Then I ran the interaction between the 2 variables: logit a ib2.b c ib2.b#c.c,nolog. This time the P for the coefficient of b_0#c is over 0.05 while for b_1#c is less than 0.05.
    I can't make any sense out of this. What is b_0? What is b_1? You said you had variables a, b, and c. How can your second equation have a coefficient for b_0#c when there is no such term in your model. Please don't paraphrase or describe. Show, by copy/pasting from your Results window or your log file, directly into a code block on this forum, exactly what code you ran and exactly what output you got from Stata. Don't edit anything.

    Comment


    • #3
      Hi Clyde, thank you for the reply! So the following is the information:

      I use Stata 14.1 and I have 3 variables: a, b, and c. a is the dependent variable(0/1). b is a categorical independent variable(0/1/2, 2 is the reference). c is a continuous independent variable. Other independent variables are not listed here.

      First I used this command and got the following output:
      logit a c ib2.b,nolog

      a | Coef. Std. Err. z P>|z| [95% Conf. Interval]
      -----------------+----------------------------------------------------------------
      c | .0160674 .0030394 5.29 0.000 .0101102 .0220246
      |
      b |
      0 | .7348515 .2097785 3.50 0.000 .3236933 1.14601
      1 | .0630187 .1829749 0.34 0.731 -.2956055 .4216429
      |
      _cons | -5.046531 .5065621 -9.96 0.000 -6.039374 -4.053687

      Then I ran this command and got the following results:
      logit a c ib2.b ib2.b#c.c,nolog

      a| Coef. Std. Err. z P>|z| [95% Conf. Interval]
      -----------------+----------------------------------------------------------------
      c| .0231053 .0042286 5.46 0.000 .0148174 .0313931
      |
      b|
      0 | 1.06985 .3673725 2.91 0.004 .3498129 1.789887
      1 | .7063934 .3047116 2.32 0.020 .1091697 1.303617
      |
      b#c.c |
      0 | -.0093783 .0075375 -1.24 0.213 -.0241516 .005395
      1 | -.0177502 .0069365 -2.56 0.010 -.0313455 -.0041548
      |
      _cons | -5.161064 .5102007 -10.12 0.000 -6.161039 -4.161089

      So as shown in the first result (main effect model), when b=0 the P is less than 0.05, and when b=1, P is over 0.05. When the interaction is added, the P for the interaction is less than 0.05 when b=1 while P for the interaction is over 0.05 when b=0.

      So I am wondering how to interpret the output? Should the interaction be considered?

      Thank you!

      Comment


      • #4
        Code:
        logit a c ib2.b,nolog
        
        a | Coef. Std. Err. z P>|z| [95% Conf. Interval]
        -----------------+----------------------------------------------------------------
        c | .0160674 .0030394 5.29 0.000 .0101102 .0220246
        |
        b |
        0 | .7348515 .2097785 3.50 0.000 .3236933 1.14601
        1 | .0630187 .1829749 0.34 0.731 -.2956055 .4216429
        |
        _cons | -5.046531 .5065621 -9.96 0.000 -6.039374 -4.053687
        
        ...
        logit a c ib2.b ib2.b#c.c,nolog
        
        a| Coef. Std. Err. z P>|z| [95% Conf. Interval]
        -----------------+----------------------------------------------------------------
        c| .0231053 .0042286 5.46 0.000 .0148174 .0313931
        |
        b|
        0 | 1.06985 .3673725 2.91 0.004 .3498129 1.789887
        1 | .7063934 .3047116 2.32 0.020 .1091697 1.303617
        |
        b#c.c |
        0 | -.0093783 .0075375 -1.24 0.213 -.0241516 .005395
        1 | -.0177502 .0069365 -2.56 0.010 -.0313455 -.0041548
        |
        _cons | -5.161064 .5102007 -10.12 0.000 -6.161039 -4.161089
        OK. This would have been more readable if you had put it in a code block, as requested. In the future, please do so.

        You have two different models here, and there is no reason to expect that the 0.b and 1.b coeffients will be the same in both models, or even that their "significance" will be the same. They mean different things altogether.

        In your first model, 0.b and 1.b are the log odds ratios of a when b = 0 and b = 1, respectively, compared to the case b = 2. They are, if you wish, the effects of levels 0 and 1 on a in the log-odds metric.

        In your second model, because you added an interaction term, there is no such thing as the effects of levels 0 and 1 on a. Rather there are different effects of those levels on a, which depend on the value of c. In particular 0.b and 1.b in this model are the effects of levels 0 and 1 on a conditional on c = 0. So these are basically unrelated to the corresponding results in the first model.

        Given that your interaction coefficients are fairly large in comparison to the coefficient of c, you probably should use the second model and discard the first. The fact that the one of the interaction terms is also statistically significant is further reason to support using the second model in preference to the first.

        There remains the question whether c = 0 is a useful or interesting value of c in your data. If not, consider re-running your model with c centered at some interesting value so that your results have more meaning and utility.

        Comment


        • #5
          OK. Thank you very much for the explanation Clyde!

          Comment

          Working...
          X