Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Significant coefficient in logistic regression, but overlapping margins

    Dear members,

    I am running a logistic regression of a binary variable on a set of independent variables. The coefficient of my key independent variable y is significant on the 5% level (p=0.018). To assess the substantive effect of the variable, I run margins after the estimation to calculate predicted probabilities for different levels of the independent variable (using the at options of margins) and finally plot them using marginsplot. The problem that arises is: all of the margins are not statistically different from each other. The 95% confidence intervals for the predicted probabilities always overlap. I could not find a single pair of margins that is statistically different from each other.

    Is there any statistical explanation for this? Common sense tells me that there cannot be a signficant effect, when all the predictions of the outcome overlap. Any idea what the reason could be?

    I clustered standard errors in the logistic regression and use the vce(unconditioinal) and asobserved options of margins, because I want to make inference from a survey sample to the general population (as suggested in the Stata manual). I calculated the margins over the observed range of the independent variable y, from the minimum to the maximum value.

    Thanks for your Help.

    Best regards,
    Felix

  • #2
    This is a good example of the problems that arise from using the concept of statistical significance, and one of the reasons why the American Statistical Association has recommended it be abandoned. See https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and
    https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr.

    By recasting the continuous effect estimates (and even the continuous p-values) into a false dichotomy of significant vs non-significant, the illusion is created that these correspond to "effect" and "no effect," which intuition suggests should be consistently arrived at when different measures are used. That, of course, is all wrong.

    There are several factors at play here. First of all, even in simple things like t-tests, looking at the difference between two means is not the same as looking at whether the confidence intervals of those means overlap. When the confidence intervals do not overlap, the difference is always statistically significant, but the reverse is not always true. Your situation is this in a different metric: logistic regression coefficient (a single measure of difference) vs overlap of confidence intervals of two measures of the corresponding levels. See https://journals.sagepub.com/doi/pdf...10581001900316 for a full explanation. In any case, it is perfectly possible for a data set to provide a very precise estimate of the difference between two things while providing only vague estimates of the two things themselves. That is what you are seeing. (And you might see it in your probability estimates too if you ran -margins- again with the -pwcompare- option.)

    The fact that you are working in a logistic model adds another complication. A logistic regression coefficient is the logarithm of the (adjusted) odds ratio. But when the baseline probabilities are high, the odds ratios greatly exaggerate the effects compared to what is seen when looking at the corresponding probabilities. A very large odds ratio can correspond to a very tiny difference in probability between groups. For example if one group has an outcome probability of .97, an odds ratio of 4 (which is huge for an odds ratio between two groups) means that the other group has a probability of .9999909. That's a difference of less then 0.03, which in many contexts would be meaninglessly small. Since you don't show your actual outputs, I can't say whether something like this is happening in your situation.

    So what you need to do is carefully review what your research goal was in the first place. Why did you gather the data? What question are you trying to answer, and how will you put your results to use? Which his more important, the odds ratio, which is more a theoretical measure of strength of association between y and the outcome, or the actual probabilities Will the results be used for decision making: if so. the probabilities will be more useful than the odds ratio.

    Comment


    • #3
      Without showing your output you’re much less likely to get a helpful response. I have some ideas, but I can’t afford to be a detective.

      Comment


      • #4
        Felix Scholl, here is another article on overlapping confidence intervals that you might find helpful.
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          @all Thanks a lot, this is very helpful. I attach the output below, if that is of any help.

          @Clyde Schechter: I assume the same is true if the baseline probability is very low, right? This would rather be the problem in my case, since the sample probability is only 0.035.

          Here is the relevant output from the logistic and margins command (I only show three variables here, but there are many more in the model).

          Code:
          Iteration 0:   log pseudolikelihood = -2695.6077  
          Iteration 1:   log pseudolikelihood = -2623.2003  
          Iteration 2:   log pseudolikelihood = -2616.6522  
          Iteration 3:   log pseudolikelihood = -2616.6386  
          Iteration 4:   log pseudolikelihood = -2616.6386  
          
          Logistic regression                             Number of obs     =     17,571
                                                          Wald chi2(36)     =     207.25
                                                          Prob > chi2       =     0.0000
          Log pseudolikelihood = -2616.6386               Pseudo R2         =     0.0293
          
                                             (Std. Err. adjusted for 198 clusters in int_date)
          ----------------------------------------------------------------------------------------
                                 |               Robust
                         problem | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -----------------------+----------------------------------------------------------------
                   medex7d_print |   1.070994   .0311648     2.36   0.018     1.011621    1.133852
                   conversations |   .8283887     .12231    -1.28   0.202     .6202344    1.106401
                        interest |   2.035035   .3810907     3.79   0.000     1.409846     2.93746
          
          
          Predictive margins                              Number of obs     =     17,571
          
          Expression   : Pr(problem), predict()
          
          1._at        : medex7d_pr~t    =           0
          
          2._at        : medex7d_pr~t    =           1
          
          3._at        : medex7d_pr~t    =           2
          
          4._at        : medex7d_pr~t    =           3
          
          5._at        : medex7d_pr~t    =           4
          
          6._at        : medex7d_pr~t    =           5
          
          7._at        : medex7d_pr~t    =           6
          
          8._at        : medex7d_pr~t    =           7
          
          9._at        : medex7d_pr~t    =           8
          
          10._at       : medex7d_pr~t    =           9
          
          11._at       : medex7d_pr~t    =          10
          
          12._at       : medex7d_pr~t    =          11
          
          13._at       : medex7d_pr~t    =          12
          
          14._at       : medex7d_pr~t    =          13
          
          15._at       : medex7d_pr~t    =          14
          
          16._at       : medex7d_pr~t    =          15
          
          17._at       : medex7d_pr~t    =          16
          
          18._at       : medex7d_pr~t    =          17
          
          19._at       : medex7d_pr~t    =          18
          
          20._at       : medex7d_pr~t    =          19
          
          21._at       : medex7d_pr~t    =          20
          
          22._at       : medex7d_pr~t    =          21
          
          23._at       : medex7d_pr~t    =          22
          
          24._at       : medex7d_pr~t    =          23
          
          25._at       : medex7d_pr~t    =          24
          
          26._at       : medex7d_pr~t    =          25
          
          27._at       : medex7d_pr~t    =          26
          
          28._at       : medex7d_pr~t    =          27
          
          29._at       : medex7d_pr~t    =          28
          
          30._at       : medex7d_pr~t    =          28
          
                                   (Std. Err. adjusted for 198 clusters in pre_intdatum)
          ------------------------------------------------------------------------------
                       |            Unconditional
                       |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   _at |
                    1  |   .0333932   .0021651    15.42   0.000     .0291497    .0376368
                    2  |   .0356589   .0020639    17.28   0.000     .0316137     .039704
                    3  |   .0380708   .0024198    15.73   0.000      .033328    .0428135
                    4  |   .0406374   .0032113    12.65   0.000     .0343434    .0469313
                    5  |   .0433675   .0043217    10.03   0.000     .0348971    .0518379
                    6  |   .0462702   .0056787     8.15   0.000     .0351402    .0574002
                    7  |    .049355   .0072534     6.80   0.000     .0351385    .0635715
                    8  |   .0526317   .0090391     5.82   0.000     .0349154    .0703481
                    9  |   .0561104   .0110388     5.08   0.000     .0344747    .0777462
                   10  |   .0598015   .0132608     4.51   0.000     .0338108    .0857923
                   11  |   .0637157    .015716     4.05   0.000      .032913    .0945185
                   12  |   .0678639   .0184168     3.68   0.000     .0317677    .1039601
                   13  |   .0722572   .0213767     3.38   0.001     .0303596    .1141548
                   14  |   .0769069   .0246097     3.13   0.002     .0286727    .1251411
                   15  |   .0818245   .0281301     2.91   0.004     .0266905    .1369585
                   16  |   .0870214   .0319521     2.72   0.006     .0243964    .1496464
                   17  |   .0925091   .0360898     2.56   0.010     .0217744    .1632438
                   18  |    .098299   .0405567     2.42   0.015     .0188093    .1777888
                   19  |   .1044025   .0453659     2.30   0.021      .015487    .1933181
                   20  |   .1108307   .0505293     2.19   0.028     .0117951    .2098663
                   21  |   .1175943   .0560578     2.10   0.036      .007723    .2274655
                   22  |   .1247036   .0619609     2.01   0.044     .0032625    .2461446
                   23  |   .1321686   .0682463     1.94   0.053    -.0015916    .2659288
                   24  |   .1399985   .0749197     1.87   0.062    -.0068415    .2868385
                   25  |   .1482018   .0819848     1.81   0.071    -.0124855    .3088892
                   26  |   .1567863   .0894425     1.75   0.080    -.0185178    .3320904
                   27  |   .1657586   .0972909     1.70   0.088    -.0249281    .3564453
                   28  |   .1751243    .105525     1.66   0.097    -.0317009    .3819495
                   29  |   .1848878   .1141364     1.62   0.105    -.0388155    .4085911
                   30  |   .1848878   .1141364     1.62   0.105    -.0388155    .4085911
          ------------------------------------------------------------------------------
          
            Variables that uniquely identify margins: medex7d_print _atopt
            Multiple at() options specified:
                _atoption=1: medex7d_print==(0(1)28)
                _atoption=2: medex7d_print==28


          Comment


          • #6
            I assume the same is true if the baseline probability is very low, right? This would rather be the problem in my case, since the sample probability is only 0.035.
            Yes, at either end, a large odds ratio can correspond to a small change in probability.

            Comment


            • #7
              Clyde has answered your question. Once I saw the issue I thought of Clyde’s example of comparing means. It’s not hard to find an example where the means are statistically different with p < 0.05 and yet the 95% CIs overlap.

              Out of curiosity, did you compute the average marginal effect on the probability?

              Comment


              • #8
                Jeff Wooldridge When I calculate average marginal effects I get these results:

                Code:
                 qui: logit problem medex7d_print `controlvars', or vce(cluster date)
                
                  margins, post dydx(medex7d_print)  vce(unconditional)
                
                Average marginal effects                        Number of obs     =     17,571
                
                Expression   : Pr(problem), predict()
                dy/dx w.r.t. : medex7d_print
                
                                          (Std. Err. adjusted for 198 clusters in date)
                -------------------------------------------------------------------------------
                              |            Unconditional
                              |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
                --------------+----------------------------------------------------------------
                medex7d_print |   .0023257   .0009817     2.37   0.018     .0004016    .0042499
                -------------------------------------------------------------------------------
                Hence, the p-value is identical to the p-value of the coefficient in the regression.

                Comment


                • #9
                  Dear Members,
                  I have two regression equations. After each regression, I run margins and marginsplot. Now with estimates from "margin" in the two regression equations, I would like to create one figure.
                  Please any have an idea of how I could combine the two plots/ graphs?

                  Comment


                  • #10
                    This seems to be a different query. Please start a new thread.
                    Best regards,

                    Marcos

                    Comment


                    • #11
                      Sorry Marcos, I'm new here and did not know this. Thank you

                      Comment


                      • #12
                        For those who are interested, Stella's new thread can be seen here:
                        --
                        Bruce Weaver
                        Email: [email protected]
                        Version: Stata/MP 18.5 (Windows)

                        Comment

                        Working...
                        X