No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correspondence between the statistical significant effect of coefficients in a table and marginal effects in a graph

    Dear members of the list,

    For a sample of highly educated young adults in 24 countries that participated in PIAAC survey, I estimate the probability of attaining a master level degree instead of a bachelor level one. Thus, my dependent variable is dichotomous. My key independent variable is father's education, which is a variable with three categories, corresponding to basic, intermediate and higher education. Controlling for gender and age, I want to estimate the effect of father's education on the attainment of a higher level degree (master) instead of a lower level one among the individuals in the sample. This is my model:

    PHP Code:
    xtmelogit univ i.edufath female age || cntryid3

    In principle, my results show that father's education has an statistically significant effect on the probability of attaining a master level degree instead of a bachelor level one. See coefficients corresponding to ISCED 3/4 and ISCED 5/6 in the following table:

    PHP Code:

    Fitting comparison model

    Iteration 0:   log likelihood = -12825.158  
    Iteration 1
    :   log likelihood =  -12512.74  
    Iteration 2
    :   log likelihood = -12511.102  
    Iteration 3
    :   log likelihood = -12511.102  

    Fitting full model

    tau =  0.0     log likelihood = -12511.102
    =  0.1     log likelihood = -11306.365
    =  0.2     log likelihood = -11278.014
    =  0.3     log likelihood = -11276.239
    =  0.4     log likelihood = -11317.484

    Iteration 0
    :   log likelihood =  -11258.67  
    Iteration 1
    :   log likelihood = -11162.029  
    Iteration 2
    :   log likelihood = -11153.553  
    Iteration 3
    :   log likelihood = -11153.329  
    Iteration 4
    :   log likelihood = -11153.329  

    -effects logistic regression              Number of obs     =     19,663
    Group variable
    cntryid3                        Number of groups  =         24

    Random effects u_i 
    Gaussian                   Obs per group:
    min =        320
    =      819.3
    =      3,302

    Integration method
    mvaghermite                 Integration pts.  =         12

                                                    Wald chi2
    (4)      =     340.00
    Log likelihood  
    = -11153.329                    Prob chi2       =     0.0000

    master |      Coef.   StdErr.      z    P>|z|     [95ConfInterval]
    edufather |
    ISCED 3/4  |   .1350686   .0466318     2.90   0.004     .0436719    .2264653
      ISCED 5
    /6  |   .6315997   .0445544    14.18   0.000     .5442747    .7189247
    female |  -.0768024   .0331973    -2.31   0.021    -.1418679    -.011737
    |   .0336828    .003584     9.40   0.000     .0266582    .0407073
    |  -2.447529   .3277209    -7.47   0.000     -3.08985   -1.805208
    lnsig2u |   .6325732    .304019                      .0367069    1.228439
    sigma_u |   1.372023   .2085606                      1.018523    1.848214
    |   .3639468   .0703772                      .2397336    .5093969
    LR test of rho=0chibar2(01) = 2715.55                Prob >= chibar2 0.000 

    Next, I proceed to estimate the average marginal effect of different categories of father's education on the probability of attaining a master level degree instead of a bachelor one:

    PHP Code:
    margins edufatherpredict(mu fixedonlyvsquish level(95post

    I do not understand why, if the effect is statistically significant in the results (table above), the confidence intervals in the graph overlap. See next:

    Is there anyone who could help me to understand the correspondence between the statistical significance of the coefficients in the table and the overlap of the confidence intervals in the graph? Which one of these results should I credit?

    Many thanks for your attention

    Kind regards

    Luis Ortiz
    Attached Files
    Last edited by Luis Ortiz; 12 Dec 2019, 12:58.

  • #2
    There are several reasons for this. It is entirely possible for the difference between two estimates, each of which is rather imprecisely estimated by the data, to nevertheless be very precisely estimated. In the language of statistical significance this translates to: there is nothing surprising about overlapping confidence intervals for things that have a statistically significant difference. It happens frequently. The regression coefficients you are seeing in the output are, in a different metric, estimates of the differences.

    "In a different metric" is also in play here. The predicted margins are probabilities, the coefficients are log odds ratios. They are related to each other rather distantly. First there is a non-linear transformation from the coefficients to predicted values for individual observations, and then there is a lot of averaging of those individual predicted values so as to take into account the base outcome probabilities. A very large regression coefficient (or odds ratio) can correspond to a very small probability difference if we are starting from a very large or very small probability.

    Finally, by doing the margins to predict only the fixed portion of the model, you are adding yet more distance between the margins output and the regression results.

    In short, there are many reasons why the two things you are looking at are so different, and there is little reason to expect them to come out in a similar way.

    All that said, you are also passing this through the statistical significance. This imposes an arbitrary dichotomous classification on inherently continuous p-values and, in general, leads people to all sorts of paradoxes. This is just one of the many reasons that The American Statistical Association has recommended that the concept of statistical significance be abandoned. See for the "executive summary" and for all 43 supporting articles. Or for the tl;dr..


    • #3
      Thanks for your so rich answer, Clyde. It is really informative

      Your answer me realizaing the distance between the coefficients in my table and the predictive margins that I plot. Could using 'margins, contrast' be a better way of approximating the difference in the predicted probability of attaining an MA vs a BA betweeh the categories of the key independent variable:

      PHP Code:
      xtmelogit master i.edufath female age  || cntryid3:
      margins r.edufather

      Kind regards