Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Significance of gologit2 model

    Hello,

    I'm writing a paper for my Bachelor's degree and I have to create a model from my data. My dependent variable is "wsk" and it takes on values {0,1,2,3,4}. At first I've decided to use ologit and got those resullts:
    Code:
    . ologit $ylist $xlist
    
    Iteration 0:  Log likelihood = -14692.098  
    Iteration 1:  Log likelihood = -13999.216  
    Iteration 2:  Log likelihood = -13991.354  
    Iteration 3:  Log likelihood = -13991.343  
    Iteration 4:  Log likelihood = -13991.343  
    
    Ordered logistic regression                            Number of obs =  12,058
                                                           LR chi2(14)   = 1401.51
                                                           Prob > chi2   =  0.0000
    Log likelihood = -13991.343                            Pseudo R2     =  0.0477
    
    -------------------------------------------------------------------------------
              wsk | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    --------------+----------------------------------------------------------------
             wiek |   .0585921    .013604     4.31   0.000     .0319286    .0852555
                  |
    c.wiek#c.wiek |  -.0007287   .0001479    -4.93   0.000    -.0010187   -.0004388
                  |
           dzieci |
               0  |          0  (base)
               1  |  -.0489784   .0514094    -0.95   0.341    -.1497389    .0517821
               2  |  -.1758044   .0555996    -3.16   0.002    -.2847776   -.0668312
               3  |  -.2241205   .0871638    -2.57   0.010    -.3949584   -.0532826
              4+  |   -.586104   .1625936    -3.60   0.000    -.9047817   -.2674264
                  |
            wyksz |
          Podst.  |          0  (base)
         Średnie  |   .3281631   .0906859     3.62   0.000      .150422    .5059043
          Zawod.  |   .6079925   .1064372     5.71   0.000     .3993794    .8166056
          Wyższe  |    1.66815   .0946209    17.63   0.000     1.482696    1.853603
                  |
              msc |
            Wies  |          0  (base)
     Miasto <100  |    .307847   .0680011     4.53   0.000     .1745672    .4411267
     Miasto <500  |    .208038    .099792     2.08   0.037     .0124493    .4036268
     Miasto >500  |  -.0575705   .0454743    -1.27   0.206    -.1466984    .0315574
                  |
          cywilny |
               0  |          0  (base)
               1  |   .0558947   .0490111     1.14   0.254    -.0401652    .1519547
                  |
              pow |  -.0056968   .0008245    -6.91   0.000    -.0073127   -.0040808
    --------------+----------------------------------------------------------------
            /cut1 |   -2.82994   .3166887                     -3.450639   -2.209242
            /cut2 |  -.2598244   .3085078                     -.8644885    .3448397
            /cut3 |   2.159625   .3091364                      1.553729    2.765521
            /cut4 |    4.54477   .3115532                      3.934137    5.155403
    -------------------------------------------------------------------------------
    Then I've run Brant test for proportional odds assumption:
    Code:
    . brant
    
    Brant test of parallel regression assumption
    
                    |       chi2     p>chi2      df
     ---------------+------------------------------
                All |     270.18      0.000      42
     ---------------+------------------------------
               wiek |      18.64      0.000       3
      c.wiek#c.wiek |      28.18      0.000       3
           1.dzieci |       3.32      0.345       3
           2.dzieci |      17.53      0.001       3
           3.dzieci |      20.01      0.000       3
           4.dzieci |      17.02      0.001       3
            1.wyksz |      15.15      0.002       3
            2.wyksz |       7.80      0.050       3
            3.wyksz |      23.32      0.000       3
              1.msc |       0.20      0.977       3
              2.msc |       4.86      0.182       3
              3.msc |       5.79      0.122       3
          1.cywilny |      13.81      0.003       3
                pow |      18.02      0.000       3
    
    A significant test statistic provides evidence that the parallel
    regression assumption has been violated.
    The assumption has been violated and I found out about gologit2. I've run it with autofit and got a new result:
    Code:
    . gologit2 $ylist $xlist, autofit
    
    ------------------------------------------------------------------------------
    Testing parallel lines assumption using the .05 level of significance...
    
    Step  1:  Constraints for parallel lines imposed for 1.msc (P Value = 0.8959)
    Step  2:  Constraints for parallel lines imposed for 1.dzieci (P Value = 0.2974)
    Step  3:  Constraints for parallel lines imposed for 3.msc (P Value = 0.1227)
    Step  4:  Constraints for parallel lines imposed for 2.msc (P Value = 0.0570)
    Step  5:  Constraints for parallel lines are not imposed for
              wiek (P Value = 0.00179)
              c.wiek#c.wiek (P Value = 0.00003)
              2.dzieci (P Value = 0.00278)
              3.dzieci (P Value = 0.00164)
              4.dzieci (P Value = 0.00209)
              1.wyksz (P Value = 0.00218)
              2.wyksz (P Value = 0.04976)
              3.wyksz (P Value = 0.00001)
              1.cywilny (P Value = 0.00285)
              pow (P Value = 0.00068)
    
    Wald test of parallel lines assumption for the final model:
    
     ( 1)  [0]1.msc - [1]1.msc = 0
     ( 2)  [0]2.msc - [1]2.msc = 0
     ( 3)  [0]3.msc - [1]3.msc = 0
     ( 4)  [0]1.dzieci - [1]1.dzieci = 0
     ( 5)  [0]1.msc - [2]1.msc = 0
     ( 6)  [0]2.msc - [2]2.msc = 0
     ( 7)  [0]3.msc - [2]3.msc = 0
     ( 8)  [0]1.dzieci - [2]1.dzieci = 0
     ( 9)  [0]1.msc - [3]1.msc = 0
     (10)  [0]2.msc - [3]2.msc = 0
     (11)  [0]3.msc - [3]3.msc = 0
     (12)  [0]1.dzieci - [3]1.dzieci = 0
    
               chi2( 12) =   17.53
             Prob > chi2 =    0.1307
    
    An insignificant test statistic indicates that the final model
    does not violate the proportional odds/ parallel lines assumption
    
    If you re-estimate this exact same model with gologit2, instead
    of autofit you can save time by using the parameter
    
    pl(0b.msc 1.msc 2.msc 3.msc 0b.dzieci 1.dzieci 0b.wyksz 0b.cywilny)
    
    ------------------------------------------------------------------------------
    
    Generalized Ordered Logit Estimates                    Number of obs =  12,058
                                                           LR chi2(44)   = 1661.95
                                                           Prob > chi2   =  0.0000
    Log likelihood = -13861.125                            Pseudo R2     =  0.0566
    
     ( 1)  [0]1.msc - [1]1.msc = 0
     ( 2)  [0]2.msc - [1]2.msc = 0
     ( 3)  [0]3.msc - [1]3.msc = 0
     ( 4)  [0]1.dzieci - [1]1.dzieci = 0
     ( 5)  [1]1.msc - [2]1.msc = 0
     ( 6)  [1]2.msc - [2]2.msc = 0
     ( 7)  [1]3.msc - [2]3.msc = 0
     ( 8)  [1]1.dzieci - [2]1.dzieci = 0
     ( 9)  [2]1.msc - [3]1.msc = 0
     (10)  [2]2.msc - [3]2.msc = 0
     (11)  [2]3.msc - [3]3.msc = 0
     (12)  [2]1.dzieci - [3]1.dzieci = 0
    -------------------------------------------------------------------------------
              wsk | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    --------------+----------------------------------------------------------------
    0             |
             wiek |   .1738873     .04645     3.74   0.000     .0828469    .2649277
                  |
    c.wiek#c.wiek |  -.0022763   .0004668    -4.88   0.000    -.0031913   -.0013613
                  |
           dzieci |
               1  |  -.0259317   .0520372    -0.50   0.618    -.1279229    .0760594
               2  |  -.0630619   .2823788    -0.22   0.823    -.6165142    .4903905
               3  |   -.293491   .4333872    -0.68   0.498    -1.142914    .5559324
              4+  |  -1.670009   .4422319    -3.78   0.000    -2.536767   -.8032501
                  |
            wyksz |
         Średnie  |   1.068135   .2314692     4.61   0.000      .614464    1.521807
          Zawod.  |   1.056128   .3375122     3.13   0.002     .3946162     1.71764
          Wyższe  |   2.468986   .3297876     7.49   0.000     1.822615    3.115358
                  |
              msc |
     Miasto <100  |   .3133377   .0679789     4.61   0.000     .1801015    .4465739
     Miasto <500  |   .2194838   .0994639     2.21   0.027     .0245383    .4144294
     Miasto >500  |  -.0549805   .0453297    -1.21   0.225    -.1438251    .0338642
                  |
          cywilny |
               1  |    .643272   .1860395     3.46   0.001     .2786414    1.007903
                  |
              pow |  -.0043335   .0030898    -1.40   0.161    -.0103894    .0017224
            _cons |   .0226806   1.103362     0.02   0.984    -2.139868     2.18523
    --------------+----------------------------------------------------------------
    1             |
             wiek |   .0893436   .0185294     4.82   0.000     .0530267    .1256604
                  |
    c.wiek#c.wiek |  -.0011449   .0001954    -5.86   0.000    -.0015279   -.0007619
                  |
           dzieci |
               1  |  -.0259317   .0520372    -0.50   0.618    -.1279229    .0760594
               2  |  -.3578178   .0795758    -4.50   0.000    -.5137835   -.2018521
               3  |  -.5521508   .1222399    -4.52   0.000    -.7917367   -.3125649
              4+  |  -1.004846   .1966145    -5.11   0.000    -1.390204    -.619489
                  |
            wyksz |
         Średnie  |   .3820885   .1117411     3.42   0.001       .16308     .601097
          Zawod.  |   .6302509   .1415816     4.45   0.000     .3527561    .9077457
          Wyższe  |   1.316392   .1234092    10.67   0.000     1.074514    1.558269
                  |
              msc |
     Miasto <100  |   .3133377   .0679789     4.61   0.000     .1801015    .4465739
     Miasto <500  |   .2194838   .0994639     2.21   0.027     .0245383    .4144294
     Miasto >500  |  -.0549805   .0453297    -1.21   0.225    -.1438251    .0338642
                  |
          cywilny |
               1  |   .1059954    .070322     1.51   0.132    -.0318332     .243824
                  |
              pow |  -.0082419   .0010485    -7.86   0.000    -.0102968   -.0061869
            _cons |  -.0808419   .4303625    -0.19   0.851    -.9243369    .7626531
    --------------+----------------------------------------------------------------
    2             |
             wiek |    .026262   .0152546     1.72   0.085    -.0036365    .0561605
                  |
    c.wiek#c.wiek |  -.0003398   .0001653    -2.06   0.040    -.0006638   -.0000159
                  |
           dzieci |
               1  |  -.0259317   .0520372    -0.50   0.618    -.1279229    .0760594
               2  |  -.0949569   .0608385    -1.56   0.119    -.2141982    .0242844
               3  |  -.0900274   .0966691    -0.93   0.352    -.2794954    .0994406
              4+  |  -.2983819   .1844015    -1.62   0.106    -.6598022    .0630385
                  |
            wyksz |
         Średnie  |   .2322069   .1124417     2.07   0.039     .0118251    .4525886
          Zawod.  |    .530006   .1281385     4.14   0.000     .2788592    .7811528
          Wyższe  |   1.561165   .1144265    13.64   0.000     1.336894    1.785437
                  |
              msc |
     Miasto <100  |   .3133377   .0679789     4.61   0.000     .1801015    .4465739
     Miasto <500  |   .2194838   .0994639     2.21   0.027     .0245383    .4144294
     Miasto >500  |  -.0549805   .0453297    -1.21   0.225    -.1438251    .0338642
                  |
          cywilny |
               1  |   .0413257    .055093     0.75   0.453    -.0666546     .149306
                  |
              pow |  -.0040054   .0009171    -4.37   0.000    -.0058029    -.002208
            _cons |  -1.540582   .3511412    -4.39   0.000    -2.228806   -.8523577
    --------------+----------------------------------------------------------------
    3             |
             wiek |   .0358637   .0310803     1.15   0.249    -.0250526      .09678
                  |
    c.wiek#c.wiek |  -.0003326   .0003356    -0.99   0.322    -.0009903    .0003251
                  |
           dzieci |
               1  |  -.0259317   .0520372    -0.50   0.618    -.1279229    .0760594
               2  |   .0053097   .1051348     0.05   0.960    -.2007508    .2113702
               3  |   .0402445   .1737061     0.23   0.817    -.3002133    .3807023
              4+  |  -.1932874   .3986984    -0.48   0.628    -.9747219    .5881471
                  |
            wyksz |
         Średnie  |   1.203682    .585968     2.05   0.040     .0552054    2.352158
          Zawod.  |   1.943424    .601285     3.23   0.001     .7649276    3.121921
          Wyższe  |   3.354312   .5816843     5.77   0.000     2.214232    4.494393
                  |
              msc |
     Miasto <100  |   .3133377   .0679789     4.61   0.000     .1801015    .4465739
     Miasto <500  |   .2194838   .0994639     2.21   0.027     .0245383    .4144294
     Miasto >500  |  -.0549805   .0453297    -1.21   0.225    -.1438251    .0338642
                  |
          cywilny |
               1  |  -.1435708   .1047545    -1.37   0.171    -.3488859    .0617443
                  |
              pow |  -.0053998   .0019135    -2.82   0.005    -.0091501   -.0016494
            _cons |   -5.73104   .8956403    -6.40   0.000    -7.486463   -3.975618
    -------------------------------------------------------------------------------
    But now when I look at the p-values here, there are many more insignificant variables in gologit2. So which model should I use? I also graphed the predicted probabilities, because they are much easier to interpret for me and the results differ between two models. Here are graphs with the variable wiek (age) and the probabilities for each level of wsk:
    GOLOGIT:
    Click image for larger version

Name:	gologit.jpg
Views:	1
Size:	70.0 KB
ID:	1774385





    OLOGIT:
    Click image for larger version

Name:	ologit.jpg
Views:	1
Size:	60.6 KB
ID:	1774386





    And my actual data looks like this (the percentage of each level)
    Click image for larger version

Name:	data (percentage).JPG
Views:	1
Size:	31.0 KB
ID:	1774388




    I'm not sure which model is closer to reality. Which one do you think I should use? Is the violation of the parallel lines assumption actually bad and if not, how do I justify the model?
    Attached Files
    Last edited by Jan Kaminski; 14 Mar 2025, 12:42.

  • #2
    The partial proportional odds model / gologit2 is probably more "correct". It also looks closer to your actual data judging by the pictures. What you give up with the more complex model is that the ologit model is easier to interpret though. I agree that the predictive margins are the most straight forward way to interpret the models and do hypothesis testing, rather than trying to interpret the regression output directly for the partial proportional odds model for a given research question.

    Another option is a sensitivity analysis (as long as you transparently report it in your paper). You could run and report both models for a certain research questions in that if the answer is substantively the same in both, you could potentially chose the simpler model and state that you ran a sensitivity analysis and that the results of the partial proportional odds model were within a certain range of the more simple model and that you're thus choosing the simpler model for ease of interpretation, particularly depending on your audience.

    Another option to look at which model fits better is to get the AIC and BIC fit statistics for each model.

    As a counterpoint, some prominent folks - like this link by Frank Harrell - have argued that the proportional odds assumption might not be as big of a deal as some people make it out to be at least in certain instances: https://www.fharrell.com/post/po/

    Hope that helps with some options - I think this is an "it depends" scenario! Interested in what others have to say.

    Comment


    • #3
      Originally posted by Erik Reinbergs View Post
      As a counterpoint, some prominent folks - like this link by Frank Harrell - have argued that the proportional odds assumption might not be as big of a deal as some people make it out to be at least in certain instances: https://www.fharrell.com/post/po/
      Adding to Erik's comment, I suspect that the Brant test is over-powered when n > 12,000. I.e., I suspect it can detect as statistically significant small deviations from the PO assumption that are not practically important.
      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment


      • #4
        Originally posted by Bruce Weaver View Post

        Adding to Erik's comment, I suspect that the Brant test is over-powered when n > 12,000. I.e., I suspect it can detect as statistically significant small deviations from the PO assumption that are not practically important.
        I agree. With so many variables and cases it is easy to have trivial violations of assumptions that are statistically significant. Or, just by chance alone, violations could show up as significant when there isn't any violation at all.

        I would at least specify autofit(.01). Indeed, with this many cases autofit(.001) may be reasonable, and it appears that would greatly increase the number of vars that meet PO, which would make interpretation easier. You could do sensitivity analyses such as Eric suggests to see whether the more constrained model can be justified.

        I don't know what your variables are but I might also suggest combining some categories if that seems reasonable.

        Whatever model you run (ologit, gologit, or even mlogit) I am a big fan of using marginal effects, adjusted predictions, and values for prototypical cases. They are generally much easier to understand than the raw coefficients. See

        https://www3.nd.edu/~rwilliam/xsoc73994/Margins05.pdf

        and the earlier handouts if necessary at https://www3.nd.edu/~rwilliam/xsoc73994/index.html

        I especially love the mtable command that is part of Long & Freese's spost13 package, which Jan probably already has installed since Jan is using the brant command.

        Finally I think many people use gologit2 much like they use, say, vce(robust). It is a way of dealing with some nuisance problem in the data. But, the gologit2 results may be of substantive significance and should be discussed more. See

        https://www.tandfonline.com/doi/full...X.2015.1112384

        ABSTRACT

        When outcome variables are ordinal rather than continuous, the ordered logit model, aka the proportional odds model (ologit/po), is a popular analytical method. However, generalized ordered logit/partial proportional odds models (gologit/ppo) are often a superior alternative. Gologit/ppo models can be less restrictive than proportional odds models and more parsimonious than methods that ignore the ordering of categories altogether. However, the use of gologit/ppo models has itself been problematic or at least sub-optimal. Researchers typically note that such models fit better but fail to explain why the ordered logit model was inadequate or the substantive insights gained by using the gologit alternative. This paper uses both hypothetical examples and data from the 2012 European Social Survey to address these shortcomings.


        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment

        Working...
        X