Significance of gologit2 model

Jan Kaminski

Join Date: Mar 2025
Posts: 4

Significance of gologit2 model

14 Mar 2025, 12:31

Hello,

I'm writing a paper for my Bachelor's degree and I have to create a model from my data. My dependent variable is "wsk" and it takes on values {0,1,2,3,4}. At first I've decided to use ologit and got those resullts:

Code:

. ologit $ylist $xlist

Iteration 0:  Log likelihood = -14692.098  
Iteration 1:  Log likelihood = -13999.216  
Iteration 2:  Log likelihood = -13991.354  
Iteration 3:  Log likelihood = -13991.343  
Iteration 4:  Log likelihood = -13991.343  

Ordered logistic regression                            Number of obs =  12,058
                                                       LR chi2(14)   = 1401.51
                                                       Prob > chi2   =  0.0000
Log likelihood = -13991.343                            Pseudo R2     =  0.0477

-------------------------------------------------------------------------------
          wsk | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
         wiek |   .0585921    .013604     4.31   0.000     .0319286    .0852555
              |
c.wiek#c.wiek |  -.0007287   .0001479    -4.93   0.000    -.0010187   -.0004388
              |
       dzieci |
           0  |          0  (base)
           1  |  -.0489784   .0514094    -0.95   0.341    -.1497389    .0517821
           2  |  -.1758044   .0555996    -3.16   0.002    -.2847776   -.0668312
           3  |  -.2241205   .0871638    -2.57   0.010    -.3949584   -.0532826
          4+  |   -.586104   .1625936    -3.60   0.000    -.9047817   -.2674264
              |
        wyksz |
      Podst.  |          0  (base)
     Średnie  |   .3281631   .0906859     3.62   0.000      .150422    .5059043
      Zawod.  |   .6079925   .1064372     5.71   0.000     .3993794    .8166056
      Wyższe  |    1.66815   .0946209    17.63   0.000     1.482696    1.853603
              |
          msc |
        Wies  |          0  (base)
 Miasto <100  |    .307847   .0680011     4.53   0.000     .1745672    .4411267
 Miasto <500  |    .208038    .099792     2.08   0.037     .0124493    .4036268
 Miasto >500  |  -.0575705   .0454743    -1.27   0.206    -.1466984    .0315574
              |
      cywilny |
           0  |          0  (base)
           1  |   .0558947   .0490111     1.14   0.254    -.0401652    .1519547
              |
          pow |  -.0056968   .0008245    -6.91   0.000    -.0073127   -.0040808
--------------+----------------------------------------------------------------
        /cut1 |   -2.82994   .3166887                     -3.450639   -2.209242
        /cut2 |  -.2598244   .3085078                     -.8644885    .3448397
        /cut3 |   2.159625   .3091364                      1.553729    2.765521
        /cut4 |    4.54477   .3115532                      3.934137    5.155403
-------------------------------------------------------------------------------

Then I've run Brant test for proportional odds assumption:

Code:

. brant

Brant test of parallel regression assumption

                |       chi2     p>chi2      df
 ---------------+------------------------------
            All |     270.18      0.000      42
 ---------------+------------------------------
           wiek |      18.64      0.000       3
  c.wiek#c.wiek |      28.18      0.000       3
       1.dzieci |       3.32      0.345       3
       2.dzieci |      17.53      0.001       3
       3.dzieci |      20.01      0.000       3
       4.dzieci |      17.02      0.001       3
        1.wyksz |      15.15      0.002       3
        2.wyksz |       7.80      0.050       3
        3.wyksz |      23.32      0.000       3
          1.msc |       0.20      0.977       3
          2.msc |       4.86      0.182       3
          3.msc |       5.79      0.122       3
      1.cywilny |      13.81      0.003       3
            pow |      18.02      0.000       3

A significant test statistic provides evidence that the parallel
regression assumption has been violated.

The assumption has been violated and I found out about gologit2. I've run it with autofit and got a new result:

Code:

. gologit2 $ylist $xlist, autofit

------------------------------------------------------------------------------
Testing parallel lines assumption using the .05 level of significance...

Step  1:  Constraints for parallel lines imposed for 1.msc (P Value = 0.8959)
Step  2:  Constraints for parallel lines imposed for 1.dzieci (P Value = 0.2974)
Step  3:  Constraints for parallel lines imposed for 3.msc (P Value = 0.1227)
Step  4:  Constraints for parallel lines imposed for 2.msc (P Value = 0.0570)
Step  5:  Constraints for parallel lines are not imposed for
          wiek (P Value = 0.00179)
          c.wiek#c.wiek (P Value = 0.00003)
          2.dzieci (P Value = 0.00278)
          3.dzieci (P Value = 0.00164)
          4.dzieci (P Value = 0.00209)
          1.wyksz (P Value = 0.00218)
          2.wyksz (P Value = 0.04976)
          3.wyksz (P Value = 0.00001)
          1.cywilny (P Value = 0.00285)
          pow (P Value = 0.00068)

Wald test of parallel lines assumption for the final model:

 ( 1)  [0]1.msc - [1]1.msc = 0
 ( 2)  [0]2.msc - [1]2.msc = 0
 ( 3)  [0]3.msc - [1]3.msc = 0
 ( 4)  [0]1.dzieci - [1]1.dzieci = 0
 ( 5)  [0]1.msc - [2]1.msc = 0
 ( 6)  [0]2.msc - [2]2.msc = 0
 ( 7)  [0]3.msc - [2]3.msc = 0
 ( 8)  [0]1.dzieci - [2]1.dzieci = 0
 ( 9)  [0]1.msc - [3]1.msc = 0
 (10)  [0]2.msc - [3]2.msc = 0
 (11)  [0]3.msc - [3]3.msc = 0
 (12)  [0]1.dzieci - [3]1.dzieci = 0

           chi2( 12) =   17.53
         Prob > chi2 =    0.1307

An insignificant test statistic indicates that the final model
does not violate the proportional odds/ parallel lines assumption

If you re-estimate this exact same model with gologit2, instead
of autofit you can save time by using the parameter

pl(0b.msc 1.msc 2.msc 3.msc 0b.dzieci 1.dzieci 0b.wyksz 0b.cywilny)

------------------------------------------------------------------------------

Generalized Ordered Logit Estimates                    Number of obs =  12,058
                                                       LR chi2(44)   = 1661.95
                                                       Prob > chi2   =  0.0000
Log likelihood = -13861.125                            Pseudo R2     =  0.0566

 ( 1)  [0]1.msc - [1]1.msc = 0
 ( 2)  [0]2.msc - [1]2.msc = 0
 ( 3)  [0]3.msc - [1]3.msc = 0
 ( 4)  [0]1.dzieci - [1]1.dzieci = 0
 ( 5)  [1]1.msc - [2]1.msc = 0
 ( 6)  [1]2.msc - [2]2.msc = 0
 ( 7)  [1]3.msc - [2]3.msc = 0
 ( 8)  [1]1.dzieci - [2]1.dzieci = 0
 ( 9)  [2]1.msc - [3]1.msc = 0
 (10)  [2]2.msc - [3]2.msc = 0
 (11)  [2]3.msc - [3]3.msc = 0
 (12)  [2]1.dzieci - [3]1.dzieci = 0
-------------------------------------------------------------------------------
          wsk | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
0             |
         wiek |   .1738873     .04645     3.74   0.000     .0828469    .2649277
              |
c.wiek#c.wiek |  -.0022763   .0004668    -4.88   0.000    -.0031913   -.0013613
              |
       dzieci |
           1  |  -.0259317   .0520372    -0.50   0.618    -.1279229    .0760594
           2  |  -.0630619   .2823788    -0.22   0.823    -.6165142    .4903905
           3  |   -.293491   .4333872    -0.68   0.498    -1.142914    .5559324
          4+  |  -1.670009   .4422319    -3.78   0.000    -2.536767   -.8032501
              |
        wyksz |
     Średnie  |   1.068135   .2314692     4.61   0.000      .614464    1.521807
      Zawod.  |   1.056128   .3375122     3.13   0.002     .3946162     1.71764
      Wyższe  |   2.468986   .3297876     7.49   0.000     1.822615    3.115358
              |
          msc |
 Miasto <100  |   .3133377   .0679789     4.61   0.000     .1801015    .4465739
 Miasto <500  |   .2194838   .0994639     2.21   0.027     .0245383    .4144294
 Miasto >500  |  -.0549805   .0453297    -1.21   0.225    -.1438251    .0338642
              |
      cywilny |
           1  |    .643272   .1860395     3.46   0.001     .2786414    1.007903
              |
          pow |  -.0043335   .0030898    -1.40   0.161    -.0103894    .0017224
        _cons |   .0226806   1.103362     0.02   0.984    -2.139868     2.18523
--------------+----------------------------------------------------------------
1             |
         wiek |   .0893436   .0185294     4.82   0.000     .0530267    .1256604
              |
c.wiek#c.wiek |  -.0011449   .0001954    -5.86   0.000    -.0015279   -.0007619
              |
       dzieci |
           1  |  -.0259317   .0520372    -0.50   0.618    -.1279229    .0760594
           2  |  -.3578178   .0795758    -4.50   0.000    -.5137835   -.2018521
           3  |  -.5521508   .1222399    -4.52   0.000    -.7917367   -.3125649
          4+  |  -1.004846   .1966145    -5.11   0.000    -1.390204    -.619489
              |
        wyksz |
     Średnie  |   .3820885   .1117411     3.42   0.001       .16308     .601097
      Zawod.  |   .6302509   .1415816     4.45   0.000     .3527561    .9077457
      Wyższe  |   1.316392   .1234092    10.67   0.000     1.074514    1.558269
              |
          msc |
 Miasto <100  |   .3133377   .0679789     4.61   0.000     .1801015    .4465739
 Miasto <500  |   .2194838   .0994639     2.21   0.027     .0245383    .4144294
 Miasto >500  |  -.0549805   .0453297    -1.21   0.225    -.1438251    .0338642
              |
      cywilny |
           1  |   .1059954    .070322     1.51   0.132    -.0318332     .243824
              |
          pow |  -.0082419   .0010485    -7.86   0.000    -.0102968   -.0061869
        _cons |  -.0808419   .4303625    -0.19   0.851    -.9243369    .7626531
--------------+----------------------------------------------------------------
2             |
         wiek |    .026262   .0152546     1.72   0.085    -.0036365    .0561605
              |
c.wiek#c.wiek |  -.0003398   .0001653    -2.06   0.040    -.0006638   -.0000159
              |
       dzieci |
           1  |  -.0259317   .0520372    -0.50   0.618    -.1279229    .0760594
           2  |  -.0949569   .0608385    -1.56   0.119    -.2141982    .0242844
           3  |  -.0900274   .0966691    -0.93   0.352    -.2794954    .0994406
          4+  |  -.2983819   .1844015    -1.62   0.106    -.6598022    .0630385
              |
        wyksz |
     Średnie  |   .2322069   .1124417     2.07   0.039     .0118251    .4525886
      Zawod.  |    .530006   .1281385     4.14   0.000     .2788592    .7811528
      Wyższe  |   1.561165   .1144265    13.64   0.000     1.336894    1.785437
              |
          msc |
 Miasto <100  |   .3133377   .0679789     4.61   0.000     .1801015    .4465739
 Miasto <500  |   .2194838   .0994639     2.21   0.027     .0245383    .4144294
 Miasto >500  |  -.0549805   .0453297    -1.21   0.225    -.1438251    .0338642
              |
      cywilny |
           1  |   .0413257    .055093     0.75   0.453    -.0666546     .149306
              |
          pow |  -.0040054   .0009171    -4.37   0.000    -.0058029    -.002208
        _cons |  -1.540582   .3511412    -4.39   0.000    -2.228806   -.8523577
--------------+----------------------------------------------------------------
3             |
         wiek |   .0358637   .0310803     1.15   0.249    -.0250526      .09678
              |
c.wiek#c.wiek |  -.0003326   .0003356    -0.99   0.322    -.0009903    .0003251
              |
       dzieci |
           1  |  -.0259317   .0520372    -0.50   0.618    -.1279229    .0760594
           2  |   .0053097   .1051348     0.05   0.960    -.2007508    .2113702
           3  |   .0402445   .1737061     0.23   0.817    -.3002133    .3807023
          4+  |  -.1932874   .3986984    -0.48   0.628    -.9747219    .5881471
              |
        wyksz |
     Średnie  |   1.203682    .585968     2.05   0.040     .0552054    2.352158
      Zawod.  |   1.943424    .601285     3.23   0.001     .7649276    3.121921
      Wyższe  |   3.354312   .5816843     5.77   0.000     2.214232    4.494393
              |
          msc |
 Miasto <100  |   .3133377   .0679789     4.61   0.000     .1801015    .4465739
 Miasto <500  |   .2194838   .0994639     2.21   0.027     .0245383    .4144294
 Miasto >500  |  -.0549805   .0453297    -1.21   0.225    -.1438251    .0338642
              |
      cywilny |
           1  |  -.1435708   .1047545    -1.37   0.171    -.3488859    .0617443
              |
          pow |  -.0053998   .0019135    -2.82   0.005    -.0091501   -.0016494
        _cons |   -5.73104   .8956403    -6.40   0.000    -7.486463   -3.975618
-------------------------------------------------------------------------------

But now when I look at the p-values here, there are many more insignificant variables in gologit2. So which model should I use? I also graphed the predicted probabilities, because they are much easier to interpret for me and the results differ between two models. Here are graphs with the variable wiek (age) and the probabilities for each level of wsk:
GOLOGIT:

OLOGIT:

And my actual data looks like this (the percentage of each level)

Click image for larger version

Name: data (percentage).JPG
Views: 1
Size: 31.0 KB
ID: 1774388

I'm not sure which model is closer to reality. Which one do you think I should use? Is the violation of the parallel lines assumption actually bad and if not, how do I justify the model?

Attached Files

Last edited by Jan Kaminski; 14 Mar 2025, 12:42.

Tags: logit, margins, ordered logit

Erik Reinbergs

Join Date: Oct 2022

Posts: 35
#2

14 Mar 2025, 17:51

The partial proportional odds model / gologit2 is probably more "correct". It also looks closer to your actual data judging by the pictures. What you give up with the more complex model is that the ologit model is easier to interpret though. I agree that the predictive margins are the most straight forward way to interpret the models and do hypothesis testing, rather than trying to interpret the regression output directly for the partial proportional odds model for a given research question.

Another option is a sensitivity analysis (as long as you transparently report it in your paper). You could run and report both models for a certain research questions in that if the answer is substantively the same in both, you could potentially chose the simpler model and state that you ran a sensitivity analysis and that the results of the partial proportional odds model were within a certain range of the more simple model and that you're thus choosing the simpler model for ease of interpretation, particularly depending on your audience.

Another option to look at which model fits better is to get the AIC and BIC fit statistics for each model.

As a counterpoint, some prominent folks - like this link by Frank Harrell - have argued that the proportional odds assumption might not be as big of a deal as some people make it out to be at least in certain instances: https://www.fharrell.com/post/po/

Hope that helps with some options - I think this is an "it depends" scenario! Interested in what others have to say.
2 likes
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1134
#3

14 Mar 2025, 19:23

Originally posted by Erik Reinbergs View Post

As a counterpoint, some prominent folks - like this link by Frank Harrell - have argued that the proportional odds assumption might not be as big of a deal as some people make it out to be at least in certain instances: https://www.fharrell.com/post/po/

Adding to Erik's comment, I suspect that the Brant test is over-powered when n > 12,000. I.e., I suspect it can detect as statistically significant small deviations from the PO assumption that are not practically important.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
2 likes
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#4

15 Mar 2025, 09:47

Originally posted by Bruce Weaver View Post

Adding to Erik's comment, I suspect that the Brant test is over-powered when n > 12,000. I.e., I suspect it can detect as statistically significant small deviations from the PO assumption that are not practically important.

I agree. With so many variables and cases it is easy to have trivial violations of assumptions that are statistically significant. Or, just by chance alone, violations could show up as significant when there isn't any violation at all.

I would at least specify autofit(.01). Indeed, with this many cases autofit(.001) may be reasonable, and it appears that would greatly increase the number of vars that meet PO, which would make interpretation easier. You could do sensitivity analyses such as Eric suggests to see whether the more constrained model can be justified.

I don't know what your variables are but I might also suggest combining some categories if that seems reasonable.

Whatever model you run (ologit, gologit, or even mlogit) I am a big fan of using marginal effects, adjusted predictions, and values for prototypical cases. They are generally much easier to understand than the raw coefficients. See

https://www3.nd.edu/~rwilliam/xsoc73994/Margins05.pdf

and the earlier handouts if necessary at https://www3.nd.edu/~rwilliam/xsoc73994/index.html

I especially love the mtable command that is part of Long & Freese's spost13 package, which Jan probably already has installed since Jan is using the brant command.

Finally I think many people use gologit2 much like they use, say, vce(robust). It is a way of dealing with some nuisance problem in the data. But, the gologit2 results may be of substantive significance and should be discussed more. See

https://www.tandfonline.com/doi/full...X.2015.1112384

ABSTRACT

When outcome variables are ordinal rather than continuous, the ordered logit model, aka the proportional odds model (ologit/po), is a popular analytical method. However, generalized ordered logit/partial proportional odds models (gologit/ppo) are often a superior alternative. Gologit/ppo models can be less restrictive than proportional odds models and more parsimonious than methods that ignore the ordering of categories altogether. However, the use of gologit/ppo models has itself been problematic or at least sub-optimal. Researchers typically note that such models fit better but fail to explain why the ordered logit model was inadequate or the substantive insights gained by using the gologit alternative. This paper uses both hypothetical examples and data from the 2012 European Social Survey to address these shortcomings.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
3 likes
Comment

Announcement

Significance of gologit2 model

Comment

Comment

Comment