Hello,
I'm writing a paper for my Bachelor's degree and I have to create a model from my data. My dependent variable is "wsk" and it takes on values {0,1,2,3,4}. At first I've decided to use ologit and got those resullts:
Then I've run Brant test for proportional odds assumption:
The assumption has been violated and I found out about gologit2. I've run it with autofit and got a new result:
But now when I look at the p-values here, there are many more insignificant variables in gologit2. So which model should I use? I also graphed the predicted probabilities, because they are much easier to interpret for me and the results differ between two models. Here are graphs with the variable wiek (age) and the probabilities for each level of wsk:
GOLOGIT:
OLOGIT:
And my actual data looks like this (the percentage of each level)
I'm not sure which model is closer to reality. Which one do you think I should use? Is the violation of the parallel lines assumption actually bad and if not, how do I justify the model?
I'm writing a paper for my Bachelor's degree and I have to create a model from my data. My dependent variable is "wsk" and it takes on values {0,1,2,3,4}. At first I've decided to use ologit and got those resullts:
Code:
. ologit $ylist $xlist
Iteration 0: Log likelihood = -14692.098
Iteration 1: Log likelihood = -13999.216
Iteration 2: Log likelihood = -13991.354
Iteration 3: Log likelihood = -13991.343
Iteration 4: Log likelihood = -13991.343
Ordered logistic regression Number of obs = 12,058
LR chi2(14) = 1401.51
Prob > chi2 = 0.0000
Log likelihood = -13991.343 Pseudo R2 = 0.0477
-------------------------------------------------------------------------------
wsk | Coefficient Std. err. z P>|z| [95% conf. interval]
--------------+----------------------------------------------------------------
wiek | .0585921 .013604 4.31 0.000 .0319286 .0852555
|
c.wiek#c.wiek | -.0007287 .0001479 -4.93 0.000 -.0010187 -.0004388
|
dzieci |
0 | 0 (base)
1 | -.0489784 .0514094 -0.95 0.341 -.1497389 .0517821
2 | -.1758044 .0555996 -3.16 0.002 -.2847776 -.0668312
3 | -.2241205 .0871638 -2.57 0.010 -.3949584 -.0532826
4+ | -.586104 .1625936 -3.60 0.000 -.9047817 -.2674264
|
wyksz |
Podst. | 0 (base)
Średnie | .3281631 .0906859 3.62 0.000 .150422 .5059043
Zawod. | .6079925 .1064372 5.71 0.000 .3993794 .8166056
Wyższe | 1.66815 .0946209 17.63 0.000 1.482696 1.853603
|
msc |
Wies | 0 (base)
Miasto <100 | .307847 .0680011 4.53 0.000 .1745672 .4411267
Miasto <500 | .208038 .099792 2.08 0.037 .0124493 .4036268
Miasto >500 | -.0575705 .0454743 -1.27 0.206 -.1466984 .0315574
|
cywilny |
0 | 0 (base)
1 | .0558947 .0490111 1.14 0.254 -.0401652 .1519547
|
pow | -.0056968 .0008245 -6.91 0.000 -.0073127 -.0040808
--------------+----------------------------------------------------------------
/cut1 | -2.82994 .3166887 -3.450639 -2.209242
/cut2 | -.2598244 .3085078 -.8644885 .3448397
/cut3 | 2.159625 .3091364 1.553729 2.765521
/cut4 | 4.54477 .3115532 3.934137 5.155403
-------------------------------------------------------------------------------
Code:
. brant
Brant test of parallel regression assumption
| chi2 p>chi2 df
---------------+------------------------------
All | 270.18 0.000 42
---------------+------------------------------
wiek | 18.64 0.000 3
c.wiek#c.wiek | 28.18 0.000 3
1.dzieci | 3.32 0.345 3
2.dzieci | 17.53 0.001 3
3.dzieci | 20.01 0.000 3
4.dzieci | 17.02 0.001 3
1.wyksz | 15.15 0.002 3
2.wyksz | 7.80 0.050 3
3.wyksz | 23.32 0.000 3
1.msc | 0.20 0.977 3
2.msc | 4.86 0.182 3
3.msc | 5.79 0.122 3
1.cywilny | 13.81 0.003 3
pow | 18.02 0.000 3
A significant test statistic provides evidence that the parallel
regression assumption has been violated.
Code:
. gologit2 $ylist $xlist, autofit
------------------------------------------------------------------------------
Testing parallel lines assumption using the .05 level of significance...
Step 1: Constraints for parallel lines imposed for 1.msc (P Value = 0.8959)
Step 2: Constraints for parallel lines imposed for 1.dzieci (P Value = 0.2974)
Step 3: Constraints for parallel lines imposed for 3.msc (P Value = 0.1227)
Step 4: Constraints for parallel lines imposed for 2.msc (P Value = 0.0570)
Step 5: Constraints for parallel lines are not imposed for
wiek (P Value = 0.00179)
c.wiek#c.wiek (P Value = 0.00003)
2.dzieci (P Value = 0.00278)
3.dzieci (P Value = 0.00164)
4.dzieci (P Value = 0.00209)
1.wyksz (P Value = 0.00218)
2.wyksz (P Value = 0.04976)
3.wyksz (P Value = 0.00001)
1.cywilny (P Value = 0.00285)
pow (P Value = 0.00068)
Wald test of parallel lines assumption for the final model:
( 1) [0]1.msc - [1]1.msc = 0
( 2) [0]2.msc - [1]2.msc = 0
( 3) [0]3.msc - [1]3.msc = 0
( 4) [0]1.dzieci - [1]1.dzieci = 0
( 5) [0]1.msc - [2]1.msc = 0
( 6) [0]2.msc - [2]2.msc = 0
( 7) [0]3.msc - [2]3.msc = 0
( 8) [0]1.dzieci - [2]1.dzieci = 0
( 9) [0]1.msc - [3]1.msc = 0
(10) [0]2.msc - [3]2.msc = 0
(11) [0]3.msc - [3]3.msc = 0
(12) [0]1.dzieci - [3]1.dzieci = 0
chi2( 12) = 17.53
Prob > chi2 = 0.1307
An insignificant test statistic indicates that the final model
does not violate the proportional odds/ parallel lines assumption
If you re-estimate this exact same model with gologit2, instead
of autofit you can save time by using the parameter
pl(0b.msc 1.msc 2.msc 3.msc 0b.dzieci 1.dzieci 0b.wyksz 0b.cywilny)
------------------------------------------------------------------------------
Generalized Ordered Logit Estimates Number of obs = 12,058
LR chi2(44) = 1661.95
Prob > chi2 = 0.0000
Log likelihood = -13861.125 Pseudo R2 = 0.0566
( 1) [0]1.msc - [1]1.msc = 0
( 2) [0]2.msc - [1]2.msc = 0
( 3) [0]3.msc - [1]3.msc = 0
( 4) [0]1.dzieci - [1]1.dzieci = 0
( 5) [1]1.msc - [2]1.msc = 0
( 6) [1]2.msc - [2]2.msc = 0
( 7) [1]3.msc - [2]3.msc = 0
( 8) [1]1.dzieci - [2]1.dzieci = 0
( 9) [2]1.msc - [3]1.msc = 0
(10) [2]2.msc - [3]2.msc = 0
(11) [2]3.msc - [3]3.msc = 0
(12) [2]1.dzieci - [3]1.dzieci = 0
-------------------------------------------------------------------------------
wsk | Coefficient Std. err. z P>|z| [95% conf. interval]
--------------+----------------------------------------------------------------
0 |
wiek | .1738873 .04645 3.74 0.000 .0828469 .2649277
|
c.wiek#c.wiek | -.0022763 .0004668 -4.88 0.000 -.0031913 -.0013613
|
dzieci |
1 | -.0259317 .0520372 -0.50 0.618 -.1279229 .0760594
2 | -.0630619 .2823788 -0.22 0.823 -.6165142 .4903905
3 | -.293491 .4333872 -0.68 0.498 -1.142914 .5559324
4+ | -1.670009 .4422319 -3.78 0.000 -2.536767 -.8032501
|
wyksz |
Średnie | 1.068135 .2314692 4.61 0.000 .614464 1.521807
Zawod. | 1.056128 .3375122 3.13 0.002 .3946162 1.71764
Wyższe | 2.468986 .3297876 7.49 0.000 1.822615 3.115358
|
msc |
Miasto <100 | .3133377 .0679789 4.61 0.000 .1801015 .4465739
Miasto <500 | .2194838 .0994639 2.21 0.027 .0245383 .4144294
Miasto >500 | -.0549805 .0453297 -1.21 0.225 -.1438251 .0338642
|
cywilny |
1 | .643272 .1860395 3.46 0.001 .2786414 1.007903
|
pow | -.0043335 .0030898 -1.40 0.161 -.0103894 .0017224
_cons | .0226806 1.103362 0.02 0.984 -2.139868 2.18523
--------------+----------------------------------------------------------------
1 |
wiek | .0893436 .0185294 4.82 0.000 .0530267 .1256604
|
c.wiek#c.wiek | -.0011449 .0001954 -5.86 0.000 -.0015279 -.0007619
|
dzieci |
1 | -.0259317 .0520372 -0.50 0.618 -.1279229 .0760594
2 | -.3578178 .0795758 -4.50 0.000 -.5137835 -.2018521
3 | -.5521508 .1222399 -4.52 0.000 -.7917367 -.3125649
4+ | -1.004846 .1966145 -5.11 0.000 -1.390204 -.619489
|
wyksz |
Średnie | .3820885 .1117411 3.42 0.001 .16308 .601097
Zawod. | .6302509 .1415816 4.45 0.000 .3527561 .9077457
Wyższe | 1.316392 .1234092 10.67 0.000 1.074514 1.558269
|
msc |
Miasto <100 | .3133377 .0679789 4.61 0.000 .1801015 .4465739
Miasto <500 | .2194838 .0994639 2.21 0.027 .0245383 .4144294
Miasto >500 | -.0549805 .0453297 -1.21 0.225 -.1438251 .0338642
|
cywilny |
1 | .1059954 .070322 1.51 0.132 -.0318332 .243824
|
pow | -.0082419 .0010485 -7.86 0.000 -.0102968 -.0061869
_cons | -.0808419 .4303625 -0.19 0.851 -.9243369 .7626531
--------------+----------------------------------------------------------------
2 |
wiek | .026262 .0152546 1.72 0.085 -.0036365 .0561605
|
c.wiek#c.wiek | -.0003398 .0001653 -2.06 0.040 -.0006638 -.0000159
|
dzieci |
1 | -.0259317 .0520372 -0.50 0.618 -.1279229 .0760594
2 | -.0949569 .0608385 -1.56 0.119 -.2141982 .0242844
3 | -.0900274 .0966691 -0.93 0.352 -.2794954 .0994406
4+ | -.2983819 .1844015 -1.62 0.106 -.6598022 .0630385
|
wyksz |
Średnie | .2322069 .1124417 2.07 0.039 .0118251 .4525886
Zawod. | .530006 .1281385 4.14 0.000 .2788592 .7811528
Wyższe | 1.561165 .1144265 13.64 0.000 1.336894 1.785437
|
msc |
Miasto <100 | .3133377 .0679789 4.61 0.000 .1801015 .4465739
Miasto <500 | .2194838 .0994639 2.21 0.027 .0245383 .4144294
Miasto >500 | -.0549805 .0453297 -1.21 0.225 -.1438251 .0338642
|
cywilny |
1 | .0413257 .055093 0.75 0.453 -.0666546 .149306
|
pow | -.0040054 .0009171 -4.37 0.000 -.0058029 -.002208
_cons | -1.540582 .3511412 -4.39 0.000 -2.228806 -.8523577
--------------+----------------------------------------------------------------
3 |
wiek | .0358637 .0310803 1.15 0.249 -.0250526 .09678
|
c.wiek#c.wiek | -.0003326 .0003356 -0.99 0.322 -.0009903 .0003251
|
dzieci |
1 | -.0259317 .0520372 -0.50 0.618 -.1279229 .0760594
2 | .0053097 .1051348 0.05 0.960 -.2007508 .2113702
3 | .0402445 .1737061 0.23 0.817 -.3002133 .3807023
4+ | -.1932874 .3986984 -0.48 0.628 -.9747219 .5881471
|
wyksz |
Średnie | 1.203682 .585968 2.05 0.040 .0552054 2.352158
Zawod. | 1.943424 .601285 3.23 0.001 .7649276 3.121921
Wyższe | 3.354312 .5816843 5.77 0.000 2.214232 4.494393
|
msc |
Miasto <100 | .3133377 .0679789 4.61 0.000 .1801015 .4465739
Miasto <500 | .2194838 .0994639 2.21 0.027 .0245383 .4144294
Miasto >500 | -.0549805 .0453297 -1.21 0.225 -.1438251 .0338642
|
cywilny |
1 | -.1435708 .1047545 -1.37 0.171 -.3488859 .0617443
|
pow | -.0053998 .0019135 -2.82 0.005 -.0091501 -.0016494
_cons | -5.73104 .8956403 -6.40 0.000 -7.486463 -3.975618
-------------------------------------------------------------------------------
GOLOGIT:
OLOGIT:
And my actual data looks like this (the percentage of each level)
I'm not sure which model is closer to reality. Which one do you think I should use? Is the violation of the parallel lines assumption actually bad and if not, how do I justify the model?

Comment