Hello,
I'm writing a paper for my Bachelor's degree and I have to create a model from my data. My dependent variable is "wsk" and it takes on values {0,1,2,3,4}. At first I've decided to use ologit and got those resullts:
Then I've run Brant test for proportional odds assumption:
The assumption has been violated and I found out about gologit2. I've run it with autofit and got a new result:
But now when I look at the p-values here, there are many more insignificant variables in gologit2. So which model should I use? I also graphed the predicted probabilities, because they are much easier to interpret for me and the results differ between two models. Here are graphs with the variable wiek (age) and the probabilities for each level of wsk:
GOLOGIT:
OLOGIT:
And my actual data looks like this (the percentage of each level)
I'm not sure which model is closer to reality. Which one do you think I should use? Is the violation of the parallel lines assumption actually bad and if not, how do I justify the model?
I'm writing a paper for my Bachelor's degree and I have to create a model from my data. My dependent variable is "wsk" and it takes on values {0,1,2,3,4}. At first I've decided to use ologit and got those resullts:
Code:
. ologit $ylist $xlist Iteration 0: Log likelihood = -14692.098 Iteration 1: Log likelihood = -13999.216 Iteration 2: Log likelihood = -13991.354 Iteration 3: Log likelihood = -13991.343 Iteration 4: Log likelihood = -13991.343 Ordered logistic regression Number of obs = 12,058 LR chi2(14) = 1401.51 Prob > chi2 = 0.0000 Log likelihood = -13991.343 Pseudo R2 = 0.0477 ------------------------------------------------------------------------------- wsk | Coefficient Std. err. z P>|z| [95% conf. interval] --------------+---------------------------------------------------------------- wiek | .0585921 .013604 4.31 0.000 .0319286 .0852555 | c.wiek#c.wiek | -.0007287 .0001479 -4.93 0.000 -.0010187 -.0004388 | dzieci | 0 | 0 (base) 1 | -.0489784 .0514094 -0.95 0.341 -.1497389 .0517821 2 | -.1758044 .0555996 -3.16 0.002 -.2847776 -.0668312 3 | -.2241205 .0871638 -2.57 0.010 -.3949584 -.0532826 4+ | -.586104 .1625936 -3.60 0.000 -.9047817 -.2674264 | wyksz | Podst. | 0 (base) Średnie | .3281631 .0906859 3.62 0.000 .150422 .5059043 Zawod. | .6079925 .1064372 5.71 0.000 .3993794 .8166056 Wyższe | 1.66815 .0946209 17.63 0.000 1.482696 1.853603 | msc | Wies | 0 (base) Miasto <100 | .307847 .0680011 4.53 0.000 .1745672 .4411267 Miasto <500 | .208038 .099792 2.08 0.037 .0124493 .4036268 Miasto >500 | -.0575705 .0454743 -1.27 0.206 -.1466984 .0315574 | cywilny | 0 | 0 (base) 1 | .0558947 .0490111 1.14 0.254 -.0401652 .1519547 | pow | -.0056968 .0008245 -6.91 0.000 -.0073127 -.0040808 --------------+---------------------------------------------------------------- /cut1 | -2.82994 .3166887 -3.450639 -2.209242 /cut2 | -.2598244 .3085078 -.8644885 .3448397 /cut3 | 2.159625 .3091364 1.553729 2.765521 /cut4 | 4.54477 .3115532 3.934137 5.155403 -------------------------------------------------------------------------------
Code:
. brant Brant test of parallel regression assumption | chi2 p>chi2 df ---------------+------------------------------ All | 270.18 0.000 42 ---------------+------------------------------ wiek | 18.64 0.000 3 c.wiek#c.wiek | 28.18 0.000 3 1.dzieci | 3.32 0.345 3 2.dzieci | 17.53 0.001 3 3.dzieci | 20.01 0.000 3 4.dzieci | 17.02 0.001 3 1.wyksz | 15.15 0.002 3 2.wyksz | 7.80 0.050 3 3.wyksz | 23.32 0.000 3 1.msc | 0.20 0.977 3 2.msc | 4.86 0.182 3 3.msc | 5.79 0.122 3 1.cywilny | 13.81 0.003 3 pow | 18.02 0.000 3 A significant test statistic provides evidence that the parallel regression assumption has been violated.
Code:
. gologit2 $ylist $xlist, autofit ------------------------------------------------------------------------------ Testing parallel lines assumption using the .05 level of significance... Step 1: Constraints for parallel lines imposed for 1.msc (P Value = 0.8959) Step 2: Constraints for parallel lines imposed for 1.dzieci (P Value = 0.2974) Step 3: Constraints for parallel lines imposed for 3.msc (P Value = 0.1227) Step 4: Constraints for parallel lines imposed for 2.msc (P Value = 0.0570) Step 5: Constraints for parallel lines are not imposed for wiek (P Value = 0.00179) c.wiek#c.wiek (P Value = 0.00003) 2.dzieci (P Value = 0.00278) 3.dzieci (P Value = 0.00164) 4.dzieci (P Value = 0.00209) 1.wyksz (P Value = 0.00218) 2.wyksz (P Value = 0.04976) 3.wyksz (P Value = 0.00001) 1.cywilny (P Value = 0.00285) pow (P Value = 0.00068) Wald test of parallel lines assumption for the final model: ( 1) [0]1.msc - [1]1.msc = 0 ( 2) [0]2.msc - [1]2.msc = 0 ( 3) [0]3.msc - [1]3.msc = 0 ( 4) [0]1.dzieci - [1]1.dzieci = 0 ( 5) [0]1.msc - [2]1.msc = 0 ( 6) [0]2.msc - [2]2.msc = 0 ( 7) [0]3.msc - [2]3.msc = 0 ( 8) [0]1.dzieci - [2]1.dzieci = 0 ( 9) [0]1.msc - [3]1.msc = 0 (10) [0]2.msc - [3]2.msc = 0 (11) [0]3.msc - [3]3.msc = 0 (12) [0]1.dzieci - [3]1.dzieci = 0 chi2( 12) = 17.53 Prob > chi2 = 0.1307 An insignificant test statistic indicates that the final model does not violate the proportional odds/ parallel lines assumption If you re-estimate this exact same model with gologit2, instead of autofit you can save time by using the parameter pl(0b.msc 1.msc 2.msc 3.msc 0b.dzieci 1.dzieci 0b.wyksz 0b.cywilny) ------------------------------------------------------------------------------ Generalized Ordered Logit Estimates Number of obs = 12,058 LR chi2(44) = 1661.95 Prob > chi2 = 0.0000 Log likelihood = -13861.125 Pseudo R2 = 0.0566 ( 1) [0]1.msc - [1]1.msc = 0 ( 2) [0]2.msc - [1]2.msc = 0 ( 3) [0]3.msc - [1]3.msc = 0 ( 4) [0]1.dzieci - [1]1.dzieci = 0 ( 5) [1]1.msc - [2]1.msc = 0 ( 6) [1]2.msc - [2]2.msc = 0 ( 7) [1]3.msc - [2]3.msc = 0 ( 8) [1]1.dzieci - [2]1.dzieci = 0 ( 9) [2]1.msc - [3]1.msc = 0 (10) [2]2.msc - [3]2.msc = 0 (11) [2]3.msc - [3]3.msc = 0 (12) [2]1.dzieci - [3]1.dzieci = 0 ------------------------------------------------------------------------------- wsk | Coefficient Std. err. z P>|z| [95% conf. interval] --------------+---------------------------------------------------------------- 0 | wiek | .1738873 .04645 3.74 0.000 .0828469 .2649277 | c.wiek#c.wiek | -.0022763 .0004668 -4.88 0.000 -.0031913 -.0013613 | dzieci | 1 | -.0259317 .0520372 -0.50 0.618 -.1279229 .0760594 2 | -.0630619 .2823788 -0.22 0.823 -.6165142 .4903905 3 | -.293491 .4333872 -0.68 0.498 -1.142914 .5559324 4+ | -1.670009 .4422319 -3.78 0.000 -2.536767 -.8032501 | wyksz | Średnie | 1.068135 .2314692 4.61 0.000 .614464 1.521807 Zawod. | 1.056128 .3375122 3.13 0.002 .3946162 1.71764 Wyższe | 2.468986 .3297876 7.49 0.000 1.822615 3.115358 | msc | Miasto <100 | .3133377 .0679789 4.61 0.000 .1801015 .4465739 Miasto <500 | .2194838 .0994639 2.21 0.027 .0245383 .4144294 Miasto >500 | -.0549805 .0453297 -1.21 0.225 -.1438251 .0338642 | cywilny | 1 | .643272 .1860395 3.46 0.001 .2786414 1.007903 | pow | -.0043335 .0030898 -1.40 0.161 -.0103894 .0017224 _cons | .0226806 1.103362 0.02 0.984 -2.139868 2.18523 --------------+---------------------------------------------------------------- 1 | wiek | .0893436 .0185294 4.82 0.000 .0530267 .1256604 | c.wiek#c.wiek | -.0011449 .0001954 -5.86 0.000 -.0015279 -.0007619 | dzieci | 1 | -.0259317 .0520372 -0.50 0.618 -.1279229 .0760594 2 | -.3578178 .0795758 -4.50 0.000 -.5137835 -.2018521 3 | -.5521508 .1222399 -4.52 0.000 -.7917367 -.3125649 4+ | -1.004846 .1966145 -5.11 0.000 -1.390204 -.619489 | wyksz | Średnie | .3820885 .1117411 3.42 0.001 .16308 .601097 Zawod. | .6302509 .1415816 4.45 0.000 .3527561 .9077457 Wyższe | 1.316392 .1234092 10.67 0.000 1.074514 1.558269 | msc | Miasto <100 | .3133377 .0679789 4.61 0.000 .1801015 .4465739 Miasto <500 | .2194838 .0994639 2.21 0.027 .0245383 .4144294 Miasto >500 | -.0549805 .0453297 -1.21 0.225 -.1438251 .0338642 | cywilny | 1 | .1059954 .070322 1.51 0.132 -.0318332 .243824 | pow | -.0082419 .0010485 -7.86 0.000 -.0102968 -.0061869 _cons | -.0808419 .4303625 -0.19 0.851 -.9243369 .7626531 --------------+---------------------------------------------------------------- 2 | wiek | .026262 .0152546 1.72 0.085 -.0036365 .0561605 | c.wiek#c.wiek | -.0003398 .0001653 -2.06 0.040 -.0006638 -.0000159 | dzieci | 1 | -.0259317 .0520372 -0.50 0.618 -.1279229 .0760594 2 | -.0949569 .0608385 -1.56 0.119 -.2141982 .0242844 3 | -.0900274 .0966691 -0.93 0.352 -.2794954 .0994406 4+ | -.2983819 .1844015 -1.62 0.106 -.6598022 .0630385 | wyksz | Średnie | .2322069 .1124417 2.07 0.039 .0118251 .4525886 Zawod. | .530006 .1281385 4.14 0.000 .2788592 .7811528 Wyższe | 1.561165 .1144265 13.64 0.000 1.336894 1.785437 | msc | Miasto <100 | .3133377 .0679789 4.61 0.000 .1801015 .4465739 Miasto <500 | .2194838 .0994639 2.21 0.027 .0245383 .4144294 Miasto >500 | -.0549805 .0453297 -1.21 0.225 -.1438251 .0338642 | cywilny | 1 | .0413257 .055093 0.75 0.453 -.0666546 .149306 | pow | -.0040054 .0009171 -4.37 0.000 -.0058029 -.002208 _cons | -1.540582 .3511412 -4.39 0.000 -2.228806 -.8523577 --------------+---------------------------------------------------------------- 3 | wiek | .0358637 .0310803 1.15 0.249 -.0250526 .09678 | c.wiek#c.wiek | -.0003326 .0003356 -0.99 0.322 -.0009903 .0003251 | dzieci | 1 | -.0259317 .0520372 -0.50 0.618 -.1279229 .0760594 2 | .0053097 .1051348 0.05 0.960 -.2007508 .2113702 3 | .0402445 .1737061 0.23 0.817 -.3002133 .3807023 4+ | -.1932874 .3986984 -0.48 0.628 -.9747219 .5881471 | wyksz | Średnie | 1.203682 .585968 2.05 0.040 .0552054 2.352158 Zawod. | 1.943424 .601285 3.23 0.001 .7649276 3.121921 Wyższe | 3.354312 .5816843 5.77 0.000 2.214232 4.494393 | msc | Miasto <100 | .3133377 .0679789 4.61 0.000 .1801015 .4465739 Miasto <500 | .2194838 .0994639 2.21 0.027 .0245383 .4144294 Miasto >500 | -.0549805 .0453297 -1.21 0.225 -.1438251 .0338642 | cywilny | 1 | -.1435708 .1047545 -1.37 0.171 -.3488859 .0617443 | pow | -.0053998 .0019135 -2.82 0.005 -.0091501 -.0016494 _cons | -5.73104 .8956403 -6.40 0.000 -7.486463 -3.975618 -------------------------------------------------------------------------------
GOLOGIT:
OLOGIT:
And my actual data looks like this (the percentage of each level)
I'm not sure which model is closer to reality. Which one do you think I should use? Is the violation of the parallel lines assumption actually bad and if not, how do I justify the model?
Comment