Ordinal regression using complementary log log link function

Jean Torgigial

Join Date: Feb 2016

Posts: 23
#16

14 Apr 2018, 12:35

[QUOTE=Weiwen Ng;n1439423]

Originally posted by Jean Torgigial View Post

That helps. Would you run the Stata code I typed and let us see what the results are?

Code:

oglm B5_new age ib10.B7_new i.B2B_2 i.B2B_5 ib3.strong_internetB), link(loglog)

That ought to make the results more directly comparable between the Stata and SPSS outputs. I think that in both programs, the coefficients for the independent variables all correspond to the odds of a higher response on the dependent variable - can anyone confirm or refute?

Do note that as per the link Marcos shared with us in post #3, Stata parameterizes the cut points/threshold parameters differently than SPSS does - Stata's /cut1 corresponds to the odds of responding 0 or lower on the dependent variable, whereas SPSS's correspond to the probability of responding at 0 or higher.

If the SE for one of the cutpoints in Stata is missing, then I'm honestly not sure what to do.

Dear Weiwen, the output is now the following:

Thanks again,

Kind Regards,

Jean
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#17

14 Apr 2018, 12:59

[QUOTE=Jean Torgigial;n1439425]

Originally posted by Weiwen Ng View Post

Dear Weiwen, the output is now the following:

[ATTACH=CONFIG]n1439426[/ATTACH]
Thanks again,

Kind Regards,

Jean

The "convergence not achieved" message is what I was worried about. Was there any similar message in SPSS?

I'm not familiar with ordinal models. In general, some models have likelihoods where it can be challenging to achieve convergence - e.g. I know that latent class models and models with multiple random effects are difficult. It could be that ordinal response models with many categories are a problem. It could be that there are few responses in the lower categories of B7_new and/or B5_new. I am not sure, and perhaps Richard can provide guidance. The magnitudes of the coefficients for the lower levels of both these variables are very different from SPSS, that's for sure.

Can you describe the iteration log for both models? Did you get a lot of "not concave" messages? If so, adding the -difficult- option might help, but I'm not certain.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Jean Torgigial

Join Date: Feb 2016

Posts: 23
#18

14 Apr 2018, 13:14

[QUOTE=Weiwen Ng;n1439430]

Originally posted by Jean Torgigial View Post

The "convergence not achieved" message is what I was worried about. Was there any similar message in SPSS?

I'm not familiar with ordinal models. In general, some models have likelihoods where it can be challenging to achieve convergence - e.g. I know that latent class models and models with multiple random effects are difficult. It could be that ordinal response models with many categories are a problem. It could be that there are few responses in the lower categories of B7_new and/or B5_new. I am not sure, and perhaps Richard can provide guidance. The magnitudes of the coefficients for the lower levels of both these variables are very different from SPSS, that's for sure.

Can you describe the iteration log for both models? Did you get a lot of "not concave" messages? If so, adding the -difficult- option might help, but I'm not certain.

Dear Weiwen,

I have only this message in SPSS:

for the iteration part, this is the SPSS set up I used:

But then I don't see any information in Stata output or SPSS output about iteration log.

Moreover, I don't get any message of the type "not concave".

I am now running the same code, with the difficult option, i.e.:

Code:

oglm B5_new age ib10.B7_new i.B2B_2 i.B2B_5 ib3.strong_internetB, link(loglog) difficult

It is running, but I have many messages saying "flat or discontinuous region encountered
numerical derivatives are approximate"

Thanks again for you answers,

Kind Regards,

Jean
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#19

14 Apr 2018, 16:01

Try Weiwen's code. If still having trouble, are you free to post your data? You only have 196 cases. You could use dataex and use the count option to include all 196 cases. Or if don't want to share it with the world you could email it to me. There is always a chance there is a problem with oglm but I hope not!

I agree that the missing SEs seem odd. But it would be easier to diagnose with the data.

Anyway, try Weiwen's code and hope that that solves all problems.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#20

14 Apr 2018, 16:07

[QUOTE=Jean Torgigial;n1439431]

Originally posted by Weiwen Ng View Post

Dear Weiwen,

I have only this message in SPSS:

[ATTACH=CONFIG]n1439432[/ATTACH]

for the iteration part, this is the SPSS set up I used:

[ATTACH=CONFIG]n1439433[/ATTACH]

But then I don't see any information in Stata output or SPSS output about iteration log.

Moreover, I don't get any message of the type "not concave".

I am now running the same code, with the difficult option, i.e.:

Code:

oglm B5_new age ib10.B7_new i.B2B_2 i.B2B_5 ib3.strong_internetB, link(loglog) difficult

It is running, but I have many messages saying "flat or discontinuous region encountered
numerical derivatives are approximate"

Thanks again for you answers,

Kind Regards,

Jean

Are you familiar with the issue of perfect prediction in logistic regression? That is, for some combinations of your independent variables, all the respondents score 0 or score 1. SPSS gave you an error message indicating that you are facing a similar issue here. I'm a bit surprised that Stata didn't issue the same message, but you should have it as well. In retrospect, you have 196 observations, and you have 2 binary variables, 1 categorical variable with 3 values, and one categorical variable with 8 values. This is not actually surprising.

I'm less certain what to do about it. If your rarely-endorsed categories of the outcome variable and maybe of B7 are at the ends of the scale, it may be worth collapsing them. It may be worth running this as an OLS model. Neither is a great solution, but with your current model, I do not believe the results are valid. I hope Richard chimes in, though.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#21

14 Apr 2018, 16:09

OK, both Stata and SPSS are saying the results cannot be trusted.

Now I go into generic advice mode. Simplify the models, adding variables one or two at a time. Check the frequencies for your dependent variable. If some categories have very few cases, you may need to combine some adjacent categories.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment

Jean Torgigial

Join Date: Feb 2016
Posts: 23

#22

15 Apr 2018, 00:56

Originally posted by Richard Williams View Post

Try Weiwen's code. If still having trouble, are you free to post your data? You only have 196 cases. You could use dataex and use the count option to include all 196 cases. Or if don't want to share it with the world you could email it to me. There is always a chance there is a problem with oglm but I hope not!

I agree that the missing SEs seem odd. But it would be easier to diagnose with the data.

Anyway, try Weiwen's code and hope that that solves all problems.

Dear Richard,

Thanks for your time, I post my data here, but at the same time I will try what you said in your last message, i.e. combining some adjacent categories, and see what happens.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(age B2B_2 B2B_5 strong_internetB B5_new B7_new)
20 0 0 2 10 10
41 0 0 1  8  5
28 1 1 3  7  7
35 1 1 1  8  8
24 1 0 1  7  6
45 1 1 3  8  8
34 1 1 3  9 10
40 1 0 2  6  6
43 1 0 1  8  7
43 1 1 3  9  8
32 1 1 3  8  8
32 0 1 3 10 10
31 1 1 3  8  8
30 0 1 1  9  9
25 1 0 1  9  9
25 1 0 1  3  0
41 1 1 2  8  8
20 1 1 2  8  8
24 1 1 2  8  6
36 1 1 3  8  8
32 1 1 3  9  9
34 1 1 1  9  9
25 1 1 1 10 10
25 1 1 1 10 10
29 1 1 3 10 10
29 0 0 2  7  7
36 0 0 3 10 10
23 0 1 3  9  9
32 1 1 3 10 10
20 1 1 1  8  8
28 1 1 3 10 10
19 0 0 2  9  9
26 1 1 3 10  7
29 1 1 1  9  6
21 1 0 1  8  6
26 1 1 1 10 10
42 1 0 3  9  9
23 1 1 1  8  8
37 1 0 1  8  9
28 1 1 3  7  7
22 1 1 1  9  9
23 1 1 1  6  7
40 1 1 1  8  9
23 1 1 1  8  7
33 1 0 3  8  8
38 1 1 2  7  8
19 0 1 1  8  8
20 1 1 1  9  9
36 1 1 2  8  7
29 0 1 2  8  8
31 1 1 3 10  9
44 1 0 2 10  7
24 1 1 2 10 10
38 0 1 1  7  8
30 0 0 3  8  8
30 1 1 3 10 10
21 1 1 1 10 10
21 1 1 1  8  8
32 1 1 1 10 10
31 0 0 1 10 10
35 1 1 1 10 10
27 0 0 2  9  9
44 1 1 3 10 10
38 0 0 2  7  6
28 0 1 2  8  5
25 1 1 2  9 10
43 1 1 3  8  8
28 1 1 3  9  8
42 1 0 2  0  0
42 1 1 3  9  8
29 1 1 1  7  7
43 1 0 1  6  5
22 1 1 2  3  0
21 1 1 2 10 10
27 0 1 3 10  9
25 1 1 3  9  8
45 1 1 2  8  8
33 0 0 2  5  4
29 0 0 1  9  6
28 1 1 3  9  8
38 1 0 1  8  7
32 0 1 2  7  7
35 1 1 1  9  9
44 1 1 3  8 10
20 1 1 3  9  7
23 1 1 1 10 10
21 1 1 1 10 10
27 1 1 1  7  7
33 1 1 2  5  5
24 1 1 2  9  9
21 0 0 3 10 10
21 0 0 1  8  7
39 1 1 1 10 10
30 1 1 1  8  9
38 1 1 3  7  7
39 0 1 3  8  8
38 1 0 3 10  8
32 1 1 3 10 10
22 0 0 1  9  9
38 0 0 1  6  6
25 1 1 3  7  6
21 1 1 2  8  8
43 0 0 1  6  8
42 1 1 3 10 10
44 1 0 3  8  8
37 1 1 2 10 10
23 1 1 1  8  9
34 1 0 1  5  6
42 1 1 3  9  9
29 1 1 3 10 10
40 1 0 2  8  7
34 1 1 3 10 10
21 1 1 3  8  8
32 0 0 1  9  9
36 1 1 2  7  8
27 0 0 1  3  3
31 0 1 1 10 10
21 0 1 1  5  5
43 0 0 1  7  7
23 0 0 1 10 10
27 1 1 1 10  9
39 1 1 1  5  8
31 1 1 3  9  9
40 1 1 3  7  5
27 1 1 3  9  9
19 1 1 3 10 10
42 1 1 3 10 10
44 1 1 3  8  8
31 1 1 1 10 10
21 1 1 1 10  8
34 1 1 3 10 10
34 1 1 3  9  9
27 1 0 3  8  8
40 1 0 2  9  7
32 1 1 3  9  9
37 1 1 3 10 10
19 0 1 3  7  8
40 1 1 3 10 10
35 1 1 3  9  9
27 1 1 3 10  9
26 1 1 1  8  8
20 1 1 1  6  7
31 1 0 1  5  6
31 0 0 2  7  7
22 0 0 3  9  8
27 1 0 1 10 10
41 1 1 3 10 10
42 1 1 1  9  9
26 1 1 1 10 10
33 1 1 1  8  8
28 0 0 1  8  8
20 0 1 3  6  5
42 1 1 3  7  6
29 1 1 1  5  5
26 1 1 1  9 10
43 0 0 2 10 10
37 1 1 1  9  9
25 1 1 3  9  8
22 1 1 3  7  7
19 1 1 3 10 10
19 0 1 1  8  9
24 1 1 1 10 10
19 1 1 1  8 10
23 1 1 3  8  8
21 1 1 3 10  5
22 0 1 2  8  7
22 1 1 3 10 10
23 1 1 1 10 10
21 1 1 2 10 10
23 1 1 1 10 10
40 1 1 3  9  9
44 1 1 3 10 10
18 1 1 3  9  8
24 0 1 1 10 10
41 1 1 3 10 10
31 1 1 3 10 10
38 0 1 1 10 10
30 1 1 1  6  6
33 1 1 3  8  8
32 0 0 2  7  7
20 1 1 3 10  9
33 1 1 1  9  9
20 1 1 2  8  8
22 1 1 1 10  9
35 1 0 1  8  8
18 1 1 3  9 10
28 0 0 1 10 10
26 1 1 3 10 10
20 1 1 1  8  8
18 1 1 1  9  8
21 1 1 3 10  9
30 1 1 3 10 10
38 1 0 1 10 10
45 1 1 1  8  8
29 1 1 3 10 10
21 1 1 3 10 10
end

Kind Regards,

Jean

Comment

Jean Torgigial

Join Date: Feb 2016

Posts: 23
#23

15 Apr 2018, 00:59

[QUOTE=Weiwen Ng;n1439450]

Originally posted by Jean Torgigial View Post

Are you familiar with the issue of perfect prediction in logistic regression? That is, for some combinations of your independent variables, all the respondents score 0 or score 1. SPSS gave you an error message indicating that you are facing a similar issue here. I'm a bit surprised that Stata didn't issue the same message, but you should have it as well. In retrospect, you have 196 observations, and you have 2 binary variables, 1 categorical variable with 3 values, and one categorical variable with 8 values. This is not actually surprising.

I'm less certain what to do about it. If your rarely-endorsed categories of the outcome variable and maybe of B7 are at the ends of the scale, it may be worth collapsing them. It may be worth running this as an OLS model. Neither is a great solution, but with your current model, I do not believe the results are valid. I hope Richard chimes in, though.

Dear Weiwen

Thanks again for your time;

I will try what Richardrd said, i.e. combining some adjacent categories. Moreover, I posted a message to answer Richard, with my data in dataex format, but I have a message saying "
Message has triggered the spam filter and must be manually approved. To expedite the approval process, click the 'CONTACT US' link at the lower right and reference this thread."

So, I guess that in some time, this message will be approved and my data will be published.

Kind Regards,

Jean
Comment
Jean Torgigial

Join Date: Feb 2016

Posts: 23
#24

15 Apr 2018, 03:30

Dear Richard, Dear Weiwen,

I used your advices and I merged some categories.

Initially, I had :
B5_new:

Category count(observations)

0 1

3 3

5 7

6 8

7 20

8 49

9 41

10 67

B7_new:

Category count(observations)

0 3

3 1

4 1

5 9

6 13

7 24

8 48

9 36

10 61

So I decided to merge, for both B5_new and B7_new all the category from 5 and below.

I recod it in B5_recoded and B7_recoded, so I have now :

I did this code, in Stata and SPSS:

Code:

oglm B5_recoded age ib10.B7_recoded i.B2B_2 i.B2B_5 ib3.strong_internetB, link(loglog)

and the outputs are the following:

Stata:

SPSS:

We can see that the parameters estimates are the same. Also the R squared. But why are p-values different? Also, I have no error in Stata, nothing about convergence etc., But in SPSS, I still have this message:

Does any of you knows why? I mean why there is nothing like this in Stata and there is such a thing in SPSS? I think I am very close to get the same results, but still, there is something that I cannot understand.

Thanks for your help and for your time,

Kind Regards,

Jean

Attached Files
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#25

15 Apr 2018, 06:59

I wouldn't worry about the SPSS warning. It is saying that, if you did some sort of mega crosstab of all your variables, 815 cells would have zero frequencies. You only have 196 cases, so sure, not all possible combos of values can have a non-zero count.

You'd probably have to read the plum technical documentation to get more of an explanation. I wouldn't worry too much about the small differences in standard errors and p-values. But if you are determined to get to the bottom of it, you can try wading though

ftp://public.dhe.ibm.com/software/an...Algorithms.pdf

For oglm, you could also try adding vce(robust), and then see how PLUM and oglm compare. Or maybe vce(opg). The two programs may just be using different approaches to calculating the standard errors.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#26

15 Apr 2018, 07:45

Originally posted by Richard Williams View Post

I wouldn't worry about the SPSS warning. It is saying that, if you did some sort of mega crosstab of all your variables, 815 cells would have zero frequencies. You only have 196 cases, so sure, not all possible combos of values can have a non-zero count.

...

Richard, curiosity on my end: doesn't this cause quasi-separation or complete separation? If so, are the results still valid for ordinal regressions?

Otherwise, I totally agree, the coefficients are identical (within 3 decimal places, anyway) between Stata and SPSS, the SEs don't materially differ, and the difference is likely to be just due to the way the programs estimate variance.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Jean Torgigial

Join Date: Feb 2016

Posts: 23
#27

15 Apr 2018, 07:48

Originally posted by Richard Williams View Post

I wouldn't worry about the SPSS warning. It is saying that, if you did some sort of mega crosstab of all your variables, 815 cells would have zero frequencies. You only have 196 cases, so sure, not all possible combos of values can have a non-zero count.

You'd probably have to read the plum technical documentation to get more of an explanation. I wouldn't worry too much about the small differences in standard errors and p-values. But if you are determined to get to the bottom of it, you can try wading though

ftp://public.dhe.ibm.com/software/an...Algorithms.pdf

For oglm, you could also try adding vce(robust), and then see how PLUM and oglm compare. Or maybe vce(opg). The two programs may just be using different approaches to calculating the standard errors.

Thanks a lot for your help, and also Weiwen and other people who helped me. It is nice to see that there is such a community, and that after only one day my problem is solved, incredible!

Have a nice day,

Kind Regards,

Jean
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#28

15 Apr 2018, 08:39

Richard, curiosity on my end: doesn't this cause quasi-separation or complete separation? If so, are the results still valid for ordinal regressions?

I really don't understand why SPSS reports this. Stata does not. Maybe there would be circumstances where you would be concerned about zero frequency cells. But it is a warning, not a fatal error, so I think it is cautioning you just in case.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#29

15 Apr 2018, 18:01

Originally posted by Richard Williams View Post

I really don't understand why SPSS reports this. . . . Maybe there would be circumstances where you would be concerned about zero frequency cells.

I think that it hearkens back to ANOVA days, where SPSS would warn users of empty cells in interaction terms (think SAS Type IV sums of squares), though I last used SPSS ca. 1980, and so this might be a fanciful recollection.
Comment