Ordinal regression using complementary log log link function

Jean Torgigial

Join Date: Feb 2016

Posts: 23
#1

Ordinal regression using complementary log log link function

14 Apr 2018, 03:38

Hi,

I don't use STATA very often, I did an ordinal regression on SPSS and I would like to redo it on STATA to compare outputs and results.

I tried to predict customer satisfaction from a survey, about telecom industry; for this purpose, I did an ordinal regression on SPSS using a complementary log-log function as link function, because on my data, higher categories of customer satisfaction are more probable.

I have 6 variable:

- B5_new: ordinal, from 0 to 10 , which is my target variable, the customer satisfaction
- B7_new: ordinal, from 0 to 10, which is about how likely are people to recommend their operator to a friend
- age: continuous
- B2B_2 : a dummy variable about the fact that they chose their operator among others for quality of operator
- B2B_5: also dummy, about the fact that they chose their operator among others for after sales service
- strong_internetB: a variable with 3 levels, but with no orders. Categorical variable

So B5_new is my target and others are my dependents variables in SPSS.

My question is the following; I spent hours and hours trying to redo it in STATA, with the clog log link function, and with the oglm . I am not able to do it in stata, to handle categorical and ordinal and also at the same time doing the ordinal regression with complementary log log link; does anyone knows how to do it? It would help me a lot! I have nice outputs in SPSS but I am frustrated to not be able to redo it in STATA.

thanks in advance,

Jean
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#2

14 Apr 2018, 04:03

See the helpfile for gsem.

Code:

gsem (B5 <- c.age i.(B7 B2B_2 B2B_5), ocloglog)

Last edited by Joseph Coveney; 14 Apr 2018, 04:13. Reason: Needs a close parenthesis after B2B_5, and it's B2B_5 and not B2B-5.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

14 Apr 2018, 04:22

You may wish to read this clarifying note about differences between both softwares conrcening ordinal regression models.

This being so, tricking enough, you need the loglog link in Stata, when you wish to dovetail with the cloglog link in SPSS

I suspect Richard William's - oglm - may be helpful to you, since it provides this link.

Last edited by Marcos Almeida; 14 Apr 2018, 04:27.

Best regards,

Marcos
Comment
Jean Torgigial

Join Date: Feb 2016

Posts: 23
#4

14 Apr 2018, 06:13

Originally posted by Marcos Almeida View Post

You may wish to read this clarifying note about differences between both softwares conrcening ordinal regression models.

This being so, tricking enough, you need the loglog link in Stata, when you wish to dovetail with the cloglog link in SPSS

I suspect Richard William's - oglm - may be helpful to you, since it provides this link.

Thanks Marcos, I tried the oglm with loglog link, but it does not work and I know it is because of my poor knowledge about how to implement it in stata. In the help, it is written

"oglm depvar [indepvars] [weight] [if exp] [in range] [, link(logit/probit/cloglog/loglog/cauchit/log) force lrforce store(name) constraints(clist) robust cluster(varname) level(#) or irr rrr eform hr log hetero(varlist) scale(varlist) eq2(varlist) hc ls flip maximize_options ]"

So I am trying this "oglm B5_new [c.age i.(B7 B2B_2 B2B_5)][, link(loglog) ]" but it does not work; pretty hard to switch from SPSS to STATA for me but I really want to do it.

Do you know how to code it properly?

Kind regards,

Jean
Comment
Jean Torgigial

Join Date: Feb 2016

Posts: 23
#5

14 Apr 2018, 06:15

Originally posted by Joseph Coveney View Post

See the helpfile for gsem.

Code:

gsem (B5 <- c.age i.(B7 B2B_2 B2B_5), ocloglog)

Thank you too Joseph, I am surprised about the rapidity of your answers and the one from Marcos

When I try what you wrote, I have this as output "note: The following observed variable names will be treated as latent variables: B2B_2, B2B_5,
B5_new, B7_new. If this is not your intention use the nocapslatent option, or identify the
latent variable names in the latent() option.
note: Latent variable B5_new was specified with option family(ordinal), but family(gaussian) is
the only option allowed. Assuming family(gaussian) for B5_new.
note: Latent variable B5_new was specified with option link(cloglog),but link(identity) is the
only option allowed. Assuming (identity) for B5_new.
model not identified;
no paths from latent variable B2B_2 to observed variables
r(503);
"

Kind regards,

Jean
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#6

14 Apr 2018, 07:00

SPSS wrote me several years ago about this. I didn't know they had written a FAQ about it. I wrote oglm to be consistent with Stata's other programs, e.g. it will produce the same results when you estimate the same models witl logit, probit, ologit, oprobit, and cloglog. At least I hope it does. I'll have to try it with gsem now (which didn't exist when I wrote oglm).

To achieve that consistency, oglm had to make some breaks with PLUM. The oglm help says

WARNING: Programs differ in the names used for some links. Stata's loglog link corresponds to SPSS PLUM's cloglog link; and Stata's cloglog link is called nloglog in SPSS.

So if you used cloglog in SPSS you should use loglog in Stata. When you say "it does not work" it is not clear what you mean. Do you get syntax errors? Are the results different from SPSS? Give us code and output so we can see what you mean. Use code tags. See pt 12 of the FAQ.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#7

14 Apr 2018, 07:12

oglm and gsem produce the same results with this code:

Code:

webuse nhanes2f, clear oglm health i.female height weight, link(cloglog) gsem (health <- i.female height weight), ocloglog

I think oglm should do what you want but without seeing code and output it is hard to diagnose what the problem is.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Jean Torgigial

Join Date: Feb 2016

Posts: 23
#8

14 Apr 2018, 07:38

Dear Richard,

Thanks for your interest in my question; I tried this:

Code:

oglm B5_new age i.(B7_new B2B_2 B2B_5 strong_internetB), link(loglog)

The first time I did a mistake. Now it is running (since 10minutes), I will tell you about the output, if it is the same as my SPSS output. Regarding the remarks, I am now using the loglog link to be consistent with clog log of SPSS.

Also, I am wondering how, with my code, STATA knows that strong_internetB is categorical (from 1 to 3, but no order, just kind of label) and B7_new is ordinal? I think I must adapt my code, I read about it, but I am a bit confused with STATA syntax, in SPSS you can directly put ordinal etc on the variable option. I hope my question is not too easy,I am not an expert in statistics, so I have some pressure regarding the fact that you wrote the oglm procedure.

Anyway, thanks for you attention, I hope I will be able to get my STATA output for this ordinal regression,

Kind Regards,

Jean
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#9

14 Apr 2018, 07:56

It won't know if the variables are ordered or not. It is just going to break them into dummy variables, If SPSS has some special way of treating ordinal independent variables, that feature is not replicated in oglm. But see what the results look like.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Jean Torgigial

Join Date: Feb 2016

Posts: 23
#10

14 Apr 2018, 09:06

Dear Richard,

I have now an output, but it does not match the one from SPSS, I don't get why:

Stata output:

And this is what I have in my SPSS output:

Thanks for your attention,

Kind Regard,

Jean
Attached Files

Last edited by Jean Torgigial; 14 Apr 2018, 09:37.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#11

14 Apr 2018, 10:15

[QUOTE=Jean Torgigial;n1439393]Dear Richard,

I have now an output, but it does not match the one from SPSS, I don't get why:

Stata output: [ATTACH=CONFIG]n1439395[/ATTACH]
[ATTACH=CONFIG]n1439396[/ATTACH]

Originally posted by Jean Torgigial View Post

...
When I try what you wrote, I have this as output "note: The following observed variable names will be treated as latent variables: B2B_2, B2B_5,
B5_new, B7_new. If this is not your intention use the nocapslatent option, or identify the
latent variable names in the latent() option.
...

FYI, this error is because, as -gsem- said, it is treating variables starting with B as latent variables. You could have corrected this by typing:

Code:

gsem (B5 <- c.age i.(B7 B2B_2 B2B_5), ocloglog), nocapslatent

I see a couple of immediate differences between the SPSS output and the Stata one. First, the variables SPSS labels as Threshold variables are what Stata labels /cut. Second, Stata omits the base levels for categorical variables from the output, whereas SPSS labels them with the suffix a.

The variable B7_new looks like like it has values 0, 3, 4, 5, ... 10. It looks like SPSS chose 10 as the base level, and Stata chose 0. To make Stata behave the same way as SPSS with the base values, you'd type

Code:

oglm B5_new age ib10.B7_new i.B2B_2 i.B2B_5 ib3.strong_internetB), link(loglog)

(Note: I am assuming that SPSS treats 0 as the base value for the binary variables B2B_2 and _5 - the coefficients' signs match in both outputs.)

However, Stata couldn't estimate standard errors for B7_new = 3, and the coding scheme for B7_new and B5_new seem a bit odd. Does 0 mean the respondent marked N/A or don't know, or left the response missing? If Stata couldn't estimate an SE, that points to a possible convergence issue, and it is a bit worrying.

In general, the magnitudes the coefficients for age, B2B_5, and strong_internetB look consistent between the programs. Jean, could you please tell us more about the coding scheme for B5_new and B5_new?

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#12

14 Apr 2018, 10:41

Notice that the log likelihood, the LR chi2, and the Pseudo R2 are exactly the same in both Stata and SPSS, So they are almost certainly estimating the same model. I think the differences are due to choosing different base levels, and my guess is Weiwen's code will fix that. But in any event, if log likelihood, LR Chi2, and Pseudo R2 are the same, it is very likely you are looking at the same model, perhaps parameterized differently.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Jean Torgigial

Join Date: Feb 2016

Posts: 23
#13

14 Apr 2018, 10:42

[QUOTE=Weiwen Ng;n1439405]

Originally posted by Jean Torgigial View Post

Dear Richard,

I have now an output, but it does not match the one from SPSS, I don't get why:

Stata output: [ATTACH=CONFIG]n1439395[/ATTACH]
[ATTACH=CONFIG]n1439396[/ATTACH]

FYI, this error is because, as -gsem- said, it is treating variables starting with B as latent variables. You could have corrected this by typing:

Code:

gsem (B5 <- c.age i.(B7 B2B_2 B2B_5), ocloglog), nocapslatent

I see a couple of immediate differences between the SPSS output and the Stata one. First, the variables SPSS labels as Threshold variables are what Stata labels /cut. Second, Stata omits the base levels for categorical variables from the output, whereas SPSS labels them with the suffix a.

The variable B7_new looks like like it has values 0, 3, 4, 5, ... 10. It looks like SPSS chose 10 as the base level, and Stata chose 0. To make Stata behave the same way as SPSS with the base values, you'd type

Code:

oglm B5_new age ib10.B7_new i.B2B_2 i.B2B_5 ib3.strong_internetB), link(loglog)

(Note: I am assuming that SPSS treats 0 as the base value for the binary variables B2B_2 and _5 - the coefficients' signs match in both outputs.)

However, Stata couldn't estimate standard errors for B7_new = 3, and the coding scheme for B7_new and B5_new seem a bit odd. Does 0 mean the respondent marked N/A or don't know, or left the response missing? If Stata couldn't estimate an SE, that points to a possible convergence issue, and it is a bit worrying.

In general, the magnitudes the coefficients for age, B2B_5, and strong_internetB look consistent between the programs. Jean, could you please tell us more about the coding scheme for B5_new and B5_new?

Dear Weiwen,

Thanks for your answer.

B5_new: has eight different values, regarding what people answer in the survey (196 answers). 0 stands for "extremely dissatisfied" and 10 for "extremely satisfied".
B7_new: nine different values. 0 stands for " would certainly not recommend" and 10 stands for "would certainly recommend".

So the zeros in both variables are not missing values, but represent the lowest level of satisfaction or recommendation.

There are 196 observations, and there is no missing values or no N/A or don't know etc.

Thanks again for taking your time to answer my question,

Kind Regards,

Jean
Comment
Jean Torgigial

Join Date: Feb 2016

Posts: 23
#14

14 Apr 2018, 10:44

Originally posted by Richard Williams View Post

Notice that the log likelihood, the LR chi2, and the Pseudo R2 are exactly the same in both Stata and SPSS, So they are almost certainly estimating the same model. I think the differences are due to choosing different base levels, and my guess is Weiwen's code will fix that. But in any event, if log likelihood, LR Chi2, and Pseudo R2 are the same, it is very likely you are looking at the same model, perhaps parameterized differently.

Dear Richard,

I don't know how I did this but I repost the same picture at the end of my message, sorry for this. Actually pseudo r squared and log likelihood are different in SPSS.

Kind regards,

Jean
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#15

14 Apr 2018, 12:13

[QUOTE=Jean Torgigial;n1439414]

Originally posted by Weiwen Ng View Post

Dear Weiwen,

Thanks for your answer.

B5_new: has eight different values, regarding what people answer in the survey (196 answers). 0 stands for "extremely dissatisfied" and 10 for "extremely satisfied".
B7_new: nine different values. 0 stands for " would certainly not recommend" and 10 stands for "would certainly recommend".

So the zeros in both variables are not missing values, but represent the lowest level of satisfaction or recommendation.

There are 196 observations, and there is no missing values or no N/A or don't know etc.

Thanks again for taking your time to answer my question,

Kind Regards,

Jean

That helps. Would you run the Stata code I typed and let us see what the results are?

Code:

oglm B5_new age ib10.B7_new i.B2B_2 i.B2B_5 ib3.strong_internetB), link(loglog)

That ought to make the results more directly comparable between the Stata and SPSS outputs. I think that in both programs, the coefficients for the independent variables all correspond to the odds of a higher response on the dependent variable - can anyone confirm or refute?

Do note that as per the link Marcos shared with us in post #3, Stata parameterizes the cut points/threshold parameters differently than SPSS does - Stata's /cut1 corresponds to the odds of responding 0 or lower on the dependent variable, whereas SPSS's correspond to the probability of responding at 0 or higher.

If the SE for one of the cutpoints in Stata is missing, then I'm honestly not sure what to do.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

Ordinal regression using complementary log log link function

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment