Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ordinal regression using complementary log log link function

    Hi,

    I don't use STATA very often, I did an ordinal regression on SPSS and I would like to redo it on STATA to compare outputs and results.

    I tried to predict customer satisfaction from a survey, about telecom industry; for this purpose, I did an ordinal regression on SPSS using a complementary log-log function as link function, because on my data, higher categories of customer satisfaction are more probable.

    I have 6 variable:

    - B5_new: ordinal, from 0 to 10 , which is my target variable, the customer satisfaction
    - B7_new: ordinal, from 0 to 10, which is about how likely are people to recommend their operator to a friend
    - age: continuous
    - B2B_2 : a dummy variable about the fact that they chose their operator among others for quality of operator
    - B2B_5: also dummy, about the fact that they chose their operator among others for after sales service
    - strong_internetB: a variable with 3 levels, but with no orders. Categorical variable


    So B5_new is my target and others are my dependents variables in SPSS.

    My question is the following; I spent hours and hours trying to redo it in STATA, with the clog log link function, and with the oglm . I am not able to do it in stata, to handle categorical and ordinal and also at the same time doing the ordinal regression with complementary log log link; does anyone knows how to do it? It would help me a lot! I have nice outputs in SPSS but I am frustrated to not be able to redo it in STATA.

    thanks in advance,

    Jean




  • #2
    See the helpfile for gsem.
    Code:
    gsem (B5 <- c.age i.(B7 B2B_2 B2B_5), ocloglog)
    Last edited by Joseph Coveney; 14 Apr 2018, 04:13. Reason: Needs a close parenthesis after B2B_5, and it's B2B_5 and not B2B-5.

    Comment


    • #3
      You may wish to read this clarifying note about differences between both softwares conrcening ordinal regression models.

      This being so, tricking enough, you need the loglog link in Stata, when you wish to dovetail with the cloglog link in SPSS

      I suspect Richard William's - oglm - may be helpful to you, since it provides this link.
      Last edited by Marcos Almeida; 14 Apr 2018, 04:27.
      Best regards,

      Marcos

      Comment


      • #4
        Originally posted by Marcos Almeida View Post
        You may wish to read this clarifying note about differences between both softwares conrcening ordinal regression models.

        This being so, tricking enough, you need the loglog link in Stata, when you wish to dovetail with the cloglog link in SPSS

        I suspect Richard William's - oglm - may be helpful to you, since it provides this link.
        Thanks Marcos, I tried the oglm with loglog link, but it does not work and I know it is because of my poor knowledge about how to implement it in stata. In the help, it is written

        "oglm depvar [indepvars] [weight] [if exp] [in range] [, link(logit/probit/cloglog/loglog/cauchit/log) force lrforce store(name) constraints(clist) robust cluster(varname) level(#) or irr rrr eform hr log hetero(varlist) scale(varlist) eq2(varlist) hc ls flip maximize_options ]"

        So I am trying this "oglm B5_new [c.age i.(B7 B2B_2 B2B_5)][, link(loglog) ]" but it does not work; pretty hard to switch from SPSS to STATA for me but I really want to do it.

        Do you know how to code it properly?

        Kind regards,

        Jean

        Comment


        • #5
          Originally posted by Joseph Coveney View Post
          See the helpfile for gsem.
          Code:
          gsem (B5 <- c.age i.(B7 B2B_2 B2B_5), ocloglog)
          Thank you too Joseph, I am surprised about the rapidity of your answers and the one from Marcos

          When I try what you wrote, I have this as output "note: The following observed variable names will be treated as latent variables: B2B_2, B2B_5,
          B5_new, B7_new. If this is not your intention use the nocapslatent option, or identify the
          latent variable names in the latent() option.
          note: Latent variable B5_new was specified with option family(ordinal), but family(gaussian) is
          the only option allowed. Assuming family(gaussian) for B5_new.
          note: Latent variable B5_new was specified with option link(cloglog),but link(identity) is the
          only option allowed. Assuming (identity) for B5_new.
          model not identified;
          no paths from latent variable B2B_2 to observed variables
          r(503);
          "

          Kind regards,

          Jean

          Comment


          • #6
            SPSS wrote me several years ago about this. I didn't know they had written a FAQ about it. I wrote oglm to be consistent with Stata's other programs, e.g. it will produce the same results when you estimate the same models witl logit, probit, ologit, oprobit, and cloglog. At least I hope it does. I'll have to try it with gsem now (which didn't exist when I wrote oglm).

            To achieve that consistency, oglm had to make some breaks with PLUM. The oglm help says

            WARNING: Programs differ in the names used for some links. Stata's loglog link corresponds to SPSS PLUM's cloglog link; and Stata's cloglog link is called nloglog in SPSS.

            So if you used cloglog in SPSS you should use loglog in Stata. When you say "it does not work" it is not clear what you mean. Do you get syntax errors? Are the results different from SPSS? Give us code and output so we can see what you mean. Use code tags. See pt 12 of the FAQ.

            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              oglm and gsem produce the same results with this code:

              Code:
              webuse nhanes2f, clear
              oglm health i.female height weight, link(cloglog)
              gsem (health <- i.female height weight), ocloglog
              I think oglm should do what you want but without seeing code and output it is hard to diagnose what the problem is.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Dear Richard,

                Thanks for your interest in my question; I tried this:

                Code:
                oglm B5_new  age i.(B7_new B2B_2 B2B_5 strong_internetB), link(loglog)
                The first time I did a mistake. Now it is running (since 10minutes), I will tell you about the output, if it is the same as my SPSS output. Regarding the remarks, I am now using the loglog link to be consistent with clog log of SPSS.

                Also, I am wondering how, with my code, STATA knows that strong_internetB is categorical (from 1 to 3, but no order, just kind of label) and B7_new is ordinal? I think I must adapt my code, I read about it, but I am a bit confused with STATA syntax, in SPSS you can directly put ordinal etc on the variable option. I hope my question is not too easy,I am not an expert in statistics, so I have some pressure regarding the fact that you wrote the oglm procedure.

                Anyway, thanks for you attention, I hope I will be able to get my STATA output for this ordinal regression,

                Kind Regards,

                Jean

                Comment


                • #9
                  It won't know if the variables are ordered or not. It is just going to break them into dummy variables, If SPSS has some special way of treating ordinal independent variables, that feature is not replicated in oglm. But see what the results look like.
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  StataNow Version: 19.5 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://www3.nd.edu/~rwilliam

                  Comment


                  • #10
                    Dear Richard,

                    I have now an output, but it does not match the one from SPSS, I don't get why:

                    Stata output:
                    Click image for larger version

Name:	Screen Shot 2018-04-14 at 16.44.25.png
Views:	2
Size:	18.0 KB
ID:	1439395

                    Click image for larger version

Name:	Screen Shot 2018-04-14 at 16.44.14.png
Views:	1
Size:	78.3 KB
ID:	1439396









                    And this is what I have in my SPSS output:
                    Click image for larger version

Name:	Screen Shot 2018-04-14 at 16.47.25.png
Views:	1
Size:	108.2 KB
ID:	1439397



                    Thanks for your attention,

                    Kind Regard,

                    Jean
                    Attached Files
                    Last edited by Jean Torgigial; 14 Apr 2018, 09:37.

                    Comment


                    • #11
                      [QUOTE=Jean Torgigial;n1439393]Dear Richard,

                      I have now an output, but it does not match the one from SPSS, I don't get why:

                      Stata output: [ATTACH=CONFIG]n1439395[/ATTACH]
                      [ATTACH=CONFIG]n1439396[/ATTACH]


                      Originally posted by Jean Torgigial View Post

                      ...
                      When I try what you wrote, I have this as output "note: The following observed variable names will be treated as latent variables: B2B_2, B2B_5,
                      B5_new, B7_new. If this is not your intention use the nocapslatent option, or identify the
                      latent variable names in the latent() option.
                      ...
                      FYI, this error is because, as -gsem- said, it is treating variables starting with B as latent variables. You could have corrected this by typing:

                      Code:
                      gsem (B5 <- c.age i.(B7 B2B_2 B2B_5), ocloglog), nocapslatent
                      I see a couple of immediate differences between the SPSS output and the Stata one. First, the variables SPSS labels as Threshold variables are what Stata labels /cut. Second, Stata omits the base levels for categorical variables from the output, whereas SPSS labels them with the suffix a.

                      The variable B7_new looks like like it has values 0, 3, 4, 5, ... 10. It looks like SPSS chose 10 as the base level, and Stata chose 0. To make Stata behave the same way as SPSS with the base values, you'd type

                      Code:
                      oglm B5_new age ib10.B7_new i.B2B_2 i.B2B_5 ib3.strong_internetB), link(loglog)
                      (Note: I am assuming that SPSS treats 0 as the base value for the binary variables B2B_2 and _5 - the coefficients' signs match in both outputs.)

                      However, Stata couldn't estimate standard errors for B7_new = 3, and the coding scheme for B7_new and B5_new seem a bit odd. Does 0 mean the respondent marked N/A or don't know, or left the response missing? If Stata couldn't estimate an SE, that points to a possible convergence issue, and it is a bit worrying.

                      In general, the magnitudes the coefficients for age, B2B_5, and strong_internetB look consistent between the programs. Jean, could you please tell us more about the coding scheme for B5_new and B5_new?
                      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                      Comment


                      • #12
                        Notice that the log likelihood, the LR chi2, and the Pseudo R2 are exactly the same in both Stata and SPSS, So they are almost certainly estimating the same model. I think the differences are due to choosing different base levels, and my guess is Weiwen's code will fix that. But in any event, if log likelihood, LR Chi2, and Pseudo R2 are the same, it is very likely you are looking at the same model, perhaps parameterized differently.
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 19.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #13
                          [QUOTE=Weiwen Ng;n1439405]
                          Originally posted by Jean Torgigial View Post
                          Dear Richard,

                          I have now an output, but it does not match the one from SPSS, I don't get why:

                          Stata output: [ATTACH=CONFIG]n1439395[/ATTACH]
                          [ATTACH=CONFIG]n1439396[/ATTACH]




                          FYI, this error is because, as -gsem- said, it is treating variables starting with B as latent variables. You could have corrected this by typing:

                          Code:
                          gsem (B5 <- c.age i.(B7 B2B_2 B2B_5), ocloglog), nocapslatent
                          I see a couple of immediate differences between the SPSS output and the Stata one. First, the variables SPSS labels as Threshold variables are what Stata labels /cut. Second, Stata omits the base levels for categorical variables from the output, whereas SPSS labels them with the suffix a.

                          The variable B7_new looks like like it has values 0, 3, 4, 5, ... 10. It looks like SPSS chose 10 as the base level, and Stata chose 0. To make Stata behave the same way as SPSS with the base values, you'd type

                          Code:
                          oglm B5_new age ib10.B7_new i.B2B_2 i.B2B_5 ib3.strong_internetB), link(loglog)
                          (Note: I am assuming that SPSS treats 0 as the base value for the binary variables B2B_2 and _5 - the coefficients' signs match in both outputs.)

                          However, Stata couldn't estimate standard errors for B7_new = 3, and the coding scheme for B7_new and B5_new seem a bit odd. Does 0 mean the respondent marked N/A or don't know, or left the response missing? If Stata couldn't estimate an SE, that points to a possible convergence issue, and it is a bit worrying.

                          In general, the magnitudes the coefficients for age, B2B_5, and strong_internetB look consistent between the programs. Jean, could you please tell us more about the coding scheme for B5_new and B5_new?
                          Dear Weiwen,

                          Thanks for your answer.

                          B5_new: has eight different values, regarding what people answer in the survey (196 answers). 0 stands for "extremely dissatisfied" and 10 for "extremely satisfied".
                          B7_new: nine different values. 0 stands for " would certainly not recommend" and 10 stands for "would certainly recommend".

                          So the zeros in both variables are not missing values, but represent the lowest level of satisfaction or recommendation.

                          There are 196 observations, and there is no missing values or no N/A or don't know etc.

                          Thanks again for taking your time to answer my question,

                          Kind Regards,

                          Jean

                          Comment


                          • #14
                            Originally posted by Richard Williams View Post
                            Notice that the log likelihood, the LR chi2, and the Pseudo R2 are exactly the same in both Stata and SPSS, So they are almost certainly estimating the same model. I think the differences are due to choosing different base levels, and my guess is Weiwen's code will fix that. But in any event, if log likelihood, LR Chi2, and Pseudo R2 are the same, it is very likely you are looking at the same model, perhaps parameterized differently.
                            Dear Richard,

                            I don't know how I did this but I repost the same picture at the end of my message, sorry for this. Actually pseudo r squared and log likelihood are different in SPSS.

                            Kind regards,

                            Jean

                            Comment


                            • #15
                              [QUOTE=Jean Torgigial;n1439414]
                              Originally posted by Weiwen Ng View Post

                              Dear Weiwen,

                              Thanks for your answer.

                              B5_new: has eight different values, regarding what people answer in the survey (196 answers). 0 stands for "extremely dissatisfied" and 10 for "extremely satisfied".
                              B7_new: nine different values. 0 stands for " would certainly not recommend" and 10 stands for "would certainly recommend".

                              So the zeros in both variables are not missing values, but represent the lowest level of satisfaction or recommendation.

                              There are 196 observations, and there is no missing values or no N/A or don't know etc.

                              Thanks again for taking your time to answer my question,

                              Kind Regards,

                              Jean
                              That helps. Would you run the Stata code I typed and let us see what the results are?

                              Code:
                               
                               oglm B5_new age ib10.B7_new i.B2B_2 i.B2B_5 ib3.strong_internetB), link(loglog)
                              That ought to make the results more directly comparable between the Stata and SPSS outputs. I think that in both programs, the coefficients for the independent variables all correspond to the odds of a higher response on the dependent variable - can anyone confirm or refute?

                              Do note that as per the link Marcos shared with us in post #3, Stata parameterizes the cut points/threshold parameters differently than SPSS does - Stata's /cut1 corresponds to the odds of responding 0 or lower on the dependent variable, whereas SPSS's correspond to the probability of responding at 0 or higher.

                              If the SE for one of the cutpoints in Stata is missing, then I'm honestly not sure what to do.
                              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                              Comment

                              Working...
                              X