Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Explain categorical variable - OrderedProbitModel

    Hi,
    I am currently evaluating my experimental results where I want to test which variable is explaining a categorical value better in two types of subjects.

    The categorical value takes the numbers 1 to 10, so on the left side every value between 1 and 10 is possible as y.
    I have two possible explaining variables. One is called Ra and the other one Bo. Both are on the same scale and also categorical between 1 and 10. I want to test now if Ra or Bo is a better predictor for y. I want to compare this relationship between two types of subjects, therefore I included a dummy variable prosocial. If prosocial =1 a person is a prosocial and if =0 its not.

    My first attempt was to use a ordered probit model, with the oprobit command, but as far as I understand the model the coefficients do not give me the marginal effects and therefore not giving me any explanation if ra or bo is having a higher influence on the result of y.

    Can you help me, how can I specify the oprobit model to get the needed results?
    Thanks a lot!

  • #2
    Rene:
    your chances of getting helpful replies are conditional on posting what you typed and what Stata gave you back (as per FAQ).
    Please, let us know some more details concerning your first try with -oprobit-. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      A couple of questions and related comments:

      1) You say that Ra and Bo are categorical, and your descriptions seems to imply that they are *ordered* categorical. An ordered regression model (probit, logit, etc.) will not enable you to treat these predictors as ordered. You would have to represent them as a set of indicator ("dummy") variables, ignoring their ordered character, or treat them as continuous, which is possibly undesirable. For a truly ordinal-ordinal measure of association, you might consider Somers' D, (-somersd-, at ssc), or polychoric correlation (-findit polychoric-). Neither will give you a marginal effect though.

      2) You do not mention whether Ra and Bo are themselves associated. If they are, you would need to use the regression approach, since Somers' D is only a bivariate measure.

      3) You say the model coefficients do not give you the marginal effects. Yes, they do not directly tell you the derivative of the predicted probability with respect to either Ra or Bo. If that derivative is what you want, using -margins- would be helpful. If you want some other marginal effect, you'll need to explain a bit to us what you want.

      Regards, Mike

      Comment


      • #4
        I'm confused by the question. If Ra and Bo are ordinal variables, what does marginal effect even mean? My understanding of a marginal effect is the change in outcome associated with a 1-unit difference in the predictor. But if the predictor is ordinal and not interval level, a 1-unit difference has no consistent meaning or interpretation. What am I missing here? It seems to me that if you are thinking about marginal effects you are at least pretending that the predictor is an interval level ("continuous") variable.

        Comment


        • #5
          I share Clyde's confusion. I am not sure what "explaining better" means. If Ra and Bo were continuous vars measured the same way then you might just see which has the larger coefficient. Given that they are not, perhaps you just want to see which has the more statistically significant effects. Maybe something like

          oprobit y i.Ra i.Bo
          testparm i.Ra
          testparm i.Bo

          But one way or another, you need to define what "explaining better" means.

          Incidentally, if Bo and Ra are ordinal 10 point scales, I might not feel too bad about treating them as continuous, at least if the intervals can be reasonably seen as evenly spaced. There are ways to test if it is ok to treat a variable as continuous or if you need to break it up into dummies instead.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            re: Mike Lacy's (post #3) first point - this is not correct, there are other options; see, e.g., S. D. Walter, A. R. Feinstein, and C. K. Wells (1987), "Coding Ordinal Independent Variables in Multiple Regression Analyses", American Journal of Epidemiology, vol. 125, pp. 319-323; note that I have coded this up in a program called -cascade- which is from the STB and can found with -search cascade-

            Comment


            • #7
              On Mike's second point: Somers' d has applications when there are several predictors. The very first example in help somersd (SSC, SJ, etc.) is of this form.

              Comment


              • #8
                Rich G., the contrast command lets you do so many different types of contrasts. Do any of them match up with what cascade does?
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Thanks to all for adding and correcting my errors. The coding scheme from the Am J Epi is an interesting idea, and I'm surprised I hadn't heard of it, since that's a solution to a problem that would be useful to many people. Google Scholar claims that it has received 108 or so cites since being published 30 yr. ago, but it sounds like it deserves more use than that. Re Nick Cox's comment: I understand that -somersd- can give multiple bivariate results, but I'm not seeing how it's adjusting for another covariate when two predictors are used in the command. I was looking at:
                  Code:
                  sysuse auto
                  somersd foreign mpg weight, tr(z) // from the help file example
                  // and comparing to
                  somersd foreign mpg, tr(z)
                  We must be thinking of different things here, or I've missed something, given the identical Somers' D
                  values for foreign as a predictor of mpg with and without the inclusion of weight in the variable list.

                  Regards, Mike

                  Comment


                  • #10
                    Richard, it seems contrast's ar. operator gives you results identical to those of cascade.

                    Best
                    Daniel

                    Comment


                    • #11
                      Richard W - first, you have an answer from Daniel Klein (thanks Daniel); note also that in the original STB article I showed a different way (long before -contrast- existed) to match the two sets

                      Comment


                      • #12
                        Mike L., given that Stata supports ar., I suspect the method is far more prominent than the 108 citations of the one article would suggest. I am vaguely aware that all these contrast options exist but I won't claim to understand the rationale for all of them.

                        Rich G., thanks for your program. Even though contrasts exists it might be nice if factor variable notation could be extended to handled such things more directly. Part of the reason I asked is because whenever you compute the variables yourself. you run the risk that margins won't handle things correctly since it doesn't realize how the variables are inter-related.
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 19.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #13
                          Riches W & G.: I see that -contrast, ar....- would enable a test, but I like the idea of -cascade-, which enables one to use these contrast so as to focus on estimates of effects.

                          Regards, Mike

                          Comment


                          • #14
                            Mike, that is why I said I would like factor variables to be more flexible. But contrast does more than just do tests; it can give you the values for contrasts. Unless I am missing something, here is how I think you can do the same thing cascade does without using cascade. Compare the logit coefficients produced using cascade with the results from the contrast command.

                            Code:
                            . sysuse nhanes2f, clear
                            
                            . cascade health, gen(hlth)
                            
                            . logit diabetes hlth2-hlth5 i.race
                            
                            Iteration 0:   log likelihood = -1999.0668  
                            Iteration 1:   log likelihood =  -1959.938  
                            Iteration 2:   log likelihood = -1782.8724  
                            Iteration 3:   log likelihood = -1782.2412  
                            Iteration 4:   log likelihood = -1782.2391  
                            Iteration 5:   log likelihood = -1782.2391  
                            
                            Logistic regression                               Number of obs   =      10335
                                                                              LR chi2(6)      =     433.66
                                                                              Prob > chi2     =     0.0000
                            Log likelihood = -1782.2391                       Pseudo R2       =     0.1085
                            
                            ------------------------------------------------------------------------------
                                diabetes |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                                   hlth2 |  -.7359927   .1264595    -5.82   0.000    -.9838488   -.4881366
                                   hlth3 |  -.8125519   .1214278    -6.69   0.000    -1.050546   -.5745577
                                   hlth4 |  -.9733558   .1747996    -5.57   0.000    -1.315957   -.6307549
                                   hlth5 |  -.5581509   .2543835    -2.19   0.028    -1.056733   -.0595684
                                         |
                                    race |
                                  Black  |   .2584663   .1278144     2.02   0.043     .0079546     .508978
                                  Other  |   .0582342   .3520786     0.17   0.869    -.6318271    .7482956
                                         |
                                   _cons |  -1.536722   .0997803   -15.40   0.000    -1.732288   -1.341156
                            ------------------------------------------------------------------------------
                            
                            . logit diabetes i.health i.race
                            
                            Iteration 0:   log likelihood = -1999.0668  
                            Iteration 1:   log likelihood =  -1959.938  
                            Iteration 2:   log likelihood = -1782.8724  
                            Iteration 3:   log likelihood = -1782.2412  
                            Iteration 4:   log likelihood = -1782.2391  
                            Iteration 5:   log likelihood = -1782.2391  
                            
                            Logistic regression                               Number of obs   =      10335
                                                                              LR chi2(6)      =     433.66
                                                                              Prob > chi2     =     0.0000
                            Log likelihood = -1782.2391                       Pseudo R2       =     0.1085
                            
                            ------------------------------------------------------------------------------
                                diabetes |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                                  health |
                                   fair  |  -.7359927   .1264595    -5.82   0.000    -.9838488   -.4881366
                                average  |  -1.548545   .1307553   -11.84   0.000     -1.80482   -1.292269
                                   good  |    -2.5219   .1788701   -14.10   0.000    -2.872479   -2.171322
                              excellent  |  -3.080051   .2270389   -13.57   0.000    -3.525039   -2.635063
                                         |
                                    race |
                                  Black  |   .2584663   .1278144     2.02   0.043     .0079546     .508978
                                  Other  |   .0582342   .3520786     0.17   0.869    -.6318271    .7482956
                                         |
                                   _cons |  -1.536722   .0997803   -15.40   0.000    -1.732288   -1.341156
                            ------------------------------------------------------------------------------
                            
                            . contrast ar.health, effects
                            
                            Contrasts of marginal linear predictions
                            
                            Margins      : asbalanced
                            
                            --------------------------------------------------------
                                                 |         df        chi2     P>chi2
                            ---------------------+----------------------------------
                                          health |
                                 (fair vs poor)  |          1       33.87     0.0000
                              (average vs fair)  |          1       44.78     0.0000
                              (good vs average)  |          1       31.01     0.0000
                            (excellent vs good)  |          1        4.81     0.0282
                                          Joint  |          4      355.48     0.0000
                            --------------------------------------------------------
                            
                            --------------------------------------------------------------------------------------
                                                 |   Contrast   Std. Err.      z    P>|z|     [95% Conf. Interval]
                            ---------------------+----------------------------------------------------------------
                                          health |
                                 (fair vs poor)  |  -.7359927   .1264595    -5.82   0.000    -.9838488   -.4881366
                              (average vs fair)  |  -.8125519   .1214278    -6.69   0.000    -1.050546   -.5745577
                              (good vs average)  |  -.9733558   .1747996    -5.57   0.000    -1.315957   -.6307549
                            (excellent vs good)  |  -.5581509   .2543835    -2.19   0.028    -1.056733   -.0595684
                            --------------------------------------------------------------------------------------
                            
                            .
                            -------------------------------------------
                            Richard Williams, Notre Dame Dept of Sociology
                            StataNow Version: 19.5 MP (2 processor)

                            EMAIL: [email protected]
                            WWW: https://www3.nd.edu/~rwilliam

                            Comment

                            Working...
                            X