Explain categorical variable - OrderedProbitModel

Rene C. Laub

Join Date: Feb 2015

Posts: 1
#1

Explain categorical variable - OrderedProbitModel

28 Feb 2015, 02:23

Hi,
I am currently evaluating my experimental results where I want to test which variable is explaining a categorical value better in two types of subjects.

The categorical value takes the numbers 1 to 10, so on the left side every value between 1 and 10 is possible as y.
I have two possible explaining variables. One is called Ra and the other one Bo. Both are on the same scale and also categorical between 1 and 10. I want to test now if Ra or Bo is a better predictor for y. I want to compare this relationship between two types of subjects, therefore I included a dummy variable prosocial. If prosocial =1 a person is a prosocial and if =0 its not.

My first attempt was to use a ordered probit model, with the oprobit command, but as far as I understand the model the coefficients do not give me the marginal effects and therefore not giving me any explanation if ra or bo is having a higher influence on the result of y.

Can you help me, how can I specify the oprobit model to get the needed results?
Thanks a lot!
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

28 Feb 2015, 04:32

Rene:
your chances of getting helpful replies are conditional on posting what you typed and what Stata gave you back (as per FAQ).
Please, let us know some more details concerning your first try with -oprobit-. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#3

28 Feb 2015, 08:08

A couple of questions and related comments:

1) You say that Ra and Bo are categorical, and your descriptions seems to imply that they are *ordered* categorical. An ordered regression model (probit, logit, etc.) will not enable you to treat these predictors as ordered. You would have to represent them as a set of indicator ("dummy") variables, ignoring their ordered character, or treat them as continuous, which is possibly undesirable. For a truly ordinal-ordinal measure of association, you might consider Somers' D, (-somersd-, at ssc), or polychoric correlation (-findit polychoric-). Neither will give you a marginal effect though.

2) You do not mention whether Ra and Bo are themselves associated. If they are, you would need to use the regression approach, since Somers' D is only a bivariate measure.

3) You say the model coefficients do not give you the marginal effects. Yes, they do not directly tell you the derivative of the predicted probability with respect to either Ra or Bo. If that derivative is what you want, using -margins- would be helpful. If you want some other marginal effect, you'll need to explain a bit to us what you want.

Regards, Mike
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#4

28 Feb 2015, 10:29

I'm confused by the question. If Ra and Bo are ordinal variables, what does marginal effect even mean? My understanding of a marginal effect is the change in outcome associated with a 1-unit difference in the predictor. But if the predictor is ordinal and not interval level, a 1-unit difference has no consistent meaning or interpretation. What am I missing here? It seems to me that if you are thinking about marginal effects you are at least pretending that the predictor is an interval level ("continuous") variable.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#5

28 Feb 2015, 10:46

I share Clyde's confusion. I am not sure what "explaining better" means. If Ra and Bo were continuous vars measured the same way then you might just see which has the larger coefficient. Given that they are not, perhaps you just want to see which has the more statistically significant effects. Maybe something like

oprobit y i.Ra i.Bo
testparm i.Ra
testparm i.Bo

But one way or another, you need to define what "explaining better" means.

Incidentally, if Bo and Ra are ordinal 10 point scales, I might not feel too bad about treating them as continuous, at least if the intervals can be reasonably seen as evenly spaced. There are ways to test if it is ok to treat a variable as continuous or if you need to break it up into dummies instead.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#6

28 Feb 2015, 10:52

re: Mike Lacy's (post #3) first point - this is not correct, there are other options; see, e.g., S. D. Walter, A. R. Feinstein, and C. K. Wells (1987), "Coding Ordinal Independent Variables in Multiple Regression Analyses", American Journal of Epidemiology, vol. 125, pp. 319-323; note that I have coded this up in a program called -cascade- which is from the STB and can found with -search cascade-
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35713
#7

28 Feb 2015, 11:04

On Mike's second point: Somers' d has applications when there are several predictors. The very first example in help somersd (SSC, SJ, etc.) is of this form.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#8

28 Feb 2015, 11:19

Rich G., the contrast command lets you do so many different types of contrasts. Do any of them match up with what cascade does?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#9

28 Feb 2015, 11:48

Thanks to all for adding and correcting my errors. The coding scheme from the Am J Epi is an interesting idea, and I'm surprised I hadn't heard of it, since that's a solution to a problem that would be useful to many people. Google Scholar claims that it has received 108 or so cites since being published 30 yr. ago, but it sounds like it deserves more use than that. Re Nick Cox's comment: I understand that -somersd- can give multiple bivariate results, but I'm not seeing how it's adjusting for another covariate when two predictors are used in the command. I was looking at:

Code:

sysuse auto somersd foreign mpg weight, tr(z) // from the help file example // and comparing to somersd foreign mpg, tr(z)

We must be thinking of different things here, or I've missed something, given the identical Somers' D
values for foreign as a predictor of mpg with and without the inclusion of weight in the variable list.

Regards, Mike
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#10

28 Feb 2015, 12:08

Richard, it seems contrast's ar. operator gives you results identical to those of cascade.

Best
Daniel
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#11

28 Feb 2015, 12:22

Richard W - first, you have an answer from Daniel Klein (thanks Daniel); note also that in the original STB article I showed a different way (long before -contrast- existed) to match the two sets
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#12

28 Feb 2015, 13:14

Mike L., given that Stata supports ar., I suspect the method is far more prominent than the 108 citations of the one article would suggest. I am vaguely aware that all these contrast options exist but I won't claim to understand the rationale for all of them.

Rich G., thanks for your program. Even though contrasts exists it might be nice if factor variable notation could be extended to handled such things more directly. Part of the reason I asked is because whenever you compute the variables yourself. you run the risk that margins won't handle things correctly since it doesn't realize how the variables are inter-related.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#13

01 Mar 2015, 09:43

Riches W & G.: I see that -contrast, ar....- would enable a test, but I like the idea of -cascade-, which enables one to use these contrast so as to focus on estimates of effects.

Regards, Mike
Comment

Richard Williams

Join Date: Apr 2014
Posts: 5008

#14

01 Mar 2015, 10:53

Mike, that is why I said I would like factor variables to be more flexible. But contrast does more than just do tests; it can give you the values for contrasts. Unless I am missing something, here is how I think you can do the same thing cascade does without using cascade. Compare the logit coefficients produced using cascade with the results from the contrast command.

Code:

. sysuse nhanes2f, clear

. cascade health, gen(hlth)

. logit diabetes hlth2-hlth5 i.race

Iteration 0:   log likelihood = -1999.0668  
Iteration 1:   log likelihood =  -1959.938  
Iteration 2:   log likelihood = -1782.8724  
Iteration 3:   log likelihood = -1782.2412  
Iteration 4:   log likelihood = -1782.2391  
Iteration 5:   log likelihood = -1782.2391  

Logistic regression                               Number of obs   =      10335
                                                  LR chi2(6)      =     433.66
                                                  Prob > chi2     =     0.0000
Log likelihood = -1782.2391                       Pseudo R2       =     0.1085

------------------------------------------------------------------------------
    diabetes |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       hlth2 |  -.7359927   .1264595    -5.82   0.000    -.9838488   -.4881366
       hlth3 |  -.8125519   .1214278    -6.69   0.000    -1.050546   -.5745577
       hlth4 |  -.9733558   .1747996    -5.57   0.000    -1.315957   -.6307549
       hlth5 |  -.5581509   .2543835    -2.19   0.028    -1.056733   -.0595684
             |
        race |
      Black  |   .2584663   .1278144     2.02   0.043     .0079546     .508978
      Other  |   .0582342   .3520786     0.17   0.869    -.6318271    .7482956
             |
       _cons |  -1.536722   .0997803   -15.40   0.000    -1.732288   -1.341156
------------------------------------------------------------------------------

. logit diabetes i.health i.race

Iteration 0:   log likelihood = -1999.0668  
Iteration 1:   log likelihood =  -1959.938  
Iteration 2:   log likelihood = -1782.8724  
Iteration 3:   log likelihood = -1782.2412  
Iteration 4:   log likelihood = -1782.2391  
Iteration 5:   log likelihood = -1782.2391  

Logistic regression                               Number of obs   =      10335
                                                  LR chi2(6)      =     433.66
                                                  Prob > chi2     =     0.0000
Log likelihood = -1782.2391                       Pseudo R2       =     0.1085

------------------------------------------------------------------------------
    diabetes |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      health |
       fair  |  -.7359927   .1264595    -5.82   0.000    -.9838488   -.4881366
    average  |  -1.548545   .1307553   -11.84   0.000     -1.80482   -1.292269
       good  |    -2.5219   .1788701   -14.10   0.000    -2.872479   -2.171322
  excellent  |  -3.080051   .2270389   -13.57   0.000    -3.525039   -2.635063
             |
        race |
      Black  |   .2584663   .1278144     2.02   0.043     .0079546     .508978
      Other  |   .0582342   .3520786     0.17   0.869    -.6318271    .7482956
             |
       _cons |  -1.536722   .0997803   -15.40   0.000    -1.732288   -1.341156
------------------------------------------------------------------------------

. contrast ar.health, effects

Contrasts of marginal linear predictions

Margins      : asbalanced

--------------------------------------------------------
                     |         df        chi2     P>chi2
---------------------+----------------------------------
              health |
     (fair vs poor)  |          1       33.87     0.0000
  (average vs fair)  |          1       44.78     0.0000
  (good vs average)  |          1       31.01     0.0000
(excellent vs good)  |          1        4.81     0.0282
              Joint  |          4      355.48     0.0000
--------------------------------------------------------

--------------------------------------------------------------------------------------
                     |   Contrast   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
              health |
     (fair vs poor)  |  -.7359927   .1264595    -5.82   0.000    -.9838488   -.4881366
  (average vs fair)  |  -.8125519   .1214278    -6.69   0.000    -1.050546   -.5745577
  (good vs average)  |  -.9733558   .1747996    -5.57   0.000    -1.315957   -.6307549
(excellent vs good)  |  -.5581509   .2543835    -2.19   0.028    -1.056733   -.0595684
--------------------------------------------------------------------------------------

.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam

Announcement

Explain categorical variable - OrderedProbitModel

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment