Running multinomial probit regressions with multiple dependent variable options

Max Hammond

Join Date: Sep 2022

Posts: 5
#1

Running multinomial probit regressions with multiple dependent variable options

17 Dec 2022, 22:37

Hi,

This might be more of an advanced econometrics question rather than a Stata question. I have 13 dependent variables [wd_1, wd_2... wd_13] each with five levels (0,1,2,3, 99). These levels are ordinal (except missing). I also have one independent variable [to_whom] with seven levels (0,1,2,3,4,5,6). These levels are also ordinal and have no missing values.

Is using a multinomial probit regression with this many dependent variables with this many levels correct to do? Or would a different model make sense? I believe this model to make sense and wanted to know if the correct code to input for all 13 dependent variables would simply just be putting their names in after global ylist.

I would not try to run this with this many regressions as the number of iterations necessary is quite high. But I wanted to post code to see if this is a correct set up with what I am trying to accomplish.

* Dependent variable has 5 categories denoted 0,1,2,3,99
global ylist wd_1 wd_2 wd_3 wd_4 wd_5 wd_6 wd_7 wd_8 wd_9 wd_10 wd_11 wd_12 wd_13
global xlist to_whom
describe $ylist $xlist
summarize $ylist $xlist
tabulate $ylist
* Multinomial probit model with base outcome the most frequent alternative
mprobit $ylist $xlist
* Multinomial probit with base outcome alternative 2
mprobit $ylist $xlist
* Multinomial probit marginal effects
margins, dydx(*) atmeans predict(pr outcome(0))
margins, dydx(*) atmeans predict(pr outcome(1))
margins, dydx(*) atmeans predict(pr outcome(2))
margins, dydx(*) atmeans predict(pr outcome(3))
margins, dydx(*) atmeans predict(pr outcome(99))

predict p1, outcome(0)
predict p2, outcome(1)
predict p3, outcome(2)
predict p4, outcome(3)
predict p5, outcome(99)

Thanks!
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#2

18 Dec 2022, 00:08

Well, mprobit $ylist $xlist isn't going to give you what you think it will, and tabulate $ylist isn't going to give you anything at all, except an error message.

I'm not sure what your advanced econometrics question is—you haven't stated any research question at all—but if it has something to do with the joint association of various outcomes as measured by individual four-level ordered-categorical variables and a to-whom explanatory variable that is likewise ordered-categorical (?!), then what would be a fundamental objection to something like the following as a first approximation to the answer?

Code:

local wd forvalues i = 1/13 { local wd `wd' wd_`i' } mvdecode `wd' to_whom, mv(99) assert !mi(to_whom) misstable patterns `wd' egen double awd = rowmean(`wd') dotplot awd, over(to_whom) median bar center
1 like
Comment
Max Hammond

Join Date: Sep 2022

Posts: 5
#3

18 Dec 2022, 09:24

Hi Joseph,

Thanks for the reply. Sorry for my incomplete post.

-My research question is: How does trust (“to_whom”) predict the person that is identified (“wd_1... wd_13”). I also will want to see if trust can predict other ordinal-categorical outcomes but was posting here to see if the probit was a good idea and to get a general idea of what the stata code structure might look like.
-As you assumed, the "to_whom" variable is ordered-categorical, yes (1-6). 1 being least trusted and 6 being most trusted. There are no missing values in "to_whom" so I will not decode them.
-The 13 wd variables are ordered-categorical (1-3) and I can decode the missing values with nothing if that is the best thing to do.

Thanks again. Any help provided would be appreciated.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#4

19 Dec 2022, 01:44

Originally posted by Max Hammond View Post

How does trust (“to_whom”) predict the person that is identified (“wd_1... wd_13”).
. . .
-The 13 wd variables are ordered-categorical (1-3) and I can decode the missing values with nothing if that is the best thing to do.

Sorry, I don't understand your data setup. Your description reads as if each wd_# outcome variable identifies one of thirteen people, and so there would be twelve missing-valued outcome variables for each row of data and one with some kind of trait or characteristic of the identified person that's measured on an ordered-categorical scale (I can't tell: originally you said 0–3, but now you're saying 1–3; ditto for the to-whom trust explanatory variable: 0–6 is now 1–6). Or something.

Maybe others on the list who are more familiar with this line of research can suss out what you've got and what you want to do with it.
Comment

Max Hammond

Join Date: Sep 2022
Posts: 5

19 Dec 2022, 22:47

Hi again Joseph, This is a list of the first 20 observations to help you get an idea of what the dataset up looks like. My deepest apologies! If you are confused, you are right to be, I made errors in my initial post without referring to my codebook.

Code:

     +--------------------------------------------------------------------------------------------------------+
     | wd_1   wd_2   wd_3   wd_4   wd_5   wd_6   wd_7   wd_8   wd_9   wd_10   wd_11   wd_12   wd_13   to_whom |
     |--------------------------------------------------------------------------------------------------------|
  1. |    .      .      .      1      .      .      .      .      .       .       .       .       .         2 |
  2. |    .      .      .      .      .      .      .      .      3       .       .       .       1         0 |
  3. |    .      .      .      .      .      .      .      .      .       2       .       .       .         0 |
  4. |    3      .      .      .      .      .      .      .      .       .       .       .       .         1 |
  5. |    2      .      2      .      .      .      .      .      2       2       .       .       .         0 |
     |--------------------------------------------------------------------------------------------------------|
  6. |    3      .      .      .      .      .      .      .      .       .       .       .       .         1 |
  7. |    3      .      .      .      .      0      .      .      .       .       .       2       .         1 |
  8. |    .      .      .      .      .      .      .      .      .       0       .       .       .         6 |
  9. |    3      .      .      3      .      .      .      .      .       .       .       .       .         1 |
 10. |    3      .      .      .      .      .      .      .      .       .       .       .       .         5 |
     |--------------------------------------------------------------------------------------------------------|
 11. |    3      .      .      .      .      .      .      .      .       .       .       .       1         1 |
 12. |    3      .      .      1      .      .      .      .      .       .       .       .       .         3 |
 13. |    3      .      .      .      .      .      .      .      .       .       .       .       .         1 |
 14. |    2      .      .      .      .      2      .      .      2       2       2       .       .         3 |
 15. |    3      .      .      .      .      .      .      .      .       .       .       .       .         5 |
     |--------------------------------------------------------------------------------------------------------|
 16. |    .      .      .      .      .      .      .      .      .       .       .       .       .         3 |
 17. |    2      1      2      .      .      2      2      .      2       2       .       2       .         3 |
 18. |    3      .      .      .      .      .      .      .      .       .       .       .       .         0 |
 19. |    2      .      .      .      .      .      .      .      .       1       .       .       .         3 |
 20. |    2      .      2      .      .      0      .      .      .       1       .       .       .         3 |
     +--------------------------------------------------------------------------------------------------------+

To be clear the 13 wd variables representing those identified are assigned a value which is ordered categorically 0-3 based on level of involvement. This ranges from least involved (0) to most involved (3). The to_whom variable representing trust is also ordered categorically (1-6) from least trustworthy (1) to most trustworthy (6).

I wanted to use the correct method (which I presume to be a multinomial probit) to see whether trust can predict the person identified and to what level they are identified (0-3) but was not sure if that was the correct method for this data set up. Does this make more sense?

Announcement

Running multinomial probit regressions with multiple dependent variable options

Comment

Comment

Comment

Comment