cmp : Bivariate ordered probit and conditional probabilities

Olivier Supplisson

Join Date: Mar 2016

Posts: 5
#1

cmp : Bivariate ordered probit and conditional probabilities

15 Apr 2016, 09:27

Dear all,

I regress a bivariate ordered model using cmp. After my regression, I would like to compute the conditional probabilities Pr(y2=x|y1=z), with y1 the first equation and y2 the second one and x,z two real numbers; same thing for conditional expectations.

I know that cmp allows to compute such things (see http://www.statalist.org/forums/foru...d-expectations or the help section in stata).

I used the coma : "predict pr(3 3) eq(result) cond(2 2, eq(Wintg))" (with result=y2 and Wintg=y1) however it failed (error message : "something required"). I think i've missed a point (maybe more !) in the syntax.

Does anybody have any advice ?
Thanks,
Olivier
Tags: None
David Roodman

Join Date: Jul 2014

Posts: 470
#2

17 Apr 2016, 06:03

Oh, this is an interesting question. What you need to do is condition on the hypothesized latent variable in the Wintg equation being between the associated cut points that cmp estimated. Assuming the Wintg equation is second in the cmp command line, do something like this:

Code:

predict X, outcome(#3) eq(result) cond(b[/cut_2_1] b[/cut_2_2], eq(Wintg))

Here is a full example:

Code:

webuse supDem egen priceO = cut(price), at(25 27 31 33 35 37 39) icodes egen quantityO = cut(quantity), at(5 7 9 11 13 15 17 19 21) icodes cmp (price: priceO = pcompete income) (quantity: quantityO = praw), ind($cmp_oprobit $cmp_oprobit) nolr qui predict X, outcome(#2) eq(price) cond(_b[/cut_2_1] _b[/cut_2_2], eq(quantity))

Last edited by David Roodman; 17 Apr 2016, 06:07.
Comment
Olivier Supplisson

Join Date: Mar 2016

Posts: 5
#3

17 Apr 2016, 07:44

It works ! Thank you so much...
I hope your answer will help someone else in the future.

Don't forget to modify the label after the comma. The proba take the label "Pr(y2=x)" and not "Pr(y2=x|...)"

For the last cut point (which is assumed to be + infinity), I used a point, for example (assuming cut_2_3=+ infty):

Code:

predict X, outcome(#1) eq(result) cond(_b[/cut_1_2] ., eq(Wintg))

I think it's ok but if you could confirm it.
Comment
David Roodman

Join Date: Jul 2014

Posts: 470
#4

17 Apr 2016, 11:56

Yes, that's exactly right Olivier. "." will mean plus or minus infinity here, as appropriate.
Comment
Olivier Supplisson

Join Date: Mar 2016

Posts: 5
#5

18 Apr 2016, 12:02

I have noticed something troubling.
assuming y1=j if c_1_1<y1*<c_1_2

I wanted Pr[y2=k|y1=j], which, translated into cmp syntaxe was Pr[y2=k|c_1_1<y1*<c_1_2]. However, cmp did not change the value of y1 in the equation y2.

More clearly, if I try to compute Pr[y2=k|y1=j] with y1=a in y2, using the coma, I have conditioned by "j" while y1 still takes the value "a".

If I change manually y1=a into y1=j before computing the proba, I have a big delta between the two proba (the one given by cmp with no change in y1 and the last one that I compute changing manually y1). I think the first one is correct (given the results I produced) but I prefere report that issue which could be disturbing.

I hope you'll have an answer.

Thank you in advance.
Comment
David Roodman

Join Date: Jul 2014

Posts: 470
#6

18 Apr 2016, 17:24

Oh, if y1 is in the y2 equation (I think that's not clear from the earlier post), then you're right there is a problem. cmp will still view the system as seemingly unrelated rather than two-stage. I think manually editing y1 after estimation is the right thing to do. Editing y2 shouldn't matter.
Comment
Oliver Mann

Join Date: May 2019

Posts: 8
#7

22 May 2019, 11:00

Please I have an issue in calculation the conditional probabilities for my model. Its a similar case like Olivier just that I have (probit oprobit) combination.
I want to predict the prob of being in the labour force(which is 0 or 1, probit) given your latent health condtion(0 1 2 3 4, oprobit)
I have modified the code you gave by am not getting any results.
this is what I did , predict pr(0 1) eq(lfp) cond(_b[/cut_2_1], eq(gh)) where lfp=labour force participation and gh is general latent health.

I hope to get some assistance.

Thank you.
Oliver
David Roodman
Comment
David Roodman

Join Date: Jul 2014

Posts: 470
#8

22 May 2019, 11:23

Oliver, "not getting any results" is rather vague. Without know precisely what output you get, it is rather hard to diagnose the problem.

Not that the cond() suboption requires a pair of numbers, not just one number before the comma.
Comment
Oliver Mann

Join Date: May 2019

Posts: 8
#9

22 May 2019, 11:58

please i estimated this model
cmp (lfp= gh# child04 child514 marriedchild04 marriedchild514 age55pls age age_sqrd married degree emphist unemphist bluecollar spouse_lf capcity pmnnltinc || xwaveid (gh= lfp# hthcond phys smoker hevydrnk physfunx age age_sqrd married degree emphist unemphist bluecollar spouse_lf capcity pmnnltinc || xwaveid if sex==1, ind($cmp_probit $cmp_oprobit)

And i want to predict the probabilities of joining the labour force given the health condition.
please find attached the cut of point and the kind of probs i want to generate.

Thank you
Attached Files
Comment
Oliver Mann

Join Date: May 2019

Posts: 8
#10

22 May 2019, 12:02

Please the cut off points corresponds to health(oprobit) o=poor, 1=good, ....... 4=excellent

Last edited by Oliver Mann; 22 May 2019, 12:11.
Comment
Oliver Mann

Join Date: May 2019

Posts: 8
#11

22 May 2019, 22:15

I tried predict pr(0 1) eq(lfp) cond(_b[/cut_2_1] _b[/cut_2_2], eq(gh))
However, it gives me either probs of (0 and 1) or (1 and . ) I am not getting values between 0 and 1

Is there something wrong with my model or postestimation code?

How can i fix this?

Thank you
Oliver

David Roodman
Comment

David Roodman

Join Date: Jul 2014
Posts: 470

#12

23 May 2019, 07:20

What you are doing here is very ambitious: simultaneous equations with dependence on latent variables, bivariate random effects. I am impressed it converges!

At any rate, since I did something similar on a simulated data set and it seems to be working for me:

Code:

set seed 1234567
drop _all
scalar T = 10
mat V1 = 1, .5, 1
mat V2 = 1, .5, 1
scalar N = 100
set obs `=N'
gen int id = _n
drawnorm e1_1 e1_2, cov(V1) cstor(lower)
expand T
bysort id: gen int year =_n
xtset id year
drawnorm e2_1 e2_2, cov(V2) cstor(lower)
gen x1 = 4*runiform()-2
gen x2 = 4*runiform()-2
gen byte y1 = 1*x1 + e1_1 + e2_1 > 0
gen double ystar2 = 1*x2 + e1_2 + e2_2
egen byte y2 = cut(ystar2), group(4) icode

cmp setup
cmp (y1 = y2# x1 || id: ) (y2 = y1# x2 || id:), ind($cmp_probit $cmp_oprobit) nolr

predict p, pr(0 1) eq(y1) cond(_b[/cut_2_1] _b[/cut_2_2], eq(y2))

sum p, detail

Here's what I get:

Code:

. set seed 1234567

. drop _all

. scalar T = 10

. mat V1 = 1, .5, 1

. mat V2 = 1, .5, 1

. scalar N = 100

. set obs `=N'
number of observations (_N) was 0, now 100

. gen int id = _n

. drawnorm e1_1 e1_2, cov(V1) cstor(lower)

. expand T
(900 observations created)

. bysort id: gen int year =_n

. xtset id year
       panel variable:  id (strongly balanced)
        time variable:  year, 1 to 10
                delta:  1 unit

. drawnorm e2_1 e2_2, cov(V2) cstor(lower)

. gen x1 = 4*runiform()-2

. gen x2 = 4*runiform()-2

. gen byte y1 = 1*x1 + e1_1 + e2_1 > 0

. gen double ystar2 = 1*x2 + e1_2 + e2_2

. egen byte y2 = cut(ystar2), group(4) icode

. 
. cmp setup
$cmp_out      = 0
$cmp_missing  = .
$cmp_cont     = 1
$cmp_left     = 2
$cmp_right    = 3
$cmp_probit   = 4
$cmp_oprobit  = 5
$cmp_mprobit  = 6
$cmp_int      = 7
$cmp_trunc    = 8  (deprecated)
$cmp_roprobit = 9
$cmp_frac     = 10

. cmp (y1 = y2# x1 || id: ) (y2 = y1# x2 || id:), ind($cmp_probit $cmp_oprobit) nolr

For quadrature, defaulting to technique(bhhh) for speed.

Fitting individual models as starting point for full model fit.
Note: For programming reasons, these initial estimates may deviate from your specification.
      For exact fits of each equation alone, run cmp separately on each.

Iteration 0:   log likelihood = -692.49904  
Iteration 1:   log likelihood = -525.40634  
Iteration 2:   log likelihood = -524.63774  
Iteration 3:   log likelihood =  -524.6373  
Iteration 4:   log likelihood =  -524.6373  

Probit regression                               Number of obs     =      1,000
                                                LR chi2(1)        =     335.72
                                                Prob > chi2       =     0.0000
Log likelihood =  -524.6373                     Pseudo R2         =     0.2424

------------------------------------------------------------------------------
          y1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .7304777   .0439394    16.62   0.000      .644358    .8165974
       _cons |  -.0975815   .0447102    -2.18   0.029     -.185212   -.0099511
------------------------------------------------------------------------------

Iteration 0:   log likelihood = -1386.2944  
Iteration 1:   log likelihood = -1163.8982  
Iteration 2:   log likelihood = -1162.9176  
Iteration 3:   log likelihood = -1162.9175  
Iteration 4:   log likelihood = -1162.9175  

Ordered probit regression                       Number of obs     =      1,000
                                                LR chi2(1)        =     446.75
                                                Prob > chi2       =     0.0000
Log likelihood = -1162.9175                     Pseudo R2         =     0.1611

------------------------------------------------------------------------------
     _cmp_y2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x2 |   .6938482   .0339824    20.42   0.000     .6272439    .7604525
-------------+----------------------------------------------------------------
       /cut1 |  -.8644666   .0494438                     -.9613746   -.7675586
       /cut2 |   .0257159   .0436871                     -.0599094    .1113411
       /cut3 |   .9120487   .0499275                      .8141927    1.009905
------------------------------------------------------------------------------

Fitting full model.
Random effects/coefficients modeled with Gauss-Hermite quadrature with 12 integration points.

Iteration 0:   log likelihood = -1497.4008  
Iteration 1:   log likelihood = -1400.4725  
Iteration 2:   log likelihood = -1387.3448  
Iteration 3:   log likelihood = -1385.2232  

Performing Naylor-Smith adaptive quadrature.
Iteration 4:   log likelihood = -1385.7041  (backed up)
Iteration 5:   log likelihood = -1385.2916  
Iteration 6:   log likelihood = -1385.1946  
Iteration 7:   log likelihood = -1385.1693  
Iteration 8:   log likelihood = -1385.1604  
Iteration 9:   log likelihood = -1385.1567  
Iteration 10:  log likelihood = -1385.1549  
Iteration 11:  log likelihood = -1385.1539  

Adaptive quadrature points fixed.
Iteration 12:  log likelihood = -1385.1533  
Iteration 13:  log likelihood =  -1385.153  
Iteration 14:  log likelihood = -1385.1527  
Iteration 15:  log likelihood = -1385.1525  
Iteration 16:  log likelihood = -1385.1524  
Iteration 17:  log likelihood = -1385.1524  
Iteration 18:  log likelihood = -1385.1523  
Iteration 19:  log likelihood = -1385.1523  
Iteration 20:  log likelihood = -1385.1523  
Iteration 21:  log likelihood = -1385.1522  
Iteration 22:  log likelihood = -1385.1522  

Mixed-process multilevel regression             Number of obs     =      1,000
                                                Wald chi2(2)      =     619.57
Log likelihood = -1385.1522                     Prob > chi2       =     0.0000

-------------------------------------------------------------------------------
              |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
y1            |
          y2# |   .0185934   .0443063     0.42   0.675    -.0682454    .1054323
           x1 |   1.081506   .0725697    14.90   0.000     .9392716    1.223739
        _cons |  -.1392884   .1166638    -1.19   0.233    -.3679452    .0893684
--------------+----------------------------------------------------------------
y2            |
          y1# |   .0659277   .0343982     1.92   0.055    -.0014915    .1333469
           x2 |   1.074331   .0501005    21.44   0.000     .9761358    1.172526
--------------+----------------------------------------------------------------
     /cut_2_1 |  -1.317102   .1294313   -10.18   0.000    -1.570782   -1.063421
     /cut_2_2 |   .0345621   .1199894     0.29   0.773    -.2006129     .269737
     /cut_2_3 |   1.376171   .1290529    10.66   0.000     1.123232     1.62911
   /lnsig_1_1 |   .0232638   .1092033     0.21   0.831    -.1907707    .2372984
   /lnsig_1_2 |   .0723423   .0883167     0.82   0.413    -.1007551    .2454398
/atanhrho_~12 |   .5479727   .1340553     4.09   0.000     .2852292    .8107163
 /atanhrho_12 |    .503686   .0936402     5.38   0.000     .3201545    .6872175
-------------------------------------------------------------------------------
------------------------------------------------------------------------------------
Random effects parameters           |  Estimate    Std. Err.    [95% Conf. Interval]
------------------------------------+-----------------------------------------------
Level: id                           |
  y1                                |
    Standard deviations             | 
      _cons                         |  1.023537    .1117736      .826322    1.267819
  y2                                |
    Standard deviations             | 
      _cons                         |  1.075023    .0949425     .9041544    1.278183
 Cross-eq correlation               | 
  y1              y2                |
    _cons           _cons           |  .4989993    .1006755     .2777379    .6699852
------------------------------------+-----------------------------------------------
Level: Observations                 |
 Standard deviations                | 
  y1                                |         1  (constrained)
  y2                                |         1  (constrained)
 Cross-eq correlation               | 
  y1              y2                |   .465011    .0733919     .3096466    .5961915
------------------------------------------------------------------------------------

. 
. predict p, pr(0 1) eq(y1) cond(_b[/cut_2_1] _b[/cut_2_2], eq(y2))

. 
. sum p, detail

                              p
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .3749889       .3721346
 5%     .3787938       .3727448
10%     .3814831       .3729718       Obs               1,000
25%     .3872677       .3735883       Sum of Wgt.       1,000

50%     .3956834                      Mean           .3952094
                        Largest       Std. Dev.       .009846
75%     .4037716        .412547
90%      .407833       .4126066       Variance       .0000969
95%     .4098255       .4128706       Skewness      -.1777721
99%     .4121519       .4129514       Kurtosis       2.000912

If you send me your precise data and code, I can look at the problem more closely.
--David

Comment

Oliver Mann

Join Date: May 2019

Posts: 8
#13

23 May 2019, 09:33

Hi David, I tried your example and it worked. But I still don't get results with my data. Probably the issue is with the data. Please how can I send my data and code to you? David Roodman

Thank you David,

Oliver

Last edited by Oliver Mann; 23 May 2019, 10:28.
Comment
David Roodman

Join Date: Jul 2014

Posts: 470
#14

23 May 2019, 10:38

https://davidroodman.com/contact/
Comment
Oliver Mann

Join Date: May 2019

Posts: 8
#15

23 May 2019, 11:14

Hi David Roodman,

Please I have sent the code and data as requested.

Kind regards,
Oliver
Comment

Announcement

cmp : Bivariate ordered probit and conditional probabilities

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment