Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • cmp : Bivariate ordered probit and conditional probabilities

    Dear all,

    I regress a bivariate ordered model using cmp. After my regression, I would like to compute the conditional probabilities Pr(y2=x|y1=z), with y1 the first equation and y2 the second one and x,z two real numbers; same thing for conditional expectations.

    I know that cmp allows to compute such things (see http://www.statalist.org/forums/foru...d-expectations or the help section in stata).

    I used the coma : "predict pr(3 3) eq(result) cond(2 2, eq(Wintg))" (with result=y2 and Wintg=y1) however it failed (error message : "something required"). I think i've missed a point (maybe more !) in the syntax.

    Does anybody have any advice ?
    Thanks,
    Olivier

  • #2
    Oh, this is an interesting question. What you need to do is condition on the hypothesized latent variable in the Wintg equation being between the associated cut points that cmp estimated. Assuming the Wintg equation is second in the cmp command line, do something like this:
    Code:
    predict X, outcome(#3) eq(result) cond(b[/cut_2_1] b[/cut_2_2], eq(Wintg))
    Here is a full example:
    Code:
    webuse supDem
    egen priceO = cut(price), at(25 27 31 33 35 37 39) icodes
    egen quantityO = cut(quantity), at(5 7 9 11 13 15 17 19 21) icodes
    
    cmp (price: priceO = pcompete income) (quantity: quantityO = praw), ind($cmp_oprobit $cmp_oprobit) nolr qui
    
    predict X, outcome(#2) eq(price) cond(_b[/cut_2_1] _b[/cut_2_2], eq(quantity))
    Last edited by David Roodman; 17 Apr 2016, 06:07.

    Comment


    • #3
      It works ! Thank you so much...
      I hope your answer will help someone else in the future.

      Don't forget to modify the label after the comma. The proba take the label "Pr(y2=x)" and not "Pr(y2=x|...)"

      For the last cut point (which is assumed to be + infinity), I used a point, for example (assuming cut_2_3=+ infty):

      Code:
      predict X, outcome(#1) eq(result) cond(_b[/cut_1_2] ., eq(Wintg))
      I think it's ok but if you could confirm it.

      Comment


      • #4
        Yes, that's exactly right Olivier. "." will mean plus or minus infinity here, as appropriate.

        Comment


        • #5
          I have noticed something troubling.
          assuming y1=j if c_1_1<y1*<c_1_2

          I wanted Pr[y2=k|y1=j], which, translated into cmp syntaxe was Pr[y2=k|c_1_1<y1*<c_1_2]. However, cmp did not change the value of y1 in the equation y2.

          More clearly, if I try to compute Pr[y2=k|y1=j] with y1=a in y2, using the coma, I have conditioned by "j" while y1 still takes the value "a".

          If I change manually y1=a into y1=j before computing the proba, I have a big delta between the two proba (the one given by cmp with no change in y1 and the last one that I compute changing manually y1). I think the first one is correct (given the results I produced) but I prefere report that issue which could be disturbing.

          I hope you'll have an answer.

          Thank you in advance.

          Comment


          • #6
            Oh, if y1 is in the y2 equation (I think that's not clear from the earlier post), then you're right there is a problem. cmp will still view the system as seemingly unrelated rather than two-stage. I think manually editing y1 after estimation is the right thing to do. Editing y2 shouldn't matter.

            Comment


            • #7
              Please I have an issue in calculation the conditional probabilities for my model. Its a similar case like Olivier just that I have (probit oprobit) combination.
              I want to predict the prob of being in the labour force(which is 0 or 1, probit) given your latent health condtion(0 1 2 3 4, oprobit)
              I have modified the code you gave by am not getting any results.
              this is what I did , predict pr(0 1) eq(lfp) cond(_b[/cut_2_1], eq(gh)) where lfp=labour force participation and gh is general latent health.

              I hope to get some assistance.

              Thank you.
              Oliver
              David Roodman

              Comment


              • #8
                Oliver, "not getting any results" is rather vague. Without know precisely what output you get, it is rather hard to diagnose the problem.

                Not that the cond() suboption requires a pair of numbers, not just one number before the comma.

                Comment


                • #9
                  please i estimated this model
                  cmp (lfp= gh# child04 child514 marriedchild04 marriedchild514 age55pls age age_sqrd married degree emphist unemphist bluecollar spouse_lf capcity pmnnltinc || xwaveid (gh= lfp# hthcond phys smoker hevydrnk physfunx age age_sqrd married degree emphist unemphist bluecollar spouse_lf capcity pmnnltinc || xwaveid if sex==1, ind($cmp_probit $cmp_oprobit)

                  And i want to predict the probabilities of joining the labour force given the health condition.
                  please find attached the cut of point and the kind of probs i want to generate.

                  Thank you
                  Attached Files

                  Comment


                  • #10
                    Please the cut off points corresponds to health(oprobit) o=poor, 1=good, ....... 4=excellent
                    Last edited by Oliver Mann; 22 May 2019, 12:11.

                    Comment


                    • #11
                      I tried predict pr(0 1) eq(lfp) cond(_b[/cut_2_1] _b[/cut_2_2], eq(gh))
                      However, it gives me either probs of (0 and 1) or (1 and . ) I am not getting values between 0 and 1

                      Is there something wrong with my model or postestimation code?

                      How can i fix this?

                      Thank you
                      Oliver

                      David Roodman

                      Comment


                      • #12
                        What you are doing here is very ambitious: simultaneous equations with dependence on latent variables, bivariate random effects. I am impressed it converges!

                        At any rate, since I did something similar on a simulated data set and it seems to be working for me:
                        Code:
                        set seed 1234567
                        drop _all
                        scalar T = 10
                        mat V1 = 1, .5, 1
                        mat V2 = 1, .5, 1
                        scalar N = 100
                        set obs `=N'
                        gen int id = _n
                        drawnorm e1_1 e1_2, cov(V1) cstor(lower)
                        expand T
                        bysort id: gen int year =_n
                        xtset id year
                        drawnorm e2_1 e2_2, cov(V2) cstor(lower)
                        gen x1 = 4*runiform()-2
                        gen x2 = 4*runiform()-2
                        gen byte y1 = 1*x1 + e1_1 + e2_1 > 0
                        gen double ystar2 = 1*x2 + e1_2 + e2_2
                        egen byte y2 = cut(ystar2), group(4) icode
                        
                        cmp setup
                        cmp (y1 = y2# x1 || id: ) (y2 = y1# x2 || id:), ind($cmp_probit $cmp_oprobit) nolr
                        
                        predict p, pr(0 1) eq(y1) cond(_b[/cut_2_1] _b[/cut_2_2], eq(y2))
                        
                        sum p, detail
                        Here's what I get:
                        Code:
                        . set seed 1234567
                        
                        . drop _all
                        
                        . scalar T = 10
                        
                        . mat V1 = 1, .5, 1
                        
                        . mat V2 = 1, .5, 1
                        
                        . scalar N = 100
                        
                        . set obs `=N'
                        number of observations (_N) was 0, now 100
                        
                        . gen int id = _n
                        
                        . drawnorm e1_1 e1_2, cov(V1) cstor(lower)
                        
                        . expand T
                        (900 observations created)
                        
                        . bysort id: gen int year =_n
                        
                        . xtset id year
                               panel variable:  id (strongly balanced)
                                time variable:  year, 1 to 10
                                        delta:  1 unit
                        
                        . drawnorm e2_1 e2_2, cov(V2) cstor(lower)
                        
                        . gen x1 = 4*runiform()-2
                        
                        . gen x2 = 4*runiform()-2
                        
                        . gen byte y1 = 1*x1 + e1_1 + e2_1 > 0
                        
                        . gen double ystar2 = 1*x2 + e1_2 + e2_2
                        
                        . egen byte y2 = cut(ystar2), group(4) icode
                        
                        . 
                        . cmp setup
                        $cmp_out      = 0
                        $cmp_missing  = .
                        $cmp_cont     = 1
                        $cmp_left     = 2
                        $cmp_right    = 3
                        $cmp_probit   = 4
                        $cmp_oprobit  = 5
                        $cmp_mprobit  = 6
                        $cmp_int      = 7
                        $cmp_trunc    = 8  (deprecated)
                        $cmp_roprobit = 9
                        $cmp_frac     = 10
                        
                        . cmp (y1 = y2# x1 || id: ) (y2 = y1# x2 || id:), ind($cmp_probit $cmp_oprobit) nolr
                        
                        For quadrature, defaulting to technique(bhhh) for speed.
                        
                        Fitting individual models as starting point for full model fit.
                        Note: For programming reasons, these initial estimates may deviate from your specification.
                              For exact fits of each equation alone, run cmp separately on each.
                        
                        Iteration 0:   log likelihood = -692.49904  
                        Iteration 1:   log likelihood = -525.40634  
                        Iteration 2:   log likelihood = -524.63774  
                        Iteration 3:   log likelihood =  -524.6373  
                        Iteration 4:   log likelihood =  -524.6373  
                        
                        Probit regression                               Number of obs     =      1,000
                                                                        LR chi2(1)        =     335.72
                                                                        Prob > chi2       =     0.0000
                        Log likelihood =  -524.6373                     Pseudo R2         =     0.2424
                        
                        ------------------------------------------------------------------------------
                                  y1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                                  x1 |   .7304777   .0439394    16.62   0.000      .644358    .8165974
                               _cons |  -.0975815   .0447102    -2.18   0.029     -.185212   -.0099511
                        ------------------------------------------------------------------------------
                        
                        Iteration 0:   log likelihood = -1386.2944  
                        Iteration 1:   log likelihood = -1163.8982  
                        Iteration 2:   log likelihood = -1162.9176  
                        Iteration 3:   log likelihood = -1162.9175  
                        Iteration 4:   log likelihood = -1162.9175  
                        
                        Ordered probit regression                       Number of obs     =      1,000
                                                                        LR chi2(1)        =     446.75
                                                                        Prob > chi2       =     0.0000
                        Log likelihood = -1162.9175                     Pseudo R2         =     0.1611
                        
                        ------------------------------------------------------------------------------
                             _cmp_y2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                                  x2 |   .6938482   .0339824    20.42   0.000     .6272439    .7604525
                        -------------+----------------------------------------------------------------
                               /cut1 |  -.8644666   .0494438                     -.9613746   -.7675586
                               /cut2 |   .0257159   .0436871                     -.0599094    .1113411
                               /cut3 |   .9120487   .0499275                      .8141927    1.009905
                        ------------------------------------------------------------------------------
                        
                        Fitting full model.
                        Random effects/coefficients modeled with Gauss-Hermite quadrature with 12 integration points.
                        
                        Iteration 0:   log likelihood = -1497.4008  
                        Iteration 1:   log likelihood = -1400.4725  
                        Iteration 2:   log likelihood = -1387.3448  
                        Iteration 3:   log likelihood = -1385.2232  
                        
                        Performing Naylor-Smith adaptive quadrature.
                        Iteration 4:   log likelihood = -1385.7041  (backed up)
                        Iteration 5:   log likelihood = -1385.2916  
                        Iteration 6:   log likelihood = -1385.1946  
                        Iteration 7:   log likelihood = -1385.1693  
                        Iteration 8:   log likelihood = -1385.1604  
                        Iteration 9:   log likelihood = -1385.1567  
                        Iteration 10:  log likelihood = -1385.1549  
                        Iteration 11:  log likelihood = -1385.1539  
                        
                        Adaptive quadrature points fixed.
                        Iteration 12:  log likelihood = -1385.1533  
                        Iteration 13:  log likelihood =  -1385.153  
                        Iteration 14:  log likelihood = -1385.1527  
                        Iteration 15:  log likelihood = -1385.1525  
                        Iteration 16:  log likelihood = -1385.1524  
                        Iteration 17:  log likelihood = -1385.1524  
                        Iteration 18:  log likelihood = -1385.1523  
                        Iteration 19:  log likelihood = -1385.1523  
                        Iteration 20:  log likelihood = -1385.1523  
                        Iteration 21:  log likelihood = -1385.1522  
                        Iteration 22:  log likelihood = -1385.1522  
                        
                        Mixed-process multilevel regression             Number of obs     =      1,000
                                                                        Wald chi2(2)      =     619.57
                        Log likelihood = -1385.1522                     Prob > chi2       =     0.0000
                        
                        -------------------------------------------------------------------------------
                                      |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        --------------+----------------------------------------------------------------
                        y1            |
                                  y2# |   .0185934   .0443063     0.42   0.675    -.0682454    .1054323
                                   x1 |   1.081506   .0725697    14.90   0.000     .9392716    1.223739
                                _cons |  -.1392884   .1166638    -1.19   0.233    -.3679452    .0893684
                        --------------+----------------------------------------------------------------
                        y2            |
                                  y1# |   .0659277   .0343982     1.92   0.055    -.0014915    .1333469
                                   x2 |   1.074331   .0501005    21.44   0.000     .9761358    1.172526
                        --------------+----------------------------------------------------------------
                             /cut_2_1 |  -1.317102   .1294313   -10.18   0.000    -1.570782   -1.063421
                             /cut_2_2 |   .0345621   .1199894     0.29   0.773    -.2006129     .269737
                             /cut_2_3 |   1.376171   .1290529    10.66   0.000     1.123232     1.62911
                           /lnsig_1_1 |   .0232638   .1092033     0.21   0.831    -.1907707    .2372984
                           /lnsig_1_2 |   .0723423   .0883167     0.82   0.413    -.1007551    .2454398
                        /atanhrho_~12 |   .5479727   .1340553     4.09   0.000     .2852292    .8107163
                         /atanhrho_12 |    .503686   .0936402     5.38   0.000     .3201545    .6872175
                        -------------------------------------------------------------------------------
                        ------------------------------------------------------------------------------------
                        Random effects parameters           |  Estimate    Std. Err.    [95% Conf. Interval]
                        ------------------------------------+-----------------------------------------------
                        Level: id                           |
                          y1                                |
                            Standard deviations             | 
                              _cons                         |  1.023537    .1117736      .826322    1.267819
                          y2                                |
                            Standard deviations             | 
                              _cons                         |  1.075023    .0949425     .9041544    1.278183
                         Cross-eq correlation               | 
                          y1              y2                |
                            _cons           _cons           |  .4989993    .1006755     .2777379    .6699852
                        ------------------------------------+-----------------------------------------------
                        Level: Observations                 |
                         Standard deviations                | 
                          y1                                |         1  (constrained)
                          y2                                |         1  (constrained)
                         Cross-eq correlation               | 
                          y1              y2                |   .465011    .0733919     .3096466    .5961915
                        ------------------------------------------------------------------------------------
                        
                        . 
                        . predict p, pr(0 1) eq(y1) cond(_b[/cut_2_1] _b[/cut_2_2], eq(y2))
                        
                        . 
                        . sum p, detail
                        
                                                      p
                        -------------------------------------------------------------
                              Percentiles      Smallest
                         1%     .3749889       .3721346
                         5%     .3787938       .3727448
                        10%     .3814831       .3729718       Obs               1,000
                        25%     .3872677       .3735883       Sum of Wgt.       1,000
                        
                        50%     .3956834                      Mean           .3952094
                                                Largest       Std. Dev.       .009846
                        75%     .4037716        .412547
                        90%      .407833       .4126066       Variance       .0000969
                        95%     .4098255       .4128706       Skewness      -.1777721
                        99%     .4121519       .4129514       Kurtosis       2.000912
                        If you send me your precise data and code, I can look at the problem more closely.
                        --David

                        Comment


                        • #13
                          Hi David, I tried your example and it worked. But I still don't get results with my data. Probably the issue is with the data. Please how can I send my data and code to you? David Roodman

                          Thank you David,

                          Oliver
                          Last edited by Oliver Mann; 23 May 2019, 10:28.

                          Comment


                          • #14
                            https://davidroodman.com/contact/

                            Comment


                            • #15
                              Hi David Roodman,

                              Please I have sent the code and data as requested.

                              Kind regards,
                              Oliver

                              Comment

                              Working...
                              X