Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • cmp yields different results in Mac and Windows

    Hi everyone,

    I am using cmp to run an ordered probit model with a right-censored dependent variable and complex data. I get different results in a Stata 11.2 (Mac) and Stata 12.1 (Windows), particularly in the arguments of the F test. Here a simplified version of the model with only one explanatory variable. Variable educ2 takes 3 values, it is censored for those still studying (stud==1), status and age1 are dummy variables.

    Using Stata 11.2 in a Mac:



    Code:
    . cmp (educ2 = status), ind("cond(stud==1, $cmp_right, $cmp_oprobit)") svy subpop(age1) qui
    
    Fitting individual models as starting point for full model fit.
    Note: For programming reasons, these initial estimates may deviate from your specification.
          For exact fits of each equation alone, run cmp separately on each.
    
    Fitting full model.
    
    Mixed-process regression
    
    Number of strata   =         8                  Number of obs      =     46705
    Number of PSUs     =      4396                  Population size    =  14309427
                                                    Subpop. no. of obs =     10789
                                                    Subpop. size       = 3602626.7
                                                    Design df          =      4388
                                                    F(   0,   4388)    =         .
                                                    Prob > F           =         .
    
    ------------------------------------------------------------------------------
                 |             Linearized
           educ2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    educ2        |
          status |   1.09e+07    5593122     1.95   0.051    -60085.23    2.19e+07
    -------------+----------------------------------------------------------------
        /cut_1_1 |  -1.28e+08    2469636   -51.94   0.000    -1.33e+08   -1.23e+08
        /cut_1_2 |   -5705095    2910510    -1.96   0.050    -1.14e+07    972.5065
        /lnsig_1 |   18.70615   .0058611  3191.60   0.000     18.69466    18.71764
    -------------+----------------------------------------------------------------
           sig_1 |   1.33e+08     779749                      1.32e+08    1.35e+08
    ------------------------------------------------------------------------------


    Using Stata 12.2 in a Windows:

    Code:
    . cmp (educ2 = status), ind("cond(stud==1, $cmp_right, $cmp_oprobit)") svy subpop(age1) qui
    
    Fitting individual models as starting point for full model fit.
    Note: For programming reasons, these initial estimates may deviate from your specification.
          For exact fits of each equation alone, run cmp separately on each.
    
    Fitting full model.
    
    Mixed-process regression
    
    Number of strata   =         8                  Number of obs      =     46705
    Number of PSUs     =      4396                  Population size    =  14309427
                                                    Subpop. no. of obs =     10789
                                                    Subpop. size       = 3602626.7
                                                    Design df          =     4388
                                                    F(   1,   4388)    =     3.80
                                                    Prob > F           =      0.0513
    
    ------------------------------------------------------------------------------
                 |             Linearized
           educ2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    educ2        |
          status |   1.11e+07    5704049    1.95   0.051    -61287.73    2.23e+07
    -------------+----------------------------------------------------------------
        /cut_1_1 |  -1.31e+08    2404031   -54.41   0.000    -1.36e+08   -1.26e+08
        /cut_1_2 |   -5818216   2957274    -1.97    0.049    -1.16e+07    -20465.59
        /lnsig_1 |   18.72579   .0064207  2916.49   0.000     18.7132    18.73838
    -------------+----------------------------------------------------------------
           sig_1 |   1.36e+08     871138.6                      1.34e+08    1.37e+08
    ------------------------------------------------------------------------------


    In both computers, this is the version of the command :

    Code:
    . which cmp
    /Users/celiavera/Library/Application Support/Stata/ado/plus/c/cmp.ado
    *! cmp 6.9.8 27 February 2016
    *! Copyright (C) 2007-16 David Roodman
    *! Version history at bottom


    If I remove the subpop() option or if I use $cmp_cont instead of $cmp_oprobit, the discrepancy disappears.
    I appreciate very much your help.

    Celia P.


  • #2
    What happens if you rescale status (divide by 1000)? Looks like you may be running into some precision problems, although it seems strange that the F-test should differ in df.
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      You have confounding in your example. It is possible that the underlying algorithms to fit the model are different across the two versions of Stata.

      Comment


      • #4
        Nah, it is exactly the same code on both machines, assuming there has been no problem with updates that would cause an old mlib file to be lying around. I think Carole is on the right track. I have seen a difference like this before with the poisson command. In that case, the solution was dividing the weights by a million, and the underlying problem was differences in Stata's precision. It appears that your data are weighted here too, but I guess in way where scaling down the weights would change the results? In that case, definitely look at rescaling variables.

        Comment


        • #5
          Thanks everyone for your inputs. I rescaled status (divided by 1000 as Carole suggested). It took more than 8 hours in both machines. Convergence was not achieved in either machine.

          Code:
            . gen status1=status/1000
            
            . cmp (educ2 = status1), ind("cond(stud==1, $cmp_right, $cmp_oprobit)") svy subpop(edad1) qui 
            
            Fitting individual models as starting point for full model fit.
            Note: For programming reasons, these initial estimates may deviate from your specification.
            For exact fits of each equation alone, run cmp separately on each.
            
            Fitting full model.
            convergence not achieved
            
            Mixed-process regression
            
            Number of strata   =         8                  Number of obs      =     46705
            Number of PSUs     =      4396                Population size    =  14309427
                                                                       Subpop. no. of obs =     10789
                                                                       Subpop. size       = 3602626.7
                                                                       Design df          =      4388
                                                                       F(   0,   4388)    =         .
                                                                       Prob > F           =         .
            
            
            Linearized
            educ2        Coef.    Std. Err.       t     P>t      [95% Conf. Interval]
            
            educ2        
            status1    5.79e+09          .        .       .            .           .
            
            /cut_1_1   -6.81e+07   3.55e+07    -1.92   0.055    -1.38e+08     1460502
            /cut_1_2    -3028494    2545896    -1.19   0.234     -8019736     1962748
            /lnsig_1    18.07286   .5122942    35.28   0.000      17.0685    19.07721
            
            sig_1    7.06e+07   3.62e+07                      2.59e+07    1.93e+08
            
            convergence not achieved
            convergence not achieved
            r(430);


          I have also tried with the additional explanatory variables (without rescaling them) and convergence is not achieved. I know I could add the tolerance options to facilitate convergence but given the amount of time it takes I wonder if there is a mistake in my code. My dependent variable is level of education (3 ordered leves). This variable is right-censored if the individual is still studying (stud=1).
          Any other suggestion on how to estimate an ordered probit with censored variable ?
          I appreciate your help.

          Celia P.

          Comment


          • #6
            Well, there is no guarantee of convergence in a lot of these models. It is possible for the command to be correctly fitting the model and for it not to converge. This would indicate that the model is not a good fit to the data. The fact that the computers are now agreeing reinforces this view. Possibly adding more covariates would help. And if there's anyway to reduce the sample while still generating the behavior, then you can do some trial and error more quickly. If you take out the "qui" option then you can observe the iteration log and probably start to get a sense of whether its converging before it actually does.
            Last edited by David Roodman; 21 Mar 2016, 07:30.

            Comment


            • #7
              Dear Celia,

              I also think this is a numerical problem caused by the scale of status, but I would suggest multiplying it by 1000 or even 1000000. My reasoning is that in the original model the coefficient on status is of order 1e+07, which is huge. If you multiply status by a large number its coefficient should be more reasonable.

              Anyway, showing us some descriptive statistics for status using the same weights used in the estimation would allow us to say more.

              All the best,

              Joao

              Comment


              • #8
                Yeah, it can't hurt to try dividing by a million. And you can also experiment with nrtol and nonrtol options, as you say.

                Comment

                Working...
                X