cmp yields different results in Mac and Windows

Celia Vera

Join Date: Feb 2016
Posts: 19

cmp yields different results in Mac and Windows

19 Mar 2016, 14:27

Hi everyone,

I am using cmp to run an ordered probit model with a right-censored dependent variable and complex data. I get different results in a Stata 11.2 (Mac) and Stata 12.1 (Windows), particularly in the arguments of the F test. Here a simplified version of the model with only one explanatory variable. Variable educ2 takes 3 values, it is censored for those still studying (stud==1), status and age1 are dummy variables.

Using Stata 11.2 in a Mac:

Code:

. cmp (educ2 = status), ind("cond(stud==1, $cmp_right, $cmp_oprobit)") svy subpop(age1) qui

Fitting individual models as starting point for full model fit.
Note: For programming reasons, these initial estimates may deviate from your specification.
      For exact fits of each equation alone, run cmp separately on each.

Fitting full model.

Mixed-process regression

Number of strata   =         8                  Number of obs      =     46705
Number of PSUs     =      4396                  Population size    =  14309427
                                                Subpop. no. of obs =     10789
                                                Subpop. size       = 3602626.7
                                                Design df          =      4388
                                                F(   0,   4388)    =         .
                                                Prob > F           =         .

------------------------------------------------------------------------------
             |             Linearized
       educ2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ2        |
      status |   1.09e+07    5593122     1.95   0.051    -60085.23    2.19e+07
-------------+----------------------------------------------------------------
    /cut_1_1 |  -1.28e+08    2469636   -51.94   0.000    -1.33e+08   -1.23e+08
    /cut_1_2 |   -5705095    2910510    -1.96   0.050    -1.14e+07    972.5065
    /lnsig_1 |   18.70615   .0058611  3191.60   0.000     18.69466    18.71764
-------------+----------------------------------------------------------------
       sig_1 |   1.33e+08     779749                      1.32e+08    1.35e+08
------------------------------------------------------------------------------

Using Stata 12.2 in a Windows:

Code:

. cmp (educ2 = status), ind("cond(stud==1, $cmp_right, $cmp_oprobit)") svy subpop(age1) qui

Fitting individual models as starting point for full model fit.
Note: For programming reasons, these initial estimates may deviate from your specification.
      For exact fits of each equation alone, run cmp separately on each.

Fitting full model.

Mixed-process regression

Number of strata   =         8                  Number of obs      =     46705
Number of PSUs     =      4396                  Population size    =  14309427
                                                Subpop. no. of obs =     10789
                                                Subpop. size       = 3602626.7
                                                Design df          =     4388
                                                F(   1,   4388)    =     3.80
                                                Prob > F           =      0.0513

------------------------------------------------------------------------------
             |             Linearized
       educ2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ2        |
      status |   1.11e+07    5704049    1.95   0.051    -61287.73    2.23e+07
-------------+----------------------------------------------------------------
    /cut_1_1 |  -1.31e+08    2404031   -54.41   0.000    -1.36e+08   -1.26e+08
    /cut_1_2 |   -5818216   2957274    -1.97    0.049    -1.16e+07    -20465.59
    /lnsig_1 |   18.72579   .0064207  2916.49   0.000     18.7132    18.73838
-------------+----------------------------------------------------------------
       sig_1 |   1.36e+08     871138.6                      1.34e+08    1.37e+08
------------------------------------------------------------------------------

In both computers, this is the version of the command :

Code:

. which cmp
/Users/celiavera/Library/Application Support/Stata/ado/plus/c/cmp.ado
*! cmp 6.9.8 27 February 2016
*! Copyright (C) 2007-16 David Roodman
*! Version history at bottom

If I remove the subpop() option or if I use $cmp_cont instead of $cmp_oprobit, the discrepancy disappears.
I appreciate very much your help.

Celia P.

Tags: None

Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#2

19 Mar 2016, 14:34

What happens if you rescale status (divide by 1000)? Looks like you may be running into some precision problems, although it seems strange that the F-test should differ in df.

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
1 like
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#3

19 Mar 2016, 18:08

You have confounding in your example. It is possible that the underlying algorithms to fit the model are different across the two versions of Stata.
Comment
David Roodman

Join Date: Jul 2014

Posts: 471
#4

19 Mar 2016, 18:56

Nah, it is exactly the same code on both machines, assuming there has been no problem with updates that would cause an old mlib file to be lying around. I think Carole is on the right track. I have seen a difference like this before with the poisson command. In that case, the solution was dividing the weights by a million, and the underlying problem was differences in Stata's precision. It appears that your data are weighted here too, but I guess in way where scaling down the weights would change the results? In that case, definitely look at rescaling variables.
Comment

Celia Vera

Join Date: Feb 2016
Posts: 19

21 Mar 2016, 01:15

Thanks everyone for your inputs. I rescaled status (divided by 1000 as Carole suggested). It took more than 8 hours in both machines. Convergence was not achieved in either machine.

Code:

  . gen status1=status/1000
  
  . cmp (educ2 = status1), ind("cond(stud==1, $cmp_right, $cmp_oprobit)") svy subpop(edad1) qui 
  
  Fitting individual models as starting point for full model fit.
  Note: For programming reasons, these initial estimates may deviate from your specification.
  For exact fits of each equation alone, run cmp separately on each.
  
  Fitting full model.
  convergence not achieved
  
  Mixed-process regression
  
  Number of strata   =         8                  Number of obs      =     46705
  Number of PSUs     =      4396                Population size    =  14309427
                                                             Subpop. no. of obs =     10789
                                                             Subpop. size       = 3602626.7
                                                             Design df          =      4388
                                                             F(   0,   4388)    =         .
                                                             Prob > F           =         .
  
  
  Linearized
  educ2        Coef.    Std. Err.       t     P>t      [95% Conf. Interval]
  
  educ2        
  status1    5.79e+09          .        .       .            .           .
  
  /cut_1_1   -6.81e+07   3.55e+07    -1.92   0.055    -1.38e+08     1460502
  /cut_1_2    -3028494    2545896    -1.19   0.234     -8019736     1962748
  /lnsig_1    18.07286   .5122942    35.28   0.000      17.0685    19.07721
  
  sig_1    7.06e+07   3.62e+07                      2.59e+07    1.93e+08
  
  convergence not achieved
  convergence not achieved
  r(430);

I have also tried with the additional explanatory variables (without rescaling them) and convergence is not achieved. I know I could add the tolerance options to facilitate convergence but given the amount of time it takes I wonder if there is a mistake in my code. My dependent variable is level of education (3 ordered leves). This variable is right-censored if the individual is still studying (stud=1).
Any other suggestion on how to estimate an ordered probit with censored variable ?
I appreciate your help.

Celia P.

Comment

David Roodman

Join Date: Jul 2014

Posts: 471
#6

21 Mar 2016, 07:27

Well, there is no guarantee of convergence in a lot of these models. It is possible for the command to be correctly fitting the model and for it not to converge. This would indicate that the model is not a good fit to the data. The fact that the computers are now agreeing reinforces this view. Possibly adding more covariates would help. And if there's anyway to reduce the sample while still generating the behavior, then you can do some trial and error more quickly. If you take out the "qui" option then you can observe the iteration log and probably start to get a sense of whether its converging before it actually does.

Last edited by David Roodman; 21 Mar 2016, 07:30.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#7

21 Mar 2016, 13:25

Dear Celia,

I also think this is a numerical problem caused by the scale of status, but I would suggest multiplying it by 1000 or even 1000000. My reasoning is that in the original model the coefficient on status is of order 1e+07, which is huge. If you multiply status by a large number its coefficient should be more reasonable.

Anyway, showing us some descriptive statistics for status using the same weights used in the estimation would allow us to say more.

All the best,

Joao
Comment
David Roodman

Join Date: Jul 2014

Posts: 471
#8

21 Mar 2016, 14:16

Yeah, it can't hurt to try dividing by a million. And you can also experiment with nrtol and nonrtol options, as you say.
Comment

Announcement

cmp yields different results in Mac and Windows

Comment

Comment

Comment

Comment

Comment

Comment

Comment