cmp: Simulated Maximum Likelihood + Tobit

Tim Sadler

Join Date: Jun 2023

Posts: 3
#1

cmp: Simulated Maximum Likelihood + Tobit

29 Jun 2023, 06:58

Hello Stata-Experts,

I am trying to estimate a left-censored random-effects Tobit Model with cmp, but I do have a couple of questions on how to set it up correctly.

Dataset: I am using a panel dataset with 500 households over 100 weeks, each household having a binary indicator for each week (1, if a purchase has happened, 0 else) and a continuous variable of spendings in $, if a purchase has happened.

I have used the following command

Code:

cmp (incidence = number inventory pi season_xmas season_q2 season_q3 season_q4 season_2018 season_2019 || household: number pi) (spendings = number inventory pi season_xmas season_q2 season_q3 season_q4 season_2018 season_2019 || household: number pi), ind($cmp_cont, $cmp_left) cl(household) redraws(50)

Now, I do have a couple of questions regarding cmp and the way the random effects models work:

1. I was under the impression, that when I use redraws(#), Simulated Maximum Likelihood according to Train (2009) is used. According to the cmp documentation, redraws "sets the number of draws per observation at each level". Level here refers to each equation in "cmp () ()", right? In the case above that would be 500 households x 100 weeks x 2 equations = 50,000 x 50 x 2 draws. According to the implementation Train provides on his Website (https://eml.berkeley.edu/~train/software.html), draws should be drawn per class (here: 500 households x 50 draws). Is it possible to obtain the latter?

2. The table with results only shows me the coefficients of variables specified on the left side of the || operator. I would expect a mean coefficient and its standard deviation for parameters specified via "|| household: number pi". Am I missing something in the cmp command?

Thank you very much for your help!
Tags: cmp, simulated maximum likelih, tobit, train
David Roodman

Join Date: Jul 2014

Posts: 473
#2

30 Jun 2023, 10:25

This model has two equations. Separately, it has two levels: household-week (observation) and household. Only the household-level random effect needs to be integrated out for likelihood computation. This is done either through simulation (computing the integrand at lots of points and giving all evaluations equal weight) or quadrature (using fewer, specially chosen points and weights). I implemented adaptive quadrature later and I've found that it is usually faster. So draws are taken only at the household level. But on top of that you've got two random coefficients in each equation. So altogether 6 latent household-level variables need separate draws.

Note that by default, the household and observation-level errors are both allow to be correlated across the equations. The two correlations are identified in theory by often are hard to estimate in practice. You can set one or the other to 0 using the cov() option. Likewise all the household-level random coefficients and effects are allowed to be correlated within and across equations. That's a lot of correlations to estimate. You can quash within-equation correlations by inserting cov() options inside each equation's specification.

But if you're getting results despite having all those correlations to estimate, that's a good sign.

The hierarchical results should be there in a bottom section. It would help if you showed output.

There should be no comma in the ind() option.
Comment

Tim Sadler

Join Date: Jun 2023
Posts: 3

05 Jul 2023, 02:14

Thank you David, this clarified a lot already. I stumble upon some more problems, however, so please allow me to post follow-up questions, alongside some more output.

As you repeatedly mentioned in this forum, I have tried to start with a simple model with only two random effects and a very simple specification for the seasonal effects:

Code:

cmp (custweek_incidence = bundleweek_all_bundles_mc week_pi i.month i.year || cid: bundleweek_all_bundles_mc week_pi, cov(indep)) (log_custweek_spendings = bundleweek_all_bundles_mc week_pi i.month i.year || cid: bundleweek_all_bundles_mc week_pi, cov(indep)), ind($cmp_cont $cmp_left) cl(cid) redraws(50) nonrtolerance cov(indep unstruct) difficult

custweek_incidence and log_custweek_spendings vary across weeks and households.
bundleweek_all_bundles_mc and week_pi vary only across weeks.

The dataset consists of 56,852 observations of 378 households by up to 156 weeks.

Already in the first step, both component of the Type II Tobit Model appear to be "ill-conditioned":

Code:

Note: 50 is not prime. Prime draw counts are more reliable.


Fitting individual models as starting point for full model fit.
Note: For programming reasons, these initial estimates may deviate from your specification.
      For exact fits of each equation alone, run cmp separately on each.

      Source |       SS           df       MS      Number of obs   =    56,852
-------------+----------------------------------   F(15, 56836)    =     75.17
       Model |  124.333143        15  8.28887621   Prob > F        =    0.0000
    Residual |  6267.27853    56,836  .110269522   R-squared       =    0.0195
-------------+----------------------------------   Adj R-squared   =    0.0192
       Total |  6391.61168    56,851  .112427427   Root MSE        =    .33207

-------------------------------------------------------------------------------------------
       custweek_incidence | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------------------+----------------------------------------------------------------
bundleweek_all_bundles_mc |   .0003109   .0002119     1.47   0.142    -.0001043    .0007262
                  week_pi |  -2.708824   .0905154   -29.93   0.000    -2.886235   -2.531414
                          |
                    month |
                       2  |   .0067774   .0070252     0.96   0.335    -.0069921    .0205468
                       3  |  -.0117956   .0073852    -1.60   0.110    -.0262707    .0026795
                       4  |  -.0052734   .0072796    -0.72   0.469    -.0195414    .0089946
                       5  |  -.0128774   .0071091    -1.81   0.070    -.0268114    .0010565
                       6  |  -.0137209   .0078481    -1.75   0.080    -.0291032    .0016613
                       7  |   .0120197   .0072946     1.65   0.099    -.0022778    .0263171
                       8  |   .0008018   .0069926     0.11   0.909    -.0129038    .0145074
                       9  |  -.0212205   .0070737    -3.00   0.003     -.035085    -.007356
                      10  |  -.0474497   .0071187    -6.67   0.000    -.0614024   -.0334971
                      11  |  -.0235828   .0086197    -2.74   0.006    -.0404773   -.0066882
                      12  |  -.0257096   .0082393    -3.12   0.002    -.0418587   -.0095606
                          |
                     year |
                    2018  |   .0034241   .0039207     0.87   0.382    -.0042606    .0111088
                    2019  |  -.0046598   .0042696    -1.09   0.275    -.0130281    .0037086
                          |
                    _cons |   2.810958   .0884333    31.79   0.000     2.637628    2.984287
-------------------------------------------------------------------------------------------

Warning: regressor matrix for custweek_incidence equation appears ill-conditioned. (Condition number = 74.430989.)
This might prevent convergence. If it does, and if you have not done so already, you may need to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or nonrtolerance option to the command line.
See cmp tips.

Tobit regression                                    Number of obs     =  8,121
                                                           Uncensored =  8,119
Limits: Lower = 1.91                                    Left-censored =      1
        Upper = 7.05                                   Right-censored =      1

                                                    LR chi2(15)       =  59.70
                                                    Prob > chi2       = 0.0000
Log likelihood = -5655.5942                         Pseudo R2         = 0.0053

-------------------------------------------------------------------------------------------
   log_custweek_spendings | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------------------+----------------------------------------------------------------
bundleweek_all_bundles_mc |   .0012468   .0008386     1.49   0.137    -.0003971    .0028908
                  week_pi |   -1.13137   .2881939    -3.93   0.000    -1.696304   -.5664361
                          |
                    month |
                       2  |  -.0106429    .025754    -0.41   0.679    -.0611273    .0398415
                       3  |  -.0356476   .0267805    -1.33   0.183    -.0881444    .0168491
                       4  |  -.0175593   .0273022    -0.64   0.520    -.0710786      .03596
                       5  |  -.0309604   .0269647    -1.15   0.251    -.0838182    .0218973
                       6  |   .0132194   .0304966     0.43   0.665    -.0465617    .0730004
                       7  |   .0183191   .0267891     0.68   0.494    -.0341945    .0708326
                       8  |   .0159804   .0261597     0.61   0.541    -.0352994    .0672602
                       9  |   .0421779   .0280409     1.50   0.133    -.0127895    .0971453
                      10  |   .0097051   .0282889     0.34   0.732    -.0457483    .0651585
                      11  |   .0053585   .0337013     0.16   0.874    -.0607047    .0714217
                      12  |    -.07248   .0329587    -2.20   0.028    -.1370876   -.0078724
                          |
                     year |
                    2018  |   .0354235    .015263     2.32   0.020     .0055042    .0653428
                    2019  |   .0311758   .0168398     1.85   0.064    -.0018346    .0641861
                          |
                    _cons |   5.767486   .2820898    20.45   0.000     5.214517    6.320454
--------------------------+----------------------------------------------------------------
                   /sigma |   .4853533   .0038092                      .4778862    .4928204
-------------------------------------------------------------------------------------------

Warning: regressor matrix for log_custweek_spendings equation appears ill-conditioned. (Condition number = 63.857888.)
This might prevent convergence. If it does, and if you have not done so already, you may need to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or nonrtolerance option to the command line.
See cmp tips.

Fitting full model.
Random effects/coefficients simulated.
    Sequence type = halton
    Number of draws per observation = 50
    Include antithetic draws = no
    Scramble = no
    Prime bases = 2 3 5 7 11 13
Each observation gets different draws, so changing the order of observations in the data set would change the results.

Even though, this "simplest" model converges, it is not correctly specified, as the output reveals:

Code:

Iteration 28: Log likelihood = -17872.483  

Mixed-process multilevel regression                     Number of obs = 56,852
                                                        Wald chi2(15) = 599.10
Log likelihood = -17872.483                             Prob > chi2   = 0.0000

-------------------------------------------------------------------------------------------
                          |               Robust
                          | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
--------------------------+----------------------------------------------------------------
custweek_incidence        |
bundleweek_all_bundles_mc |   .0002913    .000217     1.34   0.179     -.000134    .0007166
                  week_pi |  -2.703717   .1371864   -19.71   0.000    -2.972598   -2.434837
                          |
                    month |
                       2  |   .0066355   .0064844     1.02   0.306    -.0060737    .0193447
                       3  |  -.0120574   .0068406    -1.76   0.078    -.0254648    .0013499
                       4  |  -.0056558    .006501    -0.87   0.384    -.0183976    .0070859
                       5  |   -.013314   .0063529    -2.10   0.036    -.0257653   -.0008626
                       6  |  -.0141679   .0068739    -2.06   0.039    -.0276405   -.0006953
                       7  |   .0114598   .0072772     1.57   0.115    -.0028032    .0257228
                       8  |  -.0000129   .0066198    -0.00   0.998    -.0129875    .0129617
                       9  |  -.0220982   .0061317    -3.60   0.000    -.0341161   -.0100803
                      10  |  -.0484126   .0065868    -7.35   0.000    -.0613226   -.0355026
                      11  |  -.0246042   .0077291    -3.18   0.001    -.0397528   -.0094555
                      12  |   -.026974   .0076933    -3.51   0.000    -.0420527   -.0118954
                          |
                     year |
                    2018  |   .0014865   .0039747     0.37   0.708    -.0063038    .0092768
                    2019  |  -.0079592   .0049992    -1.59   0.111    -.0177574     .001839
                          |
                    _cons |   2.818973   .1343059    20.99   0.000     2.555738    3.082207
--------------------------+----------------------------------------------------------------
log_custweek_spendings    |
bundleweek_all_bundles_mc |  -.0020806          .        .       .            .           .
                  week_pi |  -2.133892          .        .       .            .           .
                          |
                    month |
                       2  |  -1.089851          .        .       .            .           .
                       3  |  -.9672192          .        .       .            .           .
                       4  |  -.9049909          .        .       .            .           .
                       5  |  -.9505842          .        .       .            .           .
                       6  |  -.8517807          .        .       .            .           .
                       7  |  -.8907744          .        .       .            .           .
                       8  |  -1.054489          .        .       .            .           .
                       9  |  -.8553853          .        .       .            .           .
                      10  |  -.9348447          .        .       .            .           .
                      11  |  -1.135238          .        .       .            .           .
                      12  |  -1.195883          .        .       .            .           .
                          |
                     year |
                    2018  |  -.9235099          .        .       .            .           .
                    2019  |  -.9479096          .        .       .            .           .
                          |
                    _cons |   4.784037          .        .       .            .           .
--------------------------+----------------------------------------------------------------
             /lnsig_1_1_1 |  -6.474712   .0000357 -1.8e+05   0.000    -6.474783   -6.474642
             /lnsig_2_1_1 |  -3.395408   .2691784   -12.61   0.000    -3.922988   -2.867828
               /lnsig_1_1 |  -2.901496   .2272796   -12.77   0.000    -3.346956   -2.456036
             /lnsig_1_1_2 |  -3.588943          .        .       .            .           .
             /lnsig_2_1_2 |   8.027939          .        .       .            .           .
               /lnsig_1_2 |   26.50355          .        .       .            .           .
                 /lnsig_1 |  -1.115321   .0088662  -125.79   0.000    -1.132698   -1.097943
                 /lnsig_2 |  -.3147458          .        .       .            .           .
             /atanhrho_12 |   -.481693          .        .       .            .           .
-------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
Random effects parameters           |  Estimate    Std. Err.    [95% Conf. Interval]
------------------------------------+-----------------------------------------------
Level: cid                          |
  custweek_inci~e                   |
    Standard deviations             |
      bundleweek_all_bundl~c        |  .0015419    5.51e-08     .0015418    .0015421
      week_pi                       |  .0335269    .0090247     .0197819    .0568222
      _cons                         |   .054941     .012487     .0351913    .0857743
  log_custweek_~s                   |
    Standard deviations             |
      bundleweek_all_bundl~c        |  .0276275          .
      week_pi                       |  3065.418          .
      _cons                         |  3.24e+11          .
------------------------------------+-----------------------------------------------
Level: Observations                 |
 Standard deviations                |
  custweek_incidence                |  .3278102    .0029064     .3221629    .3335564
  log_custweek_spendings            |  .7299744          .
 Cross-eq correlation               |
  custweek_inci~e log_custweek_~s   | -.4475985          .
------------------------------------------------------------------------------------

Here are my questions:

1. Is it plausible that such a simple model appears ill-conditioned, when the correlation between the two true independent variables is only 0.082 and the factor variables cannot be correlated by design? Or is there something wrong with the cmp specification?

2. Is it possible that the structure of the dataset is not feasible for a Tobit Type II Model: Time series with binary "selection" equation and continuous "outcome" equation, independent variables varying only across weeks?

3. Is it possible that the imbalanced size of classes of the binary variable prevents this kind of model (only 7333 = 13% observation have a value of 1 in custweek_incidence)?

4. Do you have any other indication what might be wrong or what else I could try?

Thank you very much once again for your help.

Last edited by Tim Sadler; 05 Jul 2023, 02:23.

Comment

David Roodman

Join Date: Jul 2014

Posts: 473
#4

05 Jul 2023, 09:18

I would try dropping the redraws() option so it will use quadrature rather than simulation. I think this default generally works better. But I'm not expecting it to help much.

I think a two-equation tobit model with random effects and coefficients is ambitious. It's possible to write down all sorts of models with cmp that will work if they are correct and if data are infinite, yet which will be hard to fit in reality.

Ill-conditioning happens when different variables are on much different scales. You can try rescaling some variables and see if it helps.

Your ind() specification says that every observation of the outcome in the second equation is left-censored to whatever values are reported for those observations. I doubt that's what you mean. Look at the tobit examples in the help file and make sure you understand them. To put this another way. You could make a much simpler version of this model, which only has the second equation, and has no rand effects and coefficients. The results should then match what you get from the tobit command. Do they?

There's a limit to how much I can engage with the specifics of people's modeling challenges.
Comment

Announcement