Using cmp for discrete/continuous estimation

Christos Makridis

Join Date: Nov 2014

Posts: 157
#1

Using cmp for discrete/continuous estimation

28 May 2016, 12:08

Hi everyone,

I had another thread about this sort of problem, but received good advice about using the cmp command. The context is this. I am estimating a discrete-continuous choice model where individuals choose where to live and, given where they live, how much to work, consume, and "use" their house. Each location has a different amenity (pollution), which also affects their utility.

However, leisure, consumption, and housing are all endogenous, so I also instrument for each of them. cmp is extremely useful to allow for this possibility and still estimate a discrete choice - truly a remarkable command.

However, I got some problems when estimating it that I did not know how to diagnose completely. I am copying a subset of the output (not the estimates) below. But, I will direct attention towards some matrices being ill conditioned and the collinear regressors. However, all these variables work if I'm using reg3 -- there's no reason why they should be ill conditioned or collinear. The only thing I can think of is that it's an extremely tough problem to optimize. I haven't even added the fixed effects in yet...

cmp (lwage_hourly = lleisure lcons_nondur lpoll $aqX $aqstX) (lleisure=$ivweather) (lcons_nondur = $ivcons) (lhprice = lcons_house lcons_nondur lpoll $aqX $aqstX) (lcons_house = $ivhouse) (move = lleisure lcons_nondur lcons_house $aqX $aqstX) (location = lleisure lcons_nondur lcons_house $aqX $aqstX) [w=perwt],indicators($cmp_cont $cmp_cont $cmp_cont $cmp_cont $cmp_cont $cmp_probit $cmp_oprobit) cluster(county)
(sampling weights assumed)

Fitting individual models as starting point for full model fit.
Note: For programming reasons, these initial estimates may deviate from your specification.
For exact fits of each equation alone, run cmp separately on each.

-------------------------------------------------------------------------------

Warning: regressor matrix for lwage_hourly equation appears ill-conditioned. (Condition num
> ber = 1526747.4.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

----------------------------------------------------------------------------------

Warning: regressor matrix for lleisure equation appears ill-conditioned. (Condition number
> = 196065.85.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

--------------------------------------------------------------------------------------

Warning: regressor matrix for lcons_nondur equation appears ill-conditioned. (Condition num
> ber = 2253.5663.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

Source | SS df MS Number of obs =235581188
-------------+------------------------------ F( 34,235581153) = .
Model | 32507335.3 34 956098.096 Prob > F = 0.0000
Residual | 99403557.6235581153 .421950382 R-squared = 0.2464
-------------+------------------------------ Adj R-squared = 0.2464
Total | 131910893235581187 .559938145 Root MSE = .64958

-------------------------------------------------------------------------------

Warning: regressor matrix for lhprice equation appears ill-conditioned. (Condition number =
> 1263661.2.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

Source | SS df MS Number of obs =238163538
-------------+------------------------------ F( 1,238163536) =80511.68
Model | 10561.4387 1 10561.4387 Prob > F = 0.0000
Residual | 31242044.7238163536 .131178959 R-squared = 0.0003
-------------+------------------------------ Adj R-squared = 0.0003
Total | 31252606.2238163537 .131223304 Root MSE = .36219

------------------------------------------------------------------------------
lcons_house | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ltrantime | .0088839 .0000313 283.75 0.000 .0088225 .0089453
_cons | 9.577865 .0001006 9.5e+04 0.000 9.577668 9.578062
------------------------------------------------------------------------------

Iteration 0: log likelihood = -1.279e+08
Iteration 1: log likelihood = -1.004e+08
Iteration 2: log likelihood = -98963302
Iteration 3: log likelihood = -98953638
Iteration 4: log likelihood = -98953634

Probit regression Number of obs = 1947523
LR chi2(34) = 5.79e+07
Prob > chi2 = 0.0000
Log likelihood = -98953634 Pseudo R2 = 0.2263

-------------------------------------------------------------------------------

Warning: regressor matrix for moved equation appears ill-conditioned. (Condition number = 1
> 110405.6.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

Iteration 0: log likelihood = -2.679e+08
Iteration 1: log likelihood = -2.405e+08
Iteration 2: log likelihood = -2.392e+08
Iteration 3: log likelihood = -2.392e+08
Iteration 4: log likelihood = -2.392e+08

Note: 14 observations completely determined. Standard errors questionable.

Warning: regressor matrix for _cmp_y7 equation appears ill-conditioned. (Condition number =
> 1110405.6.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

Fitting full model.

cmp_lnL(): 3499 halton2() not found
<istmt>: - function returned error
Mata run-time error
Mata run-time error
Tags: cmp, continuous choice, discrete choice, fixed effects, instrumental variables
David Roodman

Join Date: Jul 2014

Posts: 470
#2

28 May 2016, 18:53

The warnings are just warnings. They give a tip that might be useful if you are having convergence problems.
The error relating to halton2() looks like an installation problem. Try "ssc install ghk2, replace". Do you have more than one copy of ghk2.mlib on your computer?
--David
Comment
Christos Makridis

Join Date: Nov 2014

Posts: 157
#3

28 May 2016, 20:52

Hey David, fantastic program and thank you for the reply! Yes it could be something having to do with the installation.

I originally couldn't install the program -- my stata hasnt been able to do ssc install for a while, so I manually install -- but the manual installation did not work. Long story short, I used stata on the university server, grabbed the ado file using ssc install, and replaced the ado file in my stata directory, which is what allowed me to produce the above results. It's odd that it says the matrices are ill conditioned though.

I can play around with the convergence and tolerance options, but two questions to double check on:

(1) Do you foresee any problems estimating fixed effect models with 2+ million observations? The fixed effects are on county and year generally speaking, so there aren't too many fixed effects, but still quite a bit of optimization is involved.
(2) When using instruments, you just write (endog = Z), right? That is, you wouldnt also include the controls by doing (endog X = Z X)? I am just trying to think of any possible misunderstandings on my end that might be contributing to the lack of reliable convergence.

Edit:
(3) How do you output results? After using est sto, and assuming you want to output with esttab, then would you just list the equation number like you would with reg3? For example: "keep(first:x)" would keep the coefficient on X from the first equation written.

Last edited by Christos Makridis; 28 May 2016, 20:55.
Comment
David Roodman

Join Date: Jul 2014

Posts: 470
#4

28 May 2016, 20:59

1) If there aren't too many fixed effects, I'm optimistic it will work. However, since you're triggering use of the GHK algorithm to calculate cumulative normal densities above dimension 2, the computational burden is already pretty high. I'd build up to the full model and data set. Also, start with a modest number of GHK draws, maybe 10 or 20. Don't take the default on that.
2) No, do (endog X = Z X). See ivregress/ivreg example in help file.
3) That question pertains to multi-equation estimators in general, e.g., sureg, and it might depend on the specific post-estimation command, so I won't try to answer it.
1 like
Comment
Christos Makridis

Join Date: Nov 2014

Posts: 157
#5

28 May 2016, 21:20

Thank you! Will keep working on it.
Comment
Christos Makridis

Join Date: Nov 2014

Posts: 157
#6

04 Jun 2016, 18:23

Hey David,

There are three things that I was hoping to clarify further.

(1) Adding ghk(10) gave a syntax error. I looked around the documentation, and it seems right, but there must be something I'm overlooking.
(2) I added three constraints, which work when I use reg3 on just the continuous variable equations, but not here. It seems like the constraints are not even being recognized, although I'm inputting them properly via the "constr def 1 ..." and then in the options of the command "..., constr(1 2 3)"
(3) The t-stats are huge even though I am clustering at the right level. I suppose it could be due to incorrect convergence.

Thoughts?
Comment
Christos Makridis

Join Date: Nov 2014

Posts: 157
#7

06 Jun 2016, 17:02

Bump just in case anyone else had any ideas!
Comment

Announcement

Using cmp for discrete/continuous estimation

Comment

Comment

Comment

Comment

Comment

Comment