Roodman's cmp command

Ebenezer Kondo

Join Date: Aug 2017

Posts: 2
#16

06 Apr 2018, 15:16

Please I am modelling a triple hurdle market participation decision. The first model is about the production decision, second the market participation decision and the third intensity of participation. My data set is such that I have no values for non-producers in terms of their total output and market participation decision. I using the Roodman's "cmp" command to fit the three models simultaneously, but the following is the output "no observation" r(2000). Please, how do I correct this.

cmp(cpp=age gender hhsize depratio educy exp fsize distmkt pos omt ami income ofi ownmob ownrad landown credit ext c_price yendi mion kum)(MP=age gender hhsize depratio educy exp fsize distmkt pos omt ami t_output income ownmob ownrad landown credit c_price yendi mion kum)(output=age gender hhsize depratio educy exp fsize income t_output ownmob ownrad landown credit ext c_price yendi mion kum),indicators("cpp*$cmp_probit" "MP*$cmp_probit" "output*$cmp_trunc")difficult nonrtolerance qui
Comment
Pintu Batra

Join Date: May 2025

Posts: 6
#17

06 Sep 2025, 00:37

Hello All

We have a binary dependent variable, binary independent variable, and a continous instrument variable. Hence, we are planning to use cmp instead of IV TSLS. While running the cmp structure, we found 'Atanhrho_12' as 0.30 and significant at 1%. Our argument to prefer cmp over TSLS is as follows: We prefer cmp model over IV-TSLS as the former uses the information about the limited nature of first stage dependent variable and should be more efficient if the errors terms in the two equations are correlated.
My questions are as follows:
1. We prefer cmp model over IV-TSLS as the former uses the information about the limited nature of first stage dependent variable and should be more efficient if the errors terms in the two equations are correlated. Is this argument correct?
2. Since 'Atanhrho_12' is statistically significant, can we argue that the error terms of the two equations are correlated?

Our cmp command is as follows:

Code:

cmp ( binary_outcome = binary_treatment ) ( binary_treatment = continous_IV) [pw=weight], indicators($cmp_probit $cmp_probit) vce(cluster cluster_id) iterate(100)
Comment
David Roodman

Join Date: Jul 2014

Posts: 477
#18

07 Sep 2025, 15:33

2 is correct and 1 is partly correct.

The missing point in 1 is that cmp also introduces the assumption that the underly error process is normally distributed. If the entire structural model is correct--that there is normal error and then there is a censoring process that produces an observed 0/1--then the estimator that uses that information (cmp) will be more precise. However if the error is not normal, and it is impossible to be certain, then the pure linear model could be less biased, for it does not require normality. The linear model will probably also be less precise since it uses less information. There could be a bias-variance trade-off.

The help file mentions this briefly:
if C is continuous, B is a sometimes-left-censored determinant of C, and A is an instrument, then the effect of B on C can be consistently estimated with 2SLS (Kelejian 1971). However, a cmp estimate that uses the information that B is censored will be more efficient if it is based on a more accurate model.
This issue is also comes up in Roodman and Morduch (2014).

This is a general theme in econometrics. A model that makes more assumptions (= uses more information) more accurate if the assumptions are correct. But often we can't be sure.

Last edited by David Roodman; 07 Sep 2025, 15:35.
1 like
Comment
Pintu Batra

Join Date: May 2025

Posts: 6
#19

08 Sep 2025, 20:39

Thank you for your response, David. Is there a way to provide some evidence that the errors terms are normally distributed? Our sample has more than one lakh observations from a cross sectional round. Can we show that predicted residuals from the first stage equation are following normal distribution? Similarly, predicted residuals from the structural (second stage) equation are also following normal distribution? Would it provide some confidence to our argument that we should use CMP instead of 2SLS in our context?

Last edited by Pintu Batra; 08 Sep 2025, 20:55.
Comment

Announcement

Comment

Comment

Comment

Comment