Multinomial Probit model with continuous endogenous regressors

Lok Man Tong

Join Date: Jan 2015

Posts: 15
#1

Multinomial Probit model with continuous endogenous regressors

21 Sep 2020, 01:47

Hi there

May I ask if there are stata commands for running multinomial probit model with continuous endogenous regressors?
I would like to solve endogeneity problem in a multinomial probit model. Thank you.
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

21 Sep 2020, 02:30

Check whether user written -cmp- by David Roodman is not doing what you want.
Comment
Lok Man Tong

Join Date: Jan 2015

Posts: 15
#3

21 Sep 2020, 04:04

Thank you.

Should it look like this?

cmp (y=x1 x2) (x2=x1 z), ind($cmp_mprobit $cmp_cont).
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#4

21 Sep 2020, 04:32

Looks about right, but read the help file, and most importantly try it to see what Stata would say.

Originally posted by Lok Man Tong View Post

Thank you.

Should it look like this?

cmp (y=x1 x2) (x2=x1 z), ind($cmp_mprobit $cmp_cont).
Comment
Lok Man Tong

Join Date: Jan 2015

Posts: 15
#5

02 Oct 2020, 19:54

Hi Joro

May I ask further?

My original mprobit model involves a setting of initial spec, let say "from(x1=0)".

I don't know how to incorporate this into cmp. I have tried the following, but it doesn't work. The Stata said "= invalid name / invalid syntax"

cmp (y=x1 x2 x3) (x2=x1 x3 z), ind($cmp_mprobit $cmp_cont) from(x1=0)

Looking forward to hearing from you. Thank you very much.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#6

03 Oct 2020, 02:44

Why would you like to do this?

Reading the help of -mprobit-, I see that

"maximize options: difficult, technique(algorithm spec), iterate(#),

no

log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), nonrtolerance, and from(init specs); see [R] maximize. These options are
seldom used."

If -cmp- converges, do not fiddle with the optimisation.

Commands do not generally take any option we have seen anywhere else in other commands. When you use this optimisation option from -mprobit- , -cmp- does not know what you are talking about.

Originally posted by Lok Man Tong View Post

Hi Joro

May I ask further?

My original mprobit model involves a setting of initial spec, let say "from(x1=0)".

I don't know how to incorporate this into cmp. I have tried the following, but it doesn't work. The Stata said "= invalid name / invalid syntax"

cmp (y=x1 x2 x3) (x2=x1 x3 z), ind($cmp_mprobit $cmp_cont) from(x1=0)

Looking forward to hearing from you. Thank you very much.
Comment
Lok Man Tong

Join Date: Jan 2015

Posts: 15
#7

03 Oct 2020, 03:24

I need to set up from(init specs) in the original mprobit model, otherwise it does not converge.

I tried to remove it in -cmp-, but the model does not converge.

Are there alternative functions / options in -cmp-?

Thank you.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

03 Oct 2020, 03:35

This is bad... Try the -difficult- option.

From the help file, this is what -cmp- accepts:

" ml_opts: cmp accepts the following standard ml options, which affect the full-model and initial, single-equation fits: trace gradient hessian
showstep technique(algorithm_specs) vce(oim|opg|robust|cluster) iterate(#) tolerance(#) ltolerance(#) gtolerance(#) nrtolerance(#) nonrtolerance
shownrtolerance difficult constraints(numlist|matname)
"

Originally posted by Lok Man Tong View Post

I need to set up from(init specs) in the original mprobit model, otherwise it does not converge.

I tried to remove it in -cmp-, but the model does not converge.

Are there alternative functions / options in -cmp-?

Thank you.
Comment
Lok Man Tong

Join Date: Jan 2015

Posts: 15
#9

05 Oct 2020, 05:08

Very bad.

I added the "difficult" option, it does not converge.

So, I also included "init(vector)", but the processing time is incredibly long. The Stata has been running for hours, no results or errors come out.

Is it possible to speed up the process? Many thanks.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#10

05 Oct 2020, 05:19

If -cmp- accepted init(vector) then you are doing what you want to do, you are passing on the initial values.

I would suggest that you make sure you understand how init(vector) works on a simple example which converges, just to make sure you are not passing on the initial values in the wrong way.

Another suggestion is to alternatingly delete say one/two observations, and see whether the model converges.

And I do not think that you can speed up the process if it is calculating. There is always the possibility that it is calculating some nonsense because you have passed init(vector) incorrectly, or just because the optimiser got stuck in some flat/nonconcave region.

But the only thing you can do for calculations that take too long (and you have run out of options) is to let this run in the evening and go to bed, and see what has come out in the morning.

Originally posted by Lok Man Tong View Post

Very bad.

I added the "difficult" option, it does not converge.

So, I also included "init(vector)", but the processing time is incredibly long. The Stata has been running for hours, no results or errors come out.

Is it possible to speed up the process? Many thanks.
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#11

06 Oct 2020, 11:18

Originally posted by Lok Man Tong View Post

Thank you.

Should it look like this?

cmp (y=x1 x2) (x2=x1 z), ind($cmp_mprobit $cmp_cont).

You may want to replace (y = x1 x2) with (y = x1 x2, iia). As to why, please refer to the Keane [1992] reference in the -cmp- help file.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2149
#12

07 Oct 2020, 17:05

This looks like a job for the control function approach. It requires a simple first step, and then, I guess, the cmmprobit command. But I've never used cmmprobit. So here is a guess:

Code:

reg x2 x1 z predict v2h, resid cmmprobit y x1 x2 v2h

This entails a generated regressor, v2h, and so you should adjust the standard errors. However, the t test on v2h is a valid test of the null that x2 is exogenous. Of the cmmprobit does not take too long to run, I would just bootstrap the two-step procedure. The analytical standard errors are more difficult to work out.

To obtain marginal effects of x1 and x2, average out v2h across the sample. Again, a bootstrap can provide proper standard errors.

JW
Comment
Lok Man Tong

Join Date: Jan 2015

Posts: 15
#13

21 Oct 2020, 08:21

Originally posted by Joro Kolev View Post

If -cmp- accepted init(vector) then you are doing what you want to do, you are passing on the initial values.

I would suggest that you make sure you understand how init(vector) works on a simple example which converges, just to make sure you are not passing on the initial values in the wrong way.

Another suggestion is to alternatingly delete say one/two observations, and see whether the model converges.

And I do not think that you can speed up the process if it is calculating. There is always the possibility that it is calculating some nonsense because you have passed init(vector) incorrectly, or just because the optimiser got stuck in some flat/nonconcave region.

But the only thing you can do for calculations that take too long (and you have run out of options) is to let this run in the evening and go to bed, and see what has come out in the morning.

Thank you, Joro.
I tried best to deal with the initial values. Unfortunately, another error came out. But my computer has 32GB ram, sounds not very bad.
Fitting constant-only model for LR test of overall model fit.
#: 3900 out of memory
halton2(): - function returned error
ghk2setup(): - function returned error
cmp_model::cmp_init(): - function returned error
<istmt>: - function returned error
Mata run-time error
r(3900);
Comment
Lok Man Tong

Join Date: Jan 2015

Posts: 15
#14

21 Oct 2020, 08:22

Originally posted by Hong Il Yoo View Post

You may want to replace (y = x1 x2) with (y = x1 x2, iia). As to why, please refer to the Keane [1992] reference in the -cmp- help file.

Thank you, Hong Il.
I tried it, but it didn't work.
Comment
Lok Man Tong

Join Date: Jan 2015

Posts: 15
#15

21 Oct 2020, 08:37

Originally posted by Jeff Wooldridge View Post

This looks like a job for the control function approach. It requires a simple first step, and then, I guess, the cmmprobit command. But I've never used cmmprobit. So here is a guess:

Code:

reg x2 x1 z predict v2h, resid cmmprobit y x1 x2 v2h

This entails a generated regressor, v2h, and so you should adjust the standard errors. However, the t test on v2h is a valid test of the null that x2 is exogenous. Of the cmmprobit does not take too long to run, I would just bootstrap the two-step procedure. The analytical standard errors are more difficult to work out.

To obtain marginal effects of x1 and x2, average out v2h across the sample. Again, a bootstrap can provide proper standard errors.

JW

Thank you, Jeff.
Yes, I have to work on the control function approach. May I check with you if the following codes are on the right track? CMMPROBIT is a new feature of Stata 16. I haven't upgraded it yet. So, I tried to replace cmmprobit with mprobit and see how it goes first. They worked. I still want to combine two bootstrap programs to save estimation time, but failed to do so. Hope to have some suggestions. Many thanks.

*Bootstrap standard errors of mprobit
capture program drop mprobendo
program mprobendo, eclass

//step 1 OLS//
reg x2 x1 z
capture drop v2h
predict v2h, residuals

//step 2 CMMPROBIT//
cmmprobit y x1 x2 v2h
tempvar bsbse
tempname bsb
matrix `bsb'=e(b)
quietly gen byte `bsbse1' = e(sample)
ereturn post `bsb', esample(`bsbse')
end

set seed 1
bootstrap _b, reps(50): mprobendo

*Bootstrap marginal effects (rclass)
capture program drop mprobme
program mprobme, rclass

//step 1 OLS//
reg x2 x1 z
capture drop v2h
predict v2h, residuals

//step 2 CMMPROBIT//
cmmprobit y x1 x2 v2h
margins, dydx(*) at((mean)v2h) predict(outcome(1))
matrix list r(b)
tempname M1
matrix `M1'=r(b)
local M1_cols=colsof(`M1')
forvalues j=1/`M1_cols'{
return scalar margin1_`j'=`M1'[1,`j']
}
margins, dydx(*) at((mean)v2h) predict(outcome(2))
matrix list r(b)
tempname M2
matrix `M2'=r(b)
local M2_cols=colsof(`M2')
forvalues j=1/`M2_cols'{
return scalar margin2_`j'=`M2'[1,`j']
}
margins, dydx(*) at((mean)v2h) predict(outcome(3))
matrix list r(b)
tempname M3
matrix `M3'=r(b)
local M3_cols=colsof(`M3')
forvalues j=1/`M3_cols'{
return scalar margin3_`j'=`M3'[1,`j']
}
end

set seed 1
bootstrap m11=r(margin1_1) m12=r(margin1_2) m21=r(margin2_1) m22=r(margin2_2) m31=r(margin3_1) m32=r(margin3_2), reps(50): mprobme
Comment

Announcement

Multinomial Probit model with continuous endogenous regressors

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment