What does the ivprobit command actually do?

Marco Greco

Join Date: Sep 2015

Posts: 45
#1

What does the ivprobit command actually do?

22 Nov 2019, 03:10

Dear statalisters,
I've read dozens of times the ivprobit manual and searched for papers published on top journals that used the STATA ivprobit command with MLE estimation.

Based on the manual, the formal model is:

y*_1i =y_2iβ + x_1iγ + u_i
y_2i =x_1iΠ₁+ x_2iΠ₂ + v_i
where i = 1, . . . , N, y_2i is a 1×p vector of endogenous variables, x_1i is a 1×k1 vector of exogenous variables, x₂i is a 1 × k2 vector of additional instruments, and the equation for y_2i is written in reduced form. By assumption, (u_i , v_i) ∼ N(0, Σ), where σ₁₁ is normalized to one to identify the model. β and γ are vectors of structural parameters, and Π₁ and Π₂are matrices of reduced-form parameters. This is a recursive model: y_2i appears in the equation for y*_1i , but y*_1i does not appear in the equation for y_2i . We do not observe y*_1i; instead, we observe y_1i= 0 if y*_1i< 0 and y_1i=1 if y*_1i≥ 0 The order condition for identification of the structural parameters requires that k2 ≥ p. Presumably, Σ is not block diagonal between u_i and v_i

In brief, three "schools of thought" seem to exist:
1. The second stage model includes in its u_i errors also the v_i errors calculated by the first stage model
2. The second stage model includes, instead of the actual values of the variable y_2i, an estimation of it from the first stage regression
3. The β coefficient in the second stage model is calculated taking into account the results of the first stage regression

Where is the truth?

Thank you very much
Tags: instrumental variables, IV probit, ivprobit
Marco Greco

Join Date: Sep 2015

Posts: 45
#2

25 Nov 2019, 02:10

I forgot to mention that I am using STATA 14, I don't know whether the command has changed since that version. Thanks again for your help.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10216
#3

25 Nov 2019, 08:06

Newey's two step estimator uses the residual obtained in the first step as a regressor in the second step and its coefficient estimates are straight-forward to reproduce. I do not know the algorithm employed by the conditional MLE estimator but if you are very curious, you can study the code of the command.

Code:

viewsource ivprobit.ado
Comment
Marco Greco

Join Date: Sep 2015

Posts: 45
#4

25 Nov 2019, 08:40

Thanks for your feedback. I am indeed very curious, but also very unfamiliar with STATA command code, so I can hardly understand it...
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10216
#5

25 Nov 2019, 12:37

I just looked through the methods and formulas section of the manual and it's illustrated how to derive the likelihood function. Once you have the log-likelihood, it is just standard maximization.You can start at

Code:

help ml

and learn how to do maximum likelihood estimation in Stata, then build up from there.
Comment

Marco Greco

Join Date: Sep 2015
Posts: 45

27 Nov 2019, 03:46

Thanks again,
I tried hard to read the code as you suggested, and come to the conclusion that the answer is in the following section (I highlighted what I think is the first stage regression in blue, and second stage regression in green).

Code:

                       
 local lhsname `lhs'
        if _caller() < 11 {
                local lhsstr : subinstr local lhsname "." "_"
        }
        else    local lhsstr : copy local lhs
        local exogname `exog'
        local end1name `end1'
        local instname `inst'

[...]

Code:

cap sureg (`end1' = `exog' `inst')
                        if _rc {
                                di as error "could not find initial values"
                                exit 498
                        }
                        mat `V0' = e(Sigma)
                        cap mat `cholV' = cholesky(`V0')
                        if _rc {
                                di as error "could not find initial values"
                                exit 498
                        }
                        loc nchol = `end1_ct'*(`end1_ct' + 1) / 2
                        mat `V0' = J(1, `nchol', 0)
                        loc m = 1
                        forv i = 1/`end1_ct' {
                                forv j = `i'/`end1_ct' {
                                        mat `V0'[1, `m'] = `cholV'[`i',`j']
                                        loc m = `m' + 1
                                }
                        }
                        if `end1_ct' == 1 {
                                mat `bfrom' = `b0', e(b), 0, ln(`V0'[1,1])
                        }
                        else {
                                mat `bfrom' = ///
                                        `b0', e(b), J(1,`end1_ct', 0), `V0'
                        }
                        local init "`bfrom', copy"
                }
                else {
                        local init "`from'"
                }
                loc iveqns ""   // Holds IV equations
                loc covterms ""   // Holds like /s21 /s31 /s32 for cov mat
                loc testcmd ""  // To give to -test- for exog. test
                loc i = 1
                foreach var of varlist `end1' {
                        loc iveqns "`iveqns' (`var' : `var' = `exog' `inst')"
                        loc ip1 = `i' + 1
                        // Only for multiple endog vars:
                        loc covterms "`covterms' /s`ip1'1"
                        loc testcmd "`testcmd' [s`ip1'1]_b[_cons]"
                        loc i = `i' + 1
                }
                if `end1_ct' > 1 {
                        forv j = 1/`end1_ct' {
                                loc jp1 = `j' + 1
                                forv i = `j'/`end1_ct' {
                                        loc ip1 = `i' + 1
                                        loc covterms "`covterms' /s`ip1'`jp1'"
                                }
                        }
                }
                else {  // Fix things up for the one endog var model
                        loc covterms "/athrho /lnsigma "
                        loc testcmd "[athrho]_b[_cons]"
                        loc dip diparm(athrho, tanh label("rho"))       ///
                                diparm(lnsigma, exp label("sigma"))
                }
                qui `noi' di as text _n "Fitting full model"
                // sort so that we can get the cov terms from the
                // last obs. in dataset in lf
                tempvar currsort
                gen `c(obs_t)' `currsort' = _n
                sort `touse' `currsort'
                glo IV_NEND = `end1_ct'
                if `end1_ct' == 1 {
                        `vv' ///
                        ml model lf ivprob_1_lf                         ///
                                (`lhsstr' : `lhs' = `end1' `exog')      ///
                                `iveqns' `covterms'                     ///
                                `wgt' if `touse' ,                      ///
                        title(Probit model with endogenous regressors)  ///
                                maximize `mlopts' `robust' `clusopt'    ///
                                search(off) init(`init') `log'          ///
                                `scoreml' nrtolerance(`nrtolerance')    ///
                                collinear `dip'
                }
                else {                                    
                        `vv' ///
                        ml model lf ivprob_lf                           ///
                                (`lhsstr' : `lhs' = `end1' `exog')      ///
                                `iveqns' `covterms'                     ///
                                `wgt' if `touse',                       ///
                        title(Probit model with endogenous regressors)  ///
                                maximize `mlopts' `robust' `clusopt'    ///
                                search(off) init(`init') `log'          ///
                                `scoreml' nrtolerance(`nrtolerance')    ///
                                collinear
                }

Still, I am missing something: does the following code mean that we are using the end1 estimated by cap sureg, or the original data in the variable?

Code:

  
 `lhs' = `end1' `exog'

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10216
#7

27 Nov 2019, 07:38

(I highlighted what I think is the first stage regression in blue,

Yes, so this is equivalent to OLS

Code:

sureg (`end1' = `exog' `inst') regress `end1' `exog' `inst'

Still, I am missing something: does the following code mean that we are using the end1 estimated by cap sureg, or the original data in the variable?

Unless the local is redefined, which I cannot see that it has, then it is the same as is in the sureg command.
Comment
Marco Greco

Join Date: Sep 2015

Posts: 45
#8

27 Nov 2019, 12:05

Therefore, to return to my original question, the interpretation "2" seems the right one:

2. The second stage model includes, instead of the actual values of the variable y2i , an estimation of it from the first stage regression
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10216
#9

27 Nov 2019, 14:22

I do not see any evidence of #2 both from how the equations are specified in your post #1 and from the code pasted above (taken from ivprobit.ado) . As I said, `end1' refers to the original endogenous variables in the sureg command, and these are not redefined subsequently. Therefore, in the ml model command, `end1' still refers to these endogenous variables, unless you can point me to where this local is redefined. You have to remember that maximum likelihood is an estimation technique and not a model, so if you want a theoretical background, read Newey's paper on efficient estimation of limited dependent variable models with endogenous explanatory variables.

Reference
Newey, W. (1987). "Simultaneous estimation of limited dependent variable models with
endogenous explanatory variables." Journal of Econometrics, 36: 231-250.
Comment
Marco Greco

Join Date: Sep 2015

Posts: 45
#10

28 Nov 2019, 15:23

I am grateful for your patient support! Indeed, I could not find any redefinition of end1, I was mistaken.

Yet, by looking at the code, I really can't get how the first stage regression affects the second stage one...
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10216
#11

28 Nov 2019, 15:49

This is easy to illustrate using Newey's two step method. Here, the residuals from the first stage are used as regressors in the second stage.

Code:

webuse laborsup, clear *IVPROBIT ivprobit fem_work fem_educ kids (other_inc = male_educ), two first *FIRST STAGE REGRESSION sureg (other_inc = fem_educ kids male_educ) *PREDICT RESIDUALS predict res, r *SECOND STAGE PROBIT probit fem_work other_inc fem_educ kids res

For the maximum likelihood procedure, if you look at the code closely, you see that there are inputs from the first stage used to create matrices that are subsequently included in the ml maximazation.
1 like
Comment

Announcement

What does the ivprobit command actually do?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment