2 Stage Probit Least Squares (2SPLS) vs. Two-stage least-squares regression for panel data

Denis Viktorovich

Join Date: Apr 2017

Posts: 7
#1

2 Stage Probit Least Squares (2SPLS) vs. Two-stage least-squares regression for panel data

26 Apr 2017, 02:35

Dear Stata users and professionals!

I'm currently working on my Master's thesis and I've faced one interesting for me question.
Brief explanation about my sample and research:

I'm writing about the choice of zero-debt capital structure among firms from emerging countries. The dependent variable is dichotomous (takes 1 if firm doesn't have debt and 0 otherwise). Also, I have pretty big set of independent variables (around 10-15 for different models). The time frame from 2010 to 2015. The total number of observations is 24149 (27150 including lags for 2010th year).
For the first step of my research I use logit model:
xtset id year (id - firm)
xtlogit depvar indepvar1 L.indepvar2 L.indepvar3 (so on) i.coutntry i.industry, vce (cl id)
margins, dydx(indepvar1 L.indepvar2 L.indepvar3 (so on)), nose

For the second step I need to control for endogeneity in the model. So, due to several articles I've determined the interdependency between payout policy and capital structure choice. So, I found the 2 Stage Probit Least Squares model (http://www.stata-journal.com/article...article=st0038) cdsimeq command. But this model doesn't suit for panel data. I've chosen two ways to deal with this issue, and I'd like to hear your comments. The endogenous variable is Dividends (continuous) and the main depvariable id ZD(dichotomous). Here are equations (non recursive):
Dividends = b0+b1ZD+b2....+b(n)+e (1 - 1st stage)
ZD=a0+a1Dividends+a2...+a(n)+e (2 - 2nd stage)

1a: Here is the script for the 2SPLS model
cdsimeq (Dividends exog3 exog2 exog1 exog4 i.year i.coutntry i.industry) ( ZD exog1 exog2 exog5 exog6 exog7 i.year i.coutntry i.industry)
margins command doesn't work after it.
I included the i.year to somehow control for the panel data.

1b: The manual 2 stage least squares (I've taken it from http://www.stata.com/support/faqs/st...es-regression/ with some correction for my aim)

xtset id year
xtregress Dividends indepvar1 L.indepvar2 L.indepvar3 (so on) i.coutntry i.industry, vce (cl id)
predict double Dividends_hat
xtlogit ZD Dividends_hat indepvar1 L.indepvar2 L.indepvar3 (so on) i.coutntry i.industry, vce (cl id)
The correction of Std errors
rename Dividends_hat Dividends_hold
rename Dividends Dividends_hat
predict double res, residual
rename Dividends_hat Dividends /* put back real y2 */
rename Dividends_hold Dividends_hat
replace res = res^2
sum res
scalar realmse = r(mean)*r(N)/e(df_r) /* much ado about small sample */
matrix bmatrix = e(b)
matrix Vmatrix = e(V)
matrix Vmatrix = e(V) * realmse / e(rmse)^2
ereturn post bmatrix Vmatrix, noclear
ereturn display

Here is the question: Which the way is more correct and does help me to analyse the panel data?

Thank you a lot in advance for your thoughts and comments.
Best regards, Denis

Last edited by Denis Viktorovich; 26 Apr 2017, 03:04. Reason: Add details
Tags: 2SLS, 2spls, xtlogit, xtregress
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#2

26 Apr 2017, 12:02

Dear Denis,

If I understand correctly, your second method is based on a "forbidden regression" and therefore invalid. Your first method may suffer from an incidental parameters problem because you have a lot of fixed effects, but others in this forum may be able to provide more advice.

Best wishes,

Joao
Comment
Denis Viktorovich

Join Date: Apr 2017

Posts: 7
#3

26 Apr 2017, 15:09

Originally posted by Joao Santos Silva View Post

Dear Denis,

If I understand correctly, your second method is based on a "forbidden regression" and therefore invalid. Your first method may suffer from an incidental parameters problem because you have a lot of fixed effects, but others in this forum may be able to provide more advice.

Best wishes,

Joao

Dear Joao,

I know about "forbidden regression", and I've thought that this relates only when on the first stage your depvar is dichotomous => you predict residuals => use these predicted residuals for the second stage with continuous depvar.
In my case, the (1b) method a bit differs:
=> 1st stage: Dividends (continuous variavle ) and all other explanatory variables
=> predict residuals
=> 2nd stage: ZD (dichotomous variable) with its own explanatory variables and previously predicted residuals.

If this is also "forbidden regression", then I'll agree with you that the method is not appropriate.
But I've found in stata forum and other web-resources that the "forbidden regression" is only the case when you use predicted residuals from the equation with dichotomous (dummy) variable for the equation with continuous depvar.
I'll edit my post regarding 1b to make things clearer.

Thank you a lot for your comment.
I appreciate your help!

Best regards, Denys

Last edited by Denis Viktorovich; 26 Apr 2017, 15:37.
Comment
Denis Viktorovich

Join Date: Apr 2017

Posts: 7
#4

26 Apr 2017, 15:23

Here is a small corrections:

1b: The manual 2 stage least squares (I've taken it from http://www.stata.com/support/faqs/st...es-regression/ with some corrections for panel data)

xtset id year
1st stage (Dividends is continuous variable):
xtregress Dividends indepvar1 L.indepvar2 L.indepvar3 (so on) i.coutntry i.industry, vce (cl id)
predict double Dividends_hat
2nd stage (ZD is dichotomous variable):
xtlogit ZD Dividends_hat indepvar1 L.indepvar2 L.indepvar3 (so on) i.coutntry i.industry, vce (cl id)
The correction of Std errors
rename Dividends_hat Dividends_hold
rename Dividends Dividends_hat
predict double res, residual
rename Dividends_hat Dividends /* put back real y2 */
rename Dividends_hold Dividends_hat
replace res = res^2
sum res
scalar realmse = r(mean)*r(N)/e(df_r) /* much ado about small sample */
matrix bmatrix = e(b)
matrix Vmatrix = e(V)
matrix Vmatrix = e(V) * realmse / e(rmse)^2
ereturn post bmatrix Vmatrix, noclear
ereturn display

Thank you a lot in advance for your thoughts and comments.
Best regards, Denis
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#5

26 Apr 2017, 16:07

Denis,

See here.

Joao
Comment

Announcement

2 Stage Probit Least Squares (2SPLS) vs. Two-stage least-squares regression for panel data

Comment

Comment

Comment

Comment