Dear Stata users and professionals!
I'm currently working on my Master's thesis and I've faced one interesting for me question.
Brief explanation about my sample and research:
I'm writing about the choice of zero-debt capital structure among firms from emerging countries. The dependent variable is dichotomous (takes 1 if firm doesn't have debt and 0 otherwise). Also, I have pretty big set of independent variables (around 10-15 for different models). The time frame from 2010 to 2015. The total number of observations is 24149 (27150 including lags for 2010th year).
For the first step of my research I use logit model:
xtset id year (id - firm)
xtlogit depvar indepvar1 L.indepvar2 L.indepvar3 (so on) i.coutntry i.industry, vce (cl id)
margins, dydx(indepvar1 L.indepvar2 L.indepvar3 (so on)), nose
For the second step I need to control for endogeneity in the model. So, due to several articles I've determined the interdependency between payout policy and capital structure choice. So, I found the 2 Stage Probit Least Squares model (http://www.stata-journal.com/article...article=st0038) cdsimeq command. But this model doesn't suit for panel data. I've chosen two ways to deal with this issue, and I'd like to hear your comments. The endogenous variable is Dividends (continuous) and the main depvariable id ZD(dichotomous). Here are equations (non recursive):
Dividends = b0+b1ZD+b2....+b(n)+e (1 - 1st stage)
ZD=a0+a1Dividends+a2...+a(n)+e (2 - 2nd stage)
1a: Here is the script for the 2SPLS model
cdsimeq (Dividends exog3 exog2 exog1 exog4 i.year i.coutntry i.industry) ( ZD exog1 exog2 exog5 exog6 exog7 i.year i.coutntry i.industry)
margins command doesn't work after it.
I included the i.year to somehow control for the panel data.
1b: The manual 2 stage least squares (I've taken it from http://www.stata.com/support/faqs/st...es-regression/ with some correction for my aim)
xtset id year
xtregress Dividends indepvar1 L.indepvar2 L.indepvar3 (so on) i.coutntry i.industry, vce (cl id)
predict double Dividends_hat
xtlogit ZD Dividends_hat indepvar1 L.indepvar2 L.indepvar3 (so on) i.coutntry i.industry, vce (cl id)
The correction of Std errors
rename Dividends_hat Dividends_hold
rename Dividends Dividends_hat
predict double res, residual
rename Dividends_hat Dividends /* put back real y2 */
rename Dividends_hold Dividends_hat
replace res = res^2
sum res
scalar realmse = r(mean)*r(N)/e(df_r) /* much ado about small sample */
matrix bmatrix = e(b)
matrix Vmatrix = e(V)
matrix Vmatrix = e(V) * realmse / e(rmse)^2
ereturn post bmatrix Vmatrix, noclear
ereturn display
Here is the question: Which the way is more correct and does help me to analyse the panel data?
Thank you a lot in advance for your thoughts and comments.
Best regards, Denis
I'm currently working on my Master's thesis and I've faced one interesting for me question.
Brief explanation about my sample and research:
I'm writing about the choice of zero-debt capital structure among firms from emerging countries. The dependent variable is dichotomous (takes 1 if firm doesn't have debt and 0 otherwise). Also, I have pretty big set of independent variables (around 10-15 for different models). The time frame from 2010 to 2015. The total number of observations is 24149 (27150 including lags for 2010th year).
For the first step of my research I use logit model:
xtset id year (id - firm)
xtlogit depvar indepvar1 L.indepvar2 L.indepvar3 (so on) i.coutntry i.industry, vce (cl id)
margins, dydx(indepvar1 L.indepvar2 L.indepvar3 (so on)), nose
For the second step I need to control for endogeneity in the model. So, due to several articles I've determined the interdependency between payout policy and capital structure choice. So, I found the 2 Stage Probit Least Squares model (http://www.stata-journal.com/article...article=st0038) cdsimeq command. But this model doesn't suit for panel data. I've chosen two ways to deal with this issue, and I'd like to hear your comments. The endogenous variable is Dividends (continuous) and the main depvariable id ZD(dichotomous). Here are equations (non recursive):
Dividends = b0+b1ZD+b2....+b(n)+e (1 - 1st stage)
ZD=a0+a1Dividends+a2...+a(n)+e (2 - 2nd stage)
1a: Here is the script for the 2SPLS model
cdsimeq (Dividends exog3 exog2 exog1 exog4 i.year i.coutntry i.industry) ( ZD exog1 exog2 exog5 exog6 exog7 i.year i.coutntry i.industry)
margins command doesn't work after it.
I included the i.year to somehow control for the panel data.
1b: The manual 2 stage least squares (I've taken it from http://www.stata.com/support/faqs/st...es-regression/ with some correction for my aim)
xtset id year
xtregress Dividends indepvar1 L.indepvar2 L.indepvar3 (so on) i.coutntry i.industry, vce (cl id)
predict double Dividends_hat
xtlogit ZD Dividends_hat indepvar1 L.indepvar2 L.indepvar3 (so on) i.coutntry i.industry, vce (cl id)
The correction of Std errors
rename Dividends_hat Dividends_hold
rename Dividends Dividends_hat
predict double res, residual
rename Dividends_hat Dividends /* put back real y2 */
rename Dividends_hold Dividends_hat
replace res = res^2
sum res
scalar realmse = r(mean)*r(N)/e(df_r) /* much ado about small sample */
matrix bmatrix = e(b)
matrix Vmatrix = e(V)
matrix Vmatrix = e(V) * realmse / e(rmse)^2
ereturn post bmatrix Vmatrix, noclear
ereturn display
Here is the question: Which the way is more correct and does help me to analyse the panel data?
Thank you a lot in advance for your thoughts and comments.
Best regards, Denis
Comment