Dear Statalist,
I am using stata v.14... I have unbalanced panel data with T = 17 and N = 18. I run ols, fe, and re and reached that clustered fe is the best amongst the three model using robust Hausman test. I know somehow that there is multicollinearity, but I read that most of the papers did not test for its existence. Also, I read here that it is oversold. I tested existence of time fixed effect using testsparm but found it insig..
From literature, there is mostly reversal causality from 2 of my control variables (z1 and z2 )and my interest independent variable which I use 3 proxies for it in separate runs (M1, M2, and M3). most of the papars use two-step GMM technique to account for this problem and their results do not change so much, if not better.. the following is the results of the clustered fe . i used clustered due to the serial correlation and hetero problems.
xtreg lny lnz1 x1 x2 x3 x4 x5 m1 m1sq z2 x6 x7 lm1_z2 , fe cluster(country)
Fixed-effects (within) regression Number of obs = 190
Group variable: country Number of groups = 18
R-sq: Obs per group:
within = 0.7369 min = 1
between = 0.0005 avg = 10.6
overall = 0.1111 max = 17
F(12,17) = 99.23
corr(u_i, Xb) = -0.6006 Prob > F = 0.0000
(Std. Err. adjusted for 18 clusters in country)
------------------------------------------------------------------------------
| Robust
lny | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnz1 | .030846 .0132167 2.33 0.032 .0029613 .0587307
x1| .0006796 .0004414 1.54 0.142 -.0002518 .0016109
x2| -.0026081 .0009207 -2.83 0.011 -.0045505 -.0006657
x3| -.0156356 .0083621 -1.87 0.079 -.033278 .0020069
x4| -.000387 .0001654 -2.34 0.032 -.000736 -.0000379
x5| -.0000584 .0002823 -0.21 0.839 -.0006539 .0005371
m1| .0722394 .0158132 4.57 0.000 .0388764 .1056023
m1sq| -.0151598 .0036502 -4.15 0.001 -.0228609 -.0074586
z2| -.0636088 .0188611 -3.37 0.004 -.1034023 -.0238153
x6| .0482388 .0224125 2.15 0.046 .0009526 .0955249
x7| .0003355 .0002021 1.66 0.115 -.0000909 .0007619
m1_z2| .0191029 .0070425 2.71 0.015 .0042446 .0339612
_cons | 2.845722 .3811176 7.47 0.000 2.041634 3.64981
-------------+----------------------------------------------------------------
sigma_u | .18251934
sigma_e | .00809359
rho | .99803749 (fraction of variance due to u_i)
------------------------------------------------------------------------------
.
end of do-file
.
However, when I run GMM, the results change dramatically eroding the significance of most, if not all, the variables, although the model is valid according to AR(2), sargan , and Hansan tests, as follows:
xtabond2 lny l.lny lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7, ///
gmm (l.lny , lag(1 3) collapse) iv ( lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7 ) twostep cluster(country) nodiffsargan
Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: country Number of obs = 177
Time variable : time Number of groups = 17
Number of instruments = 17 Obs per group: min = 1
Wald chi2(13) = 1.40e+10 avg = 10.41
Prob > chi2 = 0.000 max = 16
(Std. Err. adjusted for clustering on country)
------------------------------------------------------------------------------
| Corrected
lny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lny|
L1. | 1.041343 .0277953 37.46 0.000 .9868654 1.095821
|
lnz1 | .0005234 .0033601 0.16 0.876 -.0060623 .007109
x1| .0000164 .0000536 0.31 0.760 -.0000887 .0001214
x2| .0000259 .0001718 0.15 0.880 -.0003108 .0003625
x3| .0002729 .0005714 0.48 0.633 -.0008469 .0013927
x4| -.0000124 .0000428 -0.29 0.772 -.0000962 .0000714
x5| -.0001993 .0002734 -0.73 0.466 -.0007351 .0003365
m1| .0052374 .0120197 0.44 0.663 -.0183208 .0287955
m1sq | -.0018499 .0028094 -0.66 0.510 -.0073563 .0036564
m1_z2 | -.0002276 .0034992 -0.07 0.948 -.0070859 .0066308
z2| -.0019458 .0062422 -0.31 0.755 -.0141803 .0102886
x6| -.002183 .0010165 -2.15 0.032 -.0041753 -.0001908
x7| .0000406 .0000986 0.41 0.680 -.0001525 .0002338
_cons | -.1274088 .1413448 -0.90 0.367 -.4044396 .149622
------------------------------------------------------------------------------
Instruments for first differences equation
Standard
D.(lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7)
GMM-type (missing=0, separate instruments for each period unless collapsed)
L(1/3).L.lny collapsed
Instruments for levels equation
Standard
lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7
_cons
GMM-type (missing=0, separate instruments for each period unless collapsed)
D.L.lngini collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -3.09 Pr > z = 0.002
Arellano-Bond test for AR(2) in first differences: z = -0.89 Pr > z = 0.375
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(3) = 0.53 Prob > chi2 = 0.913
(Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(3) = 0.49 Prob > chi2 = 0.921
(Robust, but weakened by many instruments.)
.
end of do-file
So what is the wrong in my command pls. I tried to put the endogenous variables in the gmm style part, however it does not provide much changes in the significance problem, unfortunately. I also tried to change the number of lags, however it is relatively useless as well. I also have other questions :
2- is it necessary to have L.lny to be less than 1 as I read. does it really mean that the model is unstable if it is not ?
3- I read it is better to add "collapse" to reduce the number of instruments. I found that when I don't the number of instruments become really high with sargan and Hansan tests to be 1, indicating instrument proliferation problem. so is it correct to adding it .
much appreciated for any advice !!
I am using stata v.14... I have unbalanced panel data with T = 17 and N = 18. I run ols, fe, and re and reached that clustered fe is the best amongst the three model using robust Hausman test. I know somehow that there is multicollinearity, but I read that most of the papers did not test for its existence. Also, I read here that it is oversold. I tested existence of time fixed effect using testsparm but found it insig..
From literature, there is mostly reversal causality from 2 of my control variables (z1 and z2 )and my interest independent variable which I use 3 proxies for it in separate runs (M1, M2, and M3). most of the papars use two-step GMM technique to account for this problem and their results do not change so much, if not better.. the following is the results of the clustered fe . i used clustered due to the serial correlation and hetero problems.
xtreg lny lnz1 x1 x2 x3 x4 x5 m1 m1sq z2 x6 x7 lm1_z2 , fe cluster(country)
Fixed-effects (within) regression Number of obs = 190
Group variable: country Number of groups = 18
R-sq: Obs per group:
within = 0.7369 min = 1
between = 0.0005 avg = 10.6
overall = 0.1111 max = 17
F(12,17) = 99.23
corr(u_i, Xb) = -0.6006 Prob > F = 0.0000
(Std. Err. adjusted for 18 clusters in country)
------------------------------------------------------------------------------
| Robust
lny | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnz1 | .030846 .0132167 2.33 0.032 .0029613 .0587307
x1| .0006796 .0004414 1.54 0.142 -.0002518 .0016109
x2| -.0026081 .0009207 -2.83 0.011 -.0045505 -.0006657
x3| -.0156356 .0083621 -1.87 0.079 -.033278 .0020069
x4| -.000387 .0001654 -2.34 0.032 -.000736 -.0000379
x5| -.0000584 .0002823 -0.21 0.839 -.0006539 .0005371
m1| .0722394 .0158132 4.57 0.000 .0388764 .1056023
m1sq| -.0151598 .0036502 -4.15 0.001 -.0228609 -.0074586
z2| -.0636088 .0188611 -3.37 0.004 -.1034023 -.0238153
x6| .0482388 .0224125 2.15 0.046 .0009526 .0955249
x7| .0003355 .0002021 1.66 0.115 -.0000909 .0007619
m1_z2| .0191029 .0070425 2.71 0.015 .0042446 .0339612
_cons | 2.845722 .3811176 7.47 0.000 2.041634 3.64981
-------------+----------------------------------------------------------------
sigma_u | .18251934
sigma_e | .00809359
rho | .99803749 (fraction of variance due to u_i)
------------------------------------------------------------------------------
.
end of do-file
.
However, when I run GMM, the results change dramatically eroding the significance of most, if not all, the variables, although the model is valid according to AR(2), sargan , and Hansan tests, as follows:
xtabond2 lny l.lny lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7, ///
gmm (l.lny , lag(1 3) collapse) iv ( lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7 ) twostep cluster(country) nodiffsargan
Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: country Number of obs = 177
Time variable : time Number of groups = 17
Number of instruments = 17 Obs per group: min = 1
Wald chi2(13) = 1.40e+10 avg = 10.41
Prob > chi2 = 0.000 max = 16
(Std. Err. adjusted for clustering on country)
------------------------------------------------------------------------------
| Corrected
lny | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lny|
L1. | 1.041343 .0277953 37.46 0.000 .9868654 1.095821
|
lnz1 | .0005234 .0033601 0.16 0.876 -.0060623 .007109
x1| .0000164 .0000536 0.31 0.760 -.0000887 .0001214
x2| .0000259 .0001718 0.15 0.880 -.0003108 .0003625
x3| .0002729 .0005714 0.48 0.633 -.0008469 .0013927
x4| -.0000124 .0000428 -0.29 0.772 -.0000962 .0000714
x5| -.0001993 .0002734 -0.73 0.466 -.0007351 .0003365
m1| .0052374 .0120197 0.44 0.663 -.0183208 .0287955
m1sq | -.0018499 .0028094 -0.66 0.510 -.0073563 .0036564
m1_z2 | -.0002276 .0034992 -0.07 0.948 -.0070859 .0066308
z2| -.0019458 .0062422 -0.31 0.755 -.0141803 .0102886
x6| -.002183 .0010165 -2.15 0.032 -.0041753 -.0001908
x7| .0000406 .0000986 0.41 0.680 -.0001525 .0002338
_cons | -.1274088 .1413448 -0.90 0.367 -.4044396 .149622
------------------------------------------------------------------------------
Instruments for first differences equation
Standard
D.(lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7)
GMM-type (missing=0, separate instruments for each period unless collapsed)
L(1/3).L.lny collapsed
Instruments for levels equation
Standard
lnz1 x1 x2 x3 x4 x5 m1 m1sq m1_z2 z2 x6 x7
_cons
GMM-type (missing=0, separate instruments for each period unless collapsed)
D.L.lngini collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -3.09 Pr > z = 0.002
Arellano-Bond test for AR(2) in first differences: z = -0.89 Pr > z = 0.375
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(3) = 0.53 Prob > chi2 = 0.913
(Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(3) = 0.49 Prob > chi2 = 0.921
(Robust, but weakened by many instruments.)
.
end of do-file
So what is the wrong in my command pls. I tried to put the endogenous variables in the gmm style part, however it does not provide much changes in the significance problem, unfortunately. I also tried to change the number of lags, however it is relatively useless as well. I also have other questions :
2- is it necessary to have L.lny to be less than 1 as I read. does it really mean that the model is unstable if it is not ?
3- I read it is better to add "collapse" to reduce the number of instruments. I found that when I don't the number of instruments become really high with sargan and Hansan tests to be 1, indicating instrument proliferation problem. so is it correct to adding it .
much appreciated for any advice !!