I know one method can be:
But the problem is in my data set, I have a lot of id. And it kills my Stata.
Code:
regress y x1 x2 i.id, vce(robust)
. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. areg ln_wage age, absorb(idcode) vce(cluster idcode)
Linear regression, absorbing indicators Number of obs = 28,510
Absorbed variable: idcode No. of categories = 4,710
F( 1, 4709) = 738.02
Prob > F = 0.0000
R-squared = 0.6636
Adj R-squared = 0.5970
Root MSE = 0.3035
(Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0181349 .0006675 27.17 0.000 .0168262 .0194436
_cons | 1.148214 .0193889 59.22 0.000 1.110202 1.186225
------------------------------------------------------------------------------
. xtset idcode year
panel variable: idcode (unbalanced)
time variable: year, 68 to 88, but with gaps
delta: 1 unit
. xtreg ln_wage age, fe vce(cluster idcode)
Fixed-effects (within) regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710
R-sq: Obs per group:
within = 0.1026 min = 1
between = 0.0877 avg = 6.1
overall = 0.0774 max = 15
F(1,4709) = 884.05
corr(u_i, Xb) = 0.0314 Prob > F = 0.0000
(Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0181349 .0006099 29.73 0.000 .0169392 .0193306
_cons | 1.148214 .0177153 64.81 0.000 1.113483 1.182944
-------------+----------------------------------------------------------------
sigma_u | .40635023
sigma_e | .30349389
rho | .64192015 (fraction of variance due to u_i)
------------------------------------------------------------------------------
. reghdfe ln_wage age, absorb(idcode) vce(cluster idcode)
(dropped 551 singleton observations)
(converged in 1 iterations)
HDFE Linear regression Number of obs = 27,959
Absorbing 1 HDFE group F( 1, 4158) = 884.06
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.6540
Adj R-squared = 0.5936
Within R-sq. = 0.1026
Number of clusters (idcode) = 4,159 Root MSE = 0.3035
(Std. Err. adjusted for 4,159 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0181349 .0006099 29.73 0.000 .0169391 .0193307
------------------------------------------------------------------------------
Absorbed degrees of freedom:
---------------------------------------------------------------+
Absorbed FE | Num. Coefs. = Categories - Redundant |
-------------+-------------------------------------------------|
idcode | 0 4159 4159 * |
---------------------------------------------------------------+
* = fixed effect nested within cluster; treated as redundant for DoF computation
. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
.
. regress ln_wage age, absorb(idcode) vce(cluster idcode)
Linear regression, absorbing indicators Number of obs = 28,510
F(0, 4709) = .
Prob > F = .
R-squared = 0.6636
Adj R-squared = 0.5970
Root MSE = .30349
(Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0181349 .0006675 27.17 0.000 .0168262 .0194436
_cons | 1.148214 .0193889 59.22 0.000 1.110202 1.186225
------------------------------------------------------------------------------
Comment