Dear Statalisters, I would appreciate your kind opinion on the below issue.
I have around 1 million observations from a few thousand manufacturers. I am dealing with an outcome variable which is production quantity. It has a large portion of zeroes (20%).
When I run a ppmlhdfe and xtpoisson to predict this variable, I am given AIC figures in billions (!) and a high Pseudo R2. I have never come across AIC and BIC figures this high. When I rescale the outcome variable by dividing it by 1,000, AIC/BIC figures seem to become more “usual”. However, the outcome variable contains figures smaller than 1,000, so I am concerned that this transformation might affect its count nature.
On the other hand, when I run xtnbreg, I receive better (still in millions) AIC and BIC figures. However, xtnbreg has been sensitive to model specification and often doesn't converge when I add or remove predictors.
Do you think I should be concerned about using fe poisson to predict this variable due to the large AIC figures? Does overdispersion make xtnbreg a better choice? In any case, ΔAIC is in millions across models (always 6 zeros), so I am also not sure on how to present the results.
Thank you!
I have around 1 million observations from a few thousand manufacturers. I am dealing with an outcome variable which is production quantity. It has a large portion of zeroes (20%).
Code:
. sum dv1
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
dv1 | 1,128,675 17211.52 223438.7 0 1.35e+07
Code:
. ppmlhdfe dv1 l1.dv1 l1.z_cv2 l1.z_cv3 l1.z_cv4 l1.z_cv5 ,a(eproducer year) cluster(eproducer) exp(lagcv1) d
(dropped 51273 observations that are either singletons or separated by a fixed effect)
Iteration 1: deviance = 1.2802e+10 eps = . iters = 6 tol = 1.0e-04 min(eta) = -9.11 P
Iteration 2: deviance = 5.7570e+09 eps = 1.22e+00 iters = 6 tol = 1.0e-04 min(eta) = -9.70
Iteration 3: deviance = 4.3877e+09 eps = 3.12e-01 iters = 5 tol = 1.0e-04 min(eta) = -10.07
Iteration 4: deviance = 4.0823e+09 eps = 7.48e-02 iters = 5 tol = 1.0e-04 min(eta) = -10.59
Iteration 5: deviance = 4.0171e+09 eps = 1.63e-02 iters = 4 tol = 1.0e-04 min(eta) = -11.49
Iteration 6: deviance = 4.0040e+09 eps = 3.25e-03 iters = 3 tol = 1.0e-04 min(eta) = -12.30
Iteration 7: deviance = 4.0015e+09 eps = 6.25e-04 iters = 2 tol = 1.0e-04 min(eta) = -12.96
Iteration 8: deviance = 4.0011e+09 eps = 1.15e-04 iters = 2 tol = 1.0e-04 min(eta) = -13.96
Iteration 9: deviance = 4.0010e+09 eps = 2.10e-05 iters = 2 tol = 1.0e-04 min(eta) = -14.96
Iteration 10: deviance = 4.0010e+09 eps = 4.28e-06 iters = 2 tol = 1.0e-05 min(eta) = -15.94
Iteration 11: deviance = 4.0010e+09 eps = 9.03e-07 iters = 2 tol = 1.0e-06 min(eta) = -16.91 S
Iteration 12: deviance = 4.0010e+09 eps = 1.88e-07 iters = 2 tol = 1.0e-07 min(eta) = -17.82 S
Iteration 13: deviance = 4.0010e+09 eps = 3.81e-08 iters = 2 tol = 1.0e-07 min(eta) = -18.61 S
Iteration 14: deviance = 4.0010e+09 eps = 6.80e-09 iters = 2 tol = 1.0e-08 min(eta) = -19.14 S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
Converged in 14 iterations and 45 HDFE sub-iterations (tol = 1.0e-08)
HDFE PPML regression No. of obs = 942,909
Absorbing 2 HDFE groups Residual df = 4,129
Statistics robust to heteroskedasticity Wald chi2(5) = 84.28
Deviance = 4000971161 Prob > chi2 = 0.0000
Log pseudolikelihood = -2003530151 Pseudo R2 = 0.9694
Number of clusters (eproducer)= 4,130
(Std. err. adjusted for 4,130 clusters in eproducer)
------------------------------------------------------------------------------
| Robust
dv1 | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
dv1 |
L1. | 1.70e-07 2.86e-08 5.95 0.000 1.14e-07 2.26e-07
|
z_cv2 |
L1. | .1590041 .2122519 0.75 0.454 -.257002 .5750101
|
z_cv3 |
L1. | -.0043667 .0082502 -0.53 0.597 -.0205368 .0118033
|
z_cv4 |
L1. | -.2516427 .1229716 -2.05 0.041 -.4926626 -.0106228
|
z_cv5 |
L1. | .0085431 .0072125 1.18 0.236 -.0055932 .0226794
|
_cons | 5.650281 .1897047 29.78 0.000 5.278467 6.022095
ln(lagcv1) | 1 (exposure)
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
eproducer | 4130 4130 0 *|
year | 46 0 46 |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
. estat ic
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | N ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 942,909 -6.55e+10 -2.00e+09 6 4.01e+09 4.01e+09
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.
Code:
. xtnbreg dv1 l1.dv1 l1.z_cv2 l1.z_cv3 l1.z_cv4 l1.z_cv5 i.year ,fe exp(lagcv1)
note: 135 groups (135 obs) dropped because of only one obs per group
note: 629 groups (49921 obs) dropped because of all zero outcomes
Iteration 0: log likelihood = -1.225e+09 (not concave)
Iteration 1: log likelihood = -4.990e+08
Iteration 2: log likelihood = -8027179.1
Iteration 3: log likelihood = -7995697.9 (not concave)
Iteration 4: log likelihood = -7517186.9
Iteration 5: log likelihood = -7057044.2 (backed up)
Iteration 6: log likelihood = -6798324.5
Iteration 7: log likelihood = -6734128
Iteration 8: log likelihood = -6504148.3
Iteration 9: log likelihood = -6355948.2
Iteration 10: log likelihood = -6330436.9
Iteration 11: log likelihood = -6328037
Iteration 12: log likelihood = -6327554.1
Iteration 13: log likelihood = -6327459.7
Iteration 14: log likelihood = -6327439.4
Iteration 15: log likelihood = -6327434.6
Iteration 16: log likelihood = -6327433.5
Iteration 17: log likelihood = -6327433.2
Iteration 18: log likelihood = -6327433.2
Iteration 19: log likelihood = -6327433.2
Iteration 20: log likelihood = -6327433.2
Iteration 21: log likelihood = -6327433.2
Iteration 22: log likelihood = -6327433.2
Iteration 23: log likelihood = -6327433.2
Conditional FE negative binomial regression Number of obs = 944,126
Group variable: eproducer Number of groups = 4,130
Obs per group:
min = 2
avg = 228.6
max = 553
Wald chi2(51) = 1033911.20
Log likelihood = -6327433.2 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
dv1 | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
dv1 |
L1. | 1.11e-07 5.15e-10 214.94 0.000 1.10e-07 1.12e-07
|
z_cv2 |
L1. | -.6988149 .0015773 -443.05 0.000 -.7019063 -.6957235
|
z_cv3 |
L1. | -.0279359 .0004649 -60.09 0.000 -.0288471 -.0270246
|
z_cv4 |
L1. | .4380268 .0156258 28.03 0.000 .4074009 .4686528
|
z_cv5 |
L1. | .016793 .0038206 4.40 0.000 .0093048 .0242811
|
year |
|
_cons | -1.826474 .0291059 -62.75 0.000 -1.88352 -1.769427
ln(lagcv1) | 1 (exposure)
------------------------------------------------------------------------------
. estat ic
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | N ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 944,126 . -6327433 52 1.27e+07 1.27e+07
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.
Thank you!

Comment