Extremely large AIC & BIC figures (xtpoisson, ppmlhdfe, xtnbreg)

Salva Montenegro

Join Date: Mar 2023
Posts: 4

Extremely large AIC & BIC figures (xtpoisson, ppmlhdfe, xtnbreg)

10 Mar 2023, 18:53

Dear Statalisters, I would appreciate your kind opinion on the below issue.

I have around 1 million observations from a few thousand manufacturers. I am dealing with an outcome variable which is production quantity. It has a large portion of zeroes (20%).

Code:

. sum dv1

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         dv1 |  1,128,675    17211.52    223438.7          0   1.35e+07

When I run a ppmlhdfe and xtpoisson to predict this variable, I am given AIC figures in billions (!) and a high Pseudo R2. I have never come across AIC and BIC figures this high. When I rescale the outcome variable by dividing it by 1,000, AIC/BIC figures seem to become more “usual”. However, the outcome variable contains figures smaller than 1,000, so I am concerned that this transformation might affect its count nature.

Code:

. ppmlhdfe dv1 l1.dv1   l1.z_cv2 l1.z_cv3 l1.z_cv4 l1.z_cv5 ,a(eproducer year) cluster(eproducer) exp(lagcv1) d
(dropped 51273 observations that are either singletons or separated by a fixed effect)
Iteration 1:   deviance = 1.2802e+10  eps = .         iters = 6    tol = 1.0e-04  min(eta) =  -9.11  P   
Iteration 2:   deviance = 5.7570e+09  eps = 1.22e+00  iters = 6    tol = 1.0e-04  min(eta) =  -9.70      
Iteration 3:   deviance = 4.3877e+09  eps = 3.12e-01  iters = 5    tol = 1.0e-04  min(eta) = -10.07      
Iteration 4:   deviance = 4.0823e+09  eps = 7.48e-02  iters = 5    tol = 1.0e-04  min(eta) = -10.59      
Iteration 5:   deviance = 4.0171e+09  eps = 1.63e-02  iters = 4    tol = 1.0e-04  min(eta) = -11.49      
Iteration 6:   deviance = 4.0040e+09  eps = 3.25e-03  iters = 3    tol = 1.0e-04  min(eta) = -12.30      
Iteration 7:   deviance = 4.0015e+09  eps = 6.25e-04  iters = 2    tol = 1.0e-04  min(eta) = -12.96      
Iteration 8:   deviance = 4.0011e+09  eps = 1.15e-04  iters = 2    tol = 1.0e-04  min(eta) = -13.96      
Iteration 9:   deviance = 4.0010e+09  eps = 2.10e-05  iters = 2    tol = 1.0e-04  min(eta) = -14.96      
Iteration 10:  deviance = 4.0010e+09  eps = 4.28e-06  iters = 2    tol = 1.0e-05  min(eta) = -15.94      
Iteration 11:  deviance = 4.0010e+09  eps = 9.03e-07  iters = 2    tol = 1.0e-06  min(eta) = -16.91   S  
Iteration 12:  deviance = 4.0010e+09  eps = 1.88e-07  iters = 2    tol = 1.0e-07  min(eta) = -17.82   S  
Iteration 13:  deviance = 4.0010e+09  eps = 3.81e-08  iters = 2    tol = 1.0e-07  min(eta) = -18.61   S  
Iteration 14:  deviance = 4.0010e+09  eps = 6.80e-09  iters = 2    tol = 1.0e-08  min(eta) = -19.14   S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
Converged in 14 iterations and 45 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =    942,909
Absorbing 2 HDFE groups                           Residual df     =      4,129
Statistics robust to heteroskedasticity           Wald chi2(5)    =      84.28
Deviance             =   4000971161               Prob > chi2     =     0.0000
Log pseudolikelihood =  -2003530151               Pseudo R2       =     0.9694

Number of clusters (eproducer)=     4,130
                          (Std. err. adjusted for 4,130 clusters in eproducer)
------------------------------------------------------------------------------
             |               Robust
         dv1 | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         dv1 |
         L1. |   1.70e-07   2.86e-08     5.95   0.000     1.14e-07    2.26e-07
             |
       z_cv2 |
         L1. |   .1590041   .2122519     0.75   0.454     -.257002    .5750101
             |
       z_cv3 |
         L1. |  -.0043667   .0082502    -0.53   0.597    -.0205368    .0118033
             |
       z_cv4 |
         L1. |  -.2516427   .1229716    -2.05   0.041    -.4926626   -.0106228
             |
       z_cv5 |
         L1. |   .0085431   .0072125     1.18   0.236    -.0055932    .0226794
             |
       _cons |   5.650281   .1897047    29.78   0.000     5.278467    6.022095
  ln(lagcv1) |          1  (exposure)
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
   eproducer |      4130        4130           0    *|
        year |        46           0          46     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

. estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |          N   ll(null)  ll(model)      df        AIC        BIC
-------------+---------------------------------------------------------------
           . |    942,909  -6.55e+10  -2.00e+09       6   4.01e+09   4.01e+09
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.

On the other hand, when I run xtnbreg, I receive better (still in millions) AIC and BIC figures. However, xtnbreg has been sensitive to model specification and often doesn't converge when I add or remove predictors.

Code:


. xtnbreg dv1 l1.dv1   l1.z_cv2 l1.z_cv3 l1.z_cv4 l1.z_cv5 i.year ,fe exp(lagcv1)
note: 135 groups (135 obs) dropped because of only one obs per group
note: 629 groups (49921 obs) dropped because of all zero outcomes

Iteration 0:   log likelihood = -1.225e+09  (not concave)
Iteration 1:   log likelihood = -4.990e+08  
Iteration 2:   log likelihood = -8027179.1  
Iteration 3:   log likelihood = -7995697.9  (not concave)
Iteration 4:   log likelihood = -7517186.9  
Iteration 5:   log likelihood = -7057044.2  (backed up)
Iteration 6:   log likelihood = -6798324.5  
Iteration 7:   log likelihood =   -6734128  
Iteration 8:   log likelihood = -6504148.3  
Iteration 9:   log likelihood = -6355948.2  
Iteration 10:  log likelihood = -6330436.9  
Iteration 11:  log likelihood =   -6328037  
Iteration 12:  log likelihood = -6327554.1  
Iteration 13:  log likelihood = -6327459.7  
Iteration 14:  log likelihood = -6327439.4  
Iteration 15:  log likelihood = -6327434.6  
Iteration 16:  log likelihood = -6327433.5  
Iteration 17:  log likelihood = -6327433.2  
Iteration 18:  log likelihood = -6327433.2  
Iteration 19:  log likelihood = -6327433.2  
Iteration 20:  log likelihood = -6327433.2  
Iteration 21:  log likelihood = -6327433.2  
Iteration 22:  log likelihood = -6327433.2  
Iteration 23:  log likelihood = -6327433.2  

Conditional FE negative binomial regression      Number of obs    =    944,126
Group variable: eproducer                        Number of groups =      4,130

                                                 Obs per group:
                                                              min =          2
                                                              avg =      228.6
                                                              max =        553

                                                 Wald chi2(51)    = 1033911.20
Log likelihood = -6327433.2                      Prob > chi2      =     0.0000

------------------------------------------------------------------------------
         dv1 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         dv1 |
         L1. |   1.11e-07   5.15e-10   214.94   0.000     1.10e-07    1.12e-07
             |
       z_cv2 |
         L1. |  -.6988149   .0015773  -443.05   0.000    -.7019063   -.6957235
             |
       z_cv3 |
         L1. |  -.0279359   .0004649   -60.09   0.000    -.0288471   -.0270246
             |
       z_cv4 |
         L1. |   .4380268   .0156258    28.03   0.000     .4074009    .4686528
             |
       z_cv5 |
         L1. |    .016793   .0038206     4.40   0.000     .0093048    .0242811
             |
        year |

             |
       _cons |  -1.826474   .0291059   -62.75   0.000     -1.88352   -1.769427
  ln(lagcv1) |          1  (exposure)
------------------------------------------------------------------------------

. estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |          N   ll(null)  ll(model)      df        AIC        BIC
-------------+---------------------------------------------------------------
           . |    944,126          .   -6327433      52   1.27e+07   1.27e+07
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.

Do you think I should be concerned about using fe poisson to predict this variable due to the large AIC figures? Does overdispersion make xtnbreg a better choice? In any case, ΔAIC is in millions across models (always 6 zeros), so I am also not sure on how to present the results.
Thank you!

Tags: None

Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#2

11 Mar 2023, 00:21

Dear Salva Montenegro,

As far as I understand, you are estimating models by pseudo maximum likelihood and in this context both the AIC/BIC criteria and the pseudo-R2 are not very informative because they assume the correct specification of the distribution. So, I would simply ignore those. Additionally, I would not use xtnbreg with FE because that is not a proper FE model and its validity depends on very strong assumptions; this contrasts with Poisson with FE, which is very robust. Also, changing the scale of your dependent variable should only affect the intercept. Finally, keep in mind that to get reliable predictions with models that include fixed effects you need to have enough observations to estimate each fixed effect reliably.

Best wishes,

Joao
Comment
Salva Montenegro

Join Date: Mar 2023

Posts: 4
#3

11 Mar 2023, 07:32

Dear Joao Santos Silva, thank you very much for your response. How do you think I could alternatively assess model fit when using xtpoisson? AIC/BIC seems to have been the convention in published work that I am familiar with. By the way, I see consistent improvement in AIC/BIC figures when comparing nested xtpoisson models. But I was concerned about the scale of the AIC/BIC figures, but you seem to disagree that rescaling the DV (which significantly reduces AIC/BIC figures) wouldn't be an issue.

Regarding your final note on the number of observations per fixed effect: the number of observations from each panel ranges from 2 to 500 (as seen above). Do you see an issue with this? Should I try to eliminate some of the panels with few observations?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#4

11 Mar 2023, 08:16

Dear Salva Montenegro

I would simply compare models using significance tests. As for the predictions, it is OK not to drop observations as long as you keep in mind that for the units with few observations the predictions are unreliable.

Best wishes,

Joao
Comment
Salva Montenegro

Join Date: Mar 2023

Posts: 4
#5

11 Mar 2023, 10:17

Dear Joao Santos Silva, thank you for your responses. I appreciate your support.
1 like
Comment

Announcement

Extremely large AIC & BIC figures (xtpoisson, ppmlhdfe, xtnbreg)

Comment

Comment

Comment

Comment