Is there an nbreg alternative for ppmlhdf ? (presence of overdisperion)

Ludovic Van Cau

Join Date: Jul 2019
Posts: 23

Is there an nbreg alternative for ppmlhdf ? (presence of overdisperion)

11 Aug 2019, 07:34

Hi,
I'im running regression on count data with 4 fixed effects.
Because I include 4 fixed effects it is not possible to use a simple poisson of nbreg (i set matsize at max and set emptycells drop). I used the the reghfde package by Sergio Correia which includes ppmlhdfe, a poisson regression with multiple fixed effects. However, I think my count data is overdispersed and should use a nbreg but there is no such thing in the reghdfe package.

1) Is there a test to run after ppmlhdfe to check for overdispersion?
2) or is there any alternative to run a nbreg with multiple fixed effects?

I ran the following code with results:

Code:

.  ppmlhdfe teamsize internetdummy invt_network_size invt_pat_count invt_career_age mobile_invt ,
vce(robust) absorb(cbsacode appyear uspc invt_id) d
(dropped 99460 observations that are either singletons or separated by a fixed effect)
note: 1 variable omitted because of collinearity: invt_career_age
Iteration 1:   deviance = 1.454e+05                  itol = 1.0e-04  subiters = 30  min(eta) =  
> -1.28                                                                                        
>        [p  ]
Iteration 2:   deviance = 1.407e+05  eps = 3.39e-02  itol = 1.0e-04  subiters = 19  min(eta) =  
> -1.94                                                                                        
>        [   ]
Iteration 3:   deviance = 1.406e+05  eps = 1.91e-04  itol = 1.0e-04  subiters = 10  min(eta) =  
> -2.03                                                                                        
>        [   ]
Iteration 4:   deviance = 1.406e+05  eps = 1.43e-07  itol = 1.0e-04  subiters = 3   min(eta) =  
> -2.03                                                                                        
>        [   ]
Iteration 5:   deviance = 1.406e+05  eps = 1.72e-07  itol = 1.0e-08  subiters = 62  min(eta) =  
> -2.02                                                                                        
>        [ s ]
Iteration 6:   deviance = 1.406e+05  eps = 7.80e-11  itol = 1.0e-08  subiters = 95  min(eta) =  
> -2.02                                                                                        
>        [ps ]
Iteration 7:   deviance = 1.406e+05  eps = 2.10e-14  itol = 1.0e-10  subiters = 116 min(eta) =  
> -2.02                                                                                        
>        [pso]
Iteration 8:   deviance = 1.406e+05  eps = 2.39e-14  itol = 1.0e-10  subiters = 117 min(eta) =  
> -2.02                                                                                        
>        [pso]
------------------------------------------------------------------------------------------------
> ------------
(legend: p: exact partial-out   s: exact solver   o: epsilon below tolerance)
Converged in 8 iterations and 452 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =    362,605
Absorbing 4 HDFE groups                           Residual df     =    280,588
                                                  Wald chi2(4)    =    1354.34
Deviance             =   140641.425               Prob > chi2     =     0.0000
Log pseudolikelihood = -564388.4936               Pseudo R2       =     0.1827
-----------------------------------------------------------------------------------
                  |               Robust
         teamsize |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
    internetdummy |  -.0025327   .0053141    -0.48   0.634    -.0129481    .0078827
invt_network_size |   .0122162   .0003325    36.74   0.000     .0115645     .012868
   invt_pat_count |   -.001211   .0000895   -13.53   0.000    -.0013864   -.0010355
  invt_career_age |          0  (omitted)
      mobile_invt |  -.0057639   .0086128    -0.67   0.503    -.0226447     .011117
            _cons |   .9506205   .0060948   155.97   0.000     .9386748    .9625661
-----------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
    cbsacode |       481           0         481     |
     appyear |         6           1           5     |
        uspc |       412           1         411    ?|
     invt_id |     81188          72       81116    ?|
-----------------------------------------------------+
? = number of redundant parameters may be higher

I could use a simple poisson regression with only 2 fixed effects, but that would affect the robustness.

Any suggestions?

Thanks
Ludo

Tags: None

Tom Zylkin

Join Date: Nov 2016

Posts: 188
#2

12 Aug 2019, 07:34

Hi Ludo,
The short answer is no - there is no such Stata command that offers the ability to do HDFE negative binomial regressions (at least not yet anyway). There is a command available in R called "FENmlm" that does offer this capability that you may find success with. However, with the size of your data, it's not obvious to me that you should be all that concerned about overdispersion, since PPML should be a consistent estimator. Furthermore, in panel data settings with fixed effects, the negative binomial model does not always outperform PPML when there is overdispersion; it only does so when there is a particular type of overdispersion.

Also, your z-scores with robust standard errors are huge. It looks to me like your model is basically panel data with invt_id as the panel id (the other FE dimensions are small by comparison.) Typically we would expect robust standard errors to be downward-biased in this type of setting. The standard remedy would be, at a minimum, to use cluster-robust standard errors that are clustered by the panel id, but you can also cluster at higher levels if you want to be more conservative. For that matter, it's also worth pointing out that, in a PML setting, if you have appropriately chosen robust standard errors, the heteroscedasticity correction provided by your standard errors should mean that your inferences are generally robust to overdispersion.

Finally, we unfortunately do not have a test built into ppmlhdfe for overdispersion because typically that would entail also estimating a negative binomial model (there may be a better way I'm not aware of, but I am not an expert on negative binomial models because they are not often used in my field of research.)

I hope you find this helpful!

Regards,
Tom

Last edited by Tom Zylkin; 12 Aug 2019, 07:37.
Comment
Ludovic Van Cau

Join Date: Jul 2019

Posts: 23
#3

12 Aug 2019, 08:50

Thank you for the clarification Tom!
Comment

Announcement

Is there an nbreg alternative for ppmlhdf ? (presence of overdisperion)

Comment

Comment