negative binomial regression with many group fixed effects

Vincenzo Alfano Viola

Join Date: Aug 2020

Posts: 14
#1

negative binomial regression with many group fixed effects

02 Jul 2022, 11:59

Hello everyone,

I have a large panel dataset where individuals are observed for multiple days, and I am trying to estimate a negative binomial regression controlling for region x month fixed effects, clustering the standard errors at the individual level. However, the group level dummies are too many and the regression takes too long to run. Do you have any suggestions on how I could speed up the computation? Or alternative commands to nbreg that would be more useful in this scenario?

Thanks,
Vincenzo

Last edited by Vincenzo Alfano Viola; 02 Jul 2022, 12:01.
Tags: None
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#2

02 Jul 2022, 12:16

You might want a negative binomial generalised linear mixed model. I believe that's the

Code:

meglm

command with

Code:

family(binomial)

.

You might try a two-way fixed effects as baseline, and then expand to complex models to compare to this baseline.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#3

02 Jul 2022, 12:19

Vincenzo:
you might be interested in http://www.jstor.org/stable/3186160

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#4

02 Jul 2022, 12:22

I'm basing my response on this article: https://www.nature.com/articles/s41598-020-73883-7

So I'm relatively sure that negative binomial generalised linear mixed model is something you might want to consider, not too sure of the Stata command.
1 like
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#5

02 Jul 2022, 12:24

On my post I said

Either way, I don't understand the issue. Are there actually "too many" groups, or does the regression just take a while to run? If there were legitimately too many groups/fe, the model wouldn't even estimate (I don't think), so I presume it simply takes a long time.

But no, to answer the question, I'm not aware of any simple fixes to this issue. Seems like it'll be a waiting game

Is it really true that the glm nbreg quickens computation?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10194
#6

02 Jul 2022, 16:54

Forget about the fixed effects negative binomial estimator (which isn't even a true fixed effects estimator) based on #3 of https://www.statalist.org/forums/for...-poisson-model and switch to FE Poisson. You can install ppmlhdfe from SSC. I promise you that the estimation will be a matter of seconds.
2 likes
Comment
Vincenzo Alfano Viola

Join Date: Aug 2020

Posts: 14
#7

03 Jul 2022, 04:34

Thank you all for your help!

First of all, sorry for the imprecise question. The dataset has more than 5 million observations, and depending on the region definition, the FE dummies can go from around 2600 up to 25 thousand. However even for the least granular definition it still takes a very long time to run.

After reading Carlo's suggested paper and the thread linked by Andrew I have decided to try to use the ppmlhdfe, that indeed is quite fast as Andrew pointed out. However, I have now run into a new problem: what I am trying to do is check for heterogeneous treatment effect based on 5 dummies. Thus what I care about is plotting the estimated coefficients of the dummy variables and of the interactions between the dummy variables and the treatment (called treatd1 - treatd5). The command I have used is the following:

Code:

ppmlhdfe y d1 - d5 treatd1 - treatd5 controlvar treatxcontrolvar, absorb(reg_month_FE) cluster(id)

However, by doing so it drops because of perfect collinearity the variable d5, and keeps the constant. Is there a way to drop the constant and not have any of the dummies or the interactions dropped?
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#8

03 Jul 2022, 10:37

Originally posted by Carlo Lazzaro View Post

Vincenzo:
you might be interested in http://www.jstor.org/stable/3186160

I've read the paper (very informative) you suggested, and at the very end, concerning negative binomial regression with dummy variables included as fixed effects, it states "bias in standard error estimates can be virtually eliminated by using a correction factor based on the deviance".

Are you aware of a procedure in Stata that implements this standard error correction for negative binomial regression with fixed-effects?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#9

03 Jul 2022, 10:47

See also

Guimarães, Paulo, 2008. "The fixed effects negative binomial model revisited," Economics Letters, Elsevier, vol. 99(1), pages 63-66.
1 like
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10194

#10

03 Jul 2022, 18:35

Originally posted by Vincenzo Alfano Viola View Post

Thus what I care about is plotting the estimated coefficients of the dummy variables and of the interactions between the dummy variables and the treatment (called treatd1 - treatd5). The command I have used is the following:

Code:

 ppmlhdfe y d1 - d5 treatd1 - treatd5 controlvar treatxcontrolvar, absorb(reg_month_FE) cluster(id)

I'd recommend that you do not create the dummies and interactions by hand. Use factor variable notation and then run margins.

Code:

use "http://fmwww.bc.edu/RePEc/bocode/e/EXAMPLE_TRADE_FTA_DATA" if category=="TOTAL", clear
egen imp = group(isoimp)
egen exp = group(isoexp)
set seed 07042022
gen catvar= runiformint(1,5)
ppmlhdfe trade fta i.catvar, a(imp#year exp#year imp#exp) cluster(imp#exp) d
margins i.catvar, predict(xb) post
*THE COEFFICIENT ON 2.CATVAR IS:
di _b[2.catvar]- _b[1.catvar]
*THE CORRESPONDING P VALUE IS
test _b[2.catvar]= _b[1.catvar]

Res.:

Code:

HDFE PPML regression                              No. of obs      =      5,950
Absorbing 3 HDFE groups                           Residual df     =      1,189
Statistics robust to heteroskedasticity           Wald chi2(5)    =      24.21
Deviance             =  377081432.2               Prob > chi2     =     0.0002
Log pseudolikelihood = -188585396.6               Pseudo R2       =     0.9938

Number of clusters (imp#exp)=      1,190
                            (Std. Err. adjusted for 1,190 clusters in imp#exp)
------------------------------------------------------------------------------
             |               Robust
       trade |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         fta |   .1943317   .0420242     4.62   0.000     .1119658    .2766975
             |
      catvar |
          2  |  -.0115804   .0128758    -0.90   0.368    -.0368165    .0136557
          3  |  -.0102952   .0122621    -0.84   0.401    -.0343284     .013738
          4  |  -.0073287   .0136207    -0.54   0.591    -.0340248    .0193674
          5  |  -.0141208   .0124976    -1.13   0.259    -.0386156    .0103741
             |
       _cons |   16.46473   .0232622   707.79   0.000     16.41913    16.51032
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
    imp#year |       175           0         175     |
    exp#year |       175           5         170     |
     imp#exp |      1190        1190           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

.
. margins i.catvar, predict(xb) post

Predictive margins                              Number of obs     =      5,950
Model VCE    : Robust

Expression   : Linear prediction, predict(xb)

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      catvar |
          1  |   16.52652   .0114395  1444.68   0.000      16.5041    16.54894
          2  |   16.51494   .0129782  1272.52   0.000      16.4895    16.54038
          3  |   16.51623   .0122695  1346.12   0.000     16.49218    16.54027
          4  |   16.51919   .0127013  1300.59   0.000      16.4943    16.54409
          5  |    16.5124   .0111809  1476.84   0.000     16.49049    16.53432
------------------------------------------------------------------------------

.
. *THE COEFFICIENT ON 2.CATVAR IS:

.
. di _b[2.catvar]- _b[1.catvar]
-.01158039

.
. *THE CORRESPONDING P VALUE IS

.
. test _b[2.catvar]= _b[1.catvar]

 ( 1)  - 1bn.catvar + 2.catvar = 0

           chi2(  1) =    0.81
         Prob > chi2 =    0.3684

With an interaction term, you include this as

Code:

i.treatment##i.catvar

and the corresponding margins command is

Code:

margins i.treatment##i.catvar, predict(xb) post

Announcement

negative binomial regression with many group fixed effects

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment