Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • negative binomial regression with many group fixed effects

    Hello everyone,

    I have a large panel dataset where individuals are observed for multiple days, and I am trying to estimate a negative binomial regression controlling for region x month fixed effects, clustering the standard errors at the individual level. However, the group level dummies are too many and the regression takes too long to run. Do you have any suggestions on how I could speed up the computation? Or alternative commands to nbreg that would be more useful in this scenario?

    Thanks,
    Vincenzo
    Last edited by Vincenzo Alfano Viola; 02 Jul 2022, 12:01.

  • #2
    You might want a negative binomial generalised linear mixed model. I believe that's the
    Code:
    meglm
    command with
    Code:
    family(binomial)
    .

    You might try a two-way fixed effects as baseline, and then expand to complex models to compare to this baseline.

    Comment


    • #3
      Vincenzo:
      you might be interested in http://www.jstor.org/stable/3186160
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        I'm basing my response on this article: https://www.nature.com/articles/s41598-020-73883-7

        So I'm relatively sure that negative binomial generalised linear mixed model is something you might want to consider, not too sure of the Stata command.

        Comment


        • #5
          On my post I said
          Either way, I don't understand the issue. Are there actually "too many" groups, or does the regression just take a while to run? If there were legitimately too many groups/fe, the model wouldn't even estimate (I don't think), so I presume it simply takes a long time.

          But no, to answer the question, I'm not aware of any simple fixes to this issue. Seems like it'll be a waiting game
          Is it really true that the glm nbreg quickens computation?

          Comment


          • #6
            Forget about the fixed effects negative binomial estimator (which isn't even a true fixed effects estimator) based on #3 of https://www.statalist.org/forums/for...-poisson-model and switch to FE Poisson. You can install ppmlhdfe from SSC. I promise you that the estimation will be a matter of seconds.

            Comment


            • #7
              Thank you all for your help!

              First of all, sorry for the imprecise question. The dataset has more than 5 million observations, and depending on the region definition, the FE dummies can go from around 2600 up to 25 thousand. However even for the least granular definition it still takes a very long time to run.

              After reading Carlo's suggested paper and the thread linked by Andrew I have decided to try to use the ppmlhdfe, that indeed is quite fast as Andrew pointed out. However, I have now run into a new problem: what I am trying to do is check for heterogeneous treatment effect based on 5 dummies. Thus what I care about is plotting the estimated coefficients of the dummy variables and of the interactions between the dummy variables and the treatment (called treatd1 - treatd5). The command I have used is the following:
              Code:
               ppmlhdfe y d1 - d5  treatd1 - treatd5 controlvar treatxcontrolvar, absorb(reg_month_FE) cluster(id)
              However, by doing so it drops because of perfect collinearity the variable d5, and keeps the constant. Is there a way to drop the constant and not have any of the dummies or the interactions dropped?

              Comment


              • #8
                Originally posted by Carlo Lazzaro View Post
                Vincenzo:
                you might be interested in http://www.jstor.org/stable/3186160
                I've read the paper (very informative) you suggested, and at the very end, concerning negative binomial regression with dummy variables included as fixed effects, it states "bias in standard error estimates can be virtually eliminated by using a correction factor based on the deviance".

                Are you aware of a procedure in Stata that implements this standard error correction for negative binomial regression with fixed-effects?

                Comment


                • #9
                  See also

                  GuimarĂ£es, Paulo, 2008. "The fixed effects negative binomial model revisited," Economics Letters, Elsevier, vol. 99(1), pages 63-66.


                  Comment


                  • #10
                    Originally posted by Vincenzo Alfano Viola View Post
                    Thus what I care about is plotting the estimated coefficients of the dummy variables and of the interactions between the dummy variables and the treatment (called treatd1 - treatd5). The command I have used is the following:
                    Code:
                     ppmlhdfe y d1 - d5 treatd1 - treatd5 controlvar treatxcontrolvar, absorb(reg_month_FE) cluster(id)
                    I'd recommend that you do not create the dummies and interactions by hand. Use factor variable notation and then run margins.

                    Code:
                    use "http://fmwww.bc.edu/RePEc/bocode/e/EXAMPLE_TRADE_FTA_DATA" if category=="TOTAL", clear
                    egen imp = group(isoimp)
                    egen exp = group(isoexp)
                    set seed 07042022
                    gen catvar= runiformint(1,5)
                    ppmlhdfe trade fta i.catvar, a(imp#year exp#year imp#exp) cluster(imp#exp) d
                    margins i.catvar, predict(xb) post
                    *THE COEFFICIENT ON 2.CATVAR IS:
                    di _b[2.catvar]- _b[1.catvar]
                    *THE CORRESPONDING P VALUE IS
                    test _b[2.catvar]= _b[1.catvar]
                    Res.:

                    Code:
                    HDFE PPML regression                              No. of obs      =      5,950
                    Absorbing 3 HDFE groups                           Residual df     =      1,189
                    Statistics robust to heteroskedasticity           Wald chi2(5)    =      24.21
                    Deviance             =  377081432.2               Prob > chi2     =     0.0002
                    Log pseudolikelihood = -188585396.6               Pseudo R2       =     0.9938
                    
                    Number of clusters (imp#exp)=      1,190
                                                (Std. Err. adjusted for 1,190 clusters in imp#exp)
                    ------------------------------------------------------------------------------
                                 |               Robust
                           trade |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                             fta |   .1943317   .0420242     4.62   0.000     .1119658    .2766975
                                 |
                          catvar |
                              2  |  -.0115804   .0128758    -0.90   0.368    -.0368165    .0136557
                              3  |  -.0102952   .0122621    -0.84   0.401    -.0343284     .013738
                              4  |  -.0073287   .0136207    -0.54   0.591    -.0340248    .0193674
                              5  |  -.0141208   .0124976    -1.13   0.259    -.0386156    .0103741
                                 |
                           _cons |   16.46473   .0232622   707.79   0.000     16.41913    16.51032
                    ------------------------------------------------------------------------------
                    
                    Absorbed degrees of freedom:
                    -----------------------------------------------------+
                     Absorbed FE | Categories  - Redundant  = Num. Coefs |
                    -------------+---------------------------------------|
                        imp#year |       175           0         175     |
                        exp#year |       175           5         170     |
                         imp#exp |      1190        1190           0    *|
                    -----------------------------------------------------+
                    * = FE nested within cluster; treated as redundant for DoF computation
                    
                    .
                    . margins i.catvar, predict(xb) post
                    
                    Predictive margins                              Number of obs     =      5,950
                    Model VCE    : Robust
                    
                    Expression   : Linear prediction, predict(xb)
                    
                    ------------------------------------------------------------------------------
                                 |            Delta-method
                                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                          catvar |
                              1  |   16.52652   .0114395  1444.68   0.000      16.5041    16.54894
                              2  |   16.51494   .0129782  1272.52   0.000      16.4895    16.54038
                              3  |   16.51623   .0122695  1346.12   0.000     16.49218    16.54027
                              4  |   16.51919   .0127013  1300.59   0.000      16.4943    16.54409
                              5  |    16.5124   .0111809  1476.84   0.000     16.49049    16.53432
                    ------------------------------------------------------------------------------
                    
                    .
                    . *THE COEFFICIENT ON 2.CATVAR IS:
                    
                    .
                    . di _b[2.catvar]- _b[1.catvar]
                    -.01158039
                    
                    .
                    . *THE CORRESPONDING P VALUE IS
                    
                    .
                    . test _b[2.catvar]= _b[1.catvar]
                    
                     ( 1)  - 1bn.catvar + 2.catvar = 0
                    
                               chi2(  1) =    0.81
                             Prob > chi2 =    0.3684
                    With an interaction term, you include this as

                    Code:
                    i.treatment##i.catvar
                    and the corresponding margins command is

                    Code:
                    margins i.treatment##i.catvar, predict(xb) post

                    Comment

                    Working...
                    X