Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is there an nbreg alternative for ppmlhdf ? (presence of overdisperion)

    Hi,
    I'im running regression on count data with 4 fixed effects.
    Because I include 4 fixed effects it is not possible to use a simple poisson of nbreg (i set matsize at max and set emptycells drop). I used the the reghfde package by Sergio Correia which includes ppmlhdfe, a poisson regression with multiple fixed effects. However, I think my count data is overdispersed and should use a nbreg but there is no such thing in the reghdfe package.

    1) Is there a test to run after ppmlhdfe to check for overdispersion?
    2) or is there any alternative to run a nbreg with multiple fixed effects?

    I ran the following code with results:
    Code:
    .  ppmlhdfe teamsize internetdummy invt_network_size invt_pat_count invt_career_age mobile_invt ,
    vce(robust) absorb(cbsacode appyear uspc invt_id) d
    (dropped 99460 observations that are either singletons or separated by a fixed effect)
    note: 1 variable omitted because of collinearity: invt_career_age
    Iteration 1:   deviance = 1.454e+05                  itol = 1.0e-04  subiters = 30  min(eta) =  
    > -1.28                                                                                        
    >        [p  ]
    Iteration 2:   deviance = 1.407e+05  eps = 3.39e-02  itol = 1.0e-04  subiters = 19  min(eta) =  
    > -1.94                                                                                        
    >        [   ]
    Iteration 3:   deviance = 1.406e+05  eps = 1.91e-04  itol = 1.0e-04  subiters = 10  min(eta) =  
    > -2.03                                                                                        
    >        [   ]
    Iteration 4:   deviance = 1.406e+05  eps = 1.43e-07  itol = 1.0e-04  subiters = 3   min(eta) =  
    > -2.03                                                                                        
    >        [   ]
    Iteration 5:   deviance = 1.406e+05  eps = 1.72e-07  itol = 1.0e-08  subiters = 62  min(eta) =  
    > -2.02                                                                                        
    >        [ s ]
    Iteration 6:   deviance = 1.406e+05  eps = 7.80e-11  itol = 1.0e-08  subiters = 95  min(eta) =  
    > -2.02                                                                                        
    >        [ps ]
    Iteration 7:   deviance = 1.406e+05  eps = 2.10e-14  itol = 1.0e-10  subiters = 116 min(eta) =  
    > -2.02                                                                                        
    >        [pso]
    Iteration 8:   deviance = 1.406e+05  eps = 2.39e-14  itol = 1.0e-10  subiters = 117 min(eta) =  
    > -2.02                                                                                        
    >        [pso]
    ------------------------------------------------------------------------------------------------
    > ------------
    (legend: p: exact partial-out   s: exact solver   o: epsilon below tolerance)
    Converged in 8 iterations and 452 HDFE sub-iterations (tol = 1.0e-08)
    
    HDFE PPML regression                              No. of obs      =    362,605
    Absorbing 4 HDFE groups                           Residual df     =    280,588
                                                      Wald chi2(4)    =    1354.34
    Deviance             =   140641.425               Prob > chi2     =     0.0000
    Log pseudolikelihood = -564388.4936               Pseudo R2       =     0.1827
    -----------------------------------------------------------------------------------
                      |               Robust
             teamsize |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------+----------------------------------------------------------------
        internetdummy |  -.0025327   .0053141    -0.48   0.634    -.0129481    .0078827
    invt_network_size |   .0122162   .0003325    36.74   0.000     .0115645     .012868
       invt_pat_count |   -.001211   .0000895   -13.53   0.000    -.0013864   -.0010355
      invt_career_age |          0  (omitted)
          mobile_invt |  -.0057639   .0086128    -0.67   0.503    -.0226447     .011117
                _cons |   .9506205   .0060948   155.97   0.000     .9386748    .9625661
    -----------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
        cbsacode |       481           0         481     |
         appyear |         6           1           5     |
            uspc |       412           1         411    ?|
         invt_id |     81188          72       81116    ?|
    -----------------------------------------------------+
    ? = number of redundant parameters may be higher
    I could use a simple poisson regression with only 2 fixed effects, but that would affect the robustness.

    Any suggestions?

    Thanks
    Ludo

  • #2
    Hi Ludo,
    The short answer is no - there is no such Stata command that offers the ability to do HDFE negative binomial regressions (at least not yet anyway). There is a command available in R called "FENmlm" that does offer this capability that you may find success with. However, with the size of your data, it's not obvious to me that you should be all that concerned about overdispersion, since PPML should be a consistent estimator. Furthermore, in panel data settings with fixed effects, the negative binomial model does not always outperform PPML when there is overdispersion; it only does so when there is a particular type of overdispersion.

    Also, your z-scores with robust standard errors are huge. It looks to me like your model is basically panel data with invt_id as the panel id (the other FE dimensions are small by comparison.) Typically we would expect robust standard errors to be downward-biased in this type of setting. The standard remedy would be, at a minimum, to use cluster-robust standard errors that are clustered by the panel id, but you can also cluster at higher levels if you want to be more conservative. For that matter, it's also worth pointing out that, in a PML setting, if you have appropriately chosen robust standard errors, the heteroscedasticity correction provided by your standard errors should mean that your inferences are generally robust to overdispersion.

    Finally, we unfortunately do not have a test built into ppmlhdfe for overdispersion because typically that would entail also estimating a negative binomial model (there may be a better way I'm not aware of, but I am not an expert on negative binomial models because they are not often used in my field of research.)

    I hope you find this helpful!

    Regards,
    Tom
    Last edited by Tom Zylkin; 12 Aug 2019, 07:37.

    Comment


    • #3
      Thank you for the clarification Tom!

      Comment

      Working...
      X