Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Singletons dropped for ppmlhdfe (Poisson regression) but not reghdfe (linear regression)

    I am running a fixed effects regression with two time periods and 3,492 units. When I run a linear regression using reghdfe, I have no issues and no observations are dropped. When I run a Poisson regression with ppmlhdfe, Stata drops "6632 observations that are either singletons or separated by a fixed effect." My question is: why would Stata drop singletons for a Poisson fixed effects regression but not a linear fixed effects regression?

    Also, it seems that most of the dropped observations in the Poisson regression are the units that experience no change in the dependent variable from period 1 to period 2. Why would Stata drop these observations? It seems to me that they still provide useful information.

    Here is the code and output for reghdfe, which does not drop observations:

    Code:
    reghdfe event treat_any, absorb(cell_id) cluster(cell_id)
    
    (MWFE estimator converged in 1 iterations)
    
    HDFE Linear regression                            Number of obs   =      6,984
    Absorbing 1 HDFE group                            F(   1,   3491) =      12.42
    Statistics robust to heteroskedasticity           Prob > F        =     0.0004
                                                      R-squared       =     0.6370
                                                      Adj R-squared   =     0.2738
                                                      Within R-sq.    =     0.0070
    Number of clusters (cell_id) =      3,492         Root MSE        =     0.3213
    
                                (Std. Err. adjusted for 3,492 clusters in cell_id)
    ------------------------------------------------------------------------------
                 |               Robust
           event |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       treat_any |  -.0765679   .0217247    -3.52   0.000    -.1191623   -.0339735
           _cons |   .0614419   .0033764    18.20   0.000      .054822    .0680619
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
         cell_id |      3492        3492           0    *|
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation
    Here is the code and output for ppmlhdfe, which does drop observations:

    Code:
    ppmlhdfe event treat_any, absorb(cell_id) cluster(cell_id)
    
    (dropped 6632 observations that are either singletons or separated by a fixed effect)
    Iteration 1:   deviance = 3.3482e+02  eps = .         iters = 1    tol = 1.0e-04  min(eta) =  -1.40  P   
    Iteration 2:   deviance = 3.1968e+02  eps = 4.74e-02  iters = 1    tol = 1.0e-04  min(eta) =  -1.54      
    Iteration 3:   deviance = 3.1938e+02  eps = 9.43e-04  iters = 1    tol = 1.0e-04  min(eta) =  -1.56      
    Iteration 4:   deviance = 3.1938e+02  eps = 9.97e-07  iters = 1    tol = 1.0e-04  min(eta) =  -1.56      
    Iteration 5:   deviance = 3.1938e+02  eps = 3.57e-12  iters = 1    tol = 1.0e-05  min(eta) =  -1.56   S O
    ------------------------------------------------------------------------------------------------------------
    (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
    Converged in 5 iterations and 5 HDFE sub-iterations (tol = 1.0e-08)
    
    HDFE PPML regression                              No. of obs      =        352
    Absorbing 1 HDFE group                            Residual df     =        175
    Statistics robust to heteroskedasticity           Wald chi2(1)    =      15.97
    Deviance             =  319.3822431               Prob > chi2     =     0.0001
    Log pseudolikelihood = -392.1367457               Pseudo R2       =     0.2276
    
    Number of clusters (cell_id)=        176
                                  (Std. Err. adjusted for 176 clusters in cell_id)
    ------------------------------------------------------------------------------
                 |               Robust
           event |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       treat_any |   -.894175   .2237562    -4.00   0.000    -1.332729    -.455621
           _cons |   .4578038   .0352768    12.98   0.000     .3886626    .5269451
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
         cell_id |       176         176           0    *|
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation

  • #2
    Dear Jimmy Graham,

    My guess is that the observations being dropped are mostly zeros that are perfectly predicted by fixed effects. The authors of the command have written about this issue and I suggest you check the relevant documents.

    Best wishes,

    Joao






    Comment


    • #3
      Thanks for your reply. I did some more digging and the issue is related to separation. It was not actually singletons that were dropped. As with logit and probit models, poisson fixed effect models can't handle groups with no variation on the DV. This may be the case with maximum likelihood models in general. See here for reference: https://github.com/sergiocorreia/ppm...tion_primer.md.

      Comment


      • #4
        Dear Jimmy Graham,

        I am well aware of the issue; see here and here.

        The problem is not exactly as you describe it. In short, some observations contain no information on the parameters of interest and can, and should, be dropped.

        Best wishes,

        Joao

        Comment

        Working...
        X