Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ppmlhdfe: e(N) = e(N_full) < e(sample)

    Dear list members,

    a quick question on ppmlhdfe (ssc describe ppmlhdfe).

    Inspecting my regression output, I realized that a small number of observations (≈1%) in my analytic sample (with no missing data) is flagged by e(sample) as contributing to regressions being run, but does not figure in the sample size as saved into e(N) by the command. Note immediately that e(N)=e(N_full), so the gap should not be about singletons and 'separated' observations. The very same amount of observations turns out to have missing values for the variable _ppmlhdfe_d created by the command option d - which stores "the sum of fixed effects". Note also that some of these observations do have different (binary) outcome values.

    I readily confess my partial understanding of the underlying IRLS maximization procedure, which I see entails some approximation in "absorbing" the FEs. I might eventually grasp it, but I don't have much time at the moment and would appreciate a helping hand in making sense of these missing values, which I suppose bear some relationship with the fact that e(N) is less than e(sample). Perhaps the authors of the command can enlighten me? Paulo Guimaraes Tom Zylkin
    I'm using StataNow/MP 18.5

  • #2
    This turned out to be completely unrelated to the command, and stemming from my ignorance that e(N) takes into account pweights - which I am using (something which could not be inferred from the question, mea culpa). The discrepancy between e(N) and e(sample) exactly matches the number of observations with a sampling weight of zero.
    I'm using StataNow/MP 18.5

    Comment

    Working...
    X