Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why do -xtpoisson, fe- drop panels that have only one obs per group?

    Dear Statalist,

    I am puzzled as to why regressions drop panels that have only one identical obs per group. When I run the code in the bottom, only the regression results from -xtpoisson,fe- produces a message "note: 552 groups (552 obs) dropped because of only one obs per group," while in -xtreg,fe- those dropped panels are included. The error note means (in my own data sample) that the dependent variable of those dropped groups is identical over the sample period.

    Both regressions utilize fixed effects, though they are of different characteristics (linear vs non-linear). Is there any way I can understand why one regression includes all but the other omits panels that have the same dependent variable for the entire period?

    ----------------------------------------------------------------------------------------------------------------
    webuse nlswork
    xtset idcode
    xi: xtreg ln_wage ttl_exp tenure i.year , fe
    xi: xtpoisson ln_wage ttl_exp tenure i.year , fe

    ----------------------------------------------------------------------------------------------------------------

    Regards,

    Paul

  • #2
    All of the fixed effects estimators are estimating within-panel regressions. If there is only one observation on a panel, then there is no within-panel variation, so a singleton panel is uninformative. In most of the estimators, the formulas involved "blow up" with an n = 1 panel, so Stata removes them first. In -xtreg, fe-, the de-meaning approach used can still be carried out. But, as was discussed recently on this forum in a thread initiated by Alfonso Sanchez-Penalver (the link to which I cannot find just now) the singleton panel(s), though not removed from the estimation sample, do not actually contribute to the substantive results. Try this:

    Code:
    set more off
    webuse nlswork, clear
    
    xtset
    
    //    MODIFY DATA SO idcode 12 HAS ONLY ONE OBSERVATION
    drop if idcode == 12 & year == 88
    count if idcode == 12
    assert r(N) == 1
    
    xtreg ln_wage c.age i.union, fe
    
    //    SHOW WE GET SUBSTANTIVELY THE SAME RESULTS
    //    OMITTING THE SINGLETON PANEL
    drop if idcode == 12
    xtreg ln_wage c.age i.union, fe
    and you will see that the results are identical for everything except the constant term, sigma_u, sigma_e, and rho. But the constant term is just an artifact of the way Stata's estimation algorithm identifies the inherently unidentified fixed effects, as are sigma_u and sigma_e (and rho, which is calculated from those). So in terms of substance, the inclusion or exclusion of those singleton panels affects nothing. Stata doesn't literally exclude them from analysis, but they don't impact anything "real" in the analysis either.

    Added: I found that link. It wasn't a few days ago, it was back on 9/12. My, how time flies! http://www.statalist.org/forums/foru...-fixed-effects.
    Last edited by Clyde Schechter; 23 Sep 2016, 21:03.

    Comment


    • #3
      Dear Clyde,

      Let me apologize for my misleading title, which is taken from Stata's note. What I meant by "one obs" per panel was identical values (of dep var) for the entire period per panel and not a singleton value per panel. As you mentioned, fixed effect estimations require that there be at least two observations, but the problem I described was that only xtpoisson and other Poisson family drop those panels with identical values of dependent variable throughout the sample period. Since linear estimations, like xtreg, include those panels, it causes my curiosity.

      Regards,

      Paul

      Comment


      • #4
        Dear Paul,

        Thank you for clarifying your question. The reason for the difference is that the way to eliminate the fixed effects is different in the two cases.

        As Clyde Schechter noted, in the linear model the fixed effects are eliminated by demeaning. In that case, panels where the dependent variable is constant over time can still be informative.

        For the Poisson case, the fixed effects are eliminated by conditioning on the sum over time of the dependent variable and in that case panels where the dependent variable is constant are not informative.

        All the best,

        Joao

        Comment


        • #5
          Dear Joao and Clyde,

          Thank you for your comments!

          Comment


          • #6
            Hi everyone,

            Can I bring up this topic again? I'm currently running an xtpoisson model on an unbalanced panel, and like in this question, stata drops all groups with only 1 observation, and all groups where the dep var does not change in the panel.

            I understand why this happens, but I'd like to get a bit more detail on the validity of inferences made when these drops occur. Is xtpoisson still a valid choice when running a fixed effects model on count data, even in the presence of this observation-dropping behavior? How does the validity of the inference compare to a standard xtreg model on a continuous outcome, or more specifically to running xtreg on the same count data?

            Comment


            • #7
              The fixed-effects estimators, whether -xtreg-, -xtlogit-, -xtpoisson-, etc. are purely estimating within-panel effects. If a panel exhibits no outcome variation, then it provides no information about how other variables affect within-panel outcomes. So it is not possible for any such estimator to provide information about singleton panels. Singleton panels are simply not in universe for inferences about within-panel effects.

              When you run -xtreg, fe-, Stata does not drop the singleton panels, nor the panels with constant outcome. But they actually contribute nothing to the main estimation. See this example:

              Code:
              . clear*
              
              . webuse grunfeld
              
              . 
              . //      CREATE A COMPANY WITH SINGLETON OBSERVATION
              . sort company year
              
              . drop if company == company[_N] & year < year[_N]
              (19 observations deleted)
              
              . 
              . tab company
              
                  company |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                        1 |         20       11.05       11.05
                        2 |         20       11.05       22.10
                        3 |         20       11.05       33.15
                        4 |         20       11.05       44.20
                        5 |         20       11.05       55.25
                        6 |         20       11.05       66.30
                        7 |         20       11.05       77.35
                        8 |         20       11.05       88.40
                        9 |         20       11.05       99.45
                       10 |          1        0.55      100.00
              ------------+-----------------------------------
                    Total |        181      100.00
              
              . 
              . xtset company year
                     panel variable:  company (unbalanced)
                      time variable:  year, 1935 to 1954
                              delta:  1 year
              
              . 
              . //      DO A FIXED-EFFECTS REGRESSION
              . xtreg mvalue kstock invest, fe
              
              Fixed-effects (within) regression               Number of obs     =        181
              Group variable: company                         Number of groups  =         10
              
              R-sq:                                           Obs per group:
                   within  = 0.4117                                         min =          1
                   between = 0.8073                                         avg =       18.1
                   overall = 0.7340                                         max =         20
              
                                                              F(2,169)          =      59.14
              corr(u_i, Xb)  = 0.6856                         Prob > F          =     0.0000
              
              ------------------------------------------------------------------------------
                    mvalue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                    kstock |  -.5078778   .1480372    -3.43   0.001     -.800118   -.2156376
                    invest |    2.85625   .3243214     8.81   0.000     2.216007    3.496493
                     _cons |   882.5545   37.12442    23.77   0.000     809.2672    955.8418
              -------------+----------------------------------------------------------------
                   sigma_u |  907.11622
                   sigma_e |  283.41664
                       rho |  .91106473   (fraction of variance due to u_i)
              ------------------------------------------------------------------------------
              F test that all u_i=0: F(9, 169) = 98.32                     Prob > F = 0.0000
              
              . //      NOW DO IT EXPLICITLY OMITTING THE SINGLETON
              . xtreg mvalue kstock invest if company != 10, fe
              
              Fixed-effects (within) regression               Number of obs     =        180
              Group variable: company                         Number of groups  =          9
              
              R-sq:                                           Obs per group:
                   within  = 0.4117                                         min =         20
                   between = 0.8038                                         avg =       20.0
                   overall = 0.7337                                         max =         20
              
                                                              F(2,169)          =      59.14
              corr(u_i, Xb)  = 0.6850                         Prob > F          =     0.0000
              
              ------------------------------------------------------------------------------
                    mvalue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                    kstock |  -.5078778   .1480372    -3.43   0.001     -.800118   -.2156376
                    invest |    2.85625   .3243214     8.81   0.000     2.216007    3.496493
                     _cons |   887.1755   37.29272    23.79   0.000     813.5559    960.7951
              -------------+----------------------------------------------------------------
                   sigma_u |   920.3347
                   sigma_e |  283.41664
                       rho |  .91338138   (fraction of variance due to u_i)
              ------------------------------------------------------------------------------
              F test that all u_i=0: F(8, 169) = 110.29                    Prob > F = 0.0000
              Notice that the main results are identical either way. The only things that change when the singleton is dropped are the estimates of the _cons term, sigma_u, and rho (the last being calculated from the second.) Those things change because the singleton cluster is informative about the fixed effect for that company, and hence also for sigma_u. So those observations are retained so that these minor contributions to estimation of these ancillary parameters are taken into account.

              The reason that -xtlogit- and -xtpoisson- are explicit about removing those panels from the estimation sample is that those models condition out the fixed effects, so there is no constant term, nor any estimate of sigma_u: those parameters aren't even part of the model. So, in this situation the singleton or no-outcome-variation panels literally contribute nothing at all and Stata doesn't waste its time doing any calculations with them.

              So I think the question you need to ask yourself is whether your research questions are appropriately answered by doing a purely within-panel analysis. If it is, then you need have no concerns about the omitted panels. If a purely within-panel analysis is not appropriate to your research questions, then you simply shouldn't use a fixed-effects estimator in the first place.

              Comment


              • #8
                Thanks clyde that is very helpful.

                Comment


                • #9
                  Can you refer me to a text that I could cite to explain this in a paper?

                  Comment


                  • #10
                    Philip:
                    among many others, you may want to consider:
                    http://www.stata.com/bookstore/micro...ons/index.html
                    http://www.stata.com/bookstore/micro...ata/index.html
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment

                    Working...
                    X