Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using e(sample) does not exclude singletons

    Dear Stata-Users,

    I am working with panel-data and I have performed different regressions, say regression 1,2 and 3. Since regression 1 includes fixed effects, singletons are deleted here. I would like to use the same sample for the other two regressions. However, when I use e(sample) it does exclude observations in the other 2 regressions that were excluded as a result of some missing values but it does not exclude the singletons. I know this is not a new topic per se, but none of the codes suggested on this forum seem to work.

    I could provide a data example but because this would only be a small sample, I am not sure if I can bring the point across since it might not have singletons.

    Thank you in advance.

  • #2
    I can't follow exactly what you're doing here.

    Run regression 1 first and save an indicator for what was used.

    Code:
    gen mysample = e(sample)
    and then use if mysample in later regressions.

    Comment


    • #3
      Thank you for your answer, Nick.

      Here is what I did (in pseuo-code)

      Code:
      *Regression 1
      sort id wave
      xtset id wave
      xtivreg2 age (instrumented variable = "some instruments") "some covariates...", cluster(id) fe
      gen sample2=e(sample)    
      
      
      *Regression 2
      regress age "some covariates, same as above" if sample2 == 1, cluster(id)
      The problem is: regression 1 uses 33776 observations while regression 2 uses 40,298 observations, which I don't understand since I tried to specifically use the same observations in the first as in the second regression.

      Please let me know if you need a data example. The thing is, as I said before, that I am not sure if that would show the problem.
      Last edited by Floor vd Sanden; 23 May 2022, 01:55.

      Comment


      • #4
        You need to provide sample data on which the problem can be reporoduced.

        I myself was curious what Stata does when you have cross sections for which you have only one time series observation (this is how I understood your term "singletons"). For such cross sections the fixed effects should drop them because after the within transformation all the variables are ideantically 0 in the cross section.

        From the example below it seems to me as if Stata still reports them as if they are in the fixed effects regression, although they cannot be there after the within transformation.

        In the grunfeld data for the first company I eliminated all years except for the first year. This company will drop out of the within regression. And yet it is reported as in sample.

        It seems to me that Stata sets e(sample) on the basis of observations on which you attempt your calculation, and not on the basis of observations that can be used in your calculation.

        Code:
        . webuse grunfeld, clear
        
        . drop in 2/20
        (19 observations deleted)
        
        . xtset company
        
        Panel variable: company (unbalanced)
        
        . xtreg invest mvalue kstock i.time, fe
        
        Fixed-effects (within) regression               Number of obs     =        181
        Group variable: company                         Number of groups  =         10
        
        R-squared:                                      Obs per group:
             Within  = 0.4804                                         min =          1
             Between = 0.6620                                         avg =       18.1
             Overall = 0.5161                                         max =         20
        
                                                        F(21,150)         =       6.60
        corr(u_i, Xb) = 0.3430                          Prob > F          =     0.0000
        
        ------------------------------------------------------------------------------
              invest | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
              mvalue |   .0632423   .0176809     3.58   0.000     .0283065    .0981781
              kstock |   .0992392   .0383902     2.59   0.011     .0233838    .1750947
                     |
                time |
                  2  |    8.03892   18.54319     0.43   0.665    -28.60068    44.67851
                  3  |   13.06524   19.85738     0.66   0.512    -26.17106    52.30154
                  4  |   -6.05266   18.53825    -0.33   0.745    -42.68249    30.57717
                  5  |  -18.40918   19.01444    -0.97   0.335    -55.97992    19.16156
                  6  |   1.704416   19.14986     0.09   0.929    -36.13389    39.54272
                  7  |   26.58242    19.0676     1.39   0.165    -11.09336    64.25819
                  8  |   17.67125   18.96896     0.93   0.353    -19.80962    55.15212
                  9  |   2.660505   19.25748     0.14   0.890    -35.39045    40.71146
                 10  |   .3084851   19.31558     0.02   0.987    -37.85728    38.47425
                 11  |  -2.370672   19.64233    -0.12   0.904    -41.18206    36.44071
                 12  |   20.71734   19.99239     1.04   0.302    -18.78574    60.22042
                 13  |   23.71787   19.82973     1.20   0.234    -15.46381    62.89954
                 14  |   33.16299   20.24243     1.64   0.103     -6.83414    73.16011
                 15  |    12.8115   20.64038     0.62   0.536    -27.97194    53.59493
                 16  |   11.04173   21.09191     0.52   0.601    -30.63389    52.71735
                 17  |   42.49747   21.97615     1.93   0.055    -.9253147    85.92026
                 18  |   47.96371   23.02584     2.08   0.039     2.466829    93.46059
                 19  |   47.97907   24.75416     1.94   0.054    -.9327974    96.89094
                 20  |   20.07698   25.85124     0.78   0.439    -31.00263    71.15659
                     |
               _cons |     11.238   15.37786     0.73   0.466     -19.1472    41.62321
        -------------+----------------------------------------------------------------
             sigma_u |   92.00528
             sigma_e |   38.29996
                 rho |  .85230489   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(9, 150) = 77.47                     Prob > F = 0.0000
        
        . gen fesample = e(sample)
        
        . xtreg invest mvalue kstock i.time if fesample, re
        
        Random-effects GLS regression                   Number of obs     =        181
        Group variable: company                         Number of groups  =         10
        
        R-squared:                                      Obs per group:
             Within  = 0.4786                                         min =          1
             Between = 0.6747                                         avg =       18.1
             Overall = 0.5371                                         max =         20
        
                                                        Wald chi2(21)     =     150.16
        corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
        
        ------------------------------------------------------------------------------
              invest | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
              mvalue |   .0763347   .0150811     5.06   0.000     .0467763     .105893
              kstock |   .0997447   .0375636     2.66   0.008     .0261214     .173368
                     |
                time |
                  2  |   3.737645   18.19877     0.21   0.837    -31.93128    39.40657
                  3  |   5.841745   19.07753     0.31   0.759    -31.54953    43.23302
                  4  |   -9.72013   18.25427    -0.53   0.594    -45.49784    26.05758
                  5  |  -23.26453   18.58377    -1.25   0.211    -59.68805      13.159
                  6  |   -3.45358   18.67728    -0.18   0.853    -40.06038    33.15322
                  7  |   21.95232   18.66081     1.18   0.239     -14.6222    58.52683
                  8  |      14.14   18.68128     0.76   0.449    -22.47464    50.75464
                  9  |  -1.363734   18.91043    -0.07   0.943     -38.4275    35.70003
                 10  |  -3.719048   18.96621    -0.20   0.845    -40.89214    33.45404
                 11  |  -7.361339   19.17243    -0.38   0.701    -44.93861    30.21593
                 12  |   14.88459   19.40224     0.77   0.443    -23.14309    52.91227
                 13  |   19.88895   19.48411     1.02   0.307    -18.29919     58.0771
                 14  |   29.36567   19.88719     1.48   0.140    -9.612506    68.34384
                 15  |   9.348591   20.30538     0.46   0.645    -30.44923    49.14641
                 16  |   6.786962   20.66427     0.33   0.743    -33.71427     47.2882
                 17  |    36.4429   21.30517     1.71   0.087    -5.314459    78.20026
                 18  |   41.34865   22.25438     1.86   0.063    -2.269133    84.96643
                 19  |   40.00478   23.74889     1.68   0.092    -6.542185    86.55175
                 20  |   11.71962   24.77135     0.47   0.636    -36.83134    60.27058
                     |
               _cons |   12.52635   32.79416     0.38   0.702    -51.74902    76.80171
        -------------+----------------------------------------------------------------
             sigma_u |  90.073697
             sigma_e |   38.29996
                 rho |   .8468828   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . dis _N
        181
        
        .

        Comment


        • #5
          Dear Joro,

          I think indeed that e(sample) does not leave out the observations on id's who have been observed only once. Thank you for your answer, I think I will need to find another way than using e(sample).

          Comment

          Working...
          X