List of observations in regression

OLMABA JALA

Join Date: Jan 2021

Posts: 68
#1

List of observations in regression

22 May 2021, 05:25

Hello,

I am running a regression in STATA (xtgee).

xtgee Profit number_employees company_size , link(log) corr(ar1) iterate(100) family(igaussian) vce(robust)

When running the regression STATA tells me:

note: observations not equally spaced
modal spacing is delta Year = 1 unit
12 groups omitted from estimation
note: some groups have fewer than 2 observations
not possible to estimate correlations for those groups
25 groups omitted from estimation

This is totally fine for me. But I would like to know which observations ended up being part of the regression. Is there a way to do that?

Thanks for the help!!!
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4459
#2

22 May 2021, 05:33

one of the items that Stata saves for you is called "e(sample)" - this is effectively a dummy that is 1 if the observation is included and 0 if not; because this is "temporary", I recommend generating a new variable that is equal to e(sample) immediately and then using that to see what observations are included and which are excluded
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17706

22 May 2021, 05:39

Olmaba:
-e(sample)- is probably the answer:

Code:

use http://www.stata-press.com/data/r16/nlswork2.dta
. xtgee ln_w grade age c.age#c.age, corr(indep) nmp

Iteration 1: tolerance = 8.741e-13

GEE population-averaged model                   Number of obs     =     16,085
Group variable:                     idcode      Number of groups  =      3,913
Link:                             identity      Obs per group:
Family:                           Gaussian                    min =          1
Correlation:                   independent                    avg =        4.1
                                                              max =          9
                                                Wald chi2(3)      =    4241.04
Scale parameter:                  .1408958      Prob > chi2       =     0.0000

Pearson chi2(16081):               2265.75      Deviance          =    2265.75
Dispersion (Pearson):             .1408958      Dispersion        =   .1408958

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       grade |   .0724483   .0014229    50.91   0.000     .0696594    .0752372
         age |   .1064874   .0083644    12.73   0.000     .0900935    .1228812
             |
 c.age#c.age |  -.0016931   .0001655   -10.23   0.000    -.0020174   -.0013688
             |
       _cons |  -.8681487   .1024896    -8.47   0.000    -1.069025   -.6672728
------------------------------------------------------------------------------

. gen flag = e(sample)

. sum flag

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        flag |     16,094    .9994408    .0236418          0          1

. tab flag

       flag |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |          9        0.06        0.06
          1 |     16,085       99.94      100.00
------------+-----------------------------------
      Total |     16,094      100.00

.

PS: happy with noticing that my reply is consistent with Rich's helpful (and faster) advice!

Kind regards,
Carlo
(Stata 19.0)

Announcement

List of observations in regression

Comment

Comment