descriptive statistics on subsample used for a regression analysis

Michael Eichinger

Join Date: Mar 2015

Posts: 27
#1

descriptive statistics on subsample used for a regression analysis

22 Apr 2015, 02:36

Dear Statalisters,

I have conducted a multi-level regression analysis with xtmixed with the default option that observations for which one or more variables are missing are not used for the regression. For an abstract I want give descriptive statistics (e.g. mean age) of the subsample used in thre regression analysis (ie the subsample for which there are no missing variables). Is there a simple way how to restrict eg the command summarize to the given subsample or any other way how to retrieve this information? Thanks a lot for any hints.

Best, Michael
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3496
#2

22 Apr 2015, 02:48

Estimation commands leave the estimation sample behind in e(sample). So, after the regression command you can type

Code:

summarize var1 var2 if e(sample)

You can also restrict the summary statistics like this:

Code:

summarize var1 var2 if !missing(y, var1, var2, var3)

Assumming your model contains the variables y var1 var2 and var3.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Charlie Joyez

Join Date: Dec 2014

Posts: 421
#3

22 Apr 2015, 03:32

The first command Marteen gave you is the best actually, but must be run after the correct regression (be careful if you run several models)

Otherwise, there is plenty of ways to do that, and instead of repeating all variables that musn't be missing, a simple alternative is to use the predicted value, and exclude observations whom predicted value is missing (hence not included in the model sample)

Code:

reg y var1 var2 var3 /*model1*/ predict p1 reg y var1 var2 var3 i.dummy /*model 2 */ predict p2 summarize var1 var2 var3 if p1!=. summarize var1 var2 var3 if p2!=.

This allows you to compare easily the sample of alternative models.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5043
#4

22 Apr 2015, 05:44

The estat sum command is also handy for things like this, e.g.

Code:

sysuse auto, clear reg price weight if foreign estat sum

From the help: estat summarize summarizes the variables used by the command and automatically restricts the sample to e(sample); it also summarizes the weight variable and cluster structure, if specified.

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Michael Eichinger

Join Date: Mar 2015

Posts: 27
#5

22 Apr 2015, 11:31

Thanks a lot for your help. It was really useful indeed.

Best, Michael
Comment
Natalia Remel

Join Date: Mar 2017

Posts: 23
#6

08 May 2017, 05:47

Dear Statalisters,

I have a similar question. Namely, I regress time-to-peak on launch order and other variables.
reg log_timetopeak1 lorder price nu_brands pr_prm pr_undet, robust
I also want to get descriptive statistics of the subsample for all levels of the variable launch order (1,2,3,4,etc)
Using this command tabstat timetopeak1, by( lorder ) s(n sd mean) I get results for all observations. How can I get descriptive statistics for all levels of launch order for the subsample used in the regression model?
Comment
Natalia Remel

Join Date: Mar 2017

Posts: 23
#7

08 May 2017, 06:39

I am sorry that I have to post a question in order to come up with the answer myself
Maybe it could be helpful for those who might have the same question in the future.
First I generated a variable for observations used in a regression model (1 yes, 0 no). After dropping unsed observation all kinds of descriptive statistics analysis can be performed.

gen byte used=e(sample)
drop if used==0
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17854

08 May 2017, 06:52

Natalia:
thanks for sharing your solution.
Just one minor aside: -drop-ping observations is often regretting-prone.
Why not:

Code:

. sysuse auto.dta
(1978 Automobile Data)
. reg price i.foreign mpg if mpg>=20

      Source |       SS           df       MS      Number of obs   =        39
-------------+----------------------------------   F(2, 36)        =      1.84
       Model |  18465953.6         2  9232976.78   Prob > F        =    0.1734
    Residual |   180598483        36  5016624.53   R-squared       =    0.0928
-------------+----------------------------------   Adj R-squared   =    0.0424
       Total |   199064437        38  5238537.81   Root MSE        =    2239.8

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     foreign |
    Foreign  |   889.7939   761.1811     1.17   0.250    -653.9529    2433.541
         mpg |  -144.0107   79.56934    -1.81   0.079    -305.3848    17.36339
       _cons |   8554.849   1978.991     4.32   0.000      4541.27    12568.43
------------------------------------------------------------------------------

. g double used=e(sample)

. tab used

       used |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         35       47.30       47.30
          1 |         39       52.70      100.00
------------+-----------------------------------
      Total |         74      100.00


. tabstat price mpg if used==1, stat(count mean sd p50 min max)

   stats |     price       mpg
---------+--------------------
       N |        39        39
    mean |  5279.667   25.4359
      sd |  2288.785   4.80567
     p50 |      4499        24
     min |      3291        20
     max |     15906        41
------------------------------

Kind regards,
Carlo
(Stata 19.0)

Comment

Natalia Remel

Join Date: Mar 2017

Posts: 23
#9

12 May 2017, 13:55

Originally posted by Carlo Lazzaro View Post

Natalia:
thanks for sharing your solution.
Just one minor aside: -drop-ping observations is often regretting-prone.
------------------------------[/CODE]

Dear Carlo, thank you! very helpful to me.
Comment

Announcement

descriptive statistics on subsample used for a regression analysis

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment