Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Descriptive statistics for variables in a regression

    Good day

    I am running regressions with several independent variables so for example PRICE = V1 + V2 + V3 + V4 + V5 + V6. Each of the independent variables have vastly differing numbers of observations. So when I summarize each variable, I get descriptive statistics based on:

    V1 n=932
    V2 n=546
    V3 n= 872
    V4 n=340
    V5 n=316
    V6 n=543

    The means, medians, std dev etc are all based on their individual numbers of observations. However, when I run the regression Stata skips the lines if any of the variables has missing values. So the regression seems to be based just on complete lines where V1, V2, V3, V4, V5 and V6 are not missing values. So the table of results for the final regression is based on n=294.

    The problem I have is I have to draw up a table of descriptive statistics for each of the variables (V1 - V6) for ONLY the variables that were used in the regression (i.e. n=294) and not the whole number of observations for each variable.

    I have been going through each variable and using "summarize V1 if PRICE~=. & V1~=. & V2~=. etc. But I have to do these tables for over 30 different regressions and it is wasting a lot of time. Does anyone know how to produce descriptive statistics for variables specifically for the observations that were used in the regression?

    Any help is greatly appreciated.

    Many Thanks
    Sean

  • #2
    Note the following and use of e(sample) -- and you read up about that in the manuals:
    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . clonevar newprice = price
    
    . replace newprice = . in 1/10
    (10 real changes made, 10 to missing)
    
    . clonevar newweight = weight
    
    . replace newweight = . in 50/60
    (11 real changes made, 11 to missing)
    
    . reg mpg newprice newweight
    
          Source |       SS       df       MS              Number of obs =      53
    -------------+------------------------------           F(  2,    50) =   50.35
           Model |  1378.92558     2   689.46279           Prob > F      =  0.0000
        Residual |   684.62159    50  13.6924318           R-squared     =  0.6682
    -------------+------------------------------           Adj R-squared =  0.6550
           Total |  2063.54717    52  39.6835994           Root MSE      =  3.7003
    
    ------------------------------------------------------------------------------
             mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
        newprice |  -.0000552   .0001959    -0.28   0.779    -.0004488    .0003383
       newweight |  -.0060305   .0007596    -7.94   0.000    -.0075562   -.0045049
           _cons |   40.13789   1.952008    20.56   0.000     36.21717    44.05862
    ------------------------------------------------------------------------------
    
    . su mpg newprice newweight if e(sample)
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
             mpg |        53    21.32075    6.299492         12         41
        newprice |        53    6296.113    3241.408       3291      15906
       newweight |        53    3062.642    836.0832       1760       4840
    
    . su mpg newprice newweight
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
             mpg |        74     21.2973    5.785503         12         41
        newprice |        64    6266.484    3065.138       3291      15906
       newweight |        63    3095.714     798.397       1760       4840

    Comment


    • #3
      See also http://www.stata-journal.com/sjpdf.h...iclenum=dm0030 for a nice introduction to this question.

      Comment


      • #4
        I recently discovered estat sum which I find really really handy. It is slightly easier than Stephen's approach and perhaps less error prone.

        Code:
        . webuse nhanes2f, clear
        
        . reg tgresult weight
        
              Source |       SS       df       MS              Number of obs =    5044
        -------------+------------------------------           F(  1,  5042) =  308.34
               Model |  2707181.95     1  2707181.95           Prob > F      =  0.0000
            Residual |  44268463.9  5042  8779.94127           R-squared     =  0.0576
        -------------+------------------------------           Adj R-squared =  0.0574
               Total |  46975645.8  5043  9315.01999           Root MSE      =  93.701
        
        ------------------------------------------------------------------------------
            tgresult |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              weight |   1.493946   .0850789    17.56   0.000     1.327154    1.660738
               _cons |   36.25774    6.27086     5.78   0.000     23.96413    48.55135
        ------------------------------------------------------------------------------
        
        . estat sum
        
          Estimation sample regress              Number of obs =   5044
        
          -------------------------------------------------------------
              Variable |        Mean     Std. Dev.       Min        Max
          -------------+-----------------------------------------------
              tgresult |    143.9064     96.51435         16       2238
                weight |    72.05661     15.50884      30.84     158.53
          -------------------------------------------------------------
        
        . reg health weight
        
              Source |       SS       df       MS              Number of obs =   10335
        -------------+------------------------------           F(  1, 10333) =   18.08
               Model |  26.2659433     1  26.2659433           Prob > F      =  0.0000
            Residual |  15008.7554 10333  1.45250706           R-squared     =  0.0017
        -------------+------------------------------           Adj R-squared =  0.0017
               Total |  15035.0214 10334   1.4549082           Root MSE      =  1.2052
        
        ------------------------------------------------------------------------------
              health |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              weight |  -.0032831   .0007721    -4.25   0.000    -.0047965   -.0017698
               _cons |   3.649905   .0567655    64.30   0.000     3.538634    3.761176
        ------------------------------------------------------------------------------
        
        . estat sum
        
          Estimation sample regress              Number of obs =  10335
        
          -------------------------------------------------------------
              Variable |        Mean     Std. Dev.       Min        Max
          -------------+-----------------------------------------------
                health |    3.413836     1.206196          1          5
                weight |    71.90313     15.35578      30.84     175.88
          -------------------------------------------------------------
        Last edited by Richard Williams; 02 Oct 2014, 07:38.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment

        Working...
        X