Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summary Statistics for the Sample Used in Regression

    I have a data set that has 303,706 observations, however, regression models uses only 177,714 observation (due to missing values of some of the variables). Is there a way to find summary statistics of only those observations which are used in the regression i.e. only the 177,714 observations. Thanks in advance.
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

  • #2
    Attaullah may want to consider the -e(sample) option:
    Code:
    . use auto.dta, clear
    (1978 Automobile Data)
    
    . sum price mpg rep78
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
           price |        74    6165.257    2949.496       3291      15906
             mpg |        74     21.2973    5.785503         12         41
           rep78 |        69    3.405797    .9899323          1          5
    
    . reg price mpg rep78
    
          Source |       SS       df       MS              Number of obs =      69
    -------------+------------------------------           F(  2,    66) =   11.06
           Model |   144754063     2  72377031.7           Prob > F      =  0.0001
        Residual |   432042896    66  6546104.48           R-squared     =  0.2510
    -------------+------------------------------           Adj R-squared =  0.2283
           Total |   576796959    68  8482308.22           Root MSE      =  2558.5
    
    ------------------------------------------------------------------------------
           price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             mpg |  -271.6425   57.77115    -4.70   0.000    -386.9864   -156.2987
           rep78 |   666.9568   342.3559     1.95   0.056     -16.5789    1350.492
           _cons |   9657.754    1346.54     7.17   0.000       6969.3    12346.21
    ------------------------------------------------------------------------------
    
    . sum  price mpg rep78 if e(sample)
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
           price |        69    6146.043     2912.44       3291      15906
             mpg |        69    21.28986    5.866408         12         41
           rep78 |        69    3.405797    .9899323          1          5
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Or better yet, estat sum, which I just recently discovered.

      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . reg price mpg rep78
      
            Source |       SS       df       MS              Number of obs =      69
      -------------+------------------------------           F(  2,    66) =   11.06
             Model |   144754063     2  72377031.7           Prob > F      =  0.0001
          Residual |   432042896    66  6546104.48           R-squared     =  0.2510
      -------------+------------------------------           Adj R-squared =  0.2283
             Total |   576796959    68  8482308.22           Root MSE      =  2558.5
      
      ------------------------------------------------------------------------------
             price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               mpg |  -271.6425   57.77115    -4.70   0.000    -386.9864   -156.2987
             rep78 |   666.9568   342.3559     1.95   0.056     -16.5789    1350.492
             _cons |   9657.754    1346.54     7.17   0.000       6969.3    12346.21
      ------------------------------------------------------------------------------
      
      . estat sum
      
        Estimation sample regress              Number of obs =     69
      
        -------------------------------------------------------------
            Variable |        Mean     Std. Dev.       Min        Max
        -------------+-----------------------------------------------
               price |    6146.043      2912.44       3291      15906
                 mpg |    21.28986     5.866408         12         41
               rep78 |    3.405797     .9899323          1          5
        -------------------------------------------------------------
      The e(sample) qualifier is useful in many other cases, e.g. when you want to do things besides summary statistics.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Wow, that was a wonderful solution. Thanks gentleman Carlo Lazzaro.
        Regards
        --------------------------------------------------
        Attaullah Shah, PhD.
        Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
        FinTechProfessor.com
        https://asdocx.com
        Check out my asdoc program, which sends outputs to MS Word.
        For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

        Comment


        • #5
          Thanks Richard,
          I wasn't aware of -estat sum-.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Strictly, e(sample) is not an option. It's a function. It works even if no estimation results are in memory, but not usefully...

            Comment


            • #7
              Nick.
              sorry for my previous mistaking and thanks for the clarification.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Vince Wiggins explained e(sample) way back when.

                http://www.stata.com/statalist/archi.../msg00635.html

                Key phrase: "-e(sample)- is nothing more than a cleverly hidden variable exposed through the -e(sample)- function"

                If you want to make sure you are working with the same cases throughout an analysis, you may want to run an analysis that includes all variables of interest and then generate a new variable equal to e(sample). This would be useful if, say, you want cases dropped that are missing on any of the variables of interest, not just the subset of variables used in a particular analysis.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment

                Working...
                X