Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • descriptive statistics on subsample used for a regression analysis

    Dear Statalisters,

    I have conducted a multi-level regression analysis with xtmixed with the default option that observations for which one or more variables are missing are not used for the regression. For an abstract I want give descriptive statistics (e.g. mean age) of the subsample used in thre regression analysis (ie the subsample for which there are no missing variables). Is there a simple way how to restrict eg the command summarize to the given subsample or any other way how to retrieve this information? Thanks a lot for any hints.

    Best, Michael

  • #2
    Estimation commands leave the estimation sample behind in e(sample). So, after the regression command you can type

    Code:
    summarize var1 var2 if e(sample)
    You can also restrict the summary statistics like this:

    Code:
    summarize var1 var2 if !missing(y, var1, var2, var3)
    Assumming your model contains the variables y var1 var2 and var3.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      The first command Marteen gave you is the best actually, but must be run after the correct regression (be careful if you run several models)

      Otherwise, there is plenty of ways to do that, and instead of repeating all variables that musn't be missing, a simple alternative is to use the predicted value, and exclude observations whom predicted value is missing (hence not included in the model sample)
      Code:
      reg y var1 var2 var3 /*model1*/
      predict p1
      reg y var1 var2 var3 i.dummy /*model 2 */
      predict p2
      summarize var1 var2 var3 if p1!=.
      summarize var1 var2 var3 if p2!=.
      This allows you to compare easily the sample of alternative models.

      Comment


      • #4
        The estat sum command is also handy for things like this, e.g.

        Code:
        sysuse auto, clear
        reg price weight if foreign
        estat sum
        From the help: estat summarize summarizes the variables used by the command and automatically restricts the sample to e(sample); it also summarizes the weight variable and cluster structure, if specified.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        Stata Version: 17.0 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Thanks a lot for your help. It was really useful indeed.

          Best, Michael

          Comment


          • #6
            Dear Statalisters,

            I have a similar question. Namely, I regress time-to-peak on launch order and other variables.
            reg log_timetopeak1 lorder price nu_brands pr_prm pr_undet, robust
            I also want to get descriptive statistics of the subsample for all levels of the variable launch order (1,2,3,4,etc)
            Using this command tabstat timetopeak1, by( lorder ) s(n sd mean) I get results for all observations. How can I get descriptive statistics for all levels of launch order for the subsample used in the regression model?

            Comment


            • #7
              I am sorry that I have to post a question in order to come up with the answer myself
              Maybe it could be helpful for those who might have the same question in the future.
              First I generated a variable for observations used in a regression model (1 yes, 0 no). After dropping unsed observation all kinds of descriptive statistics analysis can be performed.

              gen byte used=e(sample)
              drop if used==0

              Comment


              • #8
                Natalia:
                thanks for sharing your solution.
                Just one minor aside: -drop-ping observations is often regretting-prone.
                Why not:
                Code:
                . sysuse auto.dta
                (1978 Automobile Data)
                . reg price i.foreign mpg if mpg>=20
                
                      Source |       SS           df       MS      Number of obs   =        39
                -------------+----------------------------------   F(2, 36)        =      1.84
                       Model |  18465953.6         2  9232976.78   Prob > F        =    0.1734
                    Residual |   180598483        36  5016624.53   R-squared       =    0.0928
                -------------+----------------------------------   Adj R-squared   =    0.0424
                       Total |   199064437        38  5238537.81   Root MSE        =    2239.8
                
                ------------------------------------------------------------------------------
                       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                     foreign |
                    Foreign  |   889.7939   761.1811     1.17   0.250    -653.9529    2433.541
                         mpg |  -144.0107   79.56934    -1.81   0.079    -305.3848    17.36339
                       _cons |   8554.849   1978.991     4.32   0.000      4541.27    12568.43
                ------------------------------------------------------------------------------
                
                . g double used=e(sample)
                
                . tab used
                
                       used |      Freq.     Percent        Cum.
                ------------+-----------------------------------
                          0 |         35       47.30       47.30
                          1 |         39       52.70      100.00
                ------------+-----------------------------------
                      Total |         74      100.00
                
                
                . tabstat price mpg if used==1, stat(count mean sd p50 min max)
                
                   stats |     price       mpg
                ---------+--------------------
                       N |        39        39
                    mean |  5279.667   25.4359
                      sd |  2288.785   4.80567
                     p50 |      4499        24
                     min |      3291        20
                     max |     15906        41
                ------------------------------
                Kind regards,
                Carlo
                (Stata 18.0 SE)

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  Natalia:
                  thanks for sharing your solution.
                  Just one minor aside: -drop-ping observations is often regretting-prone.
                  ------------------------------[/CODE]
                  Dear Carlo, thank you! very helpful to me.

                  Comment

                  Working...
                  X