Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating subset of population sample and removing observations outside of the sample

    Hi
    I ran a descriptive statistics command on my subset population sample and the number of observations tallied up. However, when I ran regression using my predictors, the number of observations did not tally up. It was uneven, including the observations in the outcome. Are there commands I should use to tell Stata to drop all the observations that are outside my population sample so that the analysis and tests I run only pertain to the population sample? Thank you.

  • #2
    Lena:
    if one or more of your predictors have missing values, the observations will be listwise deleted by Stata.
    If this is not your case, an -if- qualifier (instead of -drop-ping data that can be useful again) is the simplest way to go:
    Code:
    . sysuse auto.dta
    (1978 Automobile Data)
    
    . sum displacement
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
    displacement |         74    197.2973    91.83722         79        425
    
    . g flag=1 if displacement>=r(mean)
    (38 missing values generated)
    
    . replace flag=0 if flag==.
    (38 real changes made)
    
    
    . sum price if flag==1
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
           price |         36    6836.361    3434.593       3299      15906
    
    . regress displacement i.foreign if flag==1
    note: 0.foreign omitted because of collinearity
    
          Source |       SS           df       MS      Number of obs   =        36
    -------------+----------------------------------   F(0, 35)        =      0.00
           Model |           0         0           .   Prob > F        =         .
        Residual |      132716        35  3791.88571   R-squared       =    0.0000
    -------------+----------------------------------   Adj R-squared   =    0.0000
           Total |      132716        35  3791.88571   Root MSE        =    61.578
    
    ------------------------------------------------------------------------------
    displacement |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         foreign |
       Domestic  |          0  (omitted)
           _cons |   277.3333   10.26305    27.02   0.000     256.4982    298.1684
    ------------------------------------------------------------------------------
    
    .
    Eventually:
    Code:
    summarize if e(sample)
    after regression can help you in sniffing out the cuplrit(s) (if any).
    Last edited by Carlo Lazzaro; 11 Nov 2021, 12:15.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment

    Working...
    X