Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping missings in a varlist

    Hi,

    What is an eloquent way to leave only respondents who have values for all these variables without doing it on a one-to-one basis?
    Code:
    . misstable summarize $student3 // checking missing
                                                                   Obs<.
                                                    +------------------------------
                   |                                | Unique
          Variable |     Obs=.     Obs>.     Obs<.  | values        Min         Max
      -------------+--------------------------------+------------------------------
       W1suspendMP |     1,358              12,181  |      2          1           2
         W1expelMP |     1,338              12,201  |      2          1           2
        W1ethgrpYP |        21              13,518  |      8          1           8
        W1truantYP |     1,033              12,506  |      2          1           2
        W1cignowYP |       820              12,719  |      2          1           2
       W1canntryYP |       562              12,977  |      2          1           2
         W1sprayYP |       466              13,073  |      2          1           2
         W1smashYP |       651              12,888  |      2          1           2
          W1shopYP |       617              12,922  |      2          1           2
         W1fightYP |       674              12,865  |      2          1           2
       W1inc1estMP |     3,292              10,247  |     33          0          32
         W1ch0_2HH |       139              13,400  |      4          0           3
        W1ch3_11HH |       139              13,400  |      8          0           7
       W1ch12_15HH |       139              13,400  |      5          0           4
       W1hiqualgMP |       541              12,998  |      7          1           7
      W1SOCMajorMP |     1,408              12,131  |      9          1           9
       W1englangYP |       191              13,348  |      4          1           4
          W1yys4YP |       735              12,804  |      4          1           4
          W1yys8YP |       782              12,757  |      4          1           4
          W1yys9YP |     1,081              12,458  |      4          1           4
         W1yys10YP |       681              12,858  |      4          1           4
        W1hwndayYP |     1,385              12,154  |      6          0           5
       W1heposs9YP |       777              12,762  |      4          1           4
       W1ssclubfYP |     1,363              12,176  |      5          1           5
       W1ssportfYP |     2,649              10,890  |      5          1           5
         W1yys15YP |       410              13,129  |      5          1           5
         W1yys16YP |       419              13,120  |      5          1           5
         W1yys19YP |       425              13,114  |      5          1           5
            urbind |         8              13,531  |      8          1           8
      -----------------------------------------------------------------------------

  • #2
    Sofiya:
    you're implictly asking for a complete case analysis approach, that Stata applies automatically via listwise deletion.
    So, no action from your side is required.
    Obviously, if data are not misssing completely at random (see -mi glossary- if unfamiliar with this term) your resulting sample may only have tenous relationship with your original one.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Sofiya:
      you're implictly asking for a complete case analysis approach, that Stata applies automatically via listwise deletion.
      So, no action from your side is required.
      Obviously, if data are not misssing completely at random (see -mi glossary- if unfamiliar with this term) your resulting sample may only have tenous relationship with your original one.
      Thank you Carlo! However, can I find out how many respondents I am left to work with? Is there a command ?

      When I press the summarize tab, it shows me a different number of observations for each variable. I don't think that's quite right when it comes to descriptive stats, for example..
      Last edited by Sofiya Volvakova; 05 Nov 2023, 07:09.

      Comment


      • #4
        Sofiya:
        -summarize- gives you the number of observations with observed value for each variable:
        Code:
        . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
        (1978 automobile data)
        
        . sum
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
                make |          0
               price |         74    6165.257    2949.496       3291      15906
                 mpg |         74     21.2973    5.785503         12         41
               rep78 |         69    3.405797    .9899323          1          5
            headroom |         74    2.993243    .8459948        1.5          5
        -------------+---------------------------------------------------------
               trunk |         74    13.75676    4.277404          5         23
              weight |         74    3019.459    777.1936       1760       4840
              length |         74    187.9324    22.26634        142        233
                turn |         74    39.64865    4.399354         31         51
        displacement |         74    197.2973    91.83722         79        425
        -------------+---------------------------------------------------------
          gear_ratio |         74    3.014865    .4562871       2.19       3.89
             foreign |         74    .2972973    .4601885          0          1
        In the -auto.dta- datatset the only variable with (5) missing values is -rep78-.

        The number of missing observations also depends on the specification you run:
        Code:
        . regress price trunk
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(1, 72)        =      7.89
               Model |  62747229.9         1  62747229.9   Prob > F        =    0.0064
            Residual |   572318166        72  7948863.42   R-squared       =    0.0988
        -------------+----------------------------------   Adj R-squared   =    0.0863
               Total |   635065396        73  8699525.97   Root MSE        =    2819.4
        
        ------------------------------------------------------------------------------
               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
               trunk |   216.7482   77.14554     2.81   0.006     62.96142     370.535
               _cons |   3183.504   1110.728     2.87   0.005     969.3088    5397.699
        ------------------------------------------------------------------------------
        
        . regress price i.rep78
        
              Source |       SS           df       MS      Number of obs   =        69
        -------------+----------------------------------   F(4, 64)        =      0.24
               Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
            Residual |   568436416        64     8881819   R-squared       =    0.0145
        -------------+----------------------------------   Adj R-squared   =   -0.0471
               Total |   576796959        68  8482308.22   Root MSE        =    2980.2
        
        ------------------------------------------------------------------------------
               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
               rep78 |
                  2  |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
                  3  |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
                  4  |       1507   2221.338     0.68   0.500    -2930.633    5944.633
                  5  |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
                     |
               _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
        ------------------------------------------------------------------------------
        In addition the -rowmiss- function available from -egen- can be helpful:
        Code:
        . egen wanted=rowmiss( make- foreign)
        
        
        . list in 1/10
        
             +----------------------------------------------------------------------------------------------------------------------------+
             | make             price   mpg   rep78   headroom   trunk   weight   length   turn   displa~t   gear_r~o    foreign   wanted |
             |----------------------------------------------------------------------------------------------------------------------------|
          1. | AMC Concord      4,099    22       3        2.5      11    2,930      186     40        121       3.58   Domestic        0 |
          2. | AMC Pacer        4,749    17       3        3.0      11    3,350      173     40        258       2.53   Domestic        0 |
          3. | AMC Spirit       3,799    22       .        3.0      12    2,640      168     35        121       3.08   Domestic        1 |
          4. | Buick Century    4,816    20       3        4.5      16    3,250      196     40        196       2.93   Domestic        0 |
          5. | Buick Electra    7,827    15       4        4.0      20    4,080      222     43        350       2.41   Domestic        0 |
             |----------------------------------------------------------------------------------------------------------------------------|
          6. | Buick LeSabre    5,788    18       3        4.0      21    3,670      218     43        231       2.73   Domestic        0 |
          7. | Buick Opel       4,453    26       .        3.0      10    2,230      170     34        304       2.87   Domestic        1 |
          8. | Buick Regal      5,189    20       3        2.0      16    3,280      200     42        196       2.93   Domestic        0 |
          9. | Buick Riviera   10,372    16       3        3.5      17    3,880      207     43        231       2.93   Domestic        0 |
         10. | Buick Skylark    4,082    19       3        3.5      13    3,400      200     42        231       3.08   Domestic        0 |
             +----------------------------------------------------------------------------------------------------------------------------+
        
        .
        Given the current situation of your dataset, -summarize- reports an available case analysis (that is, some variables have a number of observed values that is lower than the expected one).
        Unless you want to perform statistiscs on complete case analysis (that is considering only those observations with full observed values), that I do not sponsor, you shoud deal with missing data management first.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X