Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing those with complete data

    Hi all
    I am wondering if somebody can help me.
    I have done a complete case analysis.

    I used the keep if var!=. function to create a data set that only has participants with full data on all variables.

    Now, I'd like to compare the characteristics of participants included in my analysis, against those who I excluded due to missing data.

    Can anybody help me with the code to do this?

    Thank you so much. I've been having a lot of trouble (I even tried a 'drop if xyz!=. to create another data set, merge them and then run simple descriptive stats, but that didn't seem to work).


    Any help would be tremendously appreciated
    Al

  • #2
    If you have already used the -keep- command, that data is gone. You'll need to reload the original data. Once you have reloaded the dataset that has all observations, you can do a few things:

    1) generate a new variable that identifies observations with and without missing values on var:
    Code:
    gen not_miss=0
    replace not_miss=1 if !mi(var)
    2) You can condition your various commands on -if not_miss==1- or -if not_miss==0-
    Code:
    tab x if not_miss==1
    3) You can use -by- or -over- options with not_miss (if allowed by the commands you are using):
    Code:
    by not_miss, sort: sum x
    4) If you must drop/keep the data, you can use the -preserve- command before -keep- and -restore- when you want the full dataset (see: help preserve)

    Code:
    gen not_miss=0
    replace not_miss=1 if !mi(var)
    preserve
    drop if not_miss==0
    
    *commands that apply only to non-missing cases
    
    restore
    commands that refer to both groups
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Alan:
      ​​​​​​​Carole gave helpful advice.
      As far as the comparison you have in mind is concerned, the first step to take is creating a categorical variable that split your dataset in two subsamples (those who complete data vs those with at least one missing value in any variable): let's call it -missing-.
      Next steps depend on what is the aim of your comparison. For instance, if you're interested in comparing the mean of a continuous variable in those with complete data vs those with at least one missing value in any variable, you can consider using a bootstrapped
      Code:
      ttest <variable>, by(missing) unequal
      (see example under -bootstrap- entry, Stata .pdf manual).
      However, the comparison you have in mind seems to imply that you should then face another issue, that is the mechanism and the pattern undelying the missingness of your data (and both of them can differ across variables) in order to dela with them, especially if you plan to submit your paper to a technical journal: see -mi- suiter entries on that.
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        Carole gave excellent advice, but one detail can be simplified. You can and arguably should just write

        Code:
        gen not_miss = !missing(var)
        The main deal here is that true-or-false operations of the form

        gen newvar = 1 if something_is_true
        replace newvar = 0 if newvar == .

        can typically be written cleanly as

        gen newvar = something_is_true

        as true or false statements are automatically evaluated as 1 if true and 0 if false. More at https://www.stata.com/support/faqs/d...rue-and-false/

        The lesser deal here is more personal taste. While mi() is perfectly legal as a synonym or abbreviation for missing() I lean towards the latter as more transparent to those learning Stata (and who isn't?).

        Conversely, the first form can be defended as just personal taste too. The reader gets to see which conditions are coded 1 and 0 and the code can be defended as more transparent. True, but I see code where the same device is used again and again and is bloated correspondingly. If you're worried that readers won't understand the more concise form, add a comment to your code first time it's used.
        Last edited by Nick Cox; 29 Jul 2018, 02:03.

        Comment

        Working...
        X