Comparing those with complete data

Alan Jeddi

Join Date: Jun 2018

Posts: 42
#1

Comparing those with complete data

28 Jul 2018, 22:18

Hi all
I am wondering if somebody can help me.
I have done a complete case analysis.

I used the keep if var!=. function to create a data set that only has participants with full data on all variables.

Now, I'd like to compare the characteristics of participants included in my analysis, against those who I excluded due to missing data.

Can anybody help me with the code to do this?

Thank you so much. I've been having a lot of trouble (I even tried a 'drop if xyz!=. to create another data set, merge them and then run simple descriptive stats, but that didn't seem to work).

Any help would be tremendously appreciated
Al
Tags: None
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#2

28 Jul 2018, 22:53

If you have already used the -keep- command, that data is gone. You'll need to reload the original data. Once you have reloaded the dataset that has all observations, you can do a few things:

1) generate a new variable that identifies observations with and without missing values on var:

Code:

gen not_miss=0 replace not_miss=1 if !mi(var)

2) You can condition your various commands on -if not_miss==1- or -if not_miss==0-

Code:

tab x if not_miss==1

3) You can use -by- or -over- options with not_miss (if allowed by the commands you are using):

Code:

by not_miss, sort: sum x

4) If you must drop/keep the data, you can use the -preserve- command before -keep- and -restore- when you want the full dataset (see: help preserve)

Code:

gen not_miss=0 replace not_miss=1 if !mi(var) preserve drop if not_miss==0 *commands that apply only to non-missing cases restore commands that refer to both groups

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#3

29 Jul 2018, 01:29

Alan:
Carole gave helpful advice.
As far as the comparison you have in mind is concerned, the first step to take is creating a categorical variable that split your dataset in two subsamples (those who complete data vs those with at least one missing value in any variable): let's call it -missing-.
Next steps depend on what is the aim of your comparison. For instance, if you're interested in comparing the mean of a continuous variable in those with complete data vs those with at least one missing value in any variable, you can consider using a bootstrapped

Code:

ttest <variable>, by(missing) unequal

(see example under -bootstrap- entry, Stata .pdf manual).
However, the comparison you have in mind seems to imply that you should then face another issue, that is the mechanism and the pattern undelying the missingness of your data (and both of them can differ across variables) in order to dela with them, especially if you plan to submit your paper to a technical journal: see -mi- suiter entries on that.

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#4

29 Jul 2018, 01:59

Carole gave excellent advice, but one detail can be simplified. You can and arguably should just write

Code:

gen not_miss = !missing(var)

The main deal here is that true-or-false operations of the form

gen newvar = 1 if something_is_true
replace newvar = 0 if newvar == .

can typically be written cleanly as

gen newvar = something_is_true

as true or false statements are automatically evaluated as 1 if true and 0 if false. More at https://www.stata.com/support/faqs/d...rue-and-false/

The lesser deal here is more personal taste. While mi() is perfectly legal as a synonym or abbreviation for missing() I lean towards the latter as more transparent to those learning Stata (and who isn't?).

Conversely, the first form can be defended as just personal taste too. The reader gets to see which conditions are coded 1 and 0 and the code can be defended as more transparent. True, but I see code where the same device is used again and again and is bloated correspondingly. If you're worried that readers won't understand the more concise form, add a comment to your code first time it's used.

Last edited by Nick Cox; 29 Jul 2018, 02:03.
1 like
Comment

Announcement

Comparing those with complete data

Comment

Comment

Comment