Dropping Observations with Missings

Ally Smith

Join Date: Nov 2016

Posts: 8
#1

Dropping Observations with Missings

30 Dec 2016, 14:58

Hi,

I am currently preparing my dataset for further analyses and I have six different kind of missing values (.a, .b, .c, .d, .e, .f).
I would like to drop all observations that have missings (no matter what kind of missings) for certain specific variables.
Of course, I can check for all variables what kind of missings they have (for example, the variable "age" only has missings of type .a; variable "change" has missings of type .a and .c) and then drop the observations with missings using

drop if age==.a
drop if change==.a | change==.c

As this is kind of arduous, I wondered whether there is an easier way to do it.
I was thinking about something like

drop if change==.*

(which, unfortunately, does not work).

If somebody can help me with that problem, I would be very happy.
Thanks in advance!

Ally
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

30 Dec 2016, 15:37

Hello Ally,

Welcome to the Stata Forum,

Indeed, there are commands to drop all missings.

However, we shall keep in mind that: to start, several estimations can be performed with missing data (I mean, they don't "suffer" listwise deletion), thefore you may keep the missing data there, safe and sound, because they won't bother; second. missing data may be somewhat "informative" as well, because they "tell" much about the process of data colection and the reactions of the individuals, for example; third, shall you wish to perform, say, sensitivity analysis, and choose for that a specific variable, maybe you'd face low power, since all observations that have at least one missing were unfortunately deleted; last but not least, dropping missing data is not strictly necessary, even for the estimations which perform listwise deletion, because Stata canl handle the task appropriately.

Particularly, apart from avoiding the evidence of missing data itself, and I hope you won't take it amiss, I fear I see no purpose in deleting all missing data "for further analysis".

Best,

Marcos

Last edited by Marcos Almeida; 30 Dec 2016, 15:43.

Best regards,

Marcos
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#3

30 Dec 2016, 16:36

While I agree with Marcos that you probably shouldn't do this in the first place, you should familiarize yourself with Stata's -missing()- function. See -help missing()-. It will enable you treat all of the different missing values in the same way, and also works equally well with string and numeric variables. There are plenty of situations where you want to do something conditional on some variable(s) being (not) missing. The -missing()- function save you from having to check for each specific missing value.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

30 Dec 2016, 17:19

To extend the excellent advice from both Marcos and Clyde, my approach is often to do something like

Code:

generate no_miss = !missing(y, x1, x2, x3)

and then at the start of each do-file after I use my dataset, I include

Code:

keep if no_miss

when that is appropriate, and omit it if I want to dig more deeply into other aspects of my data, including missing value patterns and the like.

As Marcos suggests, it's always better to keep data than to discard it. It's one thing to omit, say, children from the dataset when analyzing, say, income from wages. They were never part of the universe. But adults with missing wages would be in the universe, and the analysis needs to be assured that wages are missing in ways that are not correlated with other important variables. And beyond that, Stata has tools for multiple imputation to handle missing values analytically, but that's probably far beyond where you want to take your analysis.
1 like
Comment
Ally Smith

Join Date: Nov 2016

Posts: 8
#5

31 Dec 2016, 04:32

Thank you very much for your help
Comment

Announcement

Dropping Observations with Missings

Comment

Comment

Comment

Comment