Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping Observations with Missings

    Hi,

    I am currently preparing my dataset for further analyses and I have six different kind of missing values (.a, .b, .c, .d, .e, .f).
    I would like to drop all observations that have missings (no matter what kind of missings) for certain specific variables.
    Of course, I can check for all variables what kind of missings they have (for example, the variable "age" only has missings of type .a; variable "change" has missings of type .a and .c) and then drop the observations with missings using

    drop if age==.a
    drop if change==.a | change==.c

    As this is kind of arduous, I wondered whether there is an easier way to do it.
    I was thinking about something like

    drop if change==.*

    (which, unfortunately, does not work).

    If somebody can help me with that problem, I would be very happy.
    Thanks in advance!

    Ally

  • #2
    Hello Ally,

    Welcome to the Stata Forum,

    Indeed, there are commands to drop all missings.

    However, we shall keep in mind that: to start, several estimations can be performed with missing data (I mean, they don't "suffer" listwise deletion), thefore you may keep the missing data there, safe and sound, because they won't bother; second. missing data may be somewhat "informative" as well, because they "tell" much about the process of data colection and the reactions of the individuals, for example; third, shall you wish to perform, say, sensitivity analysis, and choose for that a specific variable, maybe you'd face low power, since all observations that have at least one missing were unfortunately deleted; last but not least, dropping missing data is not strictly necessary, even for the estimations which perform listwise deletion, because Stata canl handle the task appropriately.

    Particularly, apart from avoiding the evidence of missing data itself, and I hope you won't take it amiss, I fear I see no purpose in deleting all missing data "for further analysis".

    Best,

    Marcos
    Last edited by Marcos Almeida; 30 Dec 2016, 15:43.
    Best regards,

    Marcos

    Comment


    • #3
      While I agree with Marcos that you probably shouldn't do this in the first place, you should familiarize yourself with Stata's -missing()- function. See -help missing()-. It will enable you treat all of the different missing values in the same way, and also works equally well with string and numeric variables. There are plenty of situations where you want to do something conditional on some variable(s) being (not) missing. The -missing()- function save you from having to check for each specific missing value.

      Comment


      • #4
        To extend the excellent advice from both Marcos and Clyde, my approach is often to do something like
        Code:
        generate no_miss = !missing(y, x1, x2, x3)
        and then at the start of each do-file after I use my dataset, I include
        Code:
        keep if no_miss
        when that is appropriate, and omit it if I want to dig more deeply into other aspects of my data, including missing value patterns and the like.

        As Marcos suggests, it's always better to keep data than to discard it. It's one thing to omit, say, children from the dataset when analyzing, say, income from wages. They were never part of the universe. But adults with missing wages would be in the universe, and the analysis needs to be assured that wages are missing in ways that are not correlated with other important variables. And beyond that, Stata has tools for multiple imputation to handle missing values analytically, but that's probably far beyond where you want to take your analysis.

        Comment


        • #5
          Thank you very much for your help

          Comment

          Working...
          X