Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping observations from dataset with many variables

    Hi all, I've got a very large panel data set (about 610,000 observations and 1,096 variables) and I need to drop every observation that has missing information. While I could do the arduous task of drop if mi(varlist) I can't justify doing that 1,096 times. Is there a way I can type one command and it drops all observations with missing information across all variables? Thank you!

  • #2
    Code:
    egen int mcount = rowmiss(_all)
    drop if mcount > 0
    does it in 2 commands. Of course, -egen- is just a wrapper program, and internally it is looping over all the variables.

    Note, by the way, that the -missing()- function does not take a Stata varlist. It takes a series of arguments, separated by commas, each argument being the name of a variable. If -missing()- did take a varlist, you could accomplish this task with -drop if missing(_all)-. But that will just get you a syntax error.

    Comment


    • #3
      Clyde's method is fine.

      There are several tools in this territory. https://www.statalist.org/forums/for...aging-missings points to another.

      Comment


      • #4
        Note also that a great many Stata routines automatically drop observations with missing data, so, depending on what you're doing, directly dropping observations may be unnecessary.

        Comment


        • #5
          Anna:
          as an aside to previous helpful advice, you may test on an excerpt of your dataset whether Stata listwise deletion does (automatically) what you're after.
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment

          Working...
          X