Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping observations from dataset with many variables

    Hi all, I've got a very large panel data set (about 610,000 observations and 1,096 variables) and I need to drop every observation that has missing information. While I could do the arduous task of drop if mi(varlist) I can't justify doing that 1,096 times. Is there a way I can type one command and it drops all observations with missing information across all variables? Thank you!

  • #2
    Code:
    egen int mcount = rowmiss(_all)
    drop if mcount > 0
    does it in 2 commands. Of course, -egen- is just a wrapper program, and internally it is looping over all the variables.

    Note, by the way, that the -missing()- function does not take a Stata varlist. It takes a series of arguments, separated by commas, each argument being the name of a variable. If -missing()- did take a varlist, you could accomplish this task with -drop if missing(_all)-. But that will just get you a syntax error.

    Comment


    • #3
      Clyde's method is fine.

      There are several tools in this territory. https://www.statalist.org/forums/for...aging-missings points to another.

      Comment


      • #4
        Note also that a great many Stata routines automatically drop observations with missing data, so, depending on what you're doing, directly dropping observations may be unnecessary.

        Comment


        • #5
          Anna:
          as an aside to previous helpful advice, you may test on an excerpt of your dataset whether Stata listwise deletion does (automatically) what you're after.
          Kind regards,
          Carlo
          (Stata 18.0 SE)

          Comment

          Working...
          X