Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop variables with missing values

    I have a dataset with several variables and I want to remove all the variables that have more than 400 observations with missing data. Can you help me with this code?
    Thank you very much

  • #2
    Code:
    foreach var of varlist * {
        count if missing(`var')
        if `=r(N)' > 400 drop `var' 
    }

    Comment


    • #3
      Hemanshu Kumar gave a fine answer.

      In addition, missings from the Stata Journal offers a systematic way of reporting on missing values.

      An otherwise unpredictable search term for sources or mentions here is dm0085 (Rightly or wrongly, I didn't want a long command name such as missingsreport -- yet the name I used will trigger many hits if used as a search term.)

      Code:
      . search dm0085, entry
      
      Search of official help files, FAQs, Examples, and Stata Journals
      
      SJ-20-4 dm0085_2  . . . . . . . . . . . . . . . . Software update for missings
              (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q4/20   SJ 20(4):1028--1030
              sorting has been extended for missings report
      
      SJ-17-3 dm0085_1  . . . . . . . . . . . . . . . . Software update for missings
              (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q3/17   SJ 17(3):779
              identify() and sort options have been added
      
      SJ-15-4 dm0085  Speaking Stata: A set of utilities for managing missing values
              (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q4/15   SJ 15(4):1174--1185
              provides command, missings, as a replacement for, and extension
              of, previous commands nmissing and dropmiss
      missings doesn't offer a direct route to dropping variables or observations unless all observations or all variables respectively are missing. But it helps guide your decision-making.


      Code:
      . webuse nlswork, clear
      (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
      
      . missings report
      
      Checking missings in all variables:
      15082 observations with missing values
      
      -------------------
                |      #
      ----------+--------
            age |     24
            msp |     16
        nev_mar |     16
          grade |      2
       not_smsa |      8
         c_city |      8
          south |      8
       ind_code |    341
       occ_code |    121
          union |   9296
         wks_ue |   5704
         tenure |    433
          hours |     67
       wks_work |    703
      -------------------
      
      . missings report, sort min(400)
      
      Checking missings in all variables:
      15082 observations with missing values
      
      -------------------
                |      #
      ----------+--------
          union |   9296
         wks_ue |   5704
       wks_work |    703
         tenure |    433
      -------------------
      
      . return list
      
      scalars:
                        r(N) =  28534
      
      macros:
                  r(varlist) : "union wks_ue tenure wks_work"
      .

      Comment


      • #4
        It worked. Thank you very much for your message.

        Comment

        Working...
        X