Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count how many variables have more than a certain number of missing values

    Hi Statalists!
    I'm stuck on a very simple task: I would like to count how many variables in my dataset have more than 50% missing values.
    Do you have any suggestion on how to do it? Thanks!

  • #2
    -codebook- would show you the answer for each variable, one by one, but if you just want a count across all variables, you could do this:

    Code:
    local misscount = 0
    local np50 = 0.5 * _N
    foreach v of varlist * {
       quiet count if missing(`v')
       local misscount = `misscount' + (r(N) > `np50')
    }
    di "`misscount'  variables had more than 50% missing values."

    Comment


    • #3
      See also missings -- now from the Stata Journal -- but first announced here at https://www.statalist.org/forums/for...aging-missings

      As the program name is too close in one sense to a common keyword, here is an otherwise unpredictable tip: dm0085 is the magic keyword that finds publications in the Journal. dm0085 was the first paper and at the time of writing dm0085_2 finds the latest public update.

      Code:
      . search dm0085, entry
      
      Search of official help files, FAQs, Examples, and Stata Journals
      
      SJ-20-4 dm0085_2  . . . . . . . . . . . . . . . . Software update for missings
              (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q4/20   SJ 20(4):1028--1030
              sorting has been extended for missings report
      
      SJ-17-3 dm0085_1  . . . . . . . . . . . . . . . . Software update for missings
              (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q3/17   SJ 17(3):779
              identify() and sort options have been added
      
      SJ-15-4 dm0085  Speaking Stata: A set of utilities for managing missing values
              (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q4/15   SJ 15(4):1174--1185
              provides command, missings, as a replacement for, and extension
              of, previous commands nmissing and dropmiss
      In the sandbox dataset I use with this command, no variable is missing in _N/2 or more observations, but this sample shows the main idea: Just change 10 to 2.

      Code:
      . webuse nlswork, clear
      (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
      
      . missings report, min(`=_N/10')
      
      Checking missings in all variables:
      15082 observations with missing values
      
      -----------------
              |      #
      --------+--------
        union |   9296
       wks_ue |   5704
      -----------------
      
      . ret li
      
      scalars:
                        r(N) =  28534
      
      macros:
                  r(varlist) : "union wks_ue"
      
      . di wordcount("`r(varlist)'")
      2

      Comment


      • #4
        Thank you very much, very clear!

        Comment

        Working...
        X