Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing values

    Dear Stata users,
    Can anyone help me how to find out variables with the highest missing values in my data set.
    Basically what i am using is (tab var1, miss) to figure out missing values in a single variable, but i wonder if i have to repeat the same command 100 times for each variable in my data-set and then to look for top 5 variables with the highest missing values.
    I assume there must be easy ways, would much appreciate if you could suggest any.


    Thanks



  • #2
    Well, for one thing, there is the -misstable summarize- command which will show you the number of missing values for each numeric variable in the data set in a single command.

    If you want to do something a little fancier to get a list of all your variables ranked in order by number of missings you could do this:

    Code:
    tempfile holding
    capture posutil clear
    postfile handle str32 varname long miss_count using `holding'
    
    foreach v of varlist _all {
        quietly count if missing(`v')
        post handle ("`v'") (`r(N)')
    }
    postclose handle
    
    preserve
    use `holding', clear
    gsort -miss_count varname
    list, clean
    restore

    Comment


    • #3
      There might be user-written commands to conveniently do what you want, but you could loop through all of your variables counting the missings; something like that below.
      Code:
      tempfile missing_tallies
      tempname fh
      postfile `fh' str32 varname int missing using `missing_tallies'
      foreach var of varlist _all {
          quietly count if mi(`var')
          post `fh' ("`var'") (r(N))
      }
      postclose `fh'
      // Save your dataset if needed
      use `missing_tallies', clear
      gsort -missing
      list in 1/5, noobs
      See also
      Code:
      help misstable

      Comment


      • #4
        A little complicated for the first use, but really nice.
        Thanks Clyde.

        Comment


        • #5
          Following Clyde's and Joseph's valuable and experienced comments - and considering that you want something quick and easy - a simple "summ" command won't do the work for you?

          If not, and you want to see separately the missing observations "you can use as a trick" the procedure for multiple imputation. Follow the procedure as indicated in the very informative and user friendly video created by STATA and to the steps up to the command: . mi misstable summarize, all. If you are not interested to fill in the missing variables using multiple imputation, you do not have to follow the remaining of this video.

          https://www.youtube.com/watch?v=i6SOlq0mjuc


          The point up to which you should follow the video provided by STATA and to get what you want is shown in the picture attached.


          Best,
          Andreas
          Attached Files

          Comment


          • #6
            Thank you all, just found a new command which you may like it.

            code:
            nmissing _all
            Last edited by Fahim Ahmad; 26 Apr 2017, 04:43.

            Comment


            • #7
              You can get some of the way with missings (SJ). See http://www.statalist.org/forums/foru...aging-missings

              With missings report you can get the full list of numbers of missing values by variables, and then with a min() option you can select a smaller number of variables according to a minimum number of missing values.

              What isn't there yet are options to sort the output by number missing and select the # highest variables on number of missing values.

              This is an example of what you can do if you install missings.

              Code:
              .  webuse nlswork, clear
              (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
              
              .  missings report
              
              Checking missings in all variables:
              15082 observations with missing values
              
              age           24
              msp           16
              nev_mar       16
              grade          2
              not_smsa       8
              c_city         8
              south          8
              ind_code     341
              occ_code     121
              union       9296
              wks_ue      5704
              tenure       433
              hours         67
              wks_work     703
              
              .  missings report, min(100)
              
              Checking missings in all variables:
              15082 observations with missing values
              
              ind_code     341
              occ_code     121
              union       9296
              wks_ue      5704
              tenure       433
              wks_work     703
              I did a quick hack and privately I can do this too:

              Code:
              .  missings2 report, sort
              
              Checking missings in all variables:
              15082 observations with missing values
              
                             1          2
                   +-----------------------+
                 1 |     union       9296  |
                 2 |    wks_ue       5704  |
                 3 |  wks_work        703  |
                 4 |    tenure        433  |
                 5 |  ind_code        341  |
                 6 |  occ_code        121  |
                 7 |     hours         67  |
                 8 |       age         24  |
                 9 |   nev_mar         16  |
                10 |       msp         16  |
                11 |  not_smsa          8  |
                12 |     south          8  |
                13 |    c_city          8  |
                14 |     grade          2  |
                   +-----------------------+
              
              .  missings2 report, sort show(5)
              
              Checking missings in all variables:
              15082 observations with missing values
              
                            1          2
                  +-----------------------+
                1 |     union       9296  |
                2 |    wks_ue       5704  |
                3 |  wks_work        703  |
                4 |    tenure        433  |
                5 |  ind_code        341  |
                  +-----------------------+
              Working out how better to display the results may be trickier. At present, I am just spitting out a matrix from Mata.
              Last edited by Nick Cox; 26 Apr 2017, 05:21.

              Comment


              • #8
                Fahim: Thanks for the mention but nmissing is not new (born 1999) and no longer recommended. My previous gives positive advice.
                Code:
                . search nmissing, sj historical
                
                Search of official help files, FAQs, Examples, SJs, and STBs
                
                SJ-15-4 dm0085  Speaking Stata: A set of utilities for managing missing values
                        (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                        Q4/15   SJ 15(4):1174--1185
                        provides command, missings, as a replacement for, and extension
                        of, previous commands nmissing and dropmiss
                
                SJ-15-4 dm67_4  . . . . . . . . . . . . . . . . . Software update for nmissing
                        (help nmissing if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                        Q4/15   SJ 15(4):1186--1187
                        nmissing command has been superseded by a new command, missings,
                        which offers various utilities for managing variables that may
                        have missing values
                
                SJ-5-4  dm67_3  . . . . . . . . . .  Software update for nmissing and npresent
                        (help nmissing if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                        Q4/05   SJ 5(4):607
                        now produces saved results
                
                SJ-3-4  sg67_2  . . . . . . . . . .  Software update for nmissing and npresent
                        (help nmissing, npresent if installed)  . . . . . . . . . .  N. J. Cox
                        Q4/03   SJ 3(4):449
                        updated to include support for by, options for checking
                        string values that contain spaces or periods, documentation
                        of extended missing values .a to .z, and improved output
                
                STB-60  dm67.1  . . . .  Enhancements to numbers of missing and present values
                        (help nmissing if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                        3/01    pp.2--3; STB Reprints Vol 10, pp.7--9
                        updated with option for reporting on observations
                
                STB-49  dm67  . . . . . . . . . . . . .  Numbers of missing and present values
                        (help nmissing if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                        5/99    pp.7--8; STB Reprints Vol 9, pp.26--27
                        commands to list the numbers of missing values and nonmissing
                        values in each variable in varlist

                Comment


                • #9
                  Nick, I was looking how to sort variables with highest missing values.
                  As you suggest missings report would provide us numbers of missing values in each variable and missings2 report, sort will sort it from highest to lowest.

                  I can install missings command through this [ ssc install missings ] but don't know how to install missings2 command.
                  Would much appreciate if you guide me.



                  Comment


                  • #10
                    Sorry, but you can't install missings2: as said, it is private, meaning on my machine only, as I just hacked it this morning (British time).

                    missings should be installed from the Stata Journal files. The first result from the search in #8 points to a download location.

                    Comment


                    • #11
                      This is now possible with missings as updated on SSC. (Thanks to Kit Baum as always for the update; an update to the Stata Journal version will follow in due course.)

                      Previously missings just looped through variables and printed results one at a time. Wanting to see sorted results (and possibly just some of the sorted results) implied a different structure for holding the results before sorting, etc.

                      Here is a simple example.

                      Code:
                      . webuse nlswork, clear
                      (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
                      
                      . missings report
                      
                      Checking missings in all variables:
                      15082 observations with missing values
                      
                      -------------------
                                |      #
                      ----------+--------
                            age |     24
                            msp |     16
                        nev_mar |     16
                          grade |      2
                       not_smsa |      8
                         c_city |      8
                          south |      8
                       ind_code |    341
                       occ_code |    121
                          union |   9296
                         wks_ue |   5704
                         tenure |    433
                          hours |     67
                       wks_work |    703
                      -------------------
                      
                      . missings report, min(100)
                      
                      Checking missings in all variables:
                      15082 observations with missing values
                      
                      -------------------
                                |      #
                      ----------+--------
                       ind_code |    341
                       occ_code |    121
                          union |   9296
                         wks_ue |   5704
                         tenure |    433
                       wks_work |    703
                      -------------------
                      
                      . missings report, sort
                      
                      Checking missings in all variables:
                      15082 observations with missing values
                      
                      -------------------
                                |      #
                      ----------+--------
                          union |   9296
                         wks_ue |   5704
                       wks_work |    703
                         tenure |    433
                       ind_code |    341
                       occ_code |    121
                          hours |     67
                            age |     24
                        nev_mar |     16
                            msp |     16
                         c_city |      8
                          south |      8
                       not_smsa |      8
                          grade |      2
                      -------------------
                      
                      . missings report, sort show(5)
                      
                      Checking missings in all variables:
                      15082 observations with missing values
                      
                      -------------------
                                |      #
                      ----------+--------
                          union |   9296
                         wks_ue |   5704
                       wks_work |    703
                         tenure |    433
                       ind_code |    341
                      -------------------
                      Thanks again to Fahim Ahmad for the good question.

                      Comment


                      • #12
                        Nick, this really a cool way for sorting variables by number of missing values.

                        Thanks a lot for going in details and nice explanation.


                        Comment

                        Working...
                        X