Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to delete empty variables in a large data set?

    Hi Stata Lovers,



    I am analyzing the NFHS-4 and NFHS-5 datasets for a research article. During the process of data management and analysis, I created several (>200) variables (using the forval command). However, several of these
    are all empty i.e., without any observations. Regarding this, I have the following two questions-


    1. Is there a command or way to find the number of missing observations for all the variables in the dataset (I know about 'msesc' command).

    2. Is there a way to drop all the 'empty variables' i.e., without any observations in bulk (i.e., simultaneously).



  • #2
    Code:
    foreach var of varlist * {
        qui count if !missing(`var')
        if r(N) == 0 {
            di "dropping `var'"
            drop `var'
        }
    }
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Originally posted by Maarten Buis View Post
      Code:
      foreach var of varlist * {
      qui count if !missing(`var')
      if r(N) == 0 {
      di "dropping `var'"
      drop `var'
      }
      }

      Thank you very very much................. It worked.
      You saved me several hours of hard work.
      बहुत धन्यवाद ........नमस्ते!

      Comment


      • #4
        There is also a community-contributed command called missings (available from SSC) that accomplishes these tasks without the need for a loop. Once the command is installed, you could just do

        Code:
        missings dropvars * , force

        Comment


        • #5
          Thanks for the mention of missings, which should please be referenced as published through the Stata Journal.

          SJ-15-4 dm0085 Speaking Stata: A set of utilities for managing missing values
          (help missings if installed) . . . . . . . . . . . . . . . N. J. Cox
          Q4/15 SJ 15(4):1174--1185
          provides command, missings, as a replacement for, and extension
          of, previous commands nmissing and dropmiss


          was the original and there are updates, which I will post later.

          Comment


          • #6
            You may generalize the approach shown Maarten to encompass situations for variables are constant throughout, but not necessarily missing. These are likely uninformative to any analysis.

            Code:
            foreach var of varlist * {
              * drop variables that equal a specific constant value
              qui cap assert `var'==0
              if !_rc {
                di "dropping `var'"
                drop `var'
              }
             
              * drop variables that equal any constant value
              qui cap assert `var'[1]==`var'[_n]
              if !_rc {
                di "dropping `var'"
                drop `var'
              }
            }

            Comment


            • #7
              Code:
              findname, all(@ == @[1]) 
              finds constant variables, because if all values are equal, then they are all equal to the first value. findname is from the Stata Journal and functionally (if not syntactically) a superset of ds, so that it can do everything ds can do and more.

              Comment


              • #8
                missings first published 2015, last public revision 2020 -- but use dm0085 in searches

                Code:
                . search dm0085, entry
                
                Search of official help files, FAQs, Examples, and Stata Journals
                
                SJ-20-4 dm0085_2  . . . . . . . . . . . . . . . . Software update for missings
                        (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                        Q4/20   SJ 20(4):1028--1030
                        sorting has been extended for missings report
                
                SJ-17-3 dm0085_1  . . . . . . . . . . . . . . . . Software update for missings
                        (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                        Q3/17   SJ 17(3):779
                        identify() and sort options have been added
                
                SJ-15-4 dm0085  Speaking Stata: A set of utilities for managing missing values
                        (help missings if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                        Q4/15   SJ 15(4):1174--1185
                        provides command, missings, as a replacement for, and extension
                        of, previous commands nmissing and dropmiss
                findname first published 2010, last update 2020


                Code:
                 
                
                . search findname, sj
                
                Search of official help files, FAQs, Examples, and Stata Journals
                
                SJ-20-2 dm0048_4  . . . . . . . . . . . . . . . . Software update for findname
                        (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                        Q2/20   SJ 20(2):504
                        new options include columns()
                
                SJ-15-2 dm0048_3  . . . . . . . . . . . . . . . . Software update for findname
                        (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                        Q2/15   SJ 15(2):605--606
                        updated to be able to find strL variables
                
                SJ-12-1 dm0048_2  . . . . . . . . . . . . . . . . Software update for findname
                        (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                        Q1/12   SJ 12(1):167
                        correction for handling embedded double quote characters
                
                SJ-10-4 dm0048_1  . . . . . . . . . . . . . . . . Software update for findname
                        (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                        Q4/10   SJ 10(4):691
                        update for not option
                
                SJ-10-2 dm0048  . . . . . . . . . . . . . .  Speaking Stata: Finding variables
                        (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                        Q2/10   SJ 10(2):281--296
                        produces a list of variable names showing which variables
                        have specific properties, such as being of string type, or
                        having value labels attached, or having a date format

                Comment

                Working...
                X