Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop columns using non-missing observations as criteria.

    I'm working with long data; I have about 100 columns (variables), with a lot of missing observations. I want to loop through the variables and delete columns with less than "some number", e.g., 32, of non-missing values. (This questoin is cross-listed on Stackexchange under the same title).

    I tried the following:
    Code:
     foreach var of varlist cid-lpop {   2. bysort `var': drop if _N < 32   3. }
    but it deleted my observations instead dropping any columns.
    Last edited by John Michael Perdue; 28 Aug 2017, 01:55.

  • #2
    Code:
    foreach var of varlist cid-lpop {
        count if !missing(`var')
        drop if r(N) < 32
    }
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thank you for your response Maarten! I applied the code and initially thought that it had worked according to my needs. Unfortunately, it deleted all of my observations instead of dropping any of the columns. What I need to do is drop columns where the number of non-missing observations are greater than 32.
      Last edited by John Michael Perdue; 28 Aug 2017, 02:08.

      Comment


      • #4
        Maarten meant
        Code:
        drop `var' if r(N) < 32

        Comment


        • #5
          Thankyou for your comment Joseph. I applied what you suggested but I am getting an r(198) error message: invalid syntax.


          Code:
          foreach var of varlist cid-lpop {
          count if !missing(`var')
          drop `var' if r(N) < 32
          }
          Last edited by John Michael Perdue; 28 Aug 2017, 02:29.

          Comment


          • #6
            I think the reason you are seeing error r(198) is because drop is designed to either drop a variable/s or observations and not both at one go. To confirm this, you may read
            Code:
            help drop
            . Moreover, @Maarten Buis #2 reply, although correct, does not apparently give you what you need. Therefore, I would suggest you try to look at @Nick Cox's
            HTML Code:
            missings
            "Various utilities for managing missing values".
            Code:
            ssc inst missings

            Comment


            • #7
              Ah! Abdul is right. Try
              Code:
              foreach var of varlist cid-lpop {
                  count if !missing(`var')
                  if r(N) < 32 drop `var'
              }

              Comment


              • #8
                Joseph Coveney So far it seems to have worked! Thank you for your help on this one! :-)

                Comment


                • #9
                  To push forward Abdul's suggestion:

                  1. missings was written up in Stata Journal 15(4) 2015

                  2. An update is forthcoming in Stata Journal 17(3) 2017.

                  With missings you can ask for a report on the variables with at least so many missing values and then use its saved results to drop what has too many missings by your personal criterion.

                  Code:
                  . clear
                  
                  . set obs 100 
                  number of observations (_N) was 0, now 100
                  
                  . * 30 missings 
                  . gen y = 1 in 1/70 
                  (30 missing values generated)
                  
                  . * 70 missings 
                  . gen z = 1 in 1/30 
                  (70 missing values generated)
                  
                  . summarize 
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                             y |         70           1           0          1          1
                             z |         30           1           0          1          1
                  
                  . 
                  . missings report, min(68) 
                  
                  Checking missings in all variables:
                  70 observations with missing values
                  
                  ----------
                     |    #
                  ---+------
                   z |   70
                  ----------
                  
                  . 
                  . drop `r(varlist)' 
                  
                  . summarize
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                             y |         70           1           0          1          1

                  Comment


                  • #10
                    Nick Cox: Thank you for providing this example.

                    Comment

                    Working...
                    X