Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Delete variables that have the same number repeated across all observations with some missing values.

    I have a dataset with ~2500 variables named NEIS* and I want to:
    - Delete all variables that have the same number repeated across all observations regardless of whether there are missing values in the column.

    Below is the chunk of code I was using and the error message and a picture highlighting the variables I want to delete.



    Code:
    foreach myvar of varlist NEIS* {
      sort `myvar'
      summarize `myvar'
      scalar sd_ = r(sd)      *columns with the same number will have a std deviation of 0
      if (`myvar'[1] == `myvar'[_N]) drop `myvar'  *Delete all variables that have the same number
      if ( `sd_' == 0) drop `myvar' *this is where I think my code fails
      scalar drop _all  *drop scalar to prevent reuse by another loop
      }
    Stata version 15.1 on Windows 10


    Variable | Obs Mean Std. Dev. Min Max
    -------------+---------------------------------------------------------
    NEIS0001 | 6 27 0 27 27
    ==0 invalid name
    r(198);

    NEIS0001 NEIS2267 NEIS3000 NEIS3169 NEIS3170 NEIS3171 NEIS3172 NEIS3173
    2 17 1 99992 2 99991 8 999920
    2 2 99992 99991 8 999920
    2 2 3 5 8 999920
    2 56 1 99991 8
    2 2 8 999920
    2 2 456 99991 8 999920
    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	7.7 KB
ID:	1482498

  • #2
    Code:
    foreach v of varlist NEIS* {
        sort `v'
        capture assert `v' == `v'[1] | missing(`v')
        if c(rc) == 0 {
            drop `v'
        }
    }
    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thank you Clyde. Code works perfectly! I will have to read up on -assert-. For the record, here is my dataex and my own hack that worked as well. turned out I needed to add else to the second if statement.
      Code:
       foreach myvar of varlist NEIS*  {
        sort `myvar'
        summarize `myvar'
        tempname my_means
        scalar `my_means' = r(mean)
        
        if (`myvar'[1] == `myvar'[_N]) drop `myvar' 
        else if ( `my_means' == `myvar'[1]) drop `myvar' 
        macro drop `my_means'
        }
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(NEIS0059 NEIS0071 NEIS0288 NEIS0367 NEIS0475 NEIS0599 NEIS0834 NEIS0985 NEIS01080 NEIS01132 NEIS01156 NEIS1439 NEIS1456 NEIS1557 NEIS1577 NEIS1794 NEIS1795 NEIS1829 NEIS1844 NEIS2113)
      1803    10     93    13    76 1 999912  . 999915     5 999964 511  . 1801 91 999915     9  . 45 49
         9    10     93 99993 99999 .     10 90      6 99994 999964  20 22 1809  .      . 99991 10 45 49
         9 99991 999913    13    76 1     10 90      6   316    371  20 22 1801 91 999915     9 10  .  .
      end

      Comment


      • #4
        In #3, comparing min & max is a better choice than using mean.
        Code:
        foreach v of varlist NEIS* {
            sum `v', meanonly
            if `r(min)' == `r(max)' {
            drop `v'
            }
        }
        Last edited by Romalpa Akzo; 07 Feb 2019, 17:26.

        Comment


        • #5
          Cool. Thank you, Romalpa.

          Comment

          Working...
          X