Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping variables based on values of another variable

    Hello,

    I have a dataset and would like to exclude variables based on if their name appear on another variable (my variable names are actually numbers, since Stata does not recognize varnames that begin with numbers, I'm adding a underscore in front of it). For example, in the mock data below, I would like to exclude variables _1, _4, _8, _12, _14 and _16 because these are not present on the variable number.

    Exemple dataset:
    Code:
    clear
    input float(number _1 _2 _3 _4 _5 _6 _7 _8 _9 _10 _11 _12 _13 _14 _15 _16)
     2 96.535      .   96.7 96.865  96.37   96.7 96.865 96.205  95.71 97.855  97.03 97.195  97.03 97.195  98.68 98.845
     3 96.535  97.36      . 97.195   96.7  97.03 97.195 96.535  96.04 97.855  97.03 97.195  97.03 97.195  99.01 99.175
     5 95.875  96.37  96.04 96.205      . 96.205  96.37  95.71 95.215 96.865 96.535   96.7 96.535   96.7  98.02 98.185
     6  96.37 96.865 96.535   96.7 96.205      .   96.7  96.04 95.545  97.36  97.03 97.195  97.03  97.03 98.515  98.68
     7 96.205   96.7  96.37 96.535  96.04  96.37      . 95.875  95.38 97.195 96.865  97.03 96.865 96.865  98.35 98.515
     9 96.205  96.37  96.37 96.535  95.71  96.04 96.205 95.875      . 97.195 96.865  97.03 96.865 96.865  98.02 98.185
    10  96.04 96.205 96.205  96.37 95.545 95.875  96.04  95.71 95.215      .   96.7 96.865   96.7   96.7 97.855  98.02
    11 95.875  96.04  96.04 96.205  95.38  95.71 95.875 95.545  95.05 96.865      .   96.7 96.535 96.535  97.69 97.855
    13  96.04 96.205 96.205  96.37 95.545 95.875  96.04  95.71 95.215  97.03   96.7 96.865      .   96.7 97.855  98.02
    15 95.545  96.37  96.37 96.535  96.04  96.37 96.205 96.535 95.545 96.865  96.37 96.535  96.37 96.535      . 97.855
    end

    I've been struggling with this for a few hours and I considered doing something as:

    Code:
    levelsof number, local(id)
    
    foreach i in `id'{
        keep _`i'
    }
    Which does not work because it correctly keeps variable _2, but while doing so excludes (correcly) _1 and (incorrectly) all variables that are after _2, which breaks the code.

    Any ideas on how to achieve that?

  • #2
    There are probably other strategies that would work, but I would use the loop to build a macro with the list of variables to keep.

    Code:
    levelsof number, local(id)
    
    foreach i in `id' {
        local kplist="`kplist' _`i'"
    }
    
    keep `kplist'

    Comment


    • #3
      Code:
      foreach v of varlist _* {
          local w: subinstr local v "_" ""
          count if number == `w'
          if r(N) == 0 {
              drop `v'
          }
      }
      Added: Crossed with #2 which offers a different solution. I like Sarah's solution better than mine.

      Comment


      • #4
        Thank you very much for both solutions! I really appreciate your help!

        Comment

        Working...
        X