Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subsetting dataset using variable names

    Hi,

    Here is a sample dataset from where I want to keep variables ending in _1 and the ones that are not numbered e.g. age, state. I just want to remove those with _2, _3 and so on. I am happy with the following:

    Code:
    drop *_2 *_3 *_4
    but this is a bit cumbersome for large datasets, and something tells me that there might a more elegant solution on Statalist.

    Please let me know.

    Thank you.

    Code:
     clear
    input int(age state polio_1) float polio_2 int(bcg_1 bcg_2 bcg_3)
     4296 21 3 2.5 16 2130 161
     9690 17 5   3 15 2830 189
     3829 22 4   3  9 2580 169
     3895 26 3   3 10 1830 142
     5189 20 3   2 16 3280 200
     5788 18 3   4 21 3670 218
     4082 19 3 3.5 13 3400 200
     5899 18 5 2.5 14 2410 174
     3667 24 2   2  7 2750 179
    15906 21 3   3 13 4290 204
     4647 28 3   2 11 3260 170
    12990 14 . 3.5 14 3420 192
     4589 35 5   2  8 2020 165
     4425 34 5 2.5 11 1800 157
     6303 14 4   3 16 4130 217
     6342 17 2 4.5 21 3740 220
     5719 18 5   2 11 2670 175
     3798 35 5 2.5 11 2050 164
     3984 30 5   2  8 2120 163
     3299 29 3 2.5  9 2110 163
     4934 18 1 1.5  7 3470 198
     9735 25 4 2.5 12 2650 177
    13466 14 3 3.5 15 3830 201
     4890 18 4   4 20 3690 218
    10372 16 3 3.5 17 3880 207
     5798 18 4   4 20 3700 214
     5222 19 3   2 16 3210 201
     4516 18 3   3 15 3370 198
     4723 19 3 3.5 17 3200 199
     5705 16 4   4 20 3690 212
     4499 28 4 2.5  5 1760 149
     6850 25 4   2 16 1990 156
     5397 41 5   3 15 2040 155
     4697 25 4   3 15 1930 155
     6486 26 . 1.5  8 2520 182
     3748 31 5   3  9 2200 165
     4187 21 3   2 10 2650 179
    end
    [/CODE]

  • #2
    I think you would need to loop. It would be dangerous to assume that anything that follows an underscore is a number other than one as words can follow underscores. Zeros are not considered below, but you state no rule for them immediately following an underscore.

    Code:
    foreach var of varlist *{
        if ustrregexm("`var'", "_[2-9]$|_[2-9][\d]+$"){
            drop `var'
        }
    }

    Comment


    • #3
      Code:
      ds
      keep `= ustrregexra("`r(varlist)'", "\b[\p{letter}_][\p{letter}_0-9]*_[2-9]\d*\b","" ) '

      Comment


      • #4
        Originally posted by Andrew Musau View Post
        I think you would need to loop. It would be dangerous to assume that anything that follows an underscore is a number other than one as words can follow underscores. Zeros are not considered below, but you state no rule for them immediately following an underscore.

        Code:
        foreach var of varlist *{
        if ustrregexm("`var'", "_[2-9]$|_[2-9][\d]+$"){
        drop `var'
        }
        }

        Hi Andrew,

        Thanks so much for your reply!

        I tried this method, and it correctly removes the variables ending in _2 to _9; however it ignores the higher ones e.g. _10. The list goes up _21 for certain indicators, and replacing 9 with 21 dropped all the variables. So it'd very helpful to know a little about how the command functions.

        Also, some of the variables start with _0 which I renamed by dropping the zeros. Maybe this method is applicable to those variables as well!

        Please let me know.





        Comment


        • #5
          Originally posted by Sonnen Blume View Post
          I tried this method, and it correctly removes the variables ending in _2 to _9; however it ignores the higher ones e.g. _10.
          You are correct. You need

          Code:
          foreach var of varlist *{
              if ustrregexm("`var'", "_[2-9]$|_[1-9][\d]+$"){
                  drop `var'
              }
          }

          Comment


          • #6
            Originally posted by Bjarte Aagnes View Post
            Code:
            ds
            keep `= ustrregexra("`r(varlist)'", "\b[\p{letter}_][\p{letter}_0-9]*_[2-9]\d*\b","" ) '
            Hi Bjarte, thank you for the solution! It works well for variables with suffixes up to _9, the same as the one proposed by Andrew.

            Comment


            • #7
              Originally posted by Andrew Musau View Post

              You are correct. You need

              Code:
              foreach var of varlist *{
              if ustrregexm("`var'", "_[2-9]$|_[1-9][\d]+$"){
              drop `var'
              }
              }
              Thank you! This is doing the work!

              Comment


              • #8
                #3 modified:
                Code:
                ds
                keep `= ustrregexra("`r(varlist)'", "\b[\p{L}_][\p{L}_\d]*_(?:[02-9]{1}|\d{2,})\b","") '
                or
                Code:
                ds
                keep `= ustrregexra("`r(varlist)'", "\P{Zs}+_(?:[02-9]{1}|\d{2,})\b","") '
                Last edited by Bjarte Aagnes; 29 Dec 2022, 03:28.

                Comment


                • #9
                  Originally posted by Bjarte Aagnes View Post
                  #3 modified:
                  Code:
                  ds
                  keep `= ustrregexra("`r(varlist)'", "\b[\p{L}_][\p{L}_\d]*_(?:[02-9]{1}|\d{2,})\b","") '
                  or
                  Code:
                  ds
                  keep `= ustrregexra("`r(varlist)'", "\P{Zs}+_(?:[02-9]{1}|\d{2,})\b","") '
                  Both of the codes work neatly. Thanks so much!

                  Comment


                  • #10
                    Originally posted by Bjarte Aagnes View Post
                    #3 modified:
                    Code:
                    ds
                    keep `= ustrregexra("`r(varlist)'", "\b[\p{L}_][\p{L}_\d]*_(?:[02-9]{1}|\d{2,})\b","") '
                    or
                    Code:
                    ds
                    keep `= ustrregexra("`r(varlist)'", "\P{Zs}+_(?:[02-9]{1}|\d{2,})\b","") '
                    Both of the codes work neatly. Slightly faster than the loop. Thanks so much!

                    Comment

                    Working...
                    X