Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping multiple variables ending within a range of numbers

    Dear Statlist,

    I am currently trying to organise data from the LISS dataset. A number of variables including within this dataset are not of use to me and I am looking to drop these. The variable each end in a numeric value from 1 to 500 and I am looking to drop a number of these all at once. I was hoping to achieve this by telling stata to drop variables ending in (for example) *100 to *200.

    I have been trying to do this with a loop function and varlist but have not had any luck. Please find dataex example below, I have only including 4 variable but in effect would be looking to drop variable cw15h498 to cw15h500 without manually repeating the drop *498 *499 *500 an so on. Thank you for your help.

    Kind Regards,

    Hugo

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(cw15h497 cw15h498 cw15h499 cw15h500)
    4 3 4 4
    3 3 3 3
    end
    label values cw15h497 cw15h497
    label def cw15h497 3 "3", modify
    label def cw15h497 4 "4", modify
    label values cw15h498 cw15h498
    label def cw15h498 3 "3", modify
    label values cw15h499 cw15h499
    label def cw15h499 3 "3", modify
    label def cw15h499 4 "4", modify
    label values cw15h500 cw15h500
    label def cw15h500 3 "3", modify
    label def cw15h500 4 "4", modify



  • #2
    Hugo:
    Code:
    . foreach var of varlist cw15h497-cw15h500 {
      2. drop cw15h*
      3.  }
    works for me.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      This will drop variables ending in 400-999 (or in other words, a variable ending in a digit between 4 and 9 followed by 2 digits).

      Code:
      foreach var of varlist*{
          if ustrregexm("`var'", "[4-9][0-9]{2}$"){
              drop `var'
         }
      }

      Comment


      • #4
        Dear Carlo and Andrew,

        Thank you for your reply and the two different ways of approaching this hurdle.

        Starting with your solution Carlo, I have tried this syntax but have been getter the error message "variable cw15h* not found, r(111)". Do you know why this might be the case? After I have run this command I am finding that the entire database is being dropped (bar the one variable not begining with cw15h)

        Thank you Andrew as well, with this code I was able to drop all the variable over 500 which I was looking to drop, but is it possible to further define where the first variable ending drops? So for example rather than dropping 400 - 999 you are able to define it as 427-534)?


        For a more precise example, the largest range of uninterrupted variables i am looking to drop run from variable name cw15h190 - cw15h287.

        Comment


        • #5
          So for example rather than dropping 400 - 999 you are able to define it as 427-534)?
          Code:
          foreach var of varlist *{
              if inrange(`=real(substr("`var'", -3, 3))', 427, 534){
                  drop `var'
              }
          }

          Comment


          • #6
            Andrew, thanks so much. That does the job!

            Comment


            • #7
              Hugo:
              I'm too late to the party.
              My bad: my previous code should have been:
              Code:
               foreach var of varlist cw15h497-cw15h500 {
                2. drop  `var'
                3.  }
              
              .
              Happy with reading that Andrew's code fixed the issue.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                I would loop to collect the names of variables that qualify and then drop them all once the loop is done.

                Comment

                Working...
                X