Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop multiple variables that end the same but start different

    Hello everyone,

    I have a dataset with 167 variables and 351 observations. About 50 variables are such that they record the same information for different time periods. For example, I have
    Rainfall_1995
    Rainfall_1998
    Rainfall_2002
    Rainfall_2006
    Aridity_1995
    Aridity_1998
    Aridity_2002
    Aridity_2004
    Aridity_2006

    and so on.

    From this, I essentially want to keep only the variables that end in 2006 and drop the rest leaving me with only 50 variables (one each for year 2006).

    I know I could just specify the range of variables under each category use -drop- but I'm not sure if this is the quickest way to do this. Wondering if there's a way to just keep variables that end in _2006.

    Thank you in advance for any suggestions!

    Best,
    Mansha
    Last edited by Mansha Mahajan; 29 Jun 2021, 12:03.

  • #2
    Well, the following code will work, on the assumption that for every such series of variables there is a 2006 variable, and that for every series that include a variable ending in _2006, you do want to eliminate all other years.

    Code:
    ds *_2006
    local 2006_vars `r(varlist)'
    local other_years: subinstr local 2006_vars "2006" "*", all
    ds `other_years'
    local other_years `r(varlist)'
    local other_years: list other_years - 2006_vars
    
    drop `other_years'
    That said, depending on the actual arrangement of the variables in your data set, there may be simpler ways of doing it. The above code is robust to any such peculiarities, but , correspondingly, forsakes all opportunities to be more efficient by exploiting particularities.

    Comment


    • #3
      Thanks so much Clyde! Your assumption is correct and this worked quite well.

      Comment


      • #4
        Here is an alternative approach using a regular expression which drops all variables ending with an underscore and four numbers (other than 2006):

        Code:
        ds
        keep `=ustrregexra("`r(varlist)'","(\S+_2006)|\S+_[0-9]{4}","$1")'

        Comment


        • #5
          Awesome! Thanks Ali

          Comment

          Working...
          X