Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • manipulate list of words ( --> strings) - keep subset of strings with similar pattern

    I have a local with a list of string and want to keep a subset of the string that have the same pattern. I can't work out how to do it in a simple way with the the build in string and macro list functions.

    Problem
    I have the following local (list).

    local my_large_string_list "a1 a_2 a_x zx zc zv a"

    I want to create the list
    local my_small_a_list "a1 a_2 a_x a"

    Working with variables I would write "keep a*". How do I do something similar to the functionality * (star)? Alternative, how do I create a 'not' list that I subtract from the full list.
    (Note: my list is too large to do the above manually. This is a simplification of my actual task.)

    Additional explanation
    I tried to do this in the framework of
    local my_small_a_list : { command } local my_large_string_list { command }.
    For example to remove zx I would use: local no_zx_list : subinstr local my_large_string_list "zx" "", all
    Essentially, I want to do the opposite. Not to remove but to keep. And I want to keep with a short cut for keeping everything with a particular substring.

    Can someone help?

    Thanks,
    Boris



  • #2
    Code:
    local my_large_string_list "a1 a_2 a_x zx zc zv a"
    local remove= trim(itrim(ustrregexra(" "+ "`my_large_string_list'" + " ", "[^ ]*a[^ ]*", " ")))
    local wanted: list my_large_string_list - remove
    di "`remove'"
    di "`wanted'"
    Res.:

    Code:
     
    . di "`remove'"
    zx zc zv
    
    . 
    . di "`wanted'"
    a1 a_2 a_x a

    Comment


    • #3
      Thanks, Andrew!

      Can you point me to the documentation or page where I can read up on the syntax in the expression "[^ ]*a[^ ]*"

      I can't find it.

      Comment


      • #4
        Stata's -ustrregex*()- functions use the ICU regular expression engine (see Hua Peng's post #16 at https://www.statalist.org/forums/for...ressions/page2). The official documentation is at https://unicode-org.github.io/icu/us...gs/regexp.html. If you find the proposed solution unintuitive, note that is possible to solve the problem without using regular expressions.

        Code:
        local wanted
        local my_large_string_list "a1 a_2 a_x zx zc zv a"
        foreach w of local my_large_string_list{
            if strpos("`w'", "a")>0{
                local wanted "`wanted' `w'"
            }
        }
        display "`wanted'"
        Res.:

        Code:
        . display "`wanted'"
         a1 a_2 a_x a

        Comment

        Working...
        X