Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • foreach loops: wildcard astrisks (*), nested macros, string variables

    in a foreach loop, I am searching for values within string-variables, using the strmatch() and astrisk (*) wildcards. I need the astrisks because I'm searching for words that fall into any part of the string.

    these string variables are nested into local macros. However using * in the foreach does not work with stata IF it is part of a nested/descendant macro. Is this because:
    • A) wildcards within strings can never be used in foreach in stata when using nested macros, or

      B) it isn't the wildcard itself, but the * (astrisk) that is producing the error in foreach?
    If B), is it possible to define a new character that means 'wildcard' instead of * so I can still use nested macros to organize my concepts before doing foreach?

    Note: I'm working with a large dataset so the strmatch() command without the foreach loop, is not an option/solution, unless there is an alternative to foreach().


    Here's an example, for drug class Q (parent/ancestor macro), with individual drug lists (decendant macro):
    Code:
     *chem term list
    local drug_list1 " "A*B" "B*A" "A" "
    local drug_list2 " "C*D" "D" "  
    
    *search term list
    local drugclassQ " "drug_list1" "drug_list2" "  
    
    *check macro data successfully stored    
    di `drugclassQ'
    (successfully stored information)

    Code:
     *Search all drug terms in descriptions      
    
    foreach searchterm in "drugclassQ" {          
            
            gen byte `searchterm' = 0    
            ​​​​​​​di "Making column called `searchterm'"
                 
    ​​​​​​​        ​​​​​​​​​​​​​​        foreach chemterm in ``searchterm'' {      
    ​​​​​​​        ​​​​​​​​​​​​​​        di "Checking individual search terms in: `chemterm'"                      
    ​​​​​​​        ​​​​​​​        ​​​​​​​        
    ​​​​​​​        ​​​​​​​        ​​​​​​​        foreach indiv in ``chemterm'' {        
    ​​​​​​​        ​​​​​​​        ​​​​​​​        di "Searching all columns for *`indiv'*"
                            
    ​​​​​​​        ​​​​​​​        ​​​​​​​        ​​​​​​​        foreach codeterm in lower(variable) {                
    ​​​​​​​        ​​​​​​​        ​​​​​​​        ​​​​​​​        di "`searchterm': Checking `codeterm' column for *`indiv'*"                    
    ​​​​​​​        ​​​​​​​        ​​​​​​​        ​​​​​​​        ​​​​​​​        ​​​​​​​        
    ​​​​​​​​​​​​​​        ​​​​​​​        ​​​​​​​        ​​​​​​​        replace `searchterm' = 1 if strmatch(`codeterm', "*`indiv'*")                    
    
    ​​​​​​​        ​​​​​​​        ​​​​​​​        ​​​​​​​        }                          
    ​​​​​​​        ​​​​​​​        ​​​​​​​        }              
    ​​​​​​​        ​​​​​​​        }
    ​​​​​​​        }  
    
    gen keep_term = .
    replace keep_term=1 if drugclassQ==1  
    keep if keep_term==1
    Here's an example of what I would want the foreach loop for find, searching within the string variable 'chemical'

    For example searching on "A*B" within parent macro drugclassQ would find
    drugs with string values within the string variable 'chemical' as the following:

    Code:
     Amg / Fmg /B          A/B           A/ B/R         Amg/dose / Emg/dose / Bmg/dose
    ​​​​​​​
    (note: mg = miligrams to illustrate my point about needing to define the variable as a string since the drugs are entered into the database in different ways)

    Example Output to identify strings with A and B anywhere within values of 'Chemical',
    where 0 means the observations doesn't fit the search,
    so I don't keep that observation.
    1 Amg / Fmg /B 1
    2 A/B 1
    3 A/ B/R 1
    4 Amg/dose / Emg/dose / Bmg/dose 1
    5 A 0
    My code works when I do not use astricks, but then that defeats the premise of how I'm using the foreach code,
    i.e., using foreach with wildcard that is within nested macros.

    Any solutions?

    Thanks!
    Last edited by Exxxx Anonymous; 17 Feb 2023, 01:54.

  • #2
    Please note our longstanding request to use full real names. https://www.statalist.org/forums/help#realnames and #3 at https://www.statalist.org/forums/help#adviceextras explain at length.

    There is no data example here that I can interpret easily, a more crucial problem for many readers. https://www.statalist.org/forums/help#stata

    Beyond that you've posted this also on Reddit https://www.reddit.com/r/stata/comme...macros_string/ and Stack Overflow and https://stackoverflow.com/questions/...ables-resolvin We have a request that people tel tell us about cross-posting == see https://www.statalist.org/forums/help#crossposting -- and in any forum alerting people to cross-posting saves people who might answer the irritation of finding that they have posted something already said -- and helps people who would be interested in an answer too.

    I have seen in other places comments to the effect that cross-posting in multiple forums is offensive to people in any forum -- you don't trust people to come up with an answer and you don't care about wasting anyone's time. I don't go that far -- I doubt that anyone intends offence -- but the attitude is common.

    I suppose some people reading this will regard these reminders as justification for anonymity, and they're welcome to post anywhere else on the internet!

    In terms of trying to answer your multiple questions:

    Any attempt to create a variable (you say column) with an asterisk * in the name will fail as only underscores, letters and numbers may appear within variable names.

    But before that note that

    Code:
    foreach searchterm in "drugclassQ"  {
    gen `searchterm' = 0
    }
    creates one and only one variable with name drugclassQ, as the " " will be stripped. You intend there a reference to the local macro drugclassQ, which requires different punctuation.

    It seems that that may be what you want, in which case

    Code:
    gen drugclassQ = 0
    gets you there you directly, and the loop is a distraction to people trying to work out what you're doing. As you're asking for help with code it's vital that your code is as clear and concise as possible.


    At the end of the code

    Code:
    gen keep_term = .
    replace keep_term=1 if drugclassQ==1  
    keep if keep_term==1
    boils down to
    Code:
    keep if drugclassQ == 1
    and there are other places where your code seems roundabout and indirect, and so is hard to follow.

    I tried rewriting your code more fully to see what other code problems might exist, but found it too hard without knowing the underlying goal.

    As said, using a full real name is a request not a rule, but posing a clearer question based on a data example is in everyone's interests, especially yours.
    Last edited by Nick Cox; 17 Feb 2023, 02:54.

    Comment


    • #4
      #3 just echoes the links I supplied on your behalf. Broadly, Exxxx, your choices include trying to engage with #2 or hoping for a better answer from someone else.

      People using Reddit must speak for themselves, but it seems that no-one posting there has noticed the cross-posting.
      Last edited by Nick Cox; 17 Feb 2023, 07:17.

      Comment

      Working...
      X