Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Searching observations for key words in key variables: using macros and looping with both single and compound words

    I have a list of keywords, including both single words and compound words/phrases. I want to search a couple of variables to determine which observations include any of my keywords. The observations that have any of the key words I will keep, those with none of the keywords I will drop. I am building the basic code using Stata's auto dataset. I am using a macro to store the list of keywords, and then using word count and a while loop to sort out my observations. My problem is that I can't get the loops to complete over all the keywords, though I think it's set up properly. I suspect I am not using double quotes exactly right, but don't know if that is my only problem. Here is my code:

    version 10
    sysuse auto
    gen dummy=0
    gen string="AMC"
    local kw Buick Ford "Audi Fox" Impala
    local N: word count `"`kw'"'
    local i=1
    while `i'<=`N' {
    local n: word `i' of `kw'
    gen X1=regexm(make,`"`n'"')
    gen X2=regexm(string,`"`n'"')
    replace dummy=1 if X1==1 | X2==1
    drop X1 X2
    local i =`i'+1
    }
    drop if dummy==0


  • #2
    If you post in the General forum and not the Sandbox forum, you have a much better chance of getting an answer. Per the on-screen descriptions below the names of the fora, this "sandbox" forum is for learning to the use the forum, not for questions about Stata. Anyway, you're on the right track here as regards logic, but you've made some difficult syntax choices. Here's how I would do what you want:
    what you want:
    Code:
    version 10
    sysuse auto
    gen keeper = 0
    gen other_string = "AMC"
    foreach w in "Buick" "Ford" "Audi Fox" "Impala" {
      replace keeper = 1 if (strpos(make, "`w'") > 0)  | (strpos(other_string, "`w'") > 0)
    }
    keep if (keeper ==1)

    Some comments:
    1. regexm() is rarely needed. strpos() is easier and faster.
    2. I'm a fan of -while- loops, but they're not very commonly used in Stata programs. -forvalues- and -foreach- are much more common.
    3. You may not need to drop the observations with keeper == 0, since Stata statistical procedures permit usages of -if- like this:
    Code:
    summarize weight if (keeper ==1)
    4. You'd benefit from reading -help string functions- , -help foreach-, and -help forvalues-. It looks like you've been reading the help on *hard* stuff, but on the easy stuff. <grin>

    Comment


    • #3
      Thank you so much for this, Mr. Lacy. I appreciate it. CS

      Comment

      Working...
      X