Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Selecting various last words from a string

    Dear Statalist members,
    I'm using Stata 14. I have a string variable that contains adresses in the form of "Neighborhood Municipality".
    I need to extract the municipality from the string.
    The problem is that both the name of the municipality and the neighborhood may be composed by more than one word and there is no character separating them so this may be a little complicated.
    Data looks something like this:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str56 origin
    "neighborhoodA municipalityA"                            
    "neighborhoodBA neighborhoodBB municipalityA"            
    "neighborhoodA municipalityBA municipalityBB"                            
    "neighborhoodBA neighborhoodBB municipalityBA municipalityBB"
    end


    I have the list of municipalities so I'm using it to identify the municipality name in each string.
    So far I've started identifying municipalities with a one-word name. Now I want to move on to municipalities with two-word names and so on.
    So basically what I think I need is to be able to check the last word of the string, then the two last words and so on and see if they match to the list of municipalities I have.
    I've tried using the regex functions but I still have problems using it. Any ideas?
    Thanks in advance!

  • #2
    So let's say that you have your list of municipality names contained in a local macro, which I'll call municipalities.

    Code:
    gen municipality = ""
    foreach m of local municipalities {
        replace municipality = `"`m'"' if strpos(origin, `"`m'"')
    }
    will work if the words in the municipality names cannot appear as part of the neighborhood name. If that restriction doesn't hold, then you need to do it based on location at the end of the string. So that's just a tad more complicated:

    Code:
    gen municipality = ""
    foreach m of local muicipalities {
        replace municipality = `"`m'"' if strpos(reverse(origin), reverse(`"`m'"')) == 1
    }
    Note: None of this code is tested, so beware of typos, unbalanced braces, etc.

    Comment


    • #3
      [deleted]

      Comment

      Working...
      X