Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract last part of a string

    Hello,

    I have a string variable named affil1, which is the full affiliation of an author appearing on a paper:
    Code:
                 storage   display    value
    variable name   type    format     label      variable label
    ----------------------------------------------------------------------------------------------------------
    affil1          str255  %255s
    the contents in this variable always take the following structure:

    Department, University Name, ZIP Code, Region, Country

    I would like to extract the portion containing the "country" and create a new variable (Country) with this information.
    So, basically I need to extract the portion after the last comma (or the portion after the first comma, if we count from the right).
    I've made some attempts by combining substr and strpos, but results are not satisfactory so far.

    Code:
    gen country = substr(affil1, strpos(affil1, ","), .)
    Also, note that some countries contain more than one word (e.g. united states).
    Any help would be highly appreciated.

    Thanks a lot.

  • #2
    not completely clear to me; however, if there are no commas in the country name ever, then I think you are better using strrpos to get the position the last comma and taking everything after that as the country; help for strrpos can be found in help for string functions (sorry, but the forum software does not like this function name and keeps changing it on me - I hope it comes thru)

    Comment


    • #3
      Thanks for the response!

      I agree that strpos will be useful to find the position of the last comma. But how do I do it?
      If I use strpos in this way, I get the position of the first comma:
      Code:
      strpos(affil1, ",")
      but, how can I find the position of the last comma?

      Comment


      • #4
        Richard has already answered that. There is a function strrpos() (distinct from strpos()) for that purpose.

        Before it existed, the usual trick was to reverse the string first.

        Comment


        • #5
          Thanks for the answer. You're right I didn't spot the spelling difference between both functions.
          However, it seems that the function strrpos()is not available in Stata 13.
          I have tried to install it with "findit" and "ssc install" withouth success. Any solution?

          Comment


          • #6
            Code:
            . local txt "here, yes, right here, is some text, really"
            
            . di strlen(`"`txt'"')-strpos(strreverse(`"`txt'"'),",")+1
            36

            Comment


            • #7
              strrpos() is part of the built-in official code in Stata 14 and cannot be installed from anywhere. Sergiy has already given you one solution: as I mentioned, reversing the string first was the previous trick.

              FAQ Advice Section 11

              State the version of Stata used

              The current version of Stata is 14.0. Please specify if you are using an earlier version; otherwise, the answer to your question is likely to refer to commands or features unavailable to you. Moreover, as bug fixes and new features are issued frequently by StataCorp, make sure that you update your Stata before posting a query, as your problem may already have been solved.

              Comment


              • #8
                For anyone awaiting a solution involving regular expressions ...
                Code:
                clear
                input str100 addr
                "Department, University Name, ZIP Code, Region, Country"
                end
                gen str50 cty = regexs(1) if regexm(addr,", *([^,]*)$")
                list, clean noobs
                Code:
                                                                      addr       cty  
                    Department, University Name, ZIP Code, Region, Country   Country

                Comment


                • #9
                  Hi,

                  Thanks a lot for all responses. Solved.

                  Comment

                  Working...
                  X