Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Recoding string variable - leaving capital letters only

    Hi All,

    I have a string variable with a few words per observation. I would like to generate an acronym variable contained only capital letters from the string variable, as in the example below:

    Observation StringVariable AcronymVariable
    OBS001 Some Phrase as a String SPS

    Needles to say, as always I will be grateful for any help.
    Kind regards,
    Konrad
    Version: Stata/IC 13.1

  • #2
    I can't think of a solution that does not require looping but here's a way to do it using regular expressions

    Code:
    clear
    input str50 s
    "Some Phrase as a String"
    "Some Other String"
    end
    
    gen abbr = s
    local more 1
    while `more' {
        replace abbr = regexr(abbr,"[^A-Z]+","")
        count if regexm(abbr,"[^A-Z]+")
        local more = r(N)
    }

    Comment


    • #3
      Konrad,

      There may be a more elegant and efficient way to do this, but this brute force method is the first thing that came to mind:

      Code:
      gen Acronym=""
      forvalues i=1/20 {
        replace Acronym=Acronym+substr(StringVariable,`i',1) if inrange(substr(StringVariable,`i',1),"A","Z")
      }
      Change the 20 to the length of the string. If you need this to be more general, there is a way to compute this automatically.

      Regards,
      Joe

      Comment


      • #4
        Robert,

        Thank you very much for the prompt reply, it worked perfectly.


        Originally posted by Joe Canner View Post
        Change the 20 to the length of the string. If you need this to be more general, there is a way to compute this automatically.
        Thanks very much. Strings are of variable lengths. My initial thinking is that I could get string length and pass it as local macro instead 20 but then I would end up writing a loop, as I don't know alternative solution.
        Last edited by Konrad Zdeb; 16 May 2014, 07:33.
        Kind regards,
        Konrad
        Version: Stata/IC 13.1

        Comment


        • #5
          Variable string lengths won't bite. It's just the longest string value you need to measure (i.e. count characters).

          Comment


          • #6
            Here's another technique that does not use regular expressions.

            Code:
            local chars `c(alpha)' 0 1 2 3 4 5 6 7 8 9
            gen acronym = subinstr(s," ", "", .)
            foreach c in `chars' {
                replace acronym = subinstr(acronym, "`c'", "", .)
            }

            Comment

            Working...
            X