Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • strip: defining the character(s)

    Dear List
    I’d appreciate help using strip (from http://fmwww.bc.edu/RePEc/bocode/s in Stata13.1 64bit) to define more than a letter (if it is possible). I think my problem is I’m not fully competent yet defining character/apostrophe combinations.

    I have a list of names followed by (x) (where x is one of the 26 capital letters). When I run the code below it removes all capitals but I only want to loop through and remove the full "(x)".

    Examples:
    Albany (A)
    Bathurst Region (B)
    Blue Mountains (R)

    I want to end up with:
    Albany
    Bathurst Region
    Blue Mountains

    (There are also places with, eg, ...(DC) etc. I'll deal with these if I get the basic code to work.)

    I have tried the code below but am stripping all capital letters from the data. I have tried various combinations of ` ' " but have not hit the right one by trial-and-error:

    Code:
    foreach char  in A B C D E F G H I J K L M N O P Q R S T U V W X Y Z {
     strip lgatemp, of("(`char')") gen(outtemp)
     drop lgatemp
     rename outtemp lgatemp
    }
    Any help appreciated.
    Laurence

  • #2
    You are misunderstanding -strip-. It removes the characters specified one by one. It does not remove them if and only if they occur together as a literal substring. -strip- is the wrong command for you. A loop will do the job but use -subinstr()- to replace substrings with blanks. Your use of local macros is fine. The capitals A...Z are accessible in c(Alpha).

    Comment


    • #3
      No need for user-written commands. Try:

      Code:
      clear
      set more off
      
      input ///
      str25 region
      "Albany (A)"
      "Bathurst Region (B)"
      "Blue Mountains (R)"
      end
      
      list
      
      gen region2 = reverse(substr(reverse(region), 5, .))
      
      list
      See -help string functions-.
      You should:

      1. Read the FAQ carefully.

      2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

      3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

      4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

      Comment


      • #4
        Thanks Nick & Roberto -- both work (with some adjustment for messy data).
        Laurence

        Comment


        • #5
          Originally posted by Laurence Lester View Post
          There are also places with, eg, ...(DC) etc. I'll deal with these if I get the basic code to work.
          Originally posted by Laurence Lester View Post
          Thanks Nick & Roberto -- both work (with some adjustment for messy data).
          I'm not sure what else you mean by messy data (besides things like "(DC)"), but note that much (if not all) of this may be handled by an appropriate regular expression. For example,
          Code:
          gen outtemp = regexr(lgatemp,"[ ]*\([A-Z][A-Z]?\)[ ]*$","")
          will strip "(C)" or "(CC)" off the end of the string (where C is any capital letter), including any optional surrounding spaces. And, unlike using substr(), this approach will work for entries without the trailing "(C)" (it will just leave them alone).
          Last edited by Phil Schumm; 15 Sep 2014, 03:12.

          Comment


          • #6

            Thanks Phil for your contribution.

            Comment


            • #7
              The following may be - or not be - a useful alternative:

              Code:
              clear
              input str30 place
              "Albany (A)"
              "Bathurst Region (B)"
              "Blue Mountains (R)"
              end
              
              generate pos = strpos(place," (")
              generate place1 = substr(place,1,pos)
              
              . list
                   +----------------------------------------------+
                   |               place   pos             place1 |
                   |----------------------------------------------|
                1. |          Albany (A)     7            Albany  |
                2. | Bathurst Region (B)    16   Bathurst Region  |
                3. |  Blue Mountains (R)    15    Blue Mountains  |
                   +----------------------------------------------+

              Comment

              Working...
              X