Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find string before different types of symbols

    Hi Statalist community,

    I have a sample dataset below. It comprises of all the different ways that participants describe their race.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str123 Race
    "African American/Black"                                                                                                    
    "White                         "                                                                                            
    "I do not wish to answer."                                                                                                  
    "African American/Black,Hawaiian/Other Pacific Islander"                                                                    
    "Asian "                                                                                                                    
    "American Indian/Alaskan Native"                                                                                            
    "African American/Black,American Indian/Alaskan Native"                                                                      
    "African American/Black,American Indian/Alaskan Native,White                         "                                      
    "African American/Black,White                         "                                                                      
    "American Indian/Alaskan Native,White                         "                                                              
    "African American/Black,Asian "                                                                                              
    "White                         ,American Indian/Alaskan Native"                                                              
    "White                         ,Hawaiian/Other Pacific Islander"                                                            
    "African American/Black,I do not wish to answer."                                                                            
    "African American/Black,American Indian/Alaskan Native,Hawaiian/Other Pacific Islander"                                      
    "Asian ,White                         "                                                                                      
    "American Indian/Alaskan Native,African American/Black,Asian ,White                         "                                
    "American Indian/Alaskan Native,Asian "                                                                                      
    "African American/Black,Asian ,Hawaiian/Other Pacific Islander"                                                              
    "White                         ,African American/Black"                                                                      
    "Hawaiian/Other Pacific Islander"                                                                                            
    "Hawaiian/Other Pacific Islander,White                         "                                                            
    "African American/Black,American Indian/Alaskan Native,Asian ,Hawaiian/Other Pacific Islander,White                         "
    "African American/Black,American Indian/Alaskan Native,Asian "                                                              
    "African American/Black,American Indian/Alaskan Native,Asian ,White                         "                                
    "Asian ,Hawaiian/Other Pacific Islander"                                                                                    
    "American Indian/Alaskan Native,Asian ,White                         "                                                      
    "African American/Black,American Indian/Alaskan Native,Hawaiian/Other Pacific Islander,White                         "      
    "American Indian/Alaskan Native,Hawaiian/Other Pacific Islander,White                         "                              
    "White                         ,I do not wish to answer."                                                                    
    "Hawaiian/Other Pacific Islander,African American/Black"                                                                    
    "African American/Black,Asian ,White                         "                                                              
    "African American/Black,Asian ,Hawaiian/Other Pacific Islander,White                         "                              
    "American Indian/Alaskan Native,Hawaiian/Other Pacific Islander"                                                            
    "White                         ,African American/Black,American Indian/Alaskan Native,Hawaiian/Other Pacific Islander"      
    end
    You can see that some individuals list multiple races. I want the first race in an individual's record. I was using the link below and used the suggested code.

    https://www.statalist.org/forums/for...rt-of-a-string

    In my situation, I have different types of symbols such as "/" and "," and "." and trailing blanks. So I wrote the following code but the syntax is wrong. Is there something that I'm missing. Thanks for your help.

    Code:
    gen wanted = substr(Race, 1, strpos(Race, "/" | "," |"." | " ") - 1)

  • #2
    Consider this:

    Code:
    gen wanted = ustrtrim(ustrregexs(1)) if ustrregexm(Race,"^([a-zA-Z\s]+)")
    which produces:

    Code:
      +----------------------------------------------------------------------+
      | Race                                                          wanted |
      |----------------------------------------------------------------------|
      | African American/Black                              African American |
      | White                                                          White |
      | I do not wish to answer.                     I do not wish to answer |
      | African American/Black,Hawaiian/Other Pa..          African American |
      | Asian                                                          Asian |
      | American Indian/Alaskan Native                       American Indian |
      | African American/Black,American Indian/A..          African American |
      | African American/Black,American Indian/A..          African American |
      | African American/Black,White            ..          African American |
      | American Indian/Alaskan Native,White    ..           American Indian |
      | African American/Black,Asian                        African American |
      | White                         ,American ..                     White |
      | White                         ,Hawaiian/..                     White |
      | African American/Black,I do not wish to ..          African American |
      | African American/Black,American Indian/A..          African American |
      | Asian ,White                                                   Asian |
      | American Indian/Alaskan Native,African A..           American Indian |
      | American Indian/Alaskan Native,Asian                 American Indian |
      | African American/Black,Asian ,Hawaiian/O..          African American |
      | White                         ,African A..                     White |
      | Hawaiian/Other Pacific Islander                             Hawaiian |
      | Hawaiian/Other Pacific Islander,White   ..                  Hawaiian |
      | African American/Black,American Indian/A..          African American |
      | African American/Black,American Indian/A..          African American |
      | African American/Black,American Indian/A..          African American |
      | Asian ,Hawaiian/Other Pacific Islander                         Asian |
      | American Indian/Alaskan Native,Asian ,Wh..           American Indian |
      | African American/Black,American Indian/A..          African American |
      | American Indian/Alaskan Native,Hawaiian/..           American Indian |
      | White                         ,I do not ..                     White |
      | Hawaiian/Other Pacific Islander,African ..                  Hawaiian |
      | African American/Black,Asian ,White     ..          African American |
      | African American/Black,Asian ,Hawaiian/O..          African American |
      | American Indian/Alaskan Native,Hawaiian/..           American Indian |
      | White                         ,African A..                     White |
      +----------------------------------------------------------------------+
    My regular expression looks for a permitted set of characters from the start of the string, which includes alphabets (lower and uppercase) and whitespace. I think this should work well for you, if all the strings are in English.

    Comment


    • #3
      @Hemanshu Kumar

      Thank you so much. This is exactly what I want. Is possible for you to explain what the following code does? I can't even decipher this but it does exactly what I want. Thanks again.
      Code:
       
       ^([a-zA-Z\s]+

      Comment


      • #4
        It might be helpful for you to go through some discussions of regular expressions. A couple good places to start might be Asjad Naqvi's post here or Rose Ann Madeiros' slides here.

        To break down the exact pattern I used:

        ^ says look for this pattern at the start of the string
        ( ) say the substring that matches the pattern contained within these parentheses is something that I will reference -- this is what will be picked up by the ustrregexs(1)
        []
        say match with anything within these square brackets
        a-z matches with any lowercase alphabet
        A-Z matches with any uppercase alphabet
        \s matches with a whitespace
        + matches with one or more instances of whatever precedes it, in this case, any of the stuff within the [], that is to say, any number of lower and uppercase alphabets and whitespaces (but nothing else!)

        I also like to use the website https://regex101.com/ to test my regular expressions before using them.

        Comment


        • #5
          @Hemanshu Kumar

          Thank you so much for the resources. This explains so much. I appreciate your help. Thanks again.

          Comment

          Working...
          X