Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate a new variable by deleting everything after certain character ("/")

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str28 country
    "UK"                         
    "France"                     
    "France / Singapore / UAE"   
    "Switzerland"                
    "Spain / US"                 
    "Italy"                      
    "Switzerland"                
    "France"                     
    "Netherlands"                
    "UK"                         
    "FR / GB / DE / ES / IT / PL"
    end
    Code:
        +-----------------------------+
         |                     country |
         |-----------------------------|
      1. |                          UK |
      2. |                      France |
      3. |    France / Singapore / UAE |
      4. |                 Switzerland |
      5. |                  Spain / US |
         |-----------------------------|
      6. |                       Italy |
      7. |                 Switzerland |
      8. |                      France |
      9. |                 Netherlands |
     10. |                          UK |
         |-----------------------------|
     11. | FR / GB / DE / ES / IT / PL |
         +-----------------------------+
    I want to generate a new variable, say, home_country by keeping only the first country in the country variable i.e. starting from "/", delete everything. For example, for observation 1, 2, and 3 home_country will be UK, France, and France respectively.

  • #2
    Code:
    gen home_country = country
    replace home_country = subinstr(home_country, "/", " / ", .)
    replace home_country = word(home_country, 1)
    Note: the middle line with the -subinstr()- function is there only to guarantee that every / is surrounded by blanks, which is necessary for the final line to work properly. If you are 100% certain that all your /'s are already surrounded by blanks, you can omit that line--but frankly, I'm never that trusting of my data and it really doesn't hurt to leave it in.

    Comment


    • #3
      Thanks a lot Clyde Schechter! The code works smoothly. Moreover, I find the "Note" very helpful. Thank you.

      Comment


      • #4
        Clyde Schechter's solution wouldn't work if the country name contained spaces like "Hong Kong" Here the local macro is used only to make the code a little more readable.


        Code:
        clear
        input str28 country
        "UK"                        
        "France"                    
        "France / Singapore / UAE"  
        "Switzerland"                
        "Spain / US"                
        "Italy"                      
        "Switzerland"                
        "France"                    
        "Netherlands"                
        "UK"                        
        "FR / GB / DE / ES / IT / PL"
        "Hong Kong"
        end
        
        * essential not to use = sign in defining the macro
        local slash strpos(country, "/")
        gen wanted = trim(cond(`slash', substr(country, 1, `slash' -1), country))
        
        list
        
             +-------------------------------------------+
             |                     country        wanted |
             |-------------------------------------------|
          1. |                          UK            UK |
          2. |                      France        France |
          3. |    France / Singapore / UAE        France |
          4. |                 Switzerland   Switzerland |
          5. |                  Spain / US         Spain |
             |-------------------------------------------|
          6. |                       Italy         Italy |
          7. |                 Switzerland   Switzerland |
          8. |                      France        France |
          9. |                 Netherlands   Netherlands |
         10. |                          UK            UK |
             |-------------------------------------------|
         11. | FR / GB / DE / ES / IT / PL            FR |
         12. |                   Hong Kong     Hong Kong |
             +-------------------------------------------+

        Comment


        • #5
          Another way to go.
          Code:
          split country, p(/) l(1)

          Comment


          • #6
            Many thanks Nick Cox and Romalpa Akzo !

            Comment


            • #7
              In addition to "/", if variable also contains ",", how does the code change?
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str29 country
              "UK"                           
              "France"                       
              "France / Singapore / UAE"     
              "Switzerland"                  
              "Spain / US"                   
              "Italy"                        
              "Switzerland"                  
              "France"                       
              "Netherlands"                  
              "UK"                           
              "FR / GB / DE / ES / IT / PL"  
              "France, U.K., Italy , Germany"
              end

              Comment


              • #8
                Use Romalpa Akzo's solution, and change -p(/)- to -p(/ ,)-. N.B.: the space between / and , is essential!

                Comment


                • #9
                  Code:
                  * I added a couple more obs separated by commas
                  dataex country // Example shared via  -dataex-.
                  clear
                  input str29 country
                  "UK"                          
                  "France"                      
                  "France / Singapore / UAE"    
                  "Switzerland"                  
                  "Spain / US"                  
                  "Italy"                        
                  "Switzerland"                  
                  "France"                      
                  "Netherlands"                  
                  "UK"                          
                  "FR / GB / DE / ES / IT / PL"  
                  "France, U.K., Italy , Germany"
                  "France, UK"                  
                  "Italy, US"                    
                  "FR, GB"                      
                  end
                  Code:
                  split country, p(/ ,) limit(1)
                  split country, p("/" ",") limit(1) gen(new)  // just showing that can include "" in p()
                  
                  . list
                  
                       +-----------------------------------------------------------+
                       |                       country      country1          new1 |
                       |-----------------------------------------------------------|
                    1. |                            UK            UK            UK |
                    2. |                        France        France        France |
                    3. |      France / Singapore / UAE       France        France  |
                    4. |                   Switzerland   Switzerland   Switzerland |
                    5. |                    Spain / US        Spain         Spain  |
                       |-----------------------------------------------------------|
                    6. |                         Italy         Italy         Italy |
                    7. |                   Switzerland   Switzerland   Switzerland |
                    8. |                        France        France        France |
                    9. |                   Netherlands   Netherlands   Netherlands |
                   10. |                            UK            UK            UK |
                       |-----------------------------------------------------------|
                   11. |   FR / GB / DE / ES / IT / PL           FR            FR  |
                   12. | France, U.K., Italy , Germany        France        France |
                   13. |                    France, UK        France        France |
                   14. |                     Italy, US         Italy         Italy |
                   15. |                        FR, GB            FR            FR |
                       +-----------------------------------------------------------+
                  
                  *NOTE: You will probably want to trim() the results because split() keeps the trailing space before the "/"
                  * i.e. "FR / GB" becomes "FR "
                  replace country1 = itrim(trim(country1))

                  Comment


                  • #10
                    Here's a solution using regular expressions:
                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input str28 country str14(country1 country2)
                    "UK"                           "UK"            "UK"           
                    "France"                       "France"        "France"       
                    "France / Singapore / UAE"     "France"        "France"       
                    "Switzerland"                  "Switzerland"   "Switzerland"  
                    "Spain / US"                   "Spain"         "Spain"        
                    "Italy"                        "Italy"         "Italy"        
                    "Switzerland"                  "Switzerland"   "Switzerland"  
                    "France"                       "France"        "France"       
                    "Netherlands"                  "Netherlands"   "Netherlands"  
                    "UK"                           "UK"            "UK"           
                    "FR / GB / DE / ES / IT / PL"  "FR"            "FR"           
                    "FR / GB / DE / ES / IT / CA," "FR"            "FR"           
                    "Italy/Canada"                 "Italy"         "Italy"        
                    "France, U.K., Italy , German" "France"        "France"       
                    "Great Britain / New Zealand"  "Great Britain" "Great Britain"
                    "Great Britain, France"        "Great Britain" "Great Britain"
                    end
                    
                    split country, p(/ ,) l(1)
                    replace country1 = ustrtrim(country1)
                    
                    gen country2 = ustrtrim(ustrregexs(1)) if ustrregexm(country, "^([^/,]*)", 1)
                    N.B.: both methods require a -trim- of (at least) trailing whitespace.

                    Comment


                    • #11
                      Nesting cond() is an alternative:
                      Code:
                      #delim ;
                      
                      gen country_new = trim(
                      
                          cond(
                              
                              strpos(country, "/"),  
                          
                              substr(country, 1, -1 + strpos(country, "/")),        
                                  
                              cond(
                              
                                  strpos(country, ","),  
                                  
                                  substr(country, 1, -1 + strpos(country, ",")),
                                      
                                  country    
                              )
                          )
                      )
                      ;
                      #delim cr
                      Last edited by Bjarte Aagnes; 19 Jan 2019, 05:43.

                      Comment


                      • #12
                        Thank you all for the codes! I Your enthusiasm and dedication keep me motivated

                        Comment

                        Working...
                        X