Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split string variable

    Hi everyone,

    I have a string var and I would like to split that var to get the last part of the string text, say the string var has dataset as below:
    name
    adam smith
    julia jig
    beffy mark jabcos
    william adam beg tiffy


    And I want to get the last part of this dataset, which is "smith", "jig" and "jabcos" and "tiffy" and the wanted results should be:
    name newname
    adam smith smith
    julia jig jig
    beffy mark jabcos jabcos
    william adam beg tiffy tiffy

    I have tried to use the command as below but it does not work:
    -- egen newvar=ends(name) trim[last]

    Could anyone help me to sold this issue. Sorry if this question is a basic one. I've searched on google but I cannot find the solution yet.

    Thank you a lot
    Kind regards
    Linh

    (editted: add more example)
    Last edited by Linh mt; 12 Jun 2019, 03:33.

  • #2
    Linh:
    I do hope that the following toy-example will be useful:
    Code:
    . set obs 1
    number of observations (_N) was 0, now 1
    
    . g name="Stan Smith"
    
    . split name
    variables created as string:
    name1  name2
    
    . list
    
         +----------------------------+
         |       name   name1   name2 |
         |----------------------------|
      1. | Stan Smith    Stan   Smith |
         +----------------------------+
    PS: Despite being a(n) (health) economist, actually, my education owes more to Stan Smith (https://en.wikipedia.org/wiki/Stan_Smith) than to Adam Smith (https://en.wikipedia.org/wiki/Adam_Smith)!
    Last edited by Carlo Lazzaro; 12 Jun 2019, 03:23.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      With the example given,

      Code:
      gen wanted = word(name, 2)
      would also work.

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Linh:
        I do hope that the following toy-example will be useful:
        Code:
        . set obs 1
        number of observations (_N) was 0, now 1
        
        . g name="Stan Smith"
        
        . split name
        variables created as string:
        name1 name2
        
        . list
        
        +----------------------------+
        | name name1 name2 |
        |----------------------------|
        1. | Stan Smith Stan Smith |
        +----------------------------+
        PS: Despite being a(n) (health) economist, actually, my education owes more to Stan Smith (https://en.wikipedia.org/wiki/Stan_Smith) than to Adam Smith (https://en.wikipedia.org/wiki/Adam_Smith)!
        Hi Carlo,

        Thank you for your reply. Your advice is correct. However I would like to get the all last past of name in the dataset are under a one new variable. For more detail, please see my example of dataset which I have added. I am sorry because of changing a bit example, otherwise it may cause misunderstanding to everyone.

        Best regards
        Linh

        Comment


        • #5
          Originally posted by Nick Cox View Post
          With the example given,

          Code:
          gen wanted = word(name, 2)
          would also work.
          Dear Nick,

          Thank you for your reply. My appologize when I post the example which does not cover other cases. I have just editted the example, in which name contains 3 or 4 or more words. In this case, what command I should use if I wish that all the last parts of name are presented under a generated new name. Could you please review my updated example in the first post?

          Thank you very much
          Kind regards
          Linh

          Comment


          • #6
            Code:
            help word()
            tells you about the function I used in #3.


            word(s,n)
            Description: the nth word in s; missing ("") if n is missing

            Positive numbers count words from the beginning of s, and negative numbers count words
            from the end of s. (1 is the first word in s, and -1 is the last word in s.) A word is a
            set of characters that start and terminate with spaces. This is different from a Unicode
            word, which is a language unit based on either a set of word-boundary rules or
            dictionaries for several languages (Chinese, Japanese, and Thai).
            Domain s: strings
            Domain n: integers
            Hence

            Code:
            gen wanted = word(name, -1)
            is a more general suggestion.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              Code:
              help word()
              tells you about the function I used in #3.




              Hence

              Code:
              gen wanted = word(name, -1)
              is a more general suggestion.
              Dear Nick,

              It works now. My issue has been solved.
              However, I used to tried another syntax "egen name1=ends(ten), last punct(" ")", some observations are correct but some observations are 'blank', say:

              name -------------------------------newname
              adam smith------- -----------------smith
              julia jig------------------------------- jig
              beffy mark jabcos-----------------
              william adam beg tiffy tiffy------tiffy

              I do not know why is that because most the observations, the results are correct but some are not. Could you help me to detect where the problem is please?

              P/S: ((I dont know how can post the example in the stata format, so i just manually type like that. If you do not mind, could you instruct me on this matter. i really appreciate you time and enthusiasticness)

              Thank you so much
              Regards
              Linh
              Last edited by Linh mt; 12 Jun 2019, 04:47.

              Comment


              • #8
                If I understood right, you wish something like:


                Code:
                . egen lastpart = ends(name), last
                
                . list
                
                     +-----------------------------------+
                     |                   name   lastpart |
                     |-----------------------------------|
                  1. |             adam smith      smith |
                  2. |              julia jig        jig |
                  3. |      beffy mark jabcos     jabcos |
                  4. | william adam beg tiffy      tiffy |
                     +-----------------------------------+
                Hopefully that helps.
                Best regards,

                Marcos

                Comment


                • #9
                  Please read the FAQ Advice at https://www.statalist.org/forums/help That gives the details you seek.

                  Comment


                  • #10
                    Originally posted by Marcos Almeida View Post
                    If I understood right, you wish something like:


                    Code:
                    . egen lastpart = ends(name), last
                    
                    . list
                    
                    +-----------------------------------+
                    | name lastpart |
                    |-----------------------------------|
                    1. | adam smith smith |
                    2. | julia jig jig |
                    3. | beffy mark jabcos jabcos |
                    4. | william adam beg tiffy tiffy |
                    +-----------------------------------+
                    Hopefully that helps.
                    Hi Marcos,
                    You understood. However, I have tried that syntax but some observations of lastpart are blank, but the syntax (gen lastpart=word(name,-1) is perfect. I dont know where is the syntax (egen lastpart = ends(name), last ) is problematic (

                    Thank you for your reply
                    Regards\
                    Linh

                    Comment


                    • #11
                      Linh:
                      probably you have a leading/trailing blanks issue with some of your observations.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        As Carlo pointed out, you must have blank spaces. Therefore, you need to trim before applying the code above.

                        Look at the example below, before and after trimming:

                        Code:
                        . input str40 name
                        
                                                                 name
                          1. "adam smith"
                          2. "julia jig"
                          3. "beffy mark jabcos"
                          4. "william adam beg tiffy"
                          5. "william adam beg tiffy   "
                          6. end
                        
                        . egen lastpartnotrim = ends(name), last
                        (1 missing value generated)
                        
                        . list
                        
                             +--------------------------------------+
                             |                      name   lastpa~m |
                             |--------------------------------------|
                          1. |                adam smith      smith |
                          2. |                 julia jig        jig |
                          3. |         beffy mark jabcos     jabcos |
                          4. |    william adam beg tiffy      tiffy |
                          5. | william adam beg tiffy               |
                             +--------------------------------------+
                        
                        . gen name2 = trim(name)
                        
                        . egen lastparttrimmed1 = ends(name2), last
                        
                        . list
                        
                             +--------------------------------------------------------------------------+
                             |                      name   lastpa~m                    name2   lastpa~1 |
                             |--------------------------------------------------------------------------|
                          1. |                adam smith      smith               adam smith      smith |
                          2. |                 julia jig        jig                julia jig        jig |
                          3. |         beffy mark jabcos     jabcos        beffy mark jabcos     jabcos |
                          4. |    william adam beg tiffy      tiffy   william adam beg tiffy      tiffy |
                          5. | william adam beg tiffy                 william adam beg tiffy      tiffy |
                             +--------------------------------------------------------------------------+
                        Hopefully that helps.
                        Best regards,

                        Marcos

                        Comment


                        • #13
                          Originally posted by Marcos Almeida View Post
                          As Carlo pointed out, you must have blank spaces. Therefore, you need to trim before applying the code above.

                          Look at the example below, before and after trimming:

                          Code:
                          . input str40 name
                          
                          name
                          1. "adam smith"
                          2. "julia jig"
                          3. "beffy mark jabcos"
                          4. "william adam beg tiffy"
                          5. "william adam beg tiffy "
                          6. end
                          
                          . egen lastpartnotrim = ends(name), last
                          (1 missing value generated)
                          
                          . list
                          
                          +--------------------------------------+
                          | name lastpa~m |
                          |--------------------------------------|
                          1. | adam smith smith |
                          2. | julia jig jig |
                          3. | beffy mark jabcos jabcos |
                          4. | william adam beg tiffy tiffy |
                          5. | william adam beg tiffy |
                          +--------------------------------------+
                          
                          . gen name2 = trim(name)
                          
                          . egen lastparttrimmed1 = ends(name2), last
                          
                          . list
                          
                          +--------------------------------------------------------------------------+
                          | name lastpa~m name2 lastpa~1 |
                          |--------------------------------------------------------------------------|
                          1. | adam smith smith adam smith smith |
                          2. | julia jig jig julia jig jig |
                          3. | beffy mark jabcos jabcos beffy mark jabcos jabcos |
                          4. | william adam beg tiffy tiffy william adam beg tiffy tiffy |
                          5. | william adam beg tiffy william adam beg tiffy tiffy |
                          +--------------------------------------------------------------------------+
                          Hopefully that helps.
                          Hi Marcos and Carlo,

                          Yes, that's exact what you point out. My problem is solved. However, after TRIM, name2 is no different from name in my dataset, please see it in the attachment: Anyway, my aim is achieved
                          PHP Code:
                          ti     h      xa    hoso    name                    name2
                          2    30    919    200    Triệu  Văn Huyện        Triệu  Văn Huyện
                          2    30    919    200    Triệu  Văn Hồng          Triệu  Văn Hồng
                          2    30    919    201    Nguyễn Như Thế         Nguyễn Như Thế
                          2    30    919    201    Tấn Thị Hoa                 Tấn Thị Hoa
                          2    30    919    201    Nguyễn  Văn Ngọc       Nguyễn  Văn Ngọc
                          2    30    919    201    Nguyễn  Thị Phương    Nguyễn  Thị Phương 
                          Thank you all
                          Linh
                          Last edited by Linh mt; 12 Jun 2019, 23:10.

                          Comment


                          • #14
                            Linh:
                            exploiting Marcos' helpful code, what if you -trim- before -name2-?:
                            Code:
                            . input str40 name
                            
                                                                     name
                              1.   "adam smith"
                              2.  "julia jig"
                              3. "beffy mark jabcos"
                              4. "william adam beg tiffy"
                              5.  "william adam beg tiffy   "
                              6.  end
                            
                            . replace name = trim(name)
                            (1 real change made)
                            
                            . egen lastparttrimmed1 = ends(name), last
                            
                            . list
                            
                                 +-----------------------------------+
                                 |                   name   lastpa~1 |
                                 |-----------------------------------|
                              1. |             adam smith      smith |
                              2. |              julia jig        jig |
                              3. |      beffy mark jabcos     jabcos |
                              4. | william adam beg tiffy      tiffy |
                              5. | william adam beg tiffy      tiffy |
                                 +-----------------------------------+
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              Carlo already clarified the issue.

                              That said, when you say that there is no difference between name and name2, I believe you meant the names. But if you observe attentively, you will see differences concerning blank spaces throughout the variables: before, after and in-between.
                              Best regards,

                              Marcos

                              Comment

                              Working...
                              X