Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop to Get Last String

    Hi everyone,

    I have a variable called leader which contains names surnames and sometimes titles as follows:

    leader
    john hampton
    dr. jeff jordan
    mr. jeff huntington
    dr. mr. david jones barr

    I want to retrieve last word of each leaders. To do this, I am using split function.

    split leader, p(" ")

    I am getting leader1, leader2, leader3, ...., leader6

    and creating a missing variable

    gen leader_lastname=""

    Then writing a loop to get the last names

    forvalues i=2(1)6{
    replace leader_lastname=leader`i' if missing(leader`i+1') & missing(leader_lastname) //if leader has less than 5 names
    replace leader_lastname=leader`i' if missing(leader_lastname) & `i'==6 //if the leader has 5 names & titles
    }


    However, I do not get anything. Is there any suggestion to solve this problem. I appreciate your help in advance.

    Best,
    Ulas

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str24 leader
    "john hampton"            
    "dr. jeff jordan"         
    "mr. jeff huntington"     
    "dr. mr. david jones barr"
    end
    
    gen last_name = reverse(word(reverse(leader), 1))
    
    list, noobs clean
    In the future, please use the -dataex- command to post example data. Please read FAQ #12 for instructions on installing and using -dataex-.

    Comment


    • #3
      Clyde gives excellent advice as always. Another way is to do it with word(leader, -1)

      Code:
      . di word("frog toad newt", -1)
      newt


      This works with string variables too.

      Comment


      • #4
        Thanks a lot for your suggestions. They helped me a lot. I will be careful about using dataex command in the future.

        By the way, do you have any idea why my loop did not work?

        Comment


        • #5
          I have not looked at the code in a deep and detailed way, but I see one clear error:

          Code:
          replace leader_lastname=leader`i' if missing(leader`i+1') & missing(leader_lastname) //if leader has less than 5 names
          `i+1' is wrong. So, for example, the first time through the loop we have i = 2. Stata will see this as leader`i+1'. There is no local macro `i+1' defined, so it translates to an empty string. So, in the end, leader`i+1' just reduces to leader, which is not what you need. And in particular, since the variable leader exists, and is never missing, the -replace- command never finds its -if- condition satisfied and it has no effect. There are two problems with `i+1'. First i itself never becomes 2,3,...6 because it is not immediately contained within `', and even if you had ``i'+1', the code would still fail because `2+1' does not translate to 3. (Try -display `2+1' and you will see that it, too, is an empty string.)

          To use this approach, the syntax wold be like this:
          Code:
          replace leader_lastname=leader`i' if missing(leader`=`i'+1') & missing(leader_lastname) //if leader has less than 5 names
          When Stata encounters that at i = 2, leader `=`i'+1' becomes leader`=2+1' which, due to the presence of the equals sign, next becomes leader3--which is the variable you actually do want to test for missingness.

          As noted earlier, I haven't gone over the code in depth, and I can't guarantee that there aren't still other problems with it.
          Last edited by Clyde Schechter; 17 Oct 2017, 17:25.

          Comment


          • #6
            Two things:

            1) in the data snippet you provide, after you split by empty character it generates five new variables (I guess you've just given a part of your larger dataset and really it does make six in your case)
            2) you're not actually making a new local that equals the value of `i' + 1 when you are trying to within your loop. In order to do that you want to include an = operator to let Stata know you want it to perform evaluate a local and then perform an operation.

            So, using Clyde's dataex example, your code works as intended when you do this

            Code:
            clear
            input str24 leader
            "john hampton"            
            "dr. jeff jordan"         
            "mr. jeff huntington"     
            "dr. mr. david jones barr"
            end
            
            split leader, p(" ")
            gen leader_lastname = ""
            
            forvalues i=2/4{
            replace leader_lastname=leader`i' if missing(leader`=`i'+1') & missing(leader_lastname)
            replace leader_lastname=leader`i' if missing(leader_lastname) & `i'==5
            }
            I've highlighted the changes. Obviously you only need to run up to 4 as you're adding 1 each time when you call it on the first line of your loop

            Comment


            • #7
              Ulas,

              1. The condition
              Code:
              & missing(leader_lastname) //if leader has less than 5 names
              is not the good argument, instead, it should be
              Code:
              !missing(leader`i’)

              Then, the resonable loop that you can use is:
              Code:
              gen leader_lastname = "leader5"
              
              forvalues i = 1/4 {
              local j = `i' + 1
              replace leader_lastname=leader`i' if missing(leader`j') & !missing(leader`i')
              }
              But above all, the loop is not necessary at all, since the best way to get there is 1-line command as below:
              Code:
              egen str1 leader_lastname = rowlast(leader*)
              Good luck!

              Romalpa
              Last edited by Romalpa Akzo; 17 Oct 2017, 19:30.

              Comment


              • #8
                One more comment for the loop (although it should not be necessary with the availability of the egen command).

                The most simple and direct loop could be:

                Code:
                gen leader_lastname = "leader1"  
                
                forvalues i = 2/5 {
                replace leader_lastname = leader`i' if !missing(leader`i')
                }

                Best,

                Romalpa
                Last edited by Romalpa Akzo; 17 Oct 2017, 20:11.

                Comment


                • #9
                  Thanks a lot for your contributions. I learnt a lot from you.

                  Comment


                  • #10
                    If there is to be a small debate about the best way to do it, I favour

                    Code:
                    gen leader_lastname = word(leader, -1)
                    over

                    Code:
                     egen str1 leader_lastname = rowlast(leader*)
                    They may seem similar but a glance at source code

                    Code:
                    viewsource egen.ado
                    viewsource _growlast.ado
                    shows that whereas the first really is one line of Stata code, the second entails a few dozen such lines.

                    Further, this solution depends on a prior use of the split command (not function, as in #1). I have nothing against split, but it is not needed here.

                    Comment


                    • #11
                      I have no doubt to agree with Nick: the word() command is the best solution for that matter. The egen() is just the best way to replace the loop.

                      Thanks for your briliant instruction.

                      Comment


                      • #12
                        Thanks for the very nice post, but I have to add that word() is a function, not a command!

                        Comment

                        Working...
                        X