Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to find the longest string in an array of variables?

    Hi all,

    I am looking for a command to detect the longest string or numeric variable in an array, and to generate a new variable with this string.

    Thx for your help

  • #2
    If you are referring to a column array then you could use the following procedure (-carryforward- is available from SSC).

    Code:
    gen length_var = length(var)
    gsort -length_var
    sum length_var
    gen longest_var = var if length_var == `r(max)'
    carryforward longest_var, gen(newvar)

    Comment


    • #3
      How do you decide whether a string or numeric variable is longer? Are you presuming a decimal representation and if so, how do you handle fractional parts?

      Comment


      • #4
        With "longer" I mean the number of characters a variable has.

        Comment


        • #5
          Aaditya: Your interpretation of the problem lies in a different direction from mine. You seem to be assuming that the question is about the longest value in a string variable. I am assuming "array" is an unStataish description of a set of variables, some numeric, some string, but am still waiting for clarification of how to think of the length of a numeric variable.

          That aside, your code will not work for numeric variables as soon as you apply length().

          For string variables alone, you can get there with

          Code:
            
          gen length_var = length(var)
          sort length_var
          gen longest_var = var[_N]
          with some modification no doubt for strLs.

          Comment


          • #6
            Vincenz #4: Sorry, but that doesn't help me. What is a character? Do you mean a byte?

            Comment


            • #7
              Originally posted by Nick Cox View Post
              Code:
              gen longest_var = var[_N]
              Very neat, thanks!

              Comment


              • #8
                Thank you for your help so far. I think I wasn't able to communicate the actual problem I have. Let's try it another time:
                I have a set pretty similar to the following:
                Var1 Var2 Var3 Var4
                Ref. 54322 Ref. 5432 Ref. 543
                Ref. AB562 Ref. AB56 Ref. AB5
                Ref. 1234 Ref.123 Ref.12
                I need to generate a new variable that shows all the variables with the longest content, in terms of characters:
                VarNew
                Ref. 54322
                Ref. AB562
                Ref. 1234
                Thank you for your help!

                Comment


                • #9
                  Character as either a letter, a number or other signs.

                  Comment


                  • #10
                    Have a look at the FAQs on how to use dataex to present data examples. Employing Nick's code in #5

                    Code:
                    input strL(var1 var2 var3 var4)
                    "Ref.54322" "Ref.5432" "Ref.543" ""
                    "Ref.AB562" "Ref.AB56" "Ref.AB5" ""
                    "" "Ref.1234" "Ref.123" "Ref.12"
                    end
                    
                    gen id=_n
                    reshape long var, i(id)
                    by id: gen length_var = length(var)
                    sort id length_var
                    by id: gen longest_var = var[_N]
                    drop length_var
                    reshape wide var, i(id) j(_j)
                    l
                    Code:
                    . l
                    
                         +----------------------------------------------------------+
                         | id        var1       var2      var3     var4   longest~r |
                         |----------------------------------------------------------|
                      1. |  1   Ref.54322   Ref.5432   Ref.543            Ref.54322 |
                      2. |  2   Ref.AB562   Ref.AB56   Ref.AB5            Ref.AB562 |
                      3. |  3               Ref.1234   Ref.123   Ref.12    Ref.1234 |
                         +----------------------------------------------------------+

                    Comment


                    • #11
                      Post #8 also posted at http://www.statalist.org/forums/foru...n-one-variable with a different answer given.

                      Comment

                      Working...
                      X