Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining String Variables alphabetically

    Hello,

    I have the following stylized dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 name1 str11 name2 str12(name3 name4 name5) str61 var6
    "Ami Tall"     "Peter Brown" "Ben Williams" "Tim Small"    "Marie Miller" "Ami Tall Ben Williams Marie Miller Peter Brown Tim Small"    
    "Peter Brown"  "Ami Tall"    "Ben Williams" "Marie Miller" "Tim Small"    "Ami Tall Ben Williams Marie Miller Peter Brown Tim Small"    
    "Marie Miller" "Ami Tall"    "Ben Williams" "Peter Miller" "Samuel Brown" "Ami Tall Ben Williams Marie Miller Peter Miller Samuel Brown"
    end

    I want to achieve Var6.
    Var 6 is a combination of the 5 names (Strings) in an alphabetical order.

    I thought of something like:

    egen var6 = concat(name1 name2 name3 name4 name5), punct(" ")

    but it misses the alphabetical order.

    I also though of

    gen var6 = cond(nam1<name2, name1 + " " + name2, name2 + " " + name1) // and so on for all 5 names

    but the condition would be inefficiently long (at least in my mind if I am correct)

    Bests
    Julian

  • #2
    Sort rowwise first.

    Code:
    SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
            (help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
            Q1/09   SJ 9(1):137--157
            shows how to exploit functions, egen functions, and Mata
            for working rowwise; rowsort and rowranks are introduced
    
    
    
    rowsort name?, gen(sname1-sname5)
    I am aware of the scope for reshape long -- sort -- reshape wide, but I am guessing that you have other variables too, so doing it in place is preferable. Further, sorting on family names might be desired....

    Comment


    • #3
      Perfect, thanks!

      Comment


      • #4
        Hi, Nick, I tried your code but got the following error message:
        Code:
        . rowsort name?, gen(sname1-sname5)
        string variables not allowed in varlist;
        name1 is a string variable
        r(109);
        Ho-Chuan (River) Huang
        Stata 19.0, MP(4)

        Comment


        • #5

          River:

          Thanks for your interest in rowsort.

          The reference given for rowsort in #2 was Stata Journal 9(1). So, that is the download site to use.

          At a guess you installed from SSC. This is the start of the description for that version, which makes plain that the version there is for integer numeric variables only.

          . ssc desc rowsort

          -------------------------------------------------------------------------------------------------------------
          package rowsort from http://fmwww.bc.edu/repec/bocode/r
          -------------------------------------------------------------------------------------------------------------

          TITLE
          'ROWSORT': module to row sort a set of integer variables

          DESCRIPTION/AUTHOR(S)

          rowsort creates new variables sort_1, ... , sort_p corresponding
          to var_1, ... , var_p in varlist such that sort_1, ... , sort_p
          contain the sorted (ordered) values in each observation of
          varlist. varlist should contain all numeric variables with
          integer values. Missing values are allowed. rowsort loops over
          observations and may be relatively slow. It may be faster to
          reshape, sort within blocks, and reshape again.

          Why then is an out-of-date version still visible at SSC? When rowsort was rewritten for my 2009 column, the rewrite entailed new code in Mata and thus a requirement of version 9. So, the version on SSC requiring only version 7 remained accessible.

          Another way to understand this is to know that I maintain a help file to my publicly available programs,

          Code:
          ssc inst njc_stuff

          and

          Code:
          help njc stuff
          explains the difference (sup() means superseded by)


          Code:
              rowsort
              SSC (NJC)
              row sort a set of integer variables
              sup(SJ 9-1)
          
              rowsort
              pr0046 sj9-1 (NJC)
              row sort a set of variables
              +
          Further, remember our protocol for questions about user-written (community contributed) programs, which is to tell us where you got the program from.

          If anyone wants to comment that this is why people use GitHub, they have a good point, but having one version accessible on SJ and the previous on SSC is not rocket surgery either.

          Comment

          Working...
          X