Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rowsort: tag a variable along the variable that is rowsorted

    Dear all,
    I have a question regarding the rowsort command. I am trying to sort the values of some variables horizontally (so in a row) but each of this variables to sort has another variable that describes the variable and should therefore be sorted accordingly. Here is an example:

    Imagine my data look like this:

    var1 | var1_desc | var2 | var2_desc | var3 | var3_desc | etc.
    2 blue 3 red 1 yellow

    I would like to rowsort var1-var3 but don't lose the information in the var_desc variable, i.e. the value of var1 is only described by blue, the value of var 2 with red etc. So ideally I would like to look my data like this:

    Var1 | var1_desc_sorted | Var2| var2_desc_sorted | Var3 | var3_desc_sorted
    1 yellow 2 blue 3 red

    When I just code:
    rowsort var*, generate(var1_sorted-var3_sorted)

    I get the following:

    Var1 | var1_desc | Var2| var2_desc | Var3 | var3_desc
    1 blue 2 red 3 yellow

    The information in the var_desc variables are matched to the wrong variable. Does anyone know how to solve this?

    Thank you very much for your help.

    Best,
    Mara

  • #2
    rowsort is from the Stata Journal, as you're asked to explain. (Whenever a post refers to community-contributed commands it helps people to know that they need to be installed and where to get them from: see FAQ Advice #12.)

    Here the issue is not that rowsort is wrong, but that it doesn't extend to doing what you want.

    Let's assume there is some kind of identifier: if not you can create one. What you want is something like this

    Code:
     
    clear
    input var1 var2 var3 str6 (var1_desc var2_desc var3_desc)
    2 3 1 blue red yellow
    1 3 2 pink green magenta
    end
    
    gen long id = _n
    rename (var*_desc) (vardesc*)
    reshape long var vardesc , i(id)
    bysort id (var) : replace _j = _n
    rename (var vardesc) sorted=
    reshape wide sortedvar sortedvardesc, i(id) j(_j)
    
    list
    In practice you might want to keep just those variables and when done merge back with the original dataset.

    You can copy and paste the code above to a new Stata and add commands to see step by step what is done. Conversely, although your example is clear using dataex as requested in the FAQ Advice would have been even clearer.
    Last edited by Nick Cox; 07 Mar 2018, 03:54.

    Comment


    • #3
      Note that you can still use rowsort provided you combine the value with the description in a string variable. Because you are sorting strings, you have to make sure that the values sort correctly so I use a right-justified format with leading zeros.

      Code:
      clear
      input var1 var2 var3 str6 (var1_desc var2_desc var3_desc)
      2 3 1 blue red yellow
      1 3 2 pink green magenta
      end
      
      forvalues i=1/3 {
          gen tosort`i' = string(var`i',"%05.0f") + " " + var`i'_desc
      }
      rowsort tosort1-tosort3, generate(sorted1-sorted3)
      forvalues i=1/3 {
          split sorted`i'
      }
      destring sorted*1, replace
      list var1-var3_desc sorted11-sorted32
      and the results
      Code:
      . list var1-var3_desc sorted11-sorted32
      
           +-----------------------------------------------------------------------------------------------------------------------+
           | var1   var2   var3   var1_d~c   var2_d~c   var3_d~c   sorted11   sorted12   sorted21   sorted22   sorted31   sorted32 |
           |-----------------------------------------------------------------------------------------------------------------------|
        1. |    2      3      1       blue        red     yellow          1     yellow          2       blue          3        red |
        2. |    1      3      2       pink      green     magent          1       pink          2     magent          3      green |
           +-----------------------------------------------------------------------------------------------------------------------+

      Comment


      • #4
        #3 is a neat trick that works well here. But "42 blue" and "100 red" wouldn't sort the way that is probably wanted.

        That is, once you've told rowsort it's sorting strings it certainly won't look inside to parse out the numbers. It will just sort in dictionary order.

        People often feed arbitrary numbers to rowsort if they use it at all.

        Comment


        • #5
          As I said, if you use an appropriate format, the values will correctly sort:
          Code:
          clear
          input var1 var2 var3 str6 (var1_desc var2_desc var3_desc)
          2 3 1 blue red yellow
          1 3 2 pink green magenta
          42 100 2 blue red pink
          end
          
          forvalues i=1/3 {
              gen tosort`i' = string(var`i',"%05.0f") + " " + var`i'_desc
          }
          rowsort tosort1-tosort3, generate(sorted1-sorted3)
          forvalues i=1/3 {
              split sorted`i'
          }
          destring sorted*1, replace
          list var1-var3_desc sorted11-sorted32
          Code:
          . list var1-var3_desc sorted11-sorted32
          
               +-----------------------------------------------------------------------------------------------------------------------+
               | var1   var2   var3   var1_d~c   var2_d~c   var3_d~c   sorted11   sorted12   sorted21   sorted22   sorted31   sorted32 |
               |-----------------------------------------------------------------------------------------------------------------------|
            1. |    2      3      1       blue        red     yellow          1     yellow          2       blue          3        red |
            2. |    1      3      2       pink      green     magent          1       pink          2     magent          3      green |
            3. |   42    100      2       blue        red       pink          2       pink         42       blue        100        red |
               +-----------------------------------------------------------------------------------------------------------------------+
          
          .
          I should have listed the combined strings to show the formatted combo strings (leftalign is from SSC):
          Code:
          . leftalign
          [output omitted]
          . list sorted1-sorted3
          
               +-------------------------------------------+
               | sorted1        sorted2        sorted3     |
               |-------------------------------------------|
            1. | 00001 yellow   00002 blue     00003 red   |
            2. | 00001 pink     00002 magent   00003 green |
            3. | 00002 pink     00042 blue     00100 red   |
               +-------------------------------------------+
          
          .

          Comment


          • #6
            Robert Picard Good point about the format. You got me there. As non-integers are also legal input, the format would typically need careful thought.

            Comment


            • #7
              Thanks both of you for your input! This helps a lot!

              Comment

              Working...
              X