Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to get levelsof without sorting?

    Hi all,

    I am trying to get a local macro that stores the values contained in two string variables in Stata. Normally, this is an easy job using the
    Code:
    levelsof
    command. However, I am now running something where it is important that the local macro does not automatically sort the values. Below, I will do my best to reproduce what my problem is.

    Suppose you have the following dataset:

    PHP Code:
    x                           y
    Do Later                  dolater
    Thank
    /You                thankyou
    What
    ?                     what
    Your Welcome           yourwelcome
    deers                        deers 
    You can generate this data set by doing the following:

    Code:
    set obs 5
    gen x = "Do Later" in 1
    replace x = "Thank/You" in 2
    replace x = "What?" in 3
    replace x = "Your Welcome" in 4
    replace x = "deers" in 5
    
    gen y = "dolater" in 1
    replace y = "thankyou" in 2
    replace y = "what" in 3
    replace y = "yourwelcome" in 4
    replace y = "deers" in 5
    As you can see y is simply just a "clean" version of x (lower case everything), remove spaces, remove question marks, etc. When I use the
    Code:
    levelsof
    command to pull the string values into a local macro, I would like the values to be ordered in the same way. Unfortunately,
    Code:
    levelsof
    automatically sorts the values alphabetically, and this results in a different ordering.

    For example,

    Code:
    . levelsof x
    `"Do Later"' `"Thank/You"' `"What?"' `"Your Welcome"' `"deers"'
    
    . levelsof y
    `"deers"' `"dolater"' `"thankyou"' `"what"' `"yourwelcome"'
    Notice, that because of the capitalization, "deers" gets sorted last in
    Code:
    levelsof x
    but gets sorted first in
    Code:
    levelsof y
    when everything is lower case. How do I fix this? In other words, I would like two local macros where the two macros will have the same corresponding order. In other words, I would like two local macros:

    Code:
    `xvalues'
    `"Do Later"' `"Thank/You"' `"What?"' `"Your Welcome"' `"deers"'
    
    `yvalues'
    `"dolater"' `"thankyou"' `"what"' `"yourwelcome"' `"deers"'
    Is there a way to accomplish this?

    Thank you in advance for your help!

    Vincent



  • #2
    Hi all,

    I am trying to get a local macro that stores the values contained in two string variables in Stata. Normally, this is an easy job using the
    Code:
    levelsof
    command. However, I am now running something where it is important that the local macro does not automatically sort the values. Below, I will do my best to reproduce what my problem is.

    Suppose you have the following dataset:

    PHP Code:
    x                           y
    Do Later                  dolater
    Thank
    /You                thankyou
    What
    ?                     what
    Your Welcome           yourwelcome
    deers                        deers 
    You can generate this data set by doing the following:

    Code:
    set obs 5
    gen x = "Do Later" in 1
    replace x = "Thank/You" in 2
    replace x = "What?" in 3
    replace x = "Your Welcome" in 4
    replace x = "deers" in 5
    
    gen y = "dolater" in 1
    replace y = "thankyou" in 2
    replace y = "what" in 3
    replace y = "yourwelcome" in 4
    replace y = "deers" in 5
    As you can see y is simply just a "clean" version of x (lower case everything), remove spaces, remove question marks, etc. When I use the
    Code:
    levelsof
    command to pull the string values into a local macro, I would like the values to be ordered in the same way. Unfortunately,
    Code:
    levelsof
    automatically sorts the values alphabetically, and this results in a different ordering.

    For example,

    Code:
    . levelsof x
    `"Do Later"' `"Thank/You"' `"What?"' `"Your Welcome"' `"deers"'
    
    . levelsof y
    `"deers"' `"dolater"' `"thankyou"' `"what"' `"yourwelcome"'
    Notice, that because of the capitalization, "deers" gets sorted last in
    Code:
    levelsof x
    but gets sorted first in
    Code:
    levelsof y
    when everything is lower case. How do I fix this? In other words, I would like two local macros where the two macros will have the same corresponding order. In other words, I would like two local macros:

    Code:
    `xvalues'
    `"Do Later"' `"Thank/You"' `"What?"' `"Your Welcome"' `"deers"'
    
    `yvalues'
    `"dolater"' `"thankyou"' `"what"' `"yourwelcome"' `"deers"'
    Is there a way to accomplish this?

    Thank you in advance for your help!

    Vincent


    Comment


    • #3
      I haven't tested this thoroughly , but it works on your example data. I think the following will do it:

      Code:
      sort x
      local yvalues
      forvalues j = 1/`=_N' {
           local yvalues `yvalues' `"`=y[`j']'"'
      }
      local yvalues: list uniq yvalues
      If your data set is huge, the loop over observations might be unacceptably slow, but for a moderate-sized data set this shouldn't be too bad.

      P.S. Thanks for the nice setup code in code blocks, and the very clear illustration of what you did, what you got, and what you wanted. If only every poster did that it would be so much easier to respond.
      Last edited by Clyde Schechter; 23 Jan 2015, 13:32.

      Comment


      • #4
        Using Mata this should be

        Code:
        m : st_local("x", invtokens(uniqrows(st_sdata(., "x"))'))
        m : st_local("y", invtokens(uniqrows(st_sdata(., "y"))'))

        Edit

        Nope. uniquerows() sorts the values. Make this

        Code:
        m : st_local("x", invtokens(st_sdata(., "x")'))
        loc x : list uniq x
        Might, or might not be faster than looping over observations.


        Edit 2

        Not my best 10 minutes ... Sorry about that. My second approach is still not giving you what you want, as invtokens() does not preserve the binding of more than one word in one string. You can program this, but at this point, I would go with the loop suggested by Clyde.


        Best
        Daniel
        Last edited by daniel klein; 23 Jan 2015, 14:37.

        Comment


        • #5
          Hi Daniel and Clyde,

          Thank you for your responses. I think Clyde's loop works, but I noticed something which I thought was weird. For sake of reproducibility, I will copy and paste all code below so it is easier for any respondents.

          Code:
          clear
          set obs 5
          gen x = "Do Later" in 1
          replace x = "Thank/You" in 2
          replace x = "What?" in 3
          replace x = "Your Welcome" in 4
          replace x = "deers" in 5
          
          gen y = "dolater" in 1
          replace y = "thankyou" in 2
          replace y = "what" in 3
          replace y = "yourwelcome" in 4
          replace y = "deers" in 5
          
          sort x
          local yvalues
          forvalues j = 1/`=_N' {
               local yvalues `yvalues' `"`=y[`j']'"'
          }
          local yvalues: list uniq yvalues
          
          display(`"`yvalues'"')
          The result of this code is:

          Code:
          . display(`"`yvalues'"')
          dolater `"thankyou"' `"what"' `"yourwelcome"' `"deers"'
          Notice how the dolater is the only one that is not enclosed in quotes. All other entries are enclosed in `" "'. Why is this? I'm not sure if this is directly important, but I've run into this issue before and it's caused headaches of problems.

          Thank you again for your help!

          Vincent

          PS Clyde's Loop did work for me, and I thank you so much for responding so quickly!
          Last edited by Vincent La; 23 Jan 2015, 14:57.

          Comment


          • #6
            Interesting question, and I don't know the answer. Would be curious to see if somebody else can explain that.

            Meanwhile, if you replace the body of the loop by:

            Code:
            forvalues j = 1/`=_N' {
                 local yvalues `yvalues' `=y[`j']'
            }
            you get (unsurprisingly):

            Code:
            . display(`"`yvalues'"')
            dolater thankyou what yourwelcome deers
            In your particular situation, as I understand it, the values of y are precisely designed to be devoid of embedded spaces and special characters, so a quotation-mark-free version of the list is probably suitable for anything you need to do with it. And is probably easier to work with than a list that has some entries in quotes and others not.

            Evidently, though, this could be very problematic if some values of y contained embedded spaces or embedded quotes.


            Comment


            • #7
              You can fix this changing

              Code:
              local yvalues `yvalues' `"`=y[`j']'"'
              to

              Code:
              local yvalues `"`yvalues' `"`=y[`j']'"'"'
              The problem is, the first time through the original loop you have

              Code:
              local yvales `"dolater"'
              meaning that the second time through the loop

              Code:
              `yvalues'
              expands to

              Code:
              dolater
              (note that the double quotes are gone) because the double quotes bind the string, but are neither "literal" part of it, nor are preserved, which could be done by enclosing `yvalues' in (compound) double quotes. The full line would read

              Code:
              local yvalues do later `"thankyou"'
              which, the third time through the loop, causes

              Code:
              `yvalues'
              to expand to

              Code:
              dolater `"thankyou"'
              since the double quotes are now "literal" part of the macro's contents, not interpreted as binding characters anymore.


              Alternatively, code

              Code:
              local yvalues foobar
              before the loop starts, then strip the first word, "foobar", from yvalues after the loop has finished.


              Best
              Daniel
              Last edited by daniel klein; 23 Jan 2015, 15:58.

              Comment


              • #8
                Vincent,

                Congratulations on being one of the few who actually share example data that can be copied/pasted/used immediately.

                As a note, an alternative involves the -input- command:

                Code:
                input ///
                str15(x y)
                "Do Later"      "dolater"
                "Thank/You"     "thankyou"
                "What?"         "what"
                "Your Welcome"  "yourwelcome"
                "deers"         "deers"
                end
                You should:

                1. Read the FAQ carefully.

                2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

                3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

                4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

                Comment


                • #9
                  @Daniel

                  Thanks for the clear explanation of what is going on with the quotes in the macro. It makes perfect sense now.

                  Comment


                  • #10
                    An alternative solution involves using -valuesof- and -moremata-. You would need to install from SSC. (Both are from Ben Jann, a well known Stata user/programmer.)

                    Code:
                    ssc install moremata
                    ssc install valuesof
                    Then try

                    Code:
                    clear
                    set more off
                    
                    *----- example data -----
                    
                    input ///
                    str15(x y)
                    "Do Later"      "dolater"
                    "Thank/You"     "thankyou"
                    "What?"         "what"
                    "Your Welcome"  "yourwelcome"
                    "deers"         "deers"
                    end
                    
                    list
                    
                    *----- what you want -----
                    
                    valuesof x
                    local x = r(values)
                    local xvalues: list uniq x
                    
                    valuesof y
                    local y = r(values)
                    local yvalues: list uniq y
                    
                    display `"`xvalues'"'
                    display `"`yvalues'"'
                    You should:

                    1. Read the FAQ carefully.

                    2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

                    3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

                    4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

                    Comment


                    • #11
                      Thank you all for the help. Is there something I should do to close the post? I tried looking at the FAQ, but couldn't find instructions to do so.

                      Comment


                      • #12
                        You can create the yvalues local from the xvalues local. That is to say you only create xvalues using levelsof, and then process thru the elements of xvalues to change them to the requiered format.

                        Comment

                        Working...
                        X