Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to change order with -encode-

    Hello Statalist -

    I have a rather simple request, and I believe that there should be a simple solution out there! I am running IC/15.1.

    I have a string variable that I would like to encode. It is simple to do the following:
    Code:
    encode string_var, generate(encoded_var)
    However, I would like "1" to correspond to the most frequent value, "2" to the second-most frequent, etc.

    An example; the below is using
    Code:
    tabulate string_var

    If I use
    Code:
    encode
    , then "DRC" will have a value of 2, "Drug Crt." will have a value of 3, etc.
    I would like "Int. Supv." to have a value of "2", "DRC" to have a value of 3, etc.


    Thank you very much!
    Spencer

  • #2
    Spencer:
    this seems a task for -sort-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      I think you have two choices, when you don't want to the default encoding order of encode. One is to use sencode from SSC, and the other is to create your label prior to using encode and then use the label option of encode.

      Comment


      • #4
        Dave is correct.
        -sort- won't do the trick in Spencer's case.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you, SSC sencode is exactly what I'm looking for!

          Best,
          Spencer

          Comment


          • #6
            More choices here than two.

            Most trivially, if it were just a case of sorting out [pun intended] the table, then

            Code:
            tabulate string_var, sort
            will do it.

            That said, how to encode in this way? Here is a dopey example with the auto data, absent anything that can be copied and pasted from #1.


            Code:
            . sysuse auto, clear
            (1978 Automobile Data)
            
            . tab rep78
            
                 Repair |
            Record 1978 |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                      1 |          2        2.90        2.90
                      2 |          8       11.59       14.49
                      3 |         30       43.48       57.97
                      4 |         18       26.09       84.06
                      5 |         11       15.94      100.00
            ------------+-----------------------------------
                  Total |         69      100.00
            
            . bysort rep78 : gen negfreq = -_N
            
            . egen rep78_2 = group(negfreq rep78)
            (5 missing values generated)
            
            . labmask rep78_2 , values(rep78)
            
            . label var rep78_2 "`: var label rep78'"
            
            . tab rep78_2
            
                 Repair |
            Record 1978 |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                      3 |         30       43.48       43.48
                      4 |         18       26.09       69.57
                      5 |         11       15.94       85.51
                      2 |          8       11.59       97.10
                      1 |          2        2.90      100.00
            ------------+-----------------------------------
                  Total |         69      100.00
            Here labmask which has some overlap in functionality with sencode is downloadable from the Stata Journal files.

            Comment


            • #7
              Here's another solution with only official Stata and no egen. So, it's somewhat Quaker-Shaker-Ikea-Bauhaus-minimalist in use of tools.

              It's not much longer!


              Code:
              sysuse auto, clear
              tab rep78
              
              bysort rep78 : gen negfreq = cond(missing(rep78), .,  -_N) 
              sort negfreq rep78 
              gen rep78_2 = sum(negfreq != negfreq[_n-1] | rep78 != rep78[_n-1]) if !missing(rep78)  
              
              su rep78_2, meanonly 
              forval j = 1/`r(max)' { 
                  su rep78 if rep78_2 == `j', meanonly 
                  label def rep78_2 `j' "`r(min)'", modify 
              } 
                  
              label val rep78_2 rep78_2 
              
              tab rep78_2

              Comment

              Working...
              X