Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Encoding string variables based on the values of a numeric variable

    Deal All,

    I need to make a two-way bar graph and to label the x-axis with names coming from a string variable. I found that I can do that by encoding the string variable and using the value label, with this code:

    twoway (rcap upper lower c3, lcolor(black)) || (scatter coef c3, lcolor(black) mcolor(black)) legend(off) yscale(range(-0.3 .3)) ylabel(-.3 -.15 .15 .3) xscale(range(1 2)) xlabel(#12, valuelabel angle(45) labsize(small)) graphregion(color(white) ilcolor(black)) plotregion(lcolor(black))

    However, I would like the names not being sorted alphabetically. Is there a way to encode a string variable based on the numeric value of another variable? Or to label the x-axis with a string variable?

    Thanks a lot for your help!

  • #2
    The user written package -labmask- should help you; type -findit labmask-.

    hth,
    Jeph

    Comment


    • #3
      Originally posted by Jeph Herrin View Post
      The user written package -labmask- should help you; type -findit labmask-.

      hth,
      Jeph
      Thanks, Jeph, but what I need is a variable encoded with the same "name" as the string variable but not sorted alphabetically. I can sort non-alphabetically with labmask, but then I don't have the string associated name...

      Comment


      • #4
        Please show a reproducible example of what you (i) have, (ii) want, and (iii) how labmask (probably from labutil on SSC) is not suitable.

        Comment


        • #5
          Originally posted by daniel klein View Post
          Please show a reproducible example of what you (i) have, (ii) want, and (iii) how labmask (probably from labutil on SSC) is not suitable.
          Sure Daniel,

          I have a dataset like that
          variable coefficient se upper lower var8
          age 0.03 0.05 0.01 0.06 1
          gender 0.02 0.04 0.01 0.05 1.1
          income 0.01 0.04 0.01 0.06 1.2
          and I'm using this code here:

          encode variable, gen(c3)

          twoway (rcap upper lower c3, lcolor(black)) || (scatter coef c3, lcolor(black) mcolor(black)) legend(off) yscale(range(-0.3 .3)) ylabel(-.3 -.15 .15 .3) xscale(range(1 2)) xlabel(#12, valuelabel angle(45) labsize(small)) graphregion(color(white) ilcolor(black)) plotregion(lcolor(black))

          Now, c3 has the variable value in alphabetically order, while I want to show the variable label, such age, gender, etc... but not in alphabetical order. So I would like to encode variable based on the value of var8. Does it make sense?

          Comment


          • #6
            Honestly, I find all of what you show here a bit confusing. First, your [graph] twoway command misses a comma and, therefore, produces errors. Please do not re-type or edit any commands; copy and paste what you have typed, exactly. Datasets are best shown with dataex (part of your Stata or from SSC).

            That aside, the values (strings) of variable variable are alphabetically ordered in the dataset; the values in var8 are in that exact same order. So the sort order of your encoded variable is exactly the way you want it. I believe you want to have the values in var8 in value labels. That is impossible; you cannot label non-integer values.

            Here is a quick ad-hoc fix that gives the result you are probably after.

            Code:
            twoway (rcap upper lower c3, lcolor(black)) || (scatter coef c3, lcolor(black) mcolor(black)) , legend(off) yscale(range(-0.3 .3)) ylabel(-.3 -.15 .15 .3) /*xscale(range(1 2))*/ xlabel(#3, valuelabel angle(45) labsize(small)) graphregion(color(white) ilcolor(black)) plotregion(lcolor(black))

            I would try to take a step back, though. Where does this dataset, especially the value in var8 come from? Perhaps even more basic, I believe the graph that you are trying to create might be easier created using margins and marginsplot or coefplot (SSC).
            Last edited by daniel klein; 01 Dec 2020, 15:14.

            Comment


            • #7
              Originally posted by daniel klein View Post
              Honestly, I find all of what you show here a bit confusing. First, your [graph] twoway command misses a comma and, therefore, produces errors. Please do not re-type or edit any commands; copy and paste what you have typed, exactly. Datasets are best shown with dataex (part of your Stata or from SSC).

              That aside, the values (strings) of variable variable are alphabetically ordered in the dataset; the values in var8 are in that exact same order. So the sort order of your encoded variable is exactly the way you want it. I believe you want to have the values in var8 in value labels. That is impossible; you cannot label non-integer values.

              Here is a quick ad-hoc fix that gives the result you are probably after.

              Code:
              twoway (rcap upper lower c3, lcolor(black)) || (scatter coef c3, lcolor(black) mcolor(black)) , legend(off) yscale(range(-0.3 .3)) ylabel(-.3 -.15 .15 .3) /*xscale(range(1 2))*/ xlabel(#3, valuelabel angle(45) labsize(small)) graphregion(color(white) ilcolor(black)) plotregion(lcolor(black))

              I would try to take a step back, though. Where does this dataset, especially the value in var8 come from? Perhaps even more basic, I believe the graph that you are trying to create might be easier created using margins and marginsplot or coefplot (SSC).
              No, Daniel, I just want to have instead of the values of var8 the corresponding names in variable string.

              Comment


              • #8
                Maybe use -sencode- (installable from SSC)?

                Code:
                sencode variable, gen(c3) gsort(var8)

                Comment


                • #9
                  Originally posted by Ali Atia View Post
                  Maybe use -sencode- (installable from SSC)?

                  Code:
                  sencode variable, gen(c3) gsort(var8)
                  I've tried, but it still sorts in alphabetically order... Exactly the same code you wrote.

                  Comment


                  • #10
                    Originally posted by Ali Atia View Post
                    Maybe use -sencode- (installable from SSC)?

                    Code:
                    sencode variable, gen(c3) gsort(var8)
                    Sorry, Ali, I've tried again and it worked. Thanks a lot for pointing this out, it really helped!

                    Comment


                    • #11
                      I am glad that sencode worked for you. I have one further remark on how to improve your questions here on Statalist: Your example should illustrate your problem. Your original example is this:

                      Code:
                      clear
                      input str6 variable    coefficient    se    upper    lower    var8
                      "age"    0.03 0.05 0.01    0.06 1
                      "gender" 0.02 0.04 0.01    0.05 1.1
                      "income" 0.01 0.04 0.01    0.06 1.2
                      end
                      
                      encode variable , generate(c3)
                      Note that the values in variable variable are already in the correct order according to the values in variable var8. This is because the values in var8 are already sorted. Therefore, this example does not show the problem. Replacing encode with sencode does not change the result in this example at all.

                      Code:
                      *ssc install sencode
                      sencode variable, gen(c3_2) gsort(var8)
                      
                      list
                      list , nolabel
                      [/code]

                      produces

                      Code:
                      . list
                      
                           +--------------------------------------------------------------------+
                           | variable   coeffi~t    se   upper   lower   var8       c3     c3_2 |
                           |--------------------------------------------------------------------|
                        1. |      age        .03   .05     .01     .06      1      age      age |
                        2. |   gender        .02   .04     .01     .05    1.1   gender   gender |
                        3. |   income        .01   .04     .01     .06    1.2   income   income |
                           +--------------------------------------------------------------------+
                      
                      . list , nolabel
                      
                           +--------------------------------------------------------------+
                           | variable   coeffi~t    se   upper   lower   var8   c3   c3_2 |
                           |--------------------------------------------------------------|
                        1. |      age        .03   .05     .01     .06      1    1      1 |
                        2. |   gender        .02   .04     .01     .05    1.1    2      2 |
                        3. |   income        .01   .04     .01     .06    1.2    3      3 |
                           +--------------------------------------------------------------+
                      where c3 and c3_2 are exactly the same.


                      Here is how to get that same result (irrespective of original sort order) with labmask

                      Code:
                      *ssc install labutil
                      sort var8
                      generate `c(obs_t)' c3_3 = _n
                      labmask c3_3 , values(variable)
                      and two ways to get the same with elabel (SSC)

                      Code:
                      *ssc install elabel
                      sort var8
                      elabel define c3_4_label = levels(variable) , uniq
                      encode variable , generate(c3_4) label(c3_4_label)
                      
                      sort var8
                      elabel define c3_5_label = encode(variable) , nosort
                      encode variable , generate(c3_5) label(c3_5_label)

                      Comment


                      • #12
                        Originally posted by daniel klein View Post
                        I am glad that sencode worked for you. I have one further remark on how to improve your questions here on Statalist: Your example should illustrate your problem. Your original example is this:

                        Code:
                        clear
                        input str6 variable coefficient se upper lower var8
                        "age" 0.03 0.05 0.01 0.06 1
                        "gender" 0.02 0.04 0.01 0.05 1.1
                        "income" 0.01 0.04 0.01 0.06 1.2
                        end
                        
                        encode variable , generate(c3)
                        Note that the values in variable variable are already in the correct order according to the values in variable var8. This is because the values in var8 are already sorted. Therefore, this example does not show the problem. Replacing encode with sencode does not change the result in this example at all.

                        Code:
                        *ssc install sencode
                        sencode variable, gen(c3_2) gsort(var8)
                        
                        list
                        list , nolabel
                        [/code]

                        produces

                        Code:
                        . list
                        
                        +--------------------------------------------------------------------+
                        | variable coeffi~t se upper lower var8 c3 c3_2 |
                        |--------------------------------------------------------------------|
                        1. | age .03 .05 .01 .06 1 age age |
                        2. | gender .02 .04 .01 .05 1.1 gender gender |
                        3. | income .01 .04 .01 .06 1.2 income income |
                        +--------------------------------------------------------------------+
                        
                        . list , nolabel
                        
                        +--------------------------------------------------------------+
                        | variable coeffi~t se upper lower var8 c3 c3_2 |
                        |--------------------------------------------------------------|
                        1. | age .03 .05 .01 .06 1 1 1 |
                        2. | gender .02 .04 .01 .05 1.1 2 2 |
                        3. | income .01 .04 .01 .06 1.2 3 3 |
                        +--------------------------------------------------------------+
                        where c3 and c3_2 are exactly the same.


                        Here is how to get that same result (irrespective of original sort order) with labmask

                        Code:
                        *ssc install labutil
                        sort var8
                        generate `c(obs_t)' c3_3 = _n
                        labmask c3_3 , values(variable)
                        and two ways to get the same with elabel (SSC)

                        Code:
                        *ssc install elabel
                        sort var8
                        elabel define c3_4_label = levels(variable) , uniq
                        encode variable , generate(c3_4) label(c3_4_label)
                        
                        sort var8
                        elabel define c3_5_label = encode(variable) , nosort
                        encode variable , generate(c3_5) label(c3_5_label)
                        Thank you, Daniel! This really helps. I agree I've chosen a poor example and for sure I will be more precise in my code next time I post a question. I really appreciate your help no matter my initial confusion. Thanks for taking the time!

                        Comment


                        • #13
                          Thanks to Jeph Herrin and daniel klein for posting on labmask. Here is a footnote on that command.

                          Code:
                          search labmask


                          reveals, among other details,


                          SJ-8-2 gr0034 . . . . . . . . . . Speaking Stata: Between tables and graphs
                          (help labmask, seqvar if installed) . . . . . . . . . . . . N. J. Cox
                          Q2/08 SJ 8(2):269--289
                          outlines techniques for producing table-like graphs


                          labutil from http://fmwww.bc.edu/RePEc/bocode/l
                          'LABUTIL': modules for managing value and variable labels / labcopy copies
                          value labels, or swaps them around. labdel deletes / them. lablog defines
                          value labels for values which are base 10 / logarithms containing the
                          antilogged values. labcd defines value / labels in which decimal points

                          so while Daniel is entirely right -- the files are part of the labutil package on SSC -- I tend to cite the 2008 Stata Journal paper, which shows examples. ,as the source. The code is the same either way.

                          Another reason for citing the 2008 paper is that labutil is otherwise a ragbag of itty-bitty commands that, as far as I know, still do what they were intended to do when written some years ago, but are largely superseded now by Daniel's own command.
                          Last edited by Nick Cox; 02 Dec 2020, 03:23.

                          Comment

                          Working...
                          X