Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate categorical variable usign quantile

    Hi, I need to create a Categorical variable which takes value based on the distribution of another one. My variable newproxy1_t range from 0 to 1. I would like to create a categorical variable equal 0 if newproxy1_t is 0, 1 for the value below the 0.6 quantile (except if newproxy1_t==0) and 2 if the values of newproxy1_t are above 0.6 quantile. I need help with the command.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float newproxy1_t
      .002398492
      .000760572
     .0009255342
     .0043989443
     .0009469199
      .013322884
     .0004572822
        .8344262
      .020992367
      .012115093
      .012616316
       .02273702
      .012616316
       .00041731
       .29005027
       .22095495
     .0001596679
     .0007194245
       .00065185
     .0004572822
        .5969549
    .00022686025
     .0008383903
      .009343715
     .0010445962
      .004448524
     .0001596679
     .0011673005
     .0011495574
      .001015916
        .3489583
        .5289608
     .0008383903
     .0032484184
     .0001596679
    .00011327594
     .0032484184
     .0011673005
     .0011740534
      .017035775
     .0004572822
     .0042737895
     .0011793916
        .6494555
      .005045526
       .07719298
      .000760572
      .005729085
       .03750375
      .009343715
      .024712645
     .0032484184
      .016735602
    .00011327594
     .0014688977
     .0014688977
     .0004572822
       .01017355
        .0190163
      .001015916
       .01124157
     .0042737895
       .00041731
     .0011673005
        .0996013
      .013322884
       .00777172
      .006516427
     .0011673005
    .00011327594
        .8555676
      .001973944
     .0024109297
       .00967118
      .013322884
       .00967118
        .1122975
     .0043989443
      .000961756
     .0011495574
       .27229196
      .007041927
      .007041927
     .0009313878
       .00065185
     .0009469199
      .005789594
      .008631085
        .4254324
     .0011673005
       .00065185
        .6715117
       .03981855
        .5496749
     .0043989443
      .019243853
      .007041927
      .004532125
      .005718954
        .5991238
    end


    Thanks for the help.
    Last edited by Enrico Azzini; 24 Jan 2022, 03:00.

  • #2
    Enrico:
    do you mean something along the following lines?
    Code:
      pctile pct = newproxy1_t , nq(10) genp(dec_newproxy1_t )
    
    
    . list pct if dec_newproxy1_t==60
    
         +----------+
         |      pct |
         |----------|
      6. | .0070419 |
         +----------+
    
    . g wanted=1 if newproxy1_t<=.0070419
    
    . replace wanted=2 if newproxy1_t>.0070419 & newproxy1_t!=.
    
    
    . list
    
         +-----------------------------------------+
         | newpro~t        pct   dec_ne~t   wanted |
         |-----------------------------------------|
      1. | .0023985   .0004573         10        1 |
      2. | .0007606   .0008384         20        1 |
      3. | .0009255   .0011496         30        1 |
      4. | .0043989   .0017214         40        1 |
      5. | .0009469   .0043989         50        1 |
         |-----------------------------------------|
      6. | .0133229   .0070419         60        2 |
      7. | .0004573   .0123657         70        1 |
      8. | .8344262   .0218647         80        2 |
      9. | .0209924   .3195043         90        2 |
     10. | .0121151          .          .        2 |
         |-----------------------------------------|
     11. | .0126163          .          .        2 |
     12. |  .022737          .          .        2 |
     13. | .0126163          .          .        2 |
     14. | .0004173          .          .        1 |
     15. | .2900503          .          .        2 |
         |-----------------------------------------|
     16. |  .220955          .          .        2 |
     17. | .0001597          .          .        1 |
     18. | .0007194          .          .        1 |
     19. | .0006519          .          .        1 |
     20. | .0004573          .          .        1 |
         |-----------------------------------------|
     21. | .5969549          .          .        2 |
     22. | .0002269          .          .        1 |
     23. | .0008384          .          .        1 |
     24. | .0093437          .          .        2 |
     25. | .0010446          .          .        1 |
         |-----------------------------------------|
     26. | .0044485          .          .        1 |
     27. | .0001597          .          .        1 |
     28. | .0011673          .          .        1 |
     29. | .0011496          .          .        1 |
     30. | .0010159          .          .        1 |
         |-----------------------------------------|
     31. | .3489583          .          .        2 |
     32. | .5289608          .          .        2 |
     33. | .0008384          .          .        1 |
     34. | .0032484          .          .        1 |
     35. | .0001597          .          .        1 |
         |-----------------------------------------|
     36. | .0001133          .          .        1 |
     37. | .0032484          .          .        1 |
     38. | .0011673          .          .        1 |
     39. | .0011741          .          .        1 |
     40. | .0170358          .          .        2 |
         |-----------------------------------------|
     41. | .0004573          .          .        1 |
     42. | .0042738          .          .        1 |
     43. | .0011794          .          .        1 |
     44. | .6494555          .          .        2 |
     45. | .0050455          .          .        1 |
         |-----------------------------------------|
     46. |  .077193          .          .        2 |
     47. | .0007606          .          .        1 |
     48. | .0057291          .          .        1 |
     49. | .0375037          .          .        2 |
     50. | .0093437          .          .        2 |
         |-----------------------------------------|
     51. | .0247126          .          .        2 |
     52. | .0032484          .          .        1 |
     53. | .0167356          .          .        2 |
     54. | .0001133          .          .        1 |
     55. | .0014689          .          .        1 |
         |-----------------------------------------|
     56. | .0014689          .          .        1 |
     57. | .0004573          .          .        1 |
     58. | .0101735          .          .        2 |
     59. | .0190163          .          .        2 |
     60. | .0010159          .          .        1 |
         |-----------------------------------------|
     61. | .0112416          .          .        2 |
     62. | .0042738          .          .        1 |
     63. | .0004173          .          .        1 |
     64. | .0011673          .          .        1 |
     65. | .0996013          .          .        2 |
         |-----------------------------------------|
     66. | .0133229          .          .        2 |
     67. | .0077717          .          .        2 |
     68. | .0065164          .          .        1 |
     69. | .0011673          .          .        1 |
     70. | .0001133          .          .        1 |
         |-----------------------------------------|
     71. | .8555676          .          .        2 |
     72. | .0019739          .          .        1 |
     73. | .0024109          .          .        1 |
     74. | .0096712          .          .        2 |
     75. | .0133229          .          .        2 |
         |-----------------------------------------|
     76. | .0096712          .          .        2 |
     77. | .1122975          .          .        2 |
     78. | .0043989          .          .        1 |
     79. | .0009618          .          .        1 |
     80. | .0011496          .          .        1 |
         |-----------------------------------------|
     81. |  .272292          .          .        2 |
     82. | .0070419          .          .        2 |
     83. | .0070419          .          .        2 |
     84. | .0009314          .          .        1 |
     85. | .0006519          .          .        1 |
         |-----------------------------------------|
     86. | .0009469          .          .        1 |
     87. | .0057896          .          .        1 |
     88. | .0086311          .          .        2 |
     89. | .4254324          .          .        2 |
     90. | .0011673          .          .        1 |
         |-----------------------------------------|
     91. | .0006519          .          .        1 |
     92. | .6715117          .          .        2 |
     93. | .0398186          .          .        2 |
     94. | .5496749          .          .        2 |
     95. | .0043989          .          .        1 |
         |-----------------------------------------|
     96. | .0192439          .          .        2 |
     97. | .0070419          .          .        2 |
     98. | .0045321          .          .        1 |
     99. |  .005719          .          .        1 |
    100. | .5991238          .          .        2 |
         +-----------------------------------------+
    
    .
    As an aside, please note that is a better habit to define two-level categiorical variables as 0/1 (following the way Stata reorders them internally) than 1/2.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Code:
      ._pctile newproxy1_t, p(60)
       
      . gen wanted = cond(newproxy1_t == 0, 0, 1 + (newproxy1_t >= r(r1)))

      Comment


      • #4
        Thank you both for the help.
        Carlo, the categorical variable that I want to create have three level: 0, 1 and 2. Just to be sure, the variables pct and dec_newproxy1_t have only 9 values, is it becauese they display only once the values of newproxy1_t associated to a certain percentile and the corresponding percentile right?

        Comment


        • #5
          Enrico:
          1) you're right, I misread your previous post.
          Hence my proposed code (that is outperformed by Nick's one) becomes:
          Code:
          . g wanted=0 if newproxy1_t==0
          
          . replace wanted=1 if newproxy1_t<=.0070419 & newproxy1_t>0
          
          . replace wanted=2 if newproxy1_t>.0070419 & newproxy1_t!=.
          
          
          . list
          
               +-----------------------------------------+
               | newpro~t        pct   dec_ne~t   wanted |
               |-----------------------------------------|
            1. | .0023985   .0004573         10        1 |
            2. | .0007606   .0008384         20        1 |
            3. | .0009255   .0011496         30        1 |
            4. | .0043989   .0017214         40        1 |
            5. | .0009469   .0043989         50        1 |
               |-----------------------------------------|
            6. | .0133229   .0070419         60        2 |
            7. | .0004573   .0123657         70        1 |
            8. | .8344262   .0218647         80        2 |
            9. | .0209924   .3195043         90        2 |
           10. | .0121151          .          .        2 |
               |-----------------------------------------|
           11. | .0126163          .          .        2 |
           12. |  .022737          .          .        2 |
           13. | .0126163          .          .        2 |
           14. | .0004173          .          .        1 |
           15. | .2900503          .          .        2 |
               |-----------------------------------------|
           16. |  .220955          .          .        2 |
           17. | .0001597          .          .        1 |
           18. | .0007194          .          .        1 |
           19. | .0006519          .          .        1 |
           20. | .0004573          .          .        1 |
               |-----------------------------------------|
           21. | .5969549          .          .        2 |
           22. | .0002269          .          .        1 |
           23. | .0008384          .          .        1 |
           24. | .0093437          .          .        2 |
           25. | .0010446          .          .        1 |
               |-----------------------------------------|
           26. | .0044485          .          .        1 |
           27. | .0001597          .          .        1 |
           28. | .0011673          .          .        1 |
           29. | .0011496          .          .        1 |
           30. | .0010159          .          .        1 |
               |-----------------------------------------|
           31. | .3489583          .          .        2 |
           32. | .5289608          .          .        2 |
           33. | .0008384          .          .        1 |
           34. | .0032484          .          .        1 |
           35. | .0001597          .          .        1 |
               |-----------------------------------------|
           36. | .0001133          .          .        1 |
           37. | .0032484          .          .        1 |
           38. | .0011673          .          .        1 |
           39. | .0011741          .          .        1 |
           40. | .0170358          .          .        2 |
               |-----------------------------------------|
           41. | .0004573          .          .        1 |
           42. | .0042738          .          .        1 |
           43. | .0011794          .          .        1 |
           44. | .6494555          .          .        2 |
           45. | .0050455          .          .        1 |
               |-----------------------------------------|
           46. |  .077193          .          .        2 |
           47. | .0007606          .          .        1 |
           48. | .0057291          .          .        1 |
           49. | .0375037          .          .        2 |
           50. | .0093437          .          .        2 |
               |-----------------------------------------|
           51. | .0247126          .          .        2 |
           52. | .0032484          .          .        1 |
           53. | .0167356          .          .        2 |
           54. | .0001133          .          .        1 |
           55. | .0014689          .          .        1 |
               |-----------------------------------------|
           56. | .0014689          .          .        1 |
           57. | .0004573          .          .        1 |
           58. | .0101735          .          .        2 |
           59. | .0190163          .          .        2 |
           60. | .0010159          .          .        1 |
               |-----------------------------------------|
           61. | .0112416          .          .        2 |
           62. | .0042738          .          .        1 |
           63. | .0004173          .          .        1 |
           64. | .0011673          .          .        1 |
           65. | .0996013          .          .        2 |
               |-----------------------------------------|
           66. | .0133229          .          .        2 |
           67. | .0077717          .          .        2 |
           68. | .0065164          .          .        1 |
           69. | .0011673          .          .        1 |
           70. | .0001133          .          .        1 |
               |-----------------------------------------|
           71. | .8555676          .          .        2 |
           72. | .0019739          .          .        1 |
           73. | .0024109          .          .        1 |
           74. | .0096712          .          .        2 |
           75. | .0133229          .          .        2 |
               |-----------------------------------------|
           76. | .0096712          .          .        2 |
           77. | .1122975          .          .        2 |
           78. | .0043989          .          .        1 |
           79. | .0009618          .          .        1 |
           80. | .0011496          .          .        1 |
               |-----------------------------------------|
           81. |  .272292          .          .        2 |
           82. | .0070419          .          .        2 |
           83. | .0070419          .          .        2 |
           84. | .0009314          .          .        1 |
           85. | .0006519          .          .        1 |
               |-----------------------------------------|
           86. | .0009469          .          .        1 |
           87. | .0057896          .          .        1 |
           88. | .0086311          .          .        2 |
           89. | .4254324          .          .        2 |
           90. | .0011673          .          .        1 |
               |-----------------------------------------|
           91. | .0006519          .          .        1 |
           92. | .6715117          .          .        2 |
           93. | .0398186          .          .        2 |
           94. | .5496749          .          .        2 |
           95. | .0043989          .          .        1 |
               |-----------------------------------------|
           96. | .0192439          .          .        2 |
           97. | .0070419          .          .        2 |
           98. | .0045321          .          .        1 |
           99. |  .005719          .          .        1 |
          100. | .5991238          .          .        2 |
               +-----------------------------------------+
          
          .
          2) Correct.
          Kind regards,
          Carlo
          (Stata 18.0 SE)

          Comment


          • #6
            There are no missings in the data example but


            Code:
             
             gen wanted = cond(newproxy1_t == 0, 0, 1 + (newproxy1_t >= r(r1)) if newproxy1_t < .
            would in general do no harm and would avoid missings being mapped to 1.

            Comment


            • #7
              thank you very much!

              Comment


              • #8
                Typo in #6: the danger is that missings would be mapped to 2 if not excluded.

                Comment

                Working...
                X