Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate a new categorical variable from a string variable

    Hi

    The variable below is a string variable of procedures and I'm trying to generate a new categorical variable into:
    1 - Permanent implant (A041, A052, A053, A298, A299)
    2 - Temporary implant (A064, A074, A084,A085, A094, A104, A114, A335, A336)
    3 - No implant (All the other codes not listed above)

    Would appreciate if anyone could help with the code for this thanks

    Code:
      |
      procedure |
           code |      Freq.     Percent        Cum.
    ------------+-----------------------------------
           A041 |      2,266        1.27        1.27
           A042 |        581        0.33        1.59
           A043 |         48        0.03        1.62
           A048 |         37        0.02        1.64
           A049 |         47        0.03        1.67
           A051 |        917        0.51        2.18
           A052 |        203        0.11        2.30
           A053 |      2,875        1.61        3.91
           A058 |         86        0.05        3.95
           A059 |        113        0.06        4.02
           A061 |      1,494        0.84        4.85
           A062 |      4,078        2.28        7.14
           A063 |      1,738        0.97        8.11
           A064 |      1,092        0.61        8.72
           A065 |        107        0.06        8.78
           A068 |        145        0.08        8.87
           A069 |        426        0.24        9.10
           A071 |     14,548        8.15       17.25
           A072 |     15,109       10.26       27.51
           A073 |     11,234        6.29       33.80
           A074 |      4,264        2.39       36.19
           A075 |         64        0.37       36.56
           A078 |        692        0.39       36.94
           A079 |         58        1.54       38.49
           A081 |        497        0.28       38.77
           A082 |        109        0.06       38.83
           A083 |        388        0.22       39.05
           A084 |        130        0.07       39.12
           A085 |        381        0.21       39.33
           A086 |         12        0.01       39.34
           A088 |         60        0.03       39.37
           A089 |        111        0.06       39.43
           A091 |      1,710        0.96       40.39
           A092 |      1,884        1.06       41.45
           A093 |      2,178        1.22       42.67
           A094 |        514        0.29       42.96
           A095 |      1,238        0.69       43.65
           A096 |         80        0.04       43.69
           A098 |        116        0.06       43.76
           A099 |        418        0.23       43.99
           A101 |      1,659        0.93       44.92
           A102 |      4,343        2.43       47.36
           A103 |      5,539        3.10       50.46
           A104 |      1,279        0.72       51.17
           A105 |      3,180        1.78       52.96
           A106 |        275        0.15       53.11
           A108 |        313        0.18       53.29
           A109 |      1,218        0.68       53.97
           A111 |        658        0.37       54.34
           A112 |        933        0.52       54.86
           A113 |        691        0.39       55.25
           A114 |      2,827        1.58       56.83
           A116 |         24        0.01       56.84
           A118 |        289        0.16       57.00
           A119 |        667        0.37       57.38
           A291 |         88        0.05       57.43
           A292 |         55        0.03       57.46
           A293 |        143        0.08       57.54
           A294 |         94        0.05       57.59
           A298 |      2,590        1.45       59.04
           A299 |      1,246        0.70       59.74
           A331 |      7,476        4.19       63.93
           A332 |        409        0.23       64.16
           A333 |     1,037        8.98       73.14
           A334 |     7,891        6.10       79.24
           A335 |      9,347       10.84       90.08
           A336 |     8,632        5.96       96.03
           A337 |        391        0.22       96.25
           A338 |      1,248        0.70       96.95
           A339 |        739        0.41       97.37
           A341 |      3,328        1.86       99.23
           A411 |      1,039        0.58       99.81
           A414 |         18        0.01       99.82
           A471 |         85        0.05       99.87
           A478 |        140        0.08       99.95
           A479 |         69        0.04       99.99
           A661 |         14        0.01       99.99
           A662 |          2        0.00       99.99
           A668 |          8        0.00      100.00
           A669 |          1        0.00      100.00
    ------------+-----------------------------------
          Total |    145243      100.00

  • #2
    Something like this?
    Code:
    gen byte wanted = cond(inlist(procedure_code,"A041","A052","A053","A298","A299"), 1, ///
                    cond(inlist(procedure_code,"A064", "A074", "A084", "A085", "A094", "A104", "A114", "A335", "A336"), 2, ///
                    cond(!missing(procedure_code),3,.)))
    label define IMPLANT_TYPES 1 "Permanent implant" 2 "Temporary implant" 3 "No implant"
    label values wanted IMPLANT_TYPES
    By the way, for the second category, you had 9 codes, which is exactly the upper limit of the number of string comparisons allowed in the inlist() function. If you have more codes, you would need to have multiple inlist() functions separated by |
    Last edited by Hemanshu Kumar; 04 Nov 2022, 06:52.

    Comment


    • #3
      I doubt this problem is quite big enough, but for bigger problems with the same flavour it can be practical to record the mapping in a separate dataset and then merge.

      More at https://www.stata.com/support/faqs/d...s-for-subsets/

      Comment


      • #4
        Hi Hemanshu Kumar, I have exceeded the inlist limit and have tried creating multiple inlist as you have adviced but it shows 'cond not recognised' on stata


        Code:
         gen byte wanted = cond(inlist(procedure code,"A041","A052","A053","A298","A299"), 1, ///
                        cond(inlist(procedure code,"A064", "A074", "A084", "A085", "A094", "A095", "A104", " A105", "A114") | inlist(procedure_code,"A335", "A336"), 2,                ///       
                        cond(!missing(procedure_code),3,.)))
        Is there an error in the command above? Thanks

        Comment


        • #5
          Judging from what you typed, Stata may be confused because procedure_code appears twice as procedure code.

          My students and by extension I often find that underscores disappear on copy and paste from one application to another under Windows.

          Against that guess is that when I type a minimal test script


          Code:
          clear 
          
          input str4 procedure_code 
          "frog"
          "toad"
          "newt"
          end 
          
          gen whatever = /// 
          cond(inlist(procedure code,"A041"), 1, ///
          cond(inlist(procedure code,"A064") | inlist(procedure_code,"A335"), 2, ///       
          cond(!missing(procedure_code),3,.)))
          the error message is that procedurecode is not found, so Stata is trying to ignore the space.

          The advice in #3 still stands!

          Comment


          • #6
            Thank you Nick Cox

            I just tried your advice in #3 and it worked better, cheers

            Comment

            Working...
            X