Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorising numerous variables with different ranges

    Hi all,

    I have a range of test results & need to categorise them as low, normal, high, according to their normal ranges. I am looking to do this in an efficient way, & I'm not sure I've achieved that.

    So far I have the following code which allows me to quickly change the ranges & use the lower "generate categorical variable" section for the various tests I have without changing anything:

    Code:
    *generate locals
    local test wbc
    local low 2.8
    local high 7.7
    
    *generate categorical variable
    gen `test'_cat=.
    replace `test'_cat=1 if inrange(`test',0,`low') 
    replace `test'_cat=2 if inrange(`test',`low',`high')
    replace `test'_cat=3 if `test'>`high' & `test'!=.
    I have 10+ tests I need to run this code for, with different normal ranges. I am looking for a way to make this more efficient than copying the above again & changing the test & ranges.

    I am using Stata v14.2 for windows.

    I would really appreciate any help/guidance on this.

    Thanks,
    Bryony

  • #2
    1. Why do you want to throw away information?

    2. As a general recipe consider variations on

    Code:
    gen coarse = cond(missing(fine), ., cond(fine >= `high', 3, cond(fine >= `low', 2, 1)))
    or

    Code:
    gen coarse = (fine >= `high') +  (fine >= `low') + (fine >= 0) if fine < .

    Comment


    • #3
      Nick Cox gives great advice on simplifying your commands. Another way to do so is the -recode- command. It, however, will take longer to run (which might be an issue depending on the size of the input data), compared to the straightforward Stata functions Nick pointed to.

      In addition to that, regarding the automation for several such operations, you can (1) create a list with all the criteria you want to recode for, and (2) iterate over this list to auto-apply your recode operations. I try to illustrate this in the following example (with made-up sample data):
      Code:
      clear
      input wbc xbc
      1 1
      1 2
      2 2
      3 2
      3 3
      3 4
      4 4
      5 4
      5 5
      5 6
      6 6
      7 6
      7 7
      7 8
      8 8
      9 8
      end
      
      * syntax convention: <varname>|<low>|<high> <varname>|<low>|<high> [...]
      local iterations wbc|2.8|7.7 xbc|2.5|8.1
      
      * pre-define value label
      label define scoring 1 "low" 2 "modest" 3 "high"
      
      foreach iteration of local iterations {
          * split syntax into macros `test', `low', and `high'
          gettoken test rest : iteration , parse("|")
          gettoken pipe rest : rest , parse("|")
          gettoken low rest : rest , parse("|")
          gettoken pipe high : rest , parse("|")
          * generate categorical variable
          recode `test' (0/`low'=1) (`low'/`high'=2) (`high'/max=3) , generate(`test'_cat)
          * attach value label to newly created variable
          label values `test'_cat scoring
      }
      Last but not least: If the lower and upper boundaries of the categories can be extracted from the data itself, there is much more potential to automate the process.

      Regards
      Bela

      Comment


      • #4
        It's worth flagging that getting code short and sweet is quite possibly less of an issue than having code that is as nearly as possible self-documenting -- for yourself later, for collaborators and for consumers of your research. .

        For example, I am a fan of inrange() but for anyone new to Stata it won't necessarily be obvious whether inrange(x, a , b) corresponds to [a, b], [a, b), (a, b] or (a, b). Conversely even nested cond() are not as hard as sometimes implied to read as (e.g.)

        Code:
         cond(missing(fine), ., cond(fine >= `high', 3, cond(fine >= `low', 2, 1)))
        can, with a little instruction and practice, be read off as

        Code:
        if missing           return system missing
        otherwise if >= high return 3
        otherwise if >= low  return 2
        otherwise            return 1

        Comment


        • #5
          thank you so much - these comments have all been very helpful & I've been able to refine my code significantly

          Comment

          Working...
          X