Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a new variable by count of specific disease IC9 code across multiple diagnosis

    Hello,

    I am working on a dataset with almost 450,000 cases with several columns of diagnostic codes(DX), labeled DX1-DX30. Each row of Dx per case represents clinical diagnostic codes per patient. A case may have up to 30 obs/visits with different or similar diagnostic codes written in ICD-9 format (International Classification of Disease codes); these codes are string variables in the form "415.11". A patient could have 30 rows of different codes representing different or the same diagnoses.

    I want to be able to count the number of specific diagnoses (for instance number of code 415.11 or 415.13) per patient for all the Dx and create a new variable containing these numbers.

    I am not using the right commands as the results are not similar to what I am getting on SPSS (I am very familiar with the process, using SPSS). Please I need guidance in the right direction.

    Thanks.

    Olowu

  • #2
    Please present a data example using dataex for specific code suggestions (see FAQ Advice #12 for details). It may be easier to have the data in long layout with one DX variable and a categorical variable with the description. Then

    Code:
    help contract
    may help.
    Last edited by Andrew Musau; 24 Aug 2021, 06:14.

    Comment


    • #3
      from your description in #1 it appears that some of the Dx codes can be duplicated; it is, however, not clear whether you want a count of the number of codes or the number of distinct codes; if the former, that is easily done using -egen- with the "rownonmissing" function (since these are string variables, be sure to use the "strok" option; see
      Code:
      help egen
      if, however, you want the number of distinct (i.e., non-duplicated codes), then I recommend the following steps: (1) make a dataset that only includes these 30 variables and some id; (2) reshape from wide to long; (3) use the -duplicates- command to drop duplicates within an id; (4) drop observations with missing values on your Dx variable; (5) use -egen- with the "count" function to add a new variable with the count; (6) you will only need to keep one observation per id at this point (as you have you major data set, right?) so again use -duplicates- after dropping the Dx variable; (7) -merge- this data set back into your main data

      note that Andrew's suggestion of contract may be faster (or may not be)

      Comment


      • #4
        Hi Andrew,

        Apologies. I am new to this forum. A dataex sample of the data set is pasted below. I basically want to count the number of 41511 or 41513 or 41519 per row, and record the result in a new variable. Kindly assist with any command to achieve this

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str5(DX1 DX2 DX3 DX4 DX5)
        "45342" "41513" "51882" "4660"  "45340"
        "41519" "0389"  "99591" "5750"  "1830" 
        "41519" "49390" "29680" "33829" "7245" 
        "41519" "51881" "45341" "45342" "4536" 
        "41519" "51882" "340"   "33829" "79311"
        "41519" "51881" "45342" "5990"  "2851" 
        "4162"  "51884" "486"   "262"   "V667" 
        "28529" "486"   "41519" "70703" "70724"
        "2309"  "41519" "5601"  "5180"  "99749"
        "59381" "41519" "59080" "42820" "4254" 
        end
        Regards
        Banjo

        Comment


        • #5
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str5(DX1 DX2 DX3 DX4 DX5)
          "45342" "41513" "51882" "4660"  "45340"
          "41519" "0389"  "99591" "5750"  "1830"
          "41519" "49390" "29680" "33829" "7245"
          "41519" "51881" "45341" "45342" "4536"
          "41519" "51882" "340"   "33829" "79311"
          "41519" "51881" "45342" "5990"  "2851"
          "4162"  "51884" "486"   "262"   "V667"
          "28529" "486"   "41519" "70703" "70724"
          "2309"  "41519" "5601"  "5180"  "99749"
          "59381" "41519" "59080" "42820" "4254"
          end
          
          preserve
          gen obs_no=_n
          reshape long DX, i(obs_no) j(which)
          keep if inlist(DX, "41511", "41513", "41519")
          contract DX which
          l, sep(0)
          restore
          Res.:

          Code:
          . l, sep(0)
          
               +-----------------------+
               | which      DX   _freq |
               |-----------------------|
            1. |     2   41513       1 |
            2. |     1   41519       5 |
            3. |     2   41519       2 |
            4. |     3   41519       1 |
               +-----------------------+
          Here, "which" tells you which DX, e.g., which=2 is DX2, which=3 is DX3, and so on. _freq gives you the counts. If you want the counts of all 3 codes, then your contract command will be simply

          Code:
          contract which
          and record the result in a new variable.
          I do not see how this is possible within your wide dataset. You will have a variable for each DX variable. If you separate the codes, the number of variables will increase to (no. of DX variables \(\times\) no. of codes) assuming each code is present in each DX variable.

          Comment


          • #6
            Hi Andrew, is it possible to sum the obs_no per row and gen in a new variable?

            Comment


            • #7
              Like this?

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input str5(DX1 DX2 DX3 DX4 DX5)
              "45342" "41513" "51882" "4660"  "45340"
              "41519" "0389"  "99591" "5750"  "1830"
              "41519" "49390" "29680" "33829" "7245"
              "41519" "51881" "45341" "45342" "4536"
              "41519" "51882" "340"   "33829" "79311"
              "41519" "51881" "45342" "5990"  "2851"
              "4162"  "51884" "486"   "262"   "V667"
              "28529" "486"   "41519" "70703" "70724"
              "2309"  "41519" "5601"  "5180"  "99749"
              "59381" "41519" "59080" "42820" "4254"
              end
              
              gen wanted= inlist(DX1, "41511", "41513", "41519")
              forval i=2/5{
                  replace wanted= wanted+ inlist(DX`i', "41511", "41513", "41519")
              }
              Res.:

              Code:
              . l, sep(0)
              
                   +------------------------------------------------+
                   |   DX1     DX2     DX3     DX4     DX5   wanted |
                   |------------------------------------------------|
                1. | 45342   41513   51882    4660   45340        1 |
                2. | 41519    0389   99591    5750    1830        1 |
                3. | 41519   49390   29680   33829    7245        1 |
                4. | 41519   51881   45341   45342    4536        1 |
                5. | 41519   51882     340   33829   79311        1 |
                6. | 41519   51881   45342    5990    2851        1 |
                7. |  4162   51884     486     262    V667        0 |
                8. | 28529     486   41519   70703   70724        1 |
                9. |  2309   41519    5601    5180   99749        1 |
               10. | 59381   41519   59080   42820    4254        1 |
                   +------------------------------------------------+
              
              .

              Comment


              • #8
                Perfect Andrew. Thanks alot. I guess I didn't express my thoughts as clearly as possible

                Comment

                Working...
                X