Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting unique values in a series of string variables.

    Given the example data below, I'm trying to count the number of unique drugs per patient ID. For example, patient 9 would have a unique drug count of 5, and patient 68 would have a count of 3.

    I've been attempting to adapt some code elsewhere (https://www.stata.com/support/faqs/d...tinct-strings/), such that the counts are provided for each ID rather than each row, but I'm struggling. Any advice would be hugely appreciated.

    Code:
    clear
    input float id str16(regimen_drugs_1 regimen_drugs_2) str11 regimen_drugs_3
     1 ""                 ""                 ""          
     2 ""                 ""                 ""          
     3 ""                 ""                 ""          
     4 ""                 ""                 ""          
     5 ""                 ""                 ""          
     6 ""                 ""                 ""          
     7 ""                 ""                 ""          
     8 ""                 ""                 ""          
     9 "CARBOPLATIN"      "ETOPOSIDE"        ""          
     9 "CYCLOPHOSPHAMIDE" "DOXORUBICIN"      "VINCRISTINE"
    10 ""                 ""                 ""          
    11 ""                 ""                 ""          
    12 ""                 ""                 ""          
    13 ""                 ""                 ""          
    14 ""                 ""                 ""          
    15 ""                 ""                 ""          
    16 "GEMCITABINE"      ""                 ""          
    17 "CYCLOPHOSPHAMIDE" "DOXORUBICIN"      "VINCRISTINE"
    17 "EPIRUBICIN"       "CYCLOPHOSPHAMIDE" ""          
    18 ""                 ""                 ""          
    19 ""                 ""                 ""          
    20 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    21 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    21 "CYCLOPHOSPHAMIDE" "DOXORUBICIN"      "VINCRISTINE"
    22 ""                 ""                 ""          
    23 ""                 ""                 ""          
    24 ""                 ""                 ""          
    25 ""                 ""                 ""          
    26 ""                 ""                 ""          
    27 ""                 ""                 ""          
    28 ""                 ""                 ""          
    29 ""                 ""                 ""          
    30 ""                 ""                 ""          
    31 ""                 ""                 ""          
    32 ""                 ""                 ""          
    33 ""                 ""                 ""          
    33 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    33 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    34 ""                 ""                 ""          
    35 ""                 ""                 ""          
    36 ""                 ""                 ""          
    37 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    37 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    38 ""                 ""                 ""          
    39 ""                 ""                 ""          
    40 ""                 ""                 ""          
    41 ""                 ""                 ""          
    42 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    42 "CISPLATIN"        "ETOPOSIDE"        ""          
    43 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    44 ""                 ""                 ""          
    45 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    45 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    46 ""                 ""                 ""          
    47 ""                 ""                 ""          
    48 ""                 ""                 ""          
    49 ""                 ""                 ""          
    50 ""                 ""                 ""          
    51 ""                 ""                 ""          
    52 ""                 ""                 ""          
    53 ""                 ""                 ""          
    54 ""                 ""                 ""          
    55 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    56 ""                 ""                 ""          
    57 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    57 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    58 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    59 ""                 ""                 ""          
    60 ""                 ""                 ""          
    61 ""                 ""                 ""          
    62 ""                 ""                 ""          
    63 ""                 ""                 ""          
    64 ""                 ""                 ""          
    65 ""                 ""                 ""          
    66 ""                 ""                 ""          
    67 ""                 ""                 ""          
    68 "PAZOPANIB"        ""                 ""          
    68 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    68 "CARBOPLATIN"      ""                 ""          
    68 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    69 ""                 ""                 ""          
    70 "CARBOPLATIN"      "ETOPOSIDE"        ""          
    71 ""                 ""                 ""          
    72 ""                 ""                 ""          
    73 ""                 ""                 ""          
    74 ""                 ""                 ""          
    75 ""                 ""                 ""          
    76 ""                 ""                 ""          
    77 ""                 ""                 ""          
    78 ""                 ""                 ""          
    79 ""                 ""                 ""          
    80 "CARBOPLATIN"      ""                 ""          
    81 ""                 ""                 ""          
    82 ""                 ""                 ""          
    83 "CISPLATIN"        "ETOPOSIDE"        ""          
    84 ""                 ""                 ""          
    85 ""                 ""                 ""          
    86 ""                 ""                 ""          
    87 ""                 ""                 ""          
    88 ""                 ""                 ""          
    end

    Last edited by Craig Knott; 23 Jul 2018, 06:59.

  • #2
    I have concatenated the variables into a single string variable, and then concatenated all of the rows into a single space delimited string per ID (see below).

    All I need to do now is find a way of creating a numeric variable that provides a count of distinct words in the string 'regimen_drugs_combined'. I have tried -rowsvals-, but this just codes each row with the value of 1. It's possible I'm using it wrong.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str200 regimen_drugs_combined
     1 "   "                                                                  
     2 "   "                                                                  
     3 "   "                                                                  
     4 "   "                                                                  
     5 "   "                                                                  
     6 "   "                                                                  
     7 "   "                                                                  
     8 "   "                                                                  
     9 "CARBOPLATIN ETOPOSIDE CYCLOPHOSPHAMIDE DOXORUBICIN VINCRISTINE  "      
     9 "CYCLOPHOSPHAMIDE DOXORUBICIN VINCRISTINE   "                          
    10 "   "                                                                  
    11 "   "                                                                  
    12 "   "                                                                  
    13 "   "                                                                  
    14 "   "                                                                  
    15 "   "                                                                  
    16 "GEMCITABINE   "                                                        
    17 "CYCLOPHOSPHAMIDE DOXORUBICIN VINCRISTINE EPIRUBICIN CYCLOPHOSPHAMIDE  "
    17 "EPIRUBICIN CYCLOPHOSPHAMIDE   "                                        
    18 "   "                                                                  
    19 "   "                                                                  
    20 "CARBOPLATIN ETOPOSIDE   "                                              
    21 "CYCLOPHOSPHAMIDE DOXORUBICIN VINCRISTINE CARBOPLATIN ETOPOSIDE  "      
    21 "CARBOPLATIN ETOPOSIDE   "                                              
    22 "   "                                                                  
    23 "   "                                                                  
    24 "   "                                                                  
    25 "   "                                                                  
    26 "   "                                                                  
    27 "   "                                                                  
    28 "   "                                                                  
    29 "   "                                                                  
    30 "   "                                                                  
    31 "   "                                                                  
    32 "   "                                                                  
    33 "CARBOPLATIN ETOPOSIDE  CARBOPLATIN ETOPOSIDE "                        
    33 " CARBOPLATIN ETOPOSIDE  "                                              
    33 "CARBOPLATIN ETOPOSIDE   "                                              
    34 "   "                                                                  
    35 "   "                                                                  
    36 "   "                                                                  
    37 "CARBOPLATIN ETOPOSIDE CARBOPLATIN ETOPOSIDE  "                        
    37 "CARBOPLATIN ETOPOSIDE   "                                              
    38 "   "                                                                  
    39 "   "                                                                  
    40 "   "                                                                  
    41 "   "                                                                  
    42 "CARBOPLATIN ETOPOSIDE CISPLATIN ETOPOSIDE  "                          
    42 "CISPLATIN ETOPOSIDE   "                                                
    43 "CARBOPLATIN ETOPOSIDE   "                                              
    44 "   "                                                                  
    45 "CARBOPLATIN ETOPOSIDE CARBOPLATIN ETOPOSIDE  "                        
    45 "CARBOPLATIN ETOPOSIDE   "                                              
    46 "   "                                                                  
    47 "   "                                                                  
    48 "   "                                                                  
    49 "   "                                                                  
    50 "   "                                                                  
    51 "   "                                                                  
    52 "   "                                                                  
    53 "   "                                                                  
    54 "   "                                                                  
    55 "CARBOPLATIN ETOPOSIDE   "                                              
    56 "   "                                                                  
    57 "CARBOPLATIN ETOPOSIDE CARBOPLATIN ETOPOSIDE  "                        
    57 "CARBOPLATIN ETOPOSIDE   "                                              
    58 "CARBOPLATIN ETOPOSIDE   "                                              
    59 "   "                                                                  
    60 "   "                                                                  
    61 "   "                                                                  
    62 "   "                                                                  
    63 "   "                                                                  
    64 "   "                                                                  
    65 "   "                                                                  
    66 "   "                                                                  
    67 "   "                                                                  
    68 "CARBOPLATIN ETOPOSIDE CARBOPLATIN ETOPOSIDE CARBOPLATIN PAZOPANIB"    
    68 "CARBOPLATIN ETOPOSIDE CARBOPLATIN PAZOPANIB "                          
    68 "CARBOPLATIN PAZOPANIB  "                                              
    68 "PAZOPANIB   "                                                          
    69 "   "                                                                  
    70 "CARBOPLATIN ETOPOSIDE   "                                              
    71 "   "                                                                  
    72 "   "                                                                  
    73 "   "                                                                  
    74 "   "                                                                  
    75 "   "                                                                  
    76 "   "                                                                  
    77 "   "                                                                  
    78 "   "                                                                  
    79 "   "                                                                  
    80 "CARBOPLATIN   "                                                        
    81 "   "                                                                  
    82 "   "                                                                  
    83 "CISPLATIN ETOPOSIDE   "                                                
    84 "   "                                                                  
    85 "   "                                                                  
    86 "   "                                                                  
    87 "   "                                                                  
    88 "   "                                                                  
    end
    Last edited by Craig Knott; 23 Jul 2018, 08:21.

    Comment


    • #3
      Here's an option based on your first post (without concatenating the regimens):

      Code:
      sort id
      gen new_id=_n
      reshape long regimen_drugs_, i(new_id) j(reg)
      egen tag=tag(id regimen_drugs_)
      bysort id: egen num_drug=sum(tag)
      drop tag
      reshape wide
      drop new_id
      order id, first
      list, sepby(id)
      Stata/MP 14.1 (64-bit x86-64)
      Revision 19 May 2016
      Win 8.1

      Comment


      • #4
        This also works, again using the exmaple in post no1:

        Code:
        levelsof id, local(ids)
        gen drug_ct_all=.
        foreach idno in `ids'{
            levelsof regimen_drugs_1 if id==`idno', local(drug1)
            levelsof regimen_drugs_2 if id==`idno', local(drug2)
            levelsof regimen_drugs_3 if id==`idno', local(drug3)
            local alldrugs `drug1' `drug2' `drug3'
            local no_drugs : word count `alldrugs'
            replace drug_ct_all=`no_drugs' if id==`idno'
        }

        Comment


        • #5
          See also the -egen- function -rowsvals()- from -egenmore- (SSC). Discussed in the 2009 article "Rowwise" in the Stata Journal. Search also for mentions here.

          Comment


          • #6
            With regards to post #4, that one actually doesn't properly take into account duplicate words found in different variables. Instead you could use this:

            Code:
            levelsof id, local(ids)
            gen drug_ct_all=.
            foreach idno in `ids'{
                levelsof regimen_drugs_1 if id==`idno', local(drug1)
                levelsof regimen_drugs_2 if id==`idno', local(drug2)
                levelsof regimen_drugs_3 if id==`idno', local(drug3)
                local tempdrugs12: list drug1 | drug2
                local alldrugs : list tempdrugs12 | drug3
                local no_drugs : word count `alldrugs'
                replace drug_ct_all=`no_drugs' if id==`idno'
            }

            Comment


            • #7
              Thank you all for your input. Always interesting to see different approaches to the same problem. The issue is now resolved thanks to your help.

              Hopefully others will also find this thread of value in future.

              Comment

              Working...
              X