Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • deleting multiple occurrences of the same cases within a row

    Hello all,

    I have a dataset with variables called drug1-drug25. In them, the same drug type can be identified multiple times per row in the data. I need to keep the first occurrence of the drug type, but set all subsequent occurrences to missing. For example, in the data sample below, the first row has 3 "Codeine." I need to keep the first occurrence of "Codeine" and set the remaining two occurrences to "".

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str33 drug1 str34(drug2 drug3)
    "Codeine"         "Codeine"         "Codeine"        
    "6-MAM"           "6-MAM"           "6-MAM"          
    "Cocaine"         "Cocaine"         "Cocaine"        
    "Benzoylecgonine" "Benzoylecgonine" "Benzoylecgonine"
    "Codeine"         "Codeine"         "Codeine"        
    "Methamphetamine" "Methamphetamine" "Methamphetamine"
    "THC"             "THC"             "Oxycodone"      
    "Morphine"        "Morphine"        "Alprazolam"     
    "Cocaine"         "Cocaine"         "Oxycodone"      
    "Amphetamine"     "Morphine"        "Morphine"       
    "Methamphetamine" "Hydrocodone"     "Methamphetamine"
    "MDA"             "MDMA"            "Methamphetamine"
    "6-MAM"           "6-MAM"           "Morphine"       
    "Benzoylecgonine" "Benzoylecgonine" "Benzoylecgonine"
    "6-MAM"           "6-MAM"           "6-MAM"          
    end
    As always, any help with code would be greatly appreciated!

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str33 drug1 str34(drug2 drug3)
    "Codeine"         "Codeine"         "Codeine"        
    "6-MAM"           "6-MAM"           "6-MAM"          
    "Cocaine"         "Cocaine"         "Cocaine"        
    "Benzoylecgonine" "Benzoylecgonine" "Benzoylecgonine"
    "Codeine"         "Codeine"         "Codeine"        
    "Methamphetamine" "Methamphetamine" "Methamphetamine"
    "THC"             "THC"             "Oxycodone"      
    "Morphine"        "Morphine"        "Alprazolam"     
    "Cocaine"         "Cocaine"         "Oxycodone"      
    "Amphetamine"     "Morphine"        "Morphine"       
    "Methamphetamine" "Hydrocodone"     "Methamphetamine"
    "MDA"             "MDMA"            "Methamphetamine"
    "6-MAM"           "6-MAM"           "Morphine"       
    "Benzoylecgonine" "Benzoylecgonine" "Benzoylecgonine"
    "6-MAM"           "6-MAM"           "6-MAM"          
    end
    
    gen long obsno = _n
    reshape long drug, i(obsno) j(which)
    
    bysort obsno drug (which) : replace drug = "" if _n > 1
    drop if missing(drug)
    
    bysort obsno (which) : replace which = _n 
    
    reshape wide drug, i(obsno) j(which)
    
    list 
    
         +---------------------------------------------------------+
         | obsno             drug1         drug2             drug3 |
         |---------------------------------------------------------|
      1. |     1           Codeine                                 |
      2. |     2             6-MAM                                 |
      3. |     3           Cocaine                                 |
      4. |     4   Benzoylecgonine                                 |
      5. |     5           Codeine                                 |
         |---------------------------------------------------------|
      6. |     6   Methamphetamine                                 |
      7. |     7               THC     Oxycodone                   |
      8. |     8          Morphine    Alprazolam                   |
      9. |     9           Cocaine     Oxycodone                   |
     10. |    10       Amphetamine      Morphine                   |
         |---------------------------------------------------------|
     11. |    11   Methamphetamine   Hydrocodone                   |
     12. |    12               MDA          MDMA   Methamphetamine |
     13. |    13             6-MAM      Morphine                   |
     14. |    14   Benzoylecgonine                                 |
     15. |    15             6-MAM                                 |
         +---------------------------------------------------------+

    Comment


    • #3
      Thanks so much, Nick! As always, it worked perfectly and was exactly what i was looking to do. Greatly appreciate the work you all do on this forum as it saves me a lot of time at work!

      Comment


      • #4
        I believe that the same end can be accomplished by looping, which goes against the Stata-ish culture of "don't explicitly loop if you can make Stata do it implicitly [via -sort-]," but I offer that approach out of some fondness for the loop. In an example I cooked up with 25 drugs and 1,000 observations, explicit looping was several times faster than the -reshape- approach, a difference which likely doesn't matter for the current application, but which might for other similar problems. I'd also note that the way I did the looping leaves a data set in which the first occurrence of a drug remains in its original numerical drug variable, so that e.g. in observation 9, "Oxycodone" first appears as drug3 and remains so. This may well not matter to Nick Richardson, but might in other situations. I suspect that the -reshape- approach might be modified to incorporate this feature, but I didn't try.

        Code:
        local top = 3  // would be 25 in the real application.
        // A canonical nested loop to examine all the pairs.
        forval i = 1/`=`top' -1' {
           forval j = `=`i' + 1'/`top' {
              quiet replace drug`j' = "" if (drug`j' == drug`i')
           }
        }
        list
        
             +-------------------------------------------------+
             |           drug1         drug2             drug3 |
             |-------------------------------------------------|
          1. |         Codeine                                 |
          2. |           6-MAM                                 |
          3. |         Cocaine                                 |
          4. | Benzoylecgonine                                 |
          5. |         Codeine                                 |
             |-------------------------------------------------|
          6. | Methamphetamine                                 |
          7. |             THC                       Oxycodone |
          8. |        Morphine                      Alprazolam |
          9. |         Cocaine                       Oxycodone |
         10. |     Amphetamine      Morphine                   |
             |-------------------------------------------------|
         11. | Methamphetamine   Hydrocodone                   |
         12. |             MDA          MDMA   Methamphetamine |
         13. |           6-MAM                        Morphine |
         14. | Benzoylecgonine                                 |
         15. |           6-MAM                                 |
             +-------------------------------------------------+

        Comment


        • #5
          Mike makes an excellent point. Although a loop over observations is typically shunned, might it still be faster or simpler?

          Here's yet another approach:

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str33 drug1 str34(drug2 drug3)
          "Codeine"         "Codeine"         "Codeine"        
          "6-MAM"           "6-MAM"           "6-MAM"          
          "Cocaine"         "Cocaine"         "Cocaine"        
          "Benzoylecgonine" "Benzoylecgonine" "Benzoylecgonine"
          "Codeine"         "Codeine"         "Codeine"        
          "Methamphetamine" "Methamphetamine" "Methamphetamine"
          "THC"             "THC"             "Oxycodone"      
          "Morphine"        "Morphine"        "Alprazolam"     
          "Cocaine"         "Cocaine"         "Oxycodone"      
          "Amphetamine"     "Morphine"        "Morphine"       
          "Methamphetamine" "Hydrocodone"     "Methamphetamine"
          "MDA"             "MDMA"            "Methamphetamine"
          "6-MAM"           "6-MAM"           "Morphine"       
          "Benzoylecgonine" "Benzoylecgonine" "Benzoylecgonine"
          "6-MAM"           "6-MAM"           "6-MAM"          
          end
          
          quietly mata : 
          
          data = st_sdata(., "drug1 drug2 drug3") 
          names = ("new1", "new2", "new3")
          
          for(i = 1; i <= rows(data); i++) { 
              row = uniqrows(data[i,]')        
              
              data[i,] = row', J(1, 3 - length(row), "")
          } 
          
          st_addvar("str80", names) 
          st_sstore(., names, data) 
              
          
          end 
          
          list new* 
          
               +-----------------------------------------------------+
               |            new1              new2              new3 |
               |-----------------------------------------------------|
            1. |         Codeine                                     |
            2. |           6-MAM                                     |
            3. |         Cocaine                                     |
            4. | Benzoylecgonine                                     |
            5. |         Codeine                                     |
               |-----------------------------------------------------|
            6. | Methamphetamine                                     |
            7. |       Oxycodone               THC                   |
            8. |      Alprazolam          Morphine                   |
            9. |         Cocaine         Oxycodone                   |
           10. |     Amphetamine          Morphine                   |
               |-----------------------------------------------------|
           11. |     Hydrocodone   Methamphetamine                   |
           12. |             MDA              MDMA   Methamphetamine |
           13. |           6-MAM          Morphine                   |
           14. | Benzoylecgonine                                     |
           15. |           6-MAM                                     |
               +-----------------------------------------------------+

          Comment

          Working...
          X