Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tagging duplicates (ok) BUT only keeping one line of information


    Hello, I’ve made some progress over the course of the day and would like some help. Dataset found below:


    I have tagged the rows that have duplicate information for each id in terms of oks and the components of another scar (pain and scar).

    For duplicates >0 I would like to keep one row of information.


    Therefore for ID 13, I would just like to keep one row, this will be left in the dataset with ID12


    If I use


    Code:
    /Problem: Drop the one of these duplicate lines ****
    
    drop if duplicatescores == 1 // drops all
    
    keep if duplicatescores ==1 //drops ID12


    This will drop or keep all the rows…


    Is there another alternative solution where I just keep one row of data for ID = 13 and ID12 is left with the dataset







    Dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    
    clear
    
    input float id str2 id2 float(pain scar surgdate oks gender preop_date recpreop_date postop_date recpostop_date operationdate) byte(_merge duplicatescores)
    
    12 "1A" 1 1 22555 40 1 22533 22289 22770 22771 22555 3 0
    
    13 "2B" 1 9 11962  1 2 11933 11933 11963 11782 11962 3 0
    
    13 "2B" 1 9 11962 48 2 11932 11951 11963 11782 11962 3 1
    
    13 "2B" 1 9 11962 48 2 11932     .     .     .     . . 1
    
    end
    
    format %td surgdate
    
    format %td preop_date
    
    format %td recpreop_date
    
    format %td postop_date
    
    format %td recpostop_date
    
    format %td operationdate
    
    label values _merge _merge
    
    label def _merge 3 "Matched (3)", modify

  • #2
    Code:
    h duplicates
    especially "duplicates drop"

    Comment


    • #3
      Hmm ! Didn't work as I wanted to...


      Current dataset - I would expect Stata to keep the UNIQUE values and KEEP the first row of duplicates. So I would have expected Stata to keep ID 12 and the red boxes from ID 13
      Click image for larger version

Name:	duplicatesdrop.jpg
Views:	1
Size:	44.5 KB
ID:	1730312



      When I use the code:

      Code:
      //For those with duplicates in the score components
      duplicates tag (oks pain scar), gen(duplicatescores)
      duplicates drop duplicatescores, force
      Click image for larger version

Name:	Screenshot 2023-10-15 at 15.53.57.png
Views:	1
Size:	61.7 KB
ID:	1730313



      It drops the Unique Value of ID 13 where duplicatesscores = 0 and keeps the first row of duplicates (which is what I want)

      Is there a way to do both ? i.E Keep the unique values & keep the first row of duplicates?

      I tried this:

      Code:
      keep if duplicatescores == 0 & duplicates drop duplicatesscores, force
      // This didn't work

      Comment

      Working...
      X