Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • duplicates drop - dropping more than the duplicates

    Hello I am working on a remote dataset.

    As seen in the picture
    Click image for larger version

Name:	thumbnail_IMG_72572.jpg
Views:	1
Size:	521.0 KB
ID:	1730411





    Do you happen to know what I'm doing wrong.
    I tried -duplicates drop- on my personal computer with this dummy dataset

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id str2 id2 float(pessary injection) byte test
    12 "1A" 1 1 0
    13 "2B" 1 9 1
    13 "2B" 1 9 .
    13 "2B" 1 4 .
    end

    Code:
    duplicates tag (pessary injection id id2), gen(test2)
    duplicates drop test, force
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id str2 id2 float(pessary injection) byte(test test2)
    12 "1A" 1 1 0 0
    13 "2B" 1 9 1 1
    13 "2B" 1 9 . 1
    13 "2B" 1 4 . 0
    end

    As you can see I would like to keep the Bold ones, but instead it drops the last observation for 13 (Unique value) and as request 1 of the duplicates.
    As you can see I tried this so you can understand what I want... but of course & isnt allowed

    duplicates drop test, force & keep if test2 == 0

    Do you know what I'm doing wrong?

    I just want to keep the first row for each duplicate therefore converting them to unique values . I can manualy do this, but just wondering out of curiosity what im doing wrong
    Last edited by Denise Vella; 16 Oct 2023, 07:45.

  • #2
    List and tag the duplicates first before marking them for deletion. How will Stata know what corresponds to a duplicates? There needs to be an ID variable which identifies unique observations before determining if there is a duplicate within the ID. Start with -duplicates list- then duplicates tag

    Comment


    • #3
      I have tagged them as you ahve seen in the code
      duplicates tag (pessary injection id id2), gen(test2)

      Comment


      • #4
        duplicate drop dorps exactly what you ask for: duplicates in terms of scoresid. There are 281,097 observations with scoresid == 0; thus, 281,096 duplicates. Likewise, there are 18 observations with scoresid == 1; thus, 17 duplicates. Add that together to get 281,113 duplicates that get dropped.

        If you want something else, you need to tell Stata (or duplicates) what it is you want.

        Comment


        • #5
          Code:
          duplicates drop id id2 pessary injection , force
          does what you ask for your example data. Whether that is what you really want, I cannot tell for sure.

          Comment

          Working...
          X