Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need Help to Remove Duplicates

    Hi Everyone,

    Please, I need help to remove duplicates in Stata. I have decided to post it again because i didnt do a good job explaining what the problem is in the original post. I have dataset which contains over 200 observations. What I want to do is to remove the duplicates and retain a unique value for each observation. For the same observations with Yes and Null, I want to keep the Yes. For same observations with No and Null values, I want to keep the No. For same observations with only null, I want to keep just 1 null.
    ID Number Value
    D711 null
    D711 null
    D711 null
    D711 Yes
    D714 No
    D714 null
    D714 null
    D715 Yes
    D715 null
    D722 null
    D722 null
    D729 No
    D729 null
    D722 Yes
    D723 null
    D723 null
    D723 null
    D728 null
    D728 null
    D728 null

  • #2
    Is "Value" a string variable? If so, you just have to keep the first observation for each ID after sorting.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str4(idnumber value)
    "D711" "null"
    "D711" "null"
    "D711" "null"
    "D711" "Yes"
    "D714" "No"  
    "D714" "null"
    "D714" "null"
    "D715" "Yes"
    "D715" "null"
    "D722" "null"
    "D722" "null"
    "D729" "No"  
    "D729" "null"
    "D722" "Yes"
    "D723" "null"
    "D723" "null"
    "D723" "null"
    "D728" "null"
    "D728" "null"
    "D728" "null"
    end
    
    bys idnumber (value): keep if _n==1
    Res.:

    Code:
    . l, sep(0)
    
         +------------------+
         | idnumber   value |
         |------------------|
      1. |     D711     Yes |
      2. |     D714      No |
      3. |     D715     Yes |
      4. |     D722     Yes |
      5. |     D723    null |
      6. |     D728    null |
      7. |     D729      No |
         +------------------+
    For such a problem, it is important to present a data example using dataex as details on the variable type matter. See FAQ Advice #12.

    Comment


    • #3
      This question is fairly straightforward if we can assume that all values of the variable value are coded as Yes, No, or null, with no typographical variations. Since text data is often unreliable in this way, the first command below verifies this assumption. There is also the question of what to do if some idnumber has both a Yes and a No response among his/her observations. On the assumption that this type of contradictory response is unacceptable, the third command verifies that this does not occur (or aborts with an error message if it does). Finally the last command retains the one desired observation.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str5 idnumber str4 value
      "D711 " "null"
      "D711 " "null"
      "D711 " "null"
      "D711 " "Yes"
      "D714 " "No"  
      "D714 " "null"
      "D714 " "null"
      "D715 " "Yes"
      "D715 " "null"
      "D722 " "null"
      "D722 " "null"
      "D729 " "No"  
      "D729 " "null"
      "D722 " "Yes"
      "D723 " "null"
      "D723 " "null"
      "D723 " "null"
      "D728 " "null"
      "D728 " "null"
      "D728 " "null"
      end
      
      
      assert inlist(value, "null", "Yes", "No")
      gen byte preference = inlist(value, "Yes", "No")
      by idnumber preference, sort: assert value[1] == value[_N]
      by idnumber (preference): keep if _n == _N
      In your earlier post on this topic, I asked you to use the -dataex- command to show your example data. Because you did not do this, I cannot be sure that value is actually, as I have assumed for the code, a string variable. If it is not, then the code will produce only error messages and we will have both wasted our time. In the future, please help those who try to help you: use the -dataex- command, and no other means, for showing example data. It is the only way to assure that all of the necessary information about the data is provided in a way that can be used to develop and test code.

      Added: Crossed with #2.

      Comment


      • #4
        Thank you Clyde and Andrew. The codes worked.

        Comment

        Working...
        X