Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to check if two observations are duplicates

    Dear all,
    I have a .dta with group variable, grp. In addition to grp, there are many other variables, say A-Z. There are duplicates, i.e., if you run

    by id, sort: gen n = _N

    n is occasionally 2. I want to check of such observations are duplicates in the sense that, for each such pair, each variable takes on at most one nonmissing value, i.e., it is NOT the case that

    a) A[1] and A[2] are both nonmissing and A[1]!=A[2], OR
    b) B[1] and B[2] are both nonmissing and B[1]!=B[2], OR
    ...
    z) Z[1] and Z[2] are both nonmissing and Z[1]!=Z[2].

    If so, I want to collapse such observations into one and record a missing value if both observations are missing, or the unique nonmissing value.

    Is there a way of doing it efficiently? I'd appreciate your thoughts. Thank you!

    Best,
    John

  • #2
    Code:
    // VERIFY NO CONFLICT OF NON-MISSING VALUES FOR VARIABLES
    foreach v of varlist A-Z {
        display as text "Checking `v' for conflicts"
        by id (`v'), sort: assert `v' == `v'[1] if !missing(`v')
    }
    
    //  NOW COLLAPSE TO SINGLE VALUE EPR id
    collapse (firstnm) A-Z, by(id)
    Last edited by Clyde Schechter; 24 Apr 2015, 20:21.

    Comment


    • #3
      Dear Clyde,
      Thank you so much for your help. Can I please ask you one more question (on your code)? I'm guessing your code assumes variables A-Z are numeric variables. Is there a way to modify the code s.t. if numeric, run what you have, if string, run with [_N] instead of [1]? I'd appreciate your answer! Thank you again for the lead!

      Best,
      John

      Comment


      • #4
        Ah, yes, for string variables the sort order is wrong:

        Code:
        foreach v of varlist A-Z {
            capture confirm numeric var `v'
            if c(rc) == 0 {
                by id (`v'), sort: assert `v' == `v'[1] if !missing(`v')
            }
            else {  // STRING VARIABLE
                by id (`v'), sort: assert `v' == `v'[_N] if !missing(`v')
            }
        }
        
        collapse (firstnm) A-Z, by(id) // WORKS FOR BOTH STRING & NUMERIC
        Last edited by Clyde Schechter; 24 Apr 2015, 22:32.

        Comment


        • #5
          Thank you so much! I learned new commands on the way :P

          Comment


          • #6
            you might also want to check -compobs- ; use search to find and install

            Comment

            Working...
            X