Dear all,
I have a .dta with group variable, grp. In addition to grp, there are many other variables, say A-Z. There are duplicates, i.e., if you run
by id, sort: gen n = _N
n is occasionally 2. I want to check of such observations are duplicates in the sense that, for each such pair, each variable takes on at most one nonmissing value, i.e., it is NOT the case that
a) A[1] and A[2] are both nonmissing and A[1]!=A[2], OR
b) B[1] and B[2] are both nonmissing and B[1]!=B[2], OR
...
z) Z[1] and Z[2] are both nonmissing and Z[1]!=Z[2].
If so, I want to collapse such observations into one and record a missing value if both observations are missing, or the unique nonmissing value.
Is there a way of doing it efficiently? I'd appreciate your thoughts. Thank you!
Best,
John
I have a .dta with group variable, grp. In addition to grp, there are many other variables, say A-Z. There are duplicates, i.e., if you run
by id, sort: gen n = _N
n is occasionally 2. I want to check of such observations are duplicates in the sense that, for each such pair, each variable takes on at most one nonmissing value, i.e., it is NOT the case that
a) A[1] and A[2] are both nonmissing and A[1]!=A[2], OR
b) B[1] and B[2] are both nonmissing and B[1]!=B[2], OR
...
z) Z[1] and Z[2] are both nonmissing and Z[1]!=Z[2].
If so, I want to collapse such observations into one and record a missing value if both observations are missing, or the unique nonmissing value.
Is there a way of doing it efficiently? I'd appreciate your thoughts. Thank you!
Best,
John
Comment