I have a dataset of patients, each with a number of samples collected on different days. One of the samples from each patient is labelled as the reference sample with which the others are to be compared.
But as you can see from the example, some of the samples are not from the same location as the patient's reference sample, and relevant samples must be from the same location.
I want to discard the irrelevant samples from the dataset. But how can I do that? I imagine some bysort: egen.. procedure, perhaps using _n and _N, but I can't figure out how. Thanks for any help!
Example dataset
But as you can see from the example, some of the samples are not from the same location as the patient's reference sample, and relevant samples must be from the same location.
I want to discard the irrelevant samples from the dataset. But how can I do that? I imagine some bysort: egen.. procedure, perhaps using _n and _N, but I can't figure out how. Thanks for any help!
Example dataset
Code:
patient_id sample_no. location reference_sample #1 1 1 0 #1 2 2 0 #1 3 2 1 #1 4 1 0 #2 1 2 0 #2 2 2 1 #2 3 1 0 #3 1 1 0 #3 2 1 1 #3 3 1 0 #3 4 2 0 #4 1 2 1 #4 2 1 0 #4 3 2 0
Comment