Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping duplicates by date

    Hello!

    I have a data set with appointments at a dental clinic over a period of 2 months (1st of may to 1st of july). Some patients attend the clinic more than once during this period and thus occur several times in the data set. For each patient occuring more than once, I want to keep the first appointment and drop all other appointments of that patient. If a patient attends the clinic the 1st of may, the 2nd of may and the 3rd of may i want to drop the two latter observations and keep the first observation.

    Using:

    duplicates drop patient_id, force

    will drop all duplicates. But i cannot see if it drops them in the manner i want?

    If i sort the observations by date and execute the above command, will I achieve the desired results?

    best regards,

  • #2
    Jonathan:
    you may want to try something along the following lines:
    Code:
    bysort idcode (date): keep if _n==1
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Jonathan

      best to do this would be to use collapse (min) datevar, by(patient_id)

      this will give u a data set with minimum date for each patient
      once you have run this you may need to format the variable as a date again
      use format %td datevar




      thanks

      Anesh

      Comment


      • #4
        Anesh's proposal I think misses a key point that it's entire observations that are to be kept.

        Carlo's proposal is I think on target.

        duplicates isn't the solution of choice here. You should prefer it to be explicit in your code that you keep the first visit for each person.

        Comment

        Working...
        X