Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to remove the first observation with a certain value and observations that occurred before that date

    Hello,

    I have a long dataset that includes many visits (oftentimes many visits within one person) and whether or not the visit was completed (vis_complete). I also created a variable for the nth visit (visitn). Before conducting my analysis, I need to remove the first completed visit as well as any visits that occurred before the first completed visit (shown by variable "date"). I am having trouble figuring out how to identify these so I can remove them. I have included a dataex example below. Here, you can see that the person with id=3 completed their first visit, so I would only want to remove their first visit (visitn=1). For person with id=6, whose first completed visit was their fourth visit (visitn=4), I would want to remove this visit as well as visits 1,2 and 3. Any help would be much appreciated!

    Thank you!

    Sarah



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id byte vis_complete float(visitn date)
     3 1  1 17959
     3 1  2 18330
     6 0  1 17910
     6 1 10 18540
     6 1 23 19607
     6 0  2 18001
     6 0 14 18911
     6 1 18 19226
     6 0 20 19324
     6 0 17 19207
     6 1 15 19002
     6 0 16 19128
     6 0  3 18057
     6 1  8 18379
     6 0 13 18820
     6 0  9 18470
     6 0 24 19698
     6 0 22 19579
     6 0 11 18659
     6 0  7 18274
     6 0  6 18183
     6 1 12 18701
     6 1 21 19436
     6 0  5 18162
     6 0 19 19228
     6 0 25 19702
     6 1  4 18078
     7 1  1 17976
     7 1  3 18270
     7 1  7 18641
     7 1  9 18935
     7 1  4 18389
     7 0  6 18638
     7 0 13 19436
     7 1  8 18816
     7 1  2 18099
     7 1 12 19317
     7 1 11 19198
     7 1 14 19443
     7 1 10 19061
     7 1  5 18515
     8 1  1 18428
     8 1  7 18893
     8 1 11 19305
     8 1  4 18725
     8 1 10 19226
     8 1  2 18514
     8 1  8 19016
     8 0  6 18885
     8 1  9 19107
     8 0  5 18815
     8 1  3 18634
     9 1  1 17917
    15 1  1 17924
    15 1  4 18773
    15 0  2 18106
    15 1  3 18339
    17 0  1 18176
    17 1  8 18640
    17 1  4 18421
    17 1 15 19338
    17 1  2 18211
    17 1 12 18911
    17 1 16 19401
    17 1 17 19492
    17 0  7 18605
    17 0 18 19571
    17 0  9 18731
    17 1  3 18323
    17 1  6 18514
    17 1 11 18813
    17 1 10 18745
    17 1 13 19093
    17 0 14 19282
    17 0  5 18507
    17 1 19 19583
    18 0  1 18388
    18 1  2 18507
    19 1  1 18773
    19 0  5 19144
    19 1  3 19023
    19 0  2 18893
    19 0  4 19142
    20 1  1 18094
    20 0  4 18302
    20 1  3 18204
    20 0  2 18190
    21 1  1 17966
    21 0  8 18974
    21 0  7 18589
    21 0  2 18088
    21 1  6 18435
    21 0  9 19009
    21 1 14 19607
    21 1  3 18127
    21 1  4 18246
    21 0  5 18428
    21 0 10 19240
    21 1 12 19481
    21 1 11 19383
    end
    format %td date
    label values vis_complete vis_complete
    label def vis_complete 0 "NS/cancel", modify
    label def vis_complete 1 "Completed", modify
    ------------------ copy up to and including the previous line ------------------

  • #2
    This example code seems to produce the result you want on your example data.
    Code:
    sort id visitn
    by id (visitn): generate ncomplete = sum(vis_complete)
    drop if ncomplete==0 | ncomplete==1 & vis_complete==1
    list id visitn date if id!=id[_n-1], separator(0)
    Code:
    . list id visitn date if id!=id[_n-1], separator(0)
    
         +-------------------------+
         | id   visitn        date |
         |-------------------------|
      1. |  3        2   09mar2010 |
      2. |  6        5   22sep2009 |
     23. |  7        2   21jul2009 |
     36. |  8        2   09sep2010 |
     46. | 15        2   28jul2009 |
     49. | 17        3   02mar2010 |
     66. | 19        2   23sep2011 |
     70. | 20        2   20oct2009 |
     73. | 21        2   10jul2009 |
         +-------------------------+

    Comment


    • #3
      Code:
      by id (visitn), sort: egen first_completed_visit = min(cond(vis_complete, visitn, .))
      drop if visitn <= first_completed_visit
      Added: Crossed with #2, which shows a different approach.

      Comment

      Working...
      X