Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • keeping the first observation and the first outcome occurrence


    I have a data set that I would like your help in it
    The data is for general population how have been following up in a hospital ( 100,680 obs)

    Step 1 : There is some individuals who came only once and those I would like to drop them ( I think this is the code )
    bysort ID: drop if _N==1

    Step 2 : I want to keep the 1st observation when a disease had occur

    for example
    ID exam day DM
    1 20200601 0
    1 20201201 0
    1 20210201 1
    2 20190101 0
    2 20190202 0
    2 20190601 1
    2 20191001 0
    2 20200202 1
    3 20180606 1
    I would like to delete the 2nd row for ID 1, rows 2 4 5 for ID 2, and ID 3
    ID exam day DM
    1 20200601 0
    1 20210201 1
    2 20190101 0
    2 20190601 1
    waiting for your positive reply

  • #2
    The lack of replies here may arise from your not using dataex which as explained at https://www.statalist.org/forums/help#stata is especially important for date variables.

    Your date variable could be (a) a string variable (b) a long integer (c) a Stata daily date variable formatted as is. We just can't tell from your example. exam day is not a legal variable name

    This may help. See also https://www.stata.com/support/faqs/d...t-occurrences/

    Code:
    clear 
    input ID long exam_day    DM
    1    20200601    0
    1    20201201    0
    1    20210201    1
    2    20190101    0
    2    20190202    0
    2    20190601    1
    2    20191001    0
    2    20200202    1
    3    20180606    1
    end 
    
    gen date = daily(string(exam_day, "%8.0f"), "YMD")
    format date %td 
    
    bysort ID (date) : drop if _N == 1 
    bysort ID (date) : keep if sum(DM) == 1 
    bysort ID (date) : keep if _n == 1 
    
    list, sepby(ID)

    Comment

    Working...
    X