Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to keep >2 observations over 2-years

    I wonder if anyone would be so kind as to help me with the appropriate syntax.

    I have a data set detailing the diastolic blood pressure, systolic blood pressure and the date they were recorded - linked to a unique ID.

    I would like to clean this file up and keep only the unique ID's with ≥ 2 recordings within a 2-year consecutive period.

    (The actual file contains 72m observations spanning over ~12 years, but for the purpose of statalist have created a dummy of 20 observations)

    Thank you.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str200 id long(diastolic_bp systolic_bp date)
    "a1"  90 148 15896
    "a1"  90 160 15725
    "a1"  98 168 15719
    "a2"  90 150 14692
    "a2"  90 150 15190
    "a3"  90 140 15168
    "a3"  90 150 14250
    "a3"  90 150 14962
    "a4"  90 190 15809
    "a5" 100 150 15953
    "a5" 100 160 15896
    "a5"  90 140 14935
    "a5"  90 150 15460
    "a5"  90 150 15474
    "a5"  90 160 16055
    "a5"  94 140 15434
    "a6"  90 150 14657
    "a6"  92 158 15558
    "a6"  94 150 15319
    "a7" 100 150 15658
    end
    format %d date

  • #2
    Code:
    by id (date), sort: gen byte keeper = datediff_frac(date[_n+1], date, "y") <= 2
    by id (keeper), sort: keep if keeper[_N]
    sort id date
    I have assumed that "within a 2 year period" includes exactly two years. Change <= to < if that is not your intention.

    Added: In your example data, all the data shown have consecutive observations within two years--the only id's that get excluded are those that are singletons. But this code should properly exclude any where there are multiple observations but always separated by more than 2 years. Also, your title suggests you want > 2 observations within the two year period, but the text of the post says >= 2. I've gone with >= 2 because it's a bit easier to code. If that's not what you meant, please post back and I'll provide code that keeps only id's with > 2 such observations.
    Last edited by Clyde Schechter; 21 Oct 2021, 15:32.

    Comment


    • #3
      Hi Clyde,

      Thank you. This is amazing and worked perfectly, and yes >= 2 observations in at least 2 years. So your code needs no changing!

      I was not aware of -datediff_frac- it will come in handy when I move on to survival/time to event model in a few months

      Comment

      Working...
      X