Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify rows where ID number and date of diagnosis the same. Then, keep row only if diagnosis variable is positive

    Dear Stata Forum, I am using a national diabetes registry-based data. First, I merged together diabetic retinopathy daa from 3 different datasets. Each data set had an ID numbers, a diagnosis code (which I transformed into numeric codes... 1 = positive and 0 = negative) and a date of diagnosis. After merging the datasets, I am noticing that for the same ID number and same date of diagnosis, I am have two rows of data because different datasets provided a different (positive vs negative) diagnosis. For each unique ID number, I need there to be a single row of variable results to do my analysis. I currently have n=768 rows of data but I need to only have n=693 rows (codebook says there are n=693 unique ID numbers). My goal is to write code to identify the rows of data where the ID number and date of diagnosis the exact same. Then, for these cases, I want to keep the row which has a positive diagnosis.... and drop the row which has a negative diagnosis. How should I write the code?

    retinopathy_dx_status is coded 0 = no diagnosis, 1 = non-proliferative diabetic retinopathy, 2 = proliferative diabetic retinopathy
    any_eye_dx_status is coded in a binary way so 0 = no diagnosis, 1 = any positive diagnosis

    a. I want to preferentially keep the row with a positive diagnosis or any_eye_dx_status = 1
    b. I want to preferentially keep the row with retinopathy_dx_status = 2 if any_eye_dx_status = 1 in both rows

    [CODE]
    * Example generated by -dataex-
    clear
    input double PNR_numeric float (retinopathy_dx_status any_eye_dx_status EYE_dx_date)
    0000000000 0 0 19479
    1111111111 0 0 19103
    2222222222 0 0 19094
    3333333333 0 0 19248
    4444444444 0 0 20317
    4444444444 1 1 20317

    5555555555 0 0 20290
    6666666666 0 0 21712
    7777777777 0 0 20968
    7777777777 1 1 20968

    8888888888 0 0 21584
    9999999999 2 1 21690
    9999999999 1 1 21690
    Last edited by Kevin Marks; 13 Dec 2022, 06:43.

  • #2
    EDITED: With this edit:


    retinopathy_dx_status is coded 0 = no diagnosis, 1 = non-proliferative diabetic retinopathy, 2 = proliferative diabetic retinopathy
    any_eye_dx_status is coded in a binary way so 0 = no diagnosis, 1 = any positive diagnosis

    a. I want to preferentially keep the row with a positive diagnosis or any_eye_dx_status = 1
    b. I want to preferentially keep the row with retinopathy_dx_status = 2 if any_eye_dx_status = 1 in both rows
    I think

    Code:
    bys PNR_numeric EYE_dx_date (any_eye_dx_status retinopathy_dx_status): keep if _n==_N
    will give you what you want.

    Last edited by Andrew Musau; 13 Dec 2022, 07:29.

    Comment


    • #3
      Thank you Andre Musau!

      Comment

      Working...
      X