Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replace an observations if one date variable is after another date variable

    Hi all,

    Can anyone please assist with the following problem?

    I am currently working on a hospital admissions and primary care dataset for patients with chronic rheumatic diseases (e.g.: rheumatoid arthritis (RA) , lupus (SLE), vasculitis (SV)..etc)
    Most of the patients have had multiple diagnoses over time (e.g. patient A was diagnosed with Lupus on 01 June 2008 and then was later diagnosed with vasculitis (05 August 2010) and finally was confirmed to have rheumatoid arthritis on 11 November 2010

    I am doing a case-control study, (panel dataset). I've coded these conditions as individual cases (1 - RA, 2: SLE and 3:SV). However, for people with multiple diagnoses over time, I am struggling to code them using their latest date of diagnosis.

    An example of this:

    date1 - date of rheumatoid arthritis diagnosis
    date2 - date of vasculitis (SV) diagnosis
    date 3 - date of Lupus (SLE) diagnosis


    All the dates were formatted using the date function

    format date1 date2 date3 %td

    The cases were generate in this format:

    gen RA_cases=1 if date1 !=.
    gen lupus_cases =1 if date2 !=.
    gen SV_cases =1 if date3 !=.


    Patients with multiple chronic rheumatic diagnoses were counted and tagged (or temporary removed)

    count if RA_cases & SV_cases ==1 ##(the count is n=283)

    gen cases_with_multiple_chrRD =.a if RA_cases & SV_cases ==1



    Now, I am struggling to code these patients based on their latest diagnosis - I've tried this

    replace RA_cases =.a if cases_with_multiple_chrRD ==.a
    replace RA_cases =1 if date1 > date2 & date2 !=.


    The above command doesn't seem to work because it includes missing values and I've also tried formatting the date variables back to numerical (see below).
    Can anyone please assist?
    date1 date2
    18576
    19684
    18168
    17714 20248
    15157
    14292
    19423
    7852 7897
    19103
    17744
    18079
    from
    date1 date2
    22-Nov-13
    28-Sep-09
    01-Jul-08 09-Jun-15
    01-Jul-01
    01-Aug-14
    01-Jul-06
    17-Feb-99
    06-Mar-13
    01-Jul-81 15-Aug-81
    20-Apr-12
    31-Jul-08

  • #2
    The most recent date is just the maximum of the date variables across observations.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(date1 date2)
    18576     .
    19684     .
    18168     .
    17714 20248
    15157     .
    14292     .
    19423     .
     7852  7897
    19103     .
    17744     .
    18079     .
    end
    
    egen wanted= rowmax(date*)
    format * %td
    Res.:

    Code:
     l, sep(11)
    
         +-----------------------------------+
         |     date1       date2      wanted |
         |-----------------------------------|
      1. | 10nov2010           .   10nov2010 |
      2. | 22nov2013           .   22nov2013 |
      3. | 28sep2009           .   28sep2009 |
      4. | 01jul2008   09jun2015   09jun2015 |
      5. | 01jul2001           .   01jul2001 |
      6. | 17feb1999           .   17feb1999 |
      7. | 06mar2013           .   06mar2013 |
      8. | 01jul1981   15aug1981   15aug1981 |
      9. | 20apr2012           .   20apr2012 |
     10. | 31jul2008           .   31jul2008 |
     11. | 01jul2009           .   01jul2009 |
         +-----------------------------------+

    Comment

    Working...
    X