Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • egen maxdate gives dates 01/01/2500

    Hello everyone

    I am trying to get the difference between two dates to calculate follow up per patient
    I am using a long format dataset with various observations per patient id, no missing values.

    I can't spot why for some patients I get 01/01/2500 as the maxdate date, which is wrong.


    edate format is float %dM_d,_CY , is this an issue?

    This is what I'm using:

    bysort patid : egen maxdate = max(edate)

    Does anyone know why this might be happening?

    Thank you,

    Louisa

  • #2
    You are reporting strange results, and rightly so, but not telling us anything about your data and -- above all -- about how dates were imported or converted.

    I note that

    Code:
    di mdy(1, 1, 2500)
    shows 197232, so where did that come from? Other way round, note that

    Code:
    format edate %dM_d_CY
    does nothing whatsoever to convert anything, It is appropriate if (and only if) the variable concerned is already holding Stata daily dates with origin 1 January 1960.

    I would draw


    Code:
    quantile edate
    as a quick graphical check for a bunch of rogue dates.

    Comment


    • #3
      Hi Nick,

      Thank you for your response.

      These are electronic health records.

      The date was given as "eventdate" as a long string.

      When I sorted the "eventdate" I saw that the latest date was 31/12/2012 amongst all participants.
      I therefore expected dates earlier than that.

      I used

      Code:
      gen edate=date(eventdate, "DMY")
      format date %dM_d,_CY
      bysort patid : egen mindate = min(edate)
      bysort patid : egen maxdate = max(edate)
      That's all I did. Mindate worked well.

      My computer is struggling to quantile edate I'm afraid.



      Comment


      • #4
        Nothing obviously wrong there. But sorting on a string date isn't a strong diagnostic tool. Days of the month running from 1 to 31 will sort accordingly -- except that / sorts before any number -- and that won't shake out rogue years.

        Some things you could do:

        Code:
        gen year = substr(eventdate, -4, 4) 
        tab year 
        
        summarize edate, detail

        Comment

        Working...
        X