Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data Cleaning: Calculate Date

    Hello,
    I have the following dataset, I need to check if drug date is done after the death date:
    Given that there are blanks in death date indicating that the patient is still alive, how can I manage to clean my data if drug date is after death date,
    when I tried to do simple drop if command ( drop if drugdate>deathdat) I lose large variable around 55000, knowing that I have only 6,772 death dates available

    dataex ID datedeath len yeardeath deat_year drugdate


    . dataex ID datedeath drugdate

    ----------------------- copy starting from the next line -----------------------
    [CODE]
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float ID str10(datedeath drugdate)
    1 "" "20/02/2006"
    2 "" "23/02/2006"
    3 "" "01/09/2006"
    4 "" "01/07/2005"
    5 "" "13/04/2005"
    6 "" "28/12/2006"
    7 "" "19/06/2006"
    8 "26/11/2005" "05/04/2005"
    9 "27/09/2005" "04/05/2005"
    10 "" "16/11/2005"
    11 "" "11/07/2006"
    12 "17/06/2005" "18/06/2005"
    13 "" "21/06/2005"
    14 "" "01/08/2006"
    15 "" "25/10/2005"
    16 "" "23/06/2006"
    17 "" "01/08/2005"
    18 "" "24/05/2006"
    19 "" "01/03/2006"
    20 "" "01/10/2005"
    21 "" "04/07/2006"
    22 "" "20/03/2006"
    23 "" "07/02/2005"
    24 "03/06/2005" "28/12/2006"
    25 "" "01/06/2006"
    26 "" "15/02/2006"
    27 "" "24/07/2006"
    28 "" "01/03/2006"
    29 "" "09/03/2005"
    30 "" "01/04/2006"
    31 "" "01/12/2006"
    32 "" "11/12/2006"
    33 "" "01/01/2006"
    34 "" "01/04/2006"
    35 "" "11/11/2005"
    36 "" "16/12/2005"
    37 "" "29/03/2005"
    38 "" "28/10/2005"
    39 "" "01/05/2006"
    40 "" "23/08/2005"
    41 "" "25/09/2006"
    42 "" "01/10/2005"
    43 "" "01/11/2005"
    44 "" "21/02/2005"
    45 "" "16/02/2006"
    46 "" "16/03/2005"
    47 "23/03/2006" "01/10/2006"
    48 "" "11/10/2005"
    49 "29/05/2005" "04/03/2005"
    50 "26/04/2005" "02/12/2005"
    51 "" "16/10/2006"
    52 "" "21/11/2005"
    53 "" "23/11/2006"
    55 "" "05/01/2006"
    56 "" "01/04/2005"
    57 "" "17/01/2005"
    58 "06/02/2005" "12/05/2006"
    59 "" "11/01/2006"
    60 "30/06/2006" "19/01/2005"
    61 "07/02/2005" "01/08/2006"
    62 "" "05/12/2005"
    63 "" "23/05/2005"
    64 "02/06/2006" "06/07/2005"
    65 "" "11/07/2006"
    66 "" "06/12/2006"
    67 "" "27/02/2006"
    68 "" "01/11/2006"
    69 "" "05/01/2005"
    70 "" "01/10/2005"
    71 "" "28/11/2005"
    72 "" "09/03/2005"
    73 "" "01/09/2006"
    74 "15/02/2006" "12/04/2006"
    75 "" "20/04/2005"
    76 "" "15/02/2005"
    77 "28/03/2005" "10/08/2005"
    78 "" "12/08/2005"
    79 "" "01/01/2006"

  • #2
    The problem is that you are using string variables that look like dates to human eyes, but are not Stata date variables. Once you convert them to Stata date variable, the approach you attempted will work:

    Code:
    //  CONVERT STRING VARIABLES TO ACTUAL DATE VARIABLES
    foreach v of varlist datedeath drugdate {
        gen _`v' = daily(`v', "DMY")
        assert missing(`v') == missing(_`v')
        drop `v'
        format _`v' %td
        rename _`v' `v'
    }
    
    assert !missing(drugdate)
    drop if drugdate > datedeath
    Working with date variables is "bread and butter" data management in Stata. You should invest time looking at -help datetime- and, even better, reading the sections of the PDF manual that is installed with your Stata that are linked there. It's a lot to read because dates and times are complicated, and there are many functions for creating and using these variables. You won't remember everything after a single read, and, to be honest, even expert Stata users often have to go back and refresh their memory on some details for aspects of dates and times that they only use occasionally. But the time spent will be amply repaid in the long run by expediting your ability to work with dates and times.
    Last edited by Clyde Schechter; 04 Jul 2021, 15:10.

    Comment


    • #3
      Your string dates must be converted to numeric dates to be useful. See

      Code:
      help datetime

      Code:
      . gen ddeath = daily(datedeath, "DMY")
      (65 missing values generated)
      
      . gen ddrug = daily(drugdate, "DMY")
      
      . format ddeath ddrug %td
      
      . list if ddrug > ddeath
      
           +------------------------------------------------------+
           | ID    datedeath     drugdate      ddeath       ddrug |
           |------------------------------------------------------|
       12. | 12   17/06/2005   18/06/2005   17jun2005   18jun2005 |
       24. | 24   03/06/2005   28/12/2006   03jun2005   28dec2006 |
       47. | 47   23/03/2006   01/10/2006   23mar2006   01oct2006 |
       50. | 50   26/04/2005   02/12/2005   26apr2005   02dec2005 |
       57. | 58   06/02/2005   12/05/2006   06feb2005   12may2006 |
           |------------------------------------------------------|
       60. | 61   07/02/2005   01/08/2006   07feb2005   01aug2006 |
       73. | 74   15/02/2006   12/04/2006   15feb2006   12apr2006 |
       76. | 77   28/03/2005   10/08/2005   28mar2005   10aug2005 |
           +------------------------------------------------------+

      Comment


      • #4
        Thank you so much for the useful information.

        Comment

        Working...
        X