Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • editing dates

    hii I'm really struggling with trying to change the values of the date of birth variable in my dataset.
    It is a string variable that has values like-
    16-11-98
    17-08-97
    18-10-90
    02dec1996
    02apr1998
    06nov1993
    can anyone guide me as to how i can go about cleaning this??

  • #2
    Dates can be frustrating. It really isn't Stata's fault as (1) dates just come in so many different forms and flavours and data on dates are often messy, being at the origin typed in by people! (2) researchers may want something different from the data (3) what you need is documented somewhere, although many people don't even try to read the documentation.

    Code:
    help datetime
    is always the starting point, and you can and should skip and skim over points irrelevant to you now.

    The good news is that your data in the example all to daily() satisfy the pattern DMY with the information that dates may go to 2023. (If that isn't true, using an earlier limit than 2023 would help.) daily() is just the same function as date().

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 bdate
    "16-11-98" 
    "17-08-97" 
    "18-10-90" 
    "02dec1996"
    "02apr1998"
    "06nov1993"
    end
    
    gen betterdate = daily(bdate, "DMY", 2023) 
    format betterdate %td 
    
    list bdate if missing(betterdate)
    
    list, sep(0)
    
         +-----------------------+
         |     bdate   betterd~e |
         |-----------------------|
      1. |  16-11-98   16nov1998 |
      2. |  17-08-97   17aug1997 |
      3. |  18-10-90   18oct1990 |
      4. | 02dec1996   02dec1996 |
      5. | 02apr1998   02apr1998 |
      6. | 06nov1993   06nov1993 |
         +-----------------------+
    There may be bad news. In a large dataset you may get date forms that don't match the pattern given. That is why I had a line asking to see better dates that are missing. There aren't any in the data example.

    Also, in a large dataset you may get some centenarians born in the early years of the 20th century. If "20-01-22" means born in 1922, watch out.

    Comment


    • #3
      This has worked PERFECTLY! Thanks a ton Nick
      I'll keep in mind the tips you've shared, this was super helpful

      Comment

      Working...
      X