Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing observations with imputed date

    Hi, please I have a dataset which is in this format (not real dataset)
    S/N date last seen year last seen outcome
    1 19/05/2015 2015 Alive
    2 2015 2015 Alive
    4 2018 Dead
    5 21/06/2018 2018 Dead
    6 2018 Dead
    I am trying to replace the "date last seen" with the midpoint for the "year last seen" (for example replacing observation 2 with - 01/06/2015).

    Please are there any suggestions for an easy method to do this?

    Thanks!

  • #2
    Welcome to Statalist.

    Perhaps the following will point you in a useful direction. Note that July 1st is (approximately) the midpoint of a calendar year, not June 1.
    Code:
    generate midpoint = mdy(7,1,year_last_seen)
    format midpoint %td

    Comment


    • #3
      Mid-year midpoints can be disastrous if you intend to use the dates to calculate durations. Even monthly midpoints (if you knew month and year) could have undesirable consequences. Both distort the distribution of what would otherwise be smoothly random durations. Better would be to randomly impute one of the 365 or 366 days of the year. (The reference below imputes dates in a known month.)

      If you have long enough follow-up (say five years) and many missing dates, as in your example, consider a grouped data analysis, with "year" as the time unit.

      Reference:
      Samuels, S.J. and Cox, N.J., 2012. Stata tip 105: Daily dates with missing days. Stata Journal, 12(1), pp.159-161.
      Last edited by Steve Samuels; 09 Aug 2018, 14:47.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Originally posted by William Lisowski View Post
        Welcome to Statalist.

        Perhaps the following will point you in a useful direction. Note that July 1st is (approximately) the midpoint of a calendar year, not June 1.
        Code:
        generate midpoint = mdy(7,1,year_last_seen)
        format midpoint %td
        Thanks so much for your help William.

        Comment


        • #5
          Originally posted by Steve Samuels View Post
          Mid-year midpoints can be disastrous if you intend to use the dates to calculate durations. Even monthly midpoints (if you knew month and year) could have undesirable consequences. Both distort the distribution of what would otherwise be smoothly random durations. Better would be to randomly impute one of the 365 or 366 days of the year. (The reference below imputes dates in a known month.)

          If you have long enough follow-up (say five years) and many missing dates, as in your example, consider a grouped data analysis, with "year" as the time unit.

          Reference:
          Samuels, S.J. and Cox, N.J., 2012. Stata tip 105: Daily dates with missing days. Stata Journal, 12(1), pp.159-161.
          Thanks Steve. The link you referenced was really helpful.

          Comment


          • #6
            You are very welcome, Tsi.

            I realized since I wrote that imputation has a cost: standard errors must reflect the randomness of the process. To get proper standard errors, you will need to create multiple imputation (MI) datasets by hand; then add Stata's mi estimate prefix to your estimation commands.

            To avoid this, I recommend the grouped-data survival models, as they require no imputation.
            Last edited by Steve Samuels; 10 Aug 2018, 12:07.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment

            Working...
            X