Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues with a date variable

    Hi all

    Currently working with a dataset where the date of birth is a string variable (str6). The original date of birth has been replaced with figures, each figure representing the number of weeks from the first full week in 1960. Thus subject X with date of birth coded as 2480, would actually be born in mid 2007 (1960 + 2480 weeks ahead in time).

    So far, in order to be able to work with this variable in an easy way I would like to convert it to a numeric variable that reads as a date with the format mm/dd/yyyy.

    I started with:

    destring dateofbirth, gen (dob)
    format dob %tw

    Now the date of birth for subject X above looks like: 2007w37

    Since I lack the actual day of birth in the given week of birth, I would like to choose the middle of each week as the date of birth which could be the fourth day of the given week of birth for all subjects.

    I've used the 'substr' option previously to generate dates, months and years separately before combining them together to generate one single date variable.

    gen by=substr(dob,1,4)
    gen bm=substr(dob,5,2)
    gen bd=substr(dob,7,2)
    gen byr=real(by)
    gen bmo=real(bm)
    gen bda=real(bd)
    gen bdate=mdy(bmo,bda,byr)
    format bdate %d

    But I'm not aware of a way to tackle my issue above where I have only the year and week but not the actual day.

    Thanks

    /Amal

  • #2
    Stata weeks are a trap for the unwary. Read the discussion in http://www.stata-journal.com/sjpdf.h...iclenum=dm0052 which I think shows that they are not what you want here. There are precisely 52 weeks in each year in Stata, not 52 and some fraction depending on the length of the year.

    It seems to me that you can get a Stata daily date for the middle of each week just by

    Code:
    gen ddate = 7 * real(dateofbirth) - 3
    format ddate %td
    but there may be some small corrections to that (e.g. what is a "full" week precisely).

    substr() is a function, not an option.

    To get exact advice, show us how the first 7 days of January 1960 would be numbered in terms of weeks, or equivalently the exact definition of week 1 in the scheme used.

    P.S. Note that as from Stata 10 your previously used method

    Code:
    gen by=substr(dob,1,4)
    gen bm=substr(dob,5,2)
    gen bd=substr(dob,7,2)
    gen byr=real(by)
    gen bmo=real(bm)
    gen bda=real(bd)
    gen bdate=mdy(bmo,bda,byr)
    format bdate %d
    collapses to

    Code:
    gen bdate = date(dob, "YMD")
    format bdate %td
    although the older display format %d still works. In Stata 13 daily() is a documented alternative name for date().
    Last edited by Nick Cox; 04 Mar 2015, 04:25.

    Comment


    • #3
      Thanks Nick! It worked well!

      Unfortunately we don't (as of now) have much information on how exactly the date variable was created except that the first week/seven days of January 1960 should be equivalent to one full week.

      I agree - having a supposedly date variable in just weeks isn't the best way to go about it!

      Thanks,

      /Amal.

      Comment


      • #4
        There are two distinct issues here, or so I imagine.

        Presumably your dataset includes some kind of fuzzing of the birthdates as a part of anonymising the information. That's a good idea from every angle except that of the people who want to use the data for research....

        The second is handling weeks in dates. Stata's approach is one of several; none is ideal.

        The first seven days in January are 0 1 2 3 4 5 6 as Stata daily dates. If that's week 1 in the system you are using I think the code should be adjusted by 1 day, i.e. you need

        Code:
          gen ddate = 7 * real(dateofbirth) - 4

        Comment


        • #5
          Thanks Nick

          I did adjust the code accordingly.

          Just to cross check, the first day of the second week in January starts with 1?

          Thanks

          Amal

          Comment


          • #6
            Sorry, but I don't know how to answer that at all.

            The main point of the thread is that you have to define week precisely to get anywhere, so what is the "second week"?

            For weeks as defined by Stata the second week of January always starts on 8 January, but there are 6 other definitions with different answers.

            Also, "starts with 1" refers to what variable?

            Comment

            Working...
            X