Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Right way to estimate date of birth

    Hi all

    I had posted some while back on estimating dates of birth. But I would still like to clarify that I'm tackling the issue the right way.

    Initially I used the following code to set date of birth:

    Code:
    gen birthdate = 7 * real (dateofbirth) - 3
    format birthdate %td
    where dateofbirth is the variable specifying date of birth

    My assumption is that the first week of January 1960 i.e. week 0 in Stata time is set to 0 to 6 and that every week in Stata has 52 weeks but the 52nd week has more than 7 days (8 or 9 days). This would give 365.25/52 which is 7.024 days per week.

    Thus should the above code be replaced by the following??

    Code:
    gen birthdate = 7.024 * real (dateofbirth) - 3
    format birthdate %td

    The first code estimates birthdays around a month and a half earlier than the second code which would of course impact the estimation of age and other related variables.

    Thanks!

    /Amal

  • #2
    I don't think this is remotely intelligible without a cross-reference to the earlier thread http://www.statalist.org/forums/foru...-date-variable

    You abandoned that thread without ever answering the questions in my last post.

    To back track and opening this up so that it has possibly more general interest or application:

    0. Almost everyone (we'll meet an exception immediately) agrees that a week lasts for 7 days, but thereafter we differ. Do you think the week starts on Sundays, on Mondays, ...? Or ends on some particular day? Do you think of weeks rather as particular days of the week on which something special happens, statistically in terms of some index being published or an interest rate being changed or it being a special day for your religion, and so forth?

    1. Stata supports weeks idiosyncratically with the rules

    * The origin (time is zero) of weekly dates is the first week in January 1960

    Code:
    . di %tw 0
     1960w1
    * Stata counts from that origin, but also within each year, with extra rules

    Week 1 of a year always starts on 1 January.

    Week 2 always starts on 8 January.

    and so on, with a final rule

    Week 52 always includes 8 or 9 days, depending on whether the year has 365 or 366 days.

    But as you would expect Stata supports not only its own flavour of weekly dates but conversion from those to other kinds of dates.

    2. It follows that if your data use Stata weekly dates, subject possibly to a shift of origin, then the best way to extract the equivalent daily dates is not to devise your own rough translation, but to use Stata's own conversion function(s).

    Here you go with specific examples. Stata can translate from daily dates to weeks, and back again. The translation is not, and cannot be, one-to-one and Stata's rule is that a Stata week maps to the day on which it starts, which over a long enough period can be any day of the week.

    Code:
    . di  wofd(mdy(4,10,2015))
    2874
    
    . di  %tw wofd(mdy(4,10,2015))
    2015w15
    
    . di %td dofw(wofd(mdy(4,10,2015)))
    09apr2015
    
    . di %td dofw(wofd(mdy(1,1,2015)))
    01jan2015
    
    . di %td dofw(wofd(mdy(1,4,2015)))
    01jan2015
    
    . di %td dofw(wofd(mdy(1,7,2015)))
    01jan2015
    3. I have never yet seen any body [sic] outside StataCorp use Stata's weekly dates for data production and reporting. This just means what it says and it's a statement of ignorance. I would be very interested to learn about exceptions.

    4. If your weekly dates are counted in some other way but always as units of 7 days long, then the conversion is trivial (multiply number of weeks by 7 to get number of days) subject naturally to getting the origin the same for both systems. The advice in the previous thread was based on the impression that your problem was of this kind, but it was never completely clear to me.

    I don't think we can advise otherwise without being shown concrete details for your data or the underlying production rules.
    Last edited by Nick Cox; 10 Apr 2015, 06:27.

    Comment


    • #3
      Dear Nick

      Thanks for the reply. Much appreciated.

      I thought I had solved the issue the last time I had posted about birth dates. The problem cropped-up again as there has been some disagreement in our group on he right way to estimate birth dates. This is of course a very crucial variable as we use it to calculate age etc.

      The dataset we received seems to have birth dates set in weeks starting from the first of January 1960. This is all we know. The first method I described:

      gen birthdate = 7 * real (dateofbirth) - 3 format birthdate %td
      ...is what I've always used the past few years working with Stata. But my colleagues prefer to replace the 7 with 7.024. We of course have to follow the same method in order to ensure we don't produce different estimates and variables from the same data source.

      The actual raw data looks like this:

      Code:
      ID   Birthdate
      1    2480
      2    2057
      3    1842
      4    1957
      5    2036
      6    1866
      7    2015
      8    2110

      ...and so on.

      I'm currently trying some the functions you mentioned above. Stata dates is not my strong point at all, but hopefully I'll sort it out soon

      Thanks

      /Amal

      Comment


      • #4
        The dataset we received seems to have birth dates set in weeks starting from the first of January 1960. This is all we know.
        1. Is there any way to contact the data supplier to find this out?
        2. Is this a major/popular/known dataset? Perhaps if you name it, like "DHS of India, 1998" others who worked with it would share their experiences?
        3. I don't see a point to subtract 3 days. Your input data was with precision of one week. Somehow you are tailoring it to the days, which is weird.
        4. The difference between 7 and 7.024 is 0.3 of one percent. If you are analyzing pretty much anything related to human development, disease progression, etc this precision is, imho, next to astronomical. If your results are sensitive to the assumption of this sort, perhaps those are not very robust results.
        5. Unless there is reason to believe that data producer has had any transformations in Stata, the Stata-specific definitions of weeks with 8 and 9 days should be set aside. If the zero-hour corresponds to the Stata's Jan 1, 1960, then your first code in message is what I would take. Of course when you later ask how many weeks have elapsed from Jan 1. 1960 to that date it may report you something else, but that would be Stata's weeks.
        6. The dataset we received seems to have birth dates set in weeks starting from the first of January 1960
          You read the information one way but there is an alternative: Your person is 1866 weeks old on a survey day, and the variable name is just confusing as it stands for something else. Take a weekly calendar open it on survey date's page and flip it back 1866 times. That is the week when the person was born. Whether you are counting from the 1960 or back from the survey date is a great deal of difference. Much more than that 0.3%. Testing shouldn't be a problem as long as you have any attribute related to adults and not applicable to children.
        7. You can settle for Sundays, Mondays, or randomize people between different days (probably one of the tests for sensitivity of results) once you've established the week. If your input data was precise up to 6 days, you can't make it more precise.
        8. Finally your thread's title mentions estimation. While in fact it is all about guessing what was meant by the person who created the dataset.
        Best, Sergiy Radyakin

        Comment

        Working...
        X