Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • date string: extraction of correct dates

    Dear community,

    I have date strings exported from surveymonkey in the following format:

    Code:
    . list date1 in 1
    
         +---------------------+
         |               date1 |
         |---------------------|
      1. | 10/11/2015 13:04:11 |
    All I am trying to achieve is to transform this into a stata compatible format, i.e. %tc.
    Here's my strategy and the description of the problems I encounter:

    1. I am extracting the months, days, years, hours, minutes, and seconds via

    Code:
    gen        month    = substr(date1,1,2)
    gen        day        = substr(date1,4,2)
    gen        year    = substr(date1,7,4)
    gen        hour    = substr(date1,12,2)
    gen        minute    = substr(date1,15,2)
    gen        second    = substr(date1,18,2)
    So far, this works pretty well, as documented in this example (first observation):

    Code:
    .    list date1 month-second in 1
    
        +-------------------------------------------------------------------+
        date1   month   day   year   hour   minute   second
        -------------------------------------------------------------------
        1.  10/11/2015 13:04:11      10    11   2015     13        4       11
        +-------------------------------------------------------------------+
    2. I am trying to bring all of this together via the the complete date function, and the following happens (again example first observation):

    Code:
    . gen             datefull = mdyhms(month, day, year, hour, minute, second)
    
    . format  datefull        %tc
    
    . list    date1 datefull in 1
    
         +------------------------------------------+
         |               date1             datefull |
         |------------------------------------------|
      1. | 10/11/2015 13:04:11   11oct2015 13:05:08 |
         +------------------------------------------+
    As you can see, I am getting the correct date but incorrect minutes/seconds. More precisely, the resulted time is 57 seconds later than the original time. However, this is not a constant. For example, in the 1000th observation, the resulting time is 38 seconds before the original time. Therefore, I do not believe this is somehow related to the leap seconds, because it should be more or less constant across all observations (time span is less than 2 months).

    3. I am suspecting that the problem is related to the day being not extracted accurately for some reason. If I extract "MDY" directly via the date() function and then transform the result into milliseconds via cofd(), I am not getting the exact day:

    Code:
    . **      a) transform date1 into days since 01/01/1960:
    . gen             d1      = date(date1, "MDYhms")
    
    . **      b) transform days into milliseconds
    . gen             d2      = cofd(d1)
    
    . format  d2      %tc
    
    . list    d2      in 1, notrim    
    
                           d2  
      1.   11oct2015 00:00:53
    I am hoping to have provided enought details for you to understand my problem but I'd be happy to provide more information if necessary.

    Thanks in advance,
    Jakob

  • #2
    Use a -double- data type: see help on dates, where this is emphasised.

    Comment


    • #3
      Thanks a lot for the fast reply.
      I didn't expect it'd be so easy and I was sure I double checked all the help files.
      Anyway, I am surprised that not storing dates as -double- does mean this loss of precision.

      Comment


      • #4
        You shouldn't be surprised: the numbers concerned are milliseconds since the beginning of 1960.

        Comment


        • #5
          When it comes to troubleshooting concerning date and time functions in Stata, this point has been so frequently underscored by Nick in the Forum, that it became sort of mantra to me. And I hope it happens to each one of us.

          We may also find the express recommendation (when importing date-times) from the Stata Blog (http://blog.stata.com/2011/01/05/usi...ther-software/), as well as the axiom we read in the Stata manual:

          Because %tc values can be so large, whenever you use the function clock(), you must store the results in a double,
          Best,

          Marcos
          Last edited by Marcos Almeida; 27 Jan 2016, 08:58.
          Best regards,

          Marcos

          Comment


          • #6
            By the way, it isn't necessary to go to the trouble of parsing out separately the month, day, year, etc. from the original variable date1 (unless you need those variables separately for some other reason.) The whole thing could be done in one line:

            Code:
            gen double datefull = clock(date1, "MDYhms")

            Comment


            • #7
              Once you use date() to get a daily date, you've rounded down and thrown away the time of day information (hh:mm:ss). You know that.

              So why did you get a display of 00:53:00? That was a side-effect of using float not double.

              I use display on examples where I can work out the right answer to check my syntax. That uses double precision.

              Code:
              . di %tc clock("10/11/2015 13:04:11", "MDY hms")
              11oct2015 13:04:11
              
              . di %td date("10/11/2015 13:04:11", "MDY hms")
              11oct2015
              
              . di %tc cofd(date("10/11/2015 13:04:11", "MDY hms"))
              11oct2015 00:00:00
              Hence Stata will give you the right answer so long as you ask the right question.

              So, let us go through the difference in results between storage types:

              Code:
               
              . gen clock_float = cofd(daily("10/11/2015 00:53:00", "MDY hms"))
              
              . gen double clock_double = cofd(daily("10/11/2015 00:53:00", "MDY hms"))
              
              . format %tc clock*
              
              . l clock* in 1
              
                   +-----------------------------------------+
                   |        clock_float         clock_double |
                   |-----------------------------------------|
                1. | 11oct2015 00:00:53   11oct2015 00:00:00 |
                   +-----------------------------------------+
              
              . format %18.0f clock*
              
              . l clock* in 1
              
                   +-------------------------------+
                   |   clock_float    clock_double |
                   |-------------------------------|
                1. | 1760140853248   1760140800000 |
                   +-------------------------------+
              
              . format %21x clock*
              
              . l clock* in 1
              
                   +-----------------------------------------------+
                   |           clock_float            clock_double |
                   |-----------------------------------------------|
                1. | +1.99d0920000000X+028   +1.99d0913000000X+028 |
                   +-----------------------------------------------+
              The hexadecimal display shows that, believe it or not, 1760140853248 is a rounder number when in binary than is 1760140800000.

              Also, for date-times like these, a float can only change in multiples of 65536 = 2^16 ms. Here is the analysis. Add 1 bit and so what that means in decimal:

              Code:
               
              . scalar yourtime = 1.99d0920000000X+028
              
              .  di %18.0f yourtime
                   1760140853248
              
              . scalar yourtime_plus_some = 1.99d0930000000X+028
              
              . di %18.0f yourtime_plus_some
                   1760140918784
              
              . di %18.0f yourtime_plus_some - yourtime
                           65536
              
              . di 2^16
              65536


              Last edited by Nick Cox; 27 Jan 2016, 13:03.

              Comment

              Working...
              X