Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Change work with timestamp to calculate the Duration of interview per a uniqueId

    Hi Family,
    I am currently working on a dataset in a roster format that contains timestamps, and I am seeking assistance with converting these timestamps to the required format.
    Despite the attempt in creating three new variables, I continue to encounter a "type mismatch" error while attempting to use the collapse command to sum the time differences by unique ID.

    This was the little that I know but It did not work out for me.
    • gen start_time_stata = clock(START_TIME_SECS, "YMDhms") /// to convert the start timestamp
    • gen end_time_stata = clock(STOP_TIME_SECS, "YMDhms") ////to convert the stop timestamp
    • gen time_diff = end_time_stata, start_time_stata, "seconds") ////to get the difference
    • collapse (sum) time_diff, by(MRESPONSE_ID) /////To obtained the actual Interview duration for each case(1 and 2)
    Alternatively is there an easiest way to go about such case and I am ready to adopt that.

    I would greatly appreciate your help in resolving this issue.

    I have attached the dataset for your reference.

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(MRESPONSE_ID START_TIME_SECS STOP_TIME_SECS)
    1 1681182712 1681182762
    1 1681183070 1681183080
    1 1681183410 1681183423
    1 1681183467 1681183472
    2 1681184993 1681185138
    2 1681185312 1681185464
    2 1681185708 1681185926
    end



    Thank you very much in advance.


    Sham
    Last edited by Shamsudini Amidu; 11 Apr 2023, 06:49.

  • #2
    The data example you provide has the values stored as a double, whereas clock() expects a string. The values you list don't look like numeric date-times, but they could be Stata time variables minus the formatting? If so, simply doing:

    Code:
    format START_TIME_SECS %tc
    will show them as time values which you can then subtract, collapse etc. without issue.

    Check out "Working with dates and times" in the manual for a more detailed discussion: https://www.stata.com/manuals/u25.pdf

    Comment


    • #3
      I agree with #2. I will just add that if you did have string timestamps to start with and needed to use the -clock()- function then you would have to store the resulting numeric variables as double. Your code stores them, by default, as float, which will mean loss of precision (specifically the minutes and seconds may be wrong.)

      Code:
      gen double start_time_stata = clock(START_TIME_SECS, "YMDhms") /// to convert the start timestamp
      gen double end_time_stata = clock(STOP_TIME_SECS, "YMDhms") ////to convert the stop timestamp
      Again, #2 is correct: you are starting out with what look like Stata clock variables, so this code is not necessary for your present purposes. I'm just reinforcing the use of -double- storage type when you actually do need to use the -clock()- function.

      Comment


      • #4
        Hi family,
        Thank you once again.
        When I use;
        format START_TIME_SECS %tc /// to convert the timestamp to minute it converted all the time as same format STOP_TIME_SECS %tc //// all the timestamp changes to same

        Please check that again for me or maybe I did not understand your guide.

        clear
        input double(MRESPONSE_ID START_TIME_SECS STOP_TIME_SECS)
        1 1681182712 1681182762
        1 1681183070 1681183080
        1 1681183410 1681183423
        1 1681183467 1681183472
        2 1681184993 1681185138
        2 1681185312 1681185464
        2 1681185708 1681185926
        end

        Thank you.

        Comment


        • #5
          First of all, let's check that you have what you want there. The durations are the differences, so let us look:


          Code:
          clear 
          input double(MRESPONSE_ID START_TIME_SECS STOP_TIME_SECS)
          1 1681182712 1681182762
          1 1681183070 1681183080
          1 1681183410 1681183423
          1 1681183467 1681183472
          2 1681184993 1681185138
          2 1681185312 1681185464
          2 1681185708 1681185926
          end
          
          gen duration = STOP - START 
          
          list 
          
          
               +---------------------------------------------+
               | MRESPO~D   START_T~S   STOP_TI~S   duration |
               |---------------------------------------------|
            1. |        1   1.681e+09   1.681e+09         50 |
            2. |        1   1.681e+09   1.681e+09         10 |
            3. |        1   1.681e+09   1.681e+09         13 |
            4. |        1   1.681e+09   1.681e+09          5 |
            5. |        2   1.681e+09   1.681e+09        145 |
               |---------------------------------------------|
            6. |        2   1.681e+09   1.681e+09        152 |
            7. |        2   1.681e+09   1.681e+09        218 |
               +---------------------------------------------+
          Don't focus on the format of the times. Focus on the durations. The units of date-times in Stata are milliseconds, so if you have there date-times in Stata's sense, then your durations are between 5 and 218 ms in this data example, that is, all are less than a second.

          Does this make sense in terms of your application? Because if it doesn't then the data must be in some other units.

          Comment


          • #6
            Well, if the time variables you have are actually Stata internal format clock variables, then they are denominated in milliseconds, and the differences between the consecutive values, or between stop times and stop times in the same observations, are, in fact, fractions of a second in most instances. You can use a slightly more elaborate display format to see the milliseconds:
            Code:
            . clear
            
            . input double(MRESPONSE_ID START_TIME_SECS STOP_TIME_SECS)
            
                 MRESPONS~D  START_TI~S  STOP_TIM~S
              1. 1 1681182712 1681182762
              2. 1 1681183070 1681183080
              3. 1 1681183410 1681183423
              4. 1 1681183467 1681183472
              5. 2 1681184993 1681185138
              6. 2 1681185312 1681185464
              7. 2 1681185708 1681185926
              8. end
            
            . format *TIME* %tcDDMonCCYY_HH:MM:SS.sss
            
            . list, noobs clean
            
                MRESPO~D          START_TIME_SECS           STOP_TIME_SECS  
                       1   20Jan1960 10:59:42.712   20Jan1960 10:59:42.762  
                       1   20Jan1960 10:59:43.070   20Jan1960 10:59:43.080  
                       1   20Jan1960 10:59:43.410   20Jan1960 10:59:43.423  
                       1   20Jan1960 10:59:43.467   20Jan1960 10:59:43.472  
                       2   20Jan1960 10:59:44.993   20Jan1960 10:59:45.138  
                       2   20Jan1960 10:59:45.312   20Jan1960 10:59:45.464  
                       2   20Jan1960 10:59:45.708   20Jan1960 10:59:45.926
            If these do not look like reasonable values of the start and end times, then these variables may not be Stata internal format clock variables. Perhaps they are some other kind of timestamp encoding. I suggest you review the documentation that accompanied this data set (or, if there was no documentation, contact the provider directly) to get an explanation of what these timestamp variables actually are. Once that is known, probably it will be possible to figure out how to convert them to Stata internal format clock variables.

            One possibility is that these are UTC timestamps. UTC timestamps are normally denominated in seconds. And the variable names suggest that these are as well. If that is the case, then
            Code:
            . foreach v of varlist *TIME* {
              2.     replace `v' = `v'*1000
              3. }
            (7 real changes made)
            (7 real changes made)
            
            . format *TIME* %tc
            
            . list, noobs clean
            
                MRESPO~D      START_TIME_SECS       STOP_TIME_SECS  
                       1   10apr2013 03:11:52   10apr2013 03:12:42  
                       1   10apr2013 03:17:50   10apr2013 03:18:00  
                       1   10apr2013 03:23:30   10apr2013 03:23:43  
                       1   10apr2013 03:24:27   10apr2013 03:24:32  
                       2   10apr2013 03:49:53   10apr2013 03:52:18  
                       2   10apr2013 03:55:12   10apr2013 03:57:44  
                       2   10apr2013 04:01:48   10apr2013 04:05:26
            would be the way to handle these. Do those look more appropriate?

            Comment


            • #7
              Hi Family,

              Yes, exactly what I wanted!
              .
              I am indeed very grateful to all of your warmly guidance and material support for that leaning experience.

              It is working so well.

              This is what I wanted;

              Let me share the do file to you for validation;
              ////////////////////////////////////////////////////////////////////////////

              foreach v of varlist *TIME* {
              replace `v' = `v'*1000
              }
              format *TIME* %tc

              gen timediff= (STOP_TIME_SECS- START_TIME_SECS)

              collapse (sum) timediff , by(MRESPONSE_ID)


              format timediff %tc

              ////////////////////////////////////////////////////////////////////////////

              And that gives me the overall duration for interview per observation.


              I do appreciate your time.!

              Comment


              • #8
                Hi Family,
                Sorry I am back again to seek for clarity.
                Is there a way that I can take off/ drop the date" 01 jan1960" attached to the time?
                example;

                01jan1960 00:01:18
                so that I will have only 00:01:18 standing alone without 01jan 1960 at the background?

                I used this format: below;
                format timediff %tcHH:MM:SS

                I got the desire results alright but it still shows "01jan1960 00:01:18" in the content.

                And surprisingly when I copied it to Excel, the content is showing 12:01:18 am at the formula bar but see 00:01:18 in the cell itself.
                I attached the example data set.

                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input double(MRESPONSE_ID timediff)
                1 78000
                2 515000
                3 28000
                4 93000
                end
                format %tc timediff
                format timediff %tcHH:MM:SS

                Thank you.
                Sham

                Comment


                • #9
                  Is there a way that I can take off/ drop the date" 01 jan1960" attached to the time?
                  No, it is not possible. Stata does not have time variables. Representations of time in Stata are always attached to dates. Since 1 Jan 1960 is the zero date, a "pure time" would use that as the date part.

                  You have correctly identified that you can suppress the display of the date part by using the %tcHH:MM:SS format. I do not know, and cannot imagine, any circumstance in which this would not be a satisfactory solution for you. You say that "but it still shows "01jan1960 00:01:18" in the content." But I don't know what you mean by content here. If you have formatted the variable %tcHH:MM:SS you will not see the 01jan1960 part in any display of the variable. So where are you seeing it and how did it arise?

                  And surprisingly when I copied it to Excel, the content is showing 12:01:18 am at the formula bar but see 00:01:18 in the cell itself.
                  In Excel, the same things can be displayed under different formats, just as it can in Stata. 12:01:18 AM is the same thing as 00:01:18 in 24 hour time. Excel, like Stata, makes both 12 hour AM/PM and 24 hour formatting available in its cells.* You can select whichever you prefer. However, I do not think you have control over how things are displayed in Excel's formula bar.

                  *And, by the way, internally, Excel stores date and time variables almost exactly the same way Stata does: as milliseconds from reference time. The only difference is that whereas Stata's reference time is midnight of 1 Jan 1960, Excel uses midnight of 31 Dec 1899 (in recent versions, a different reference date was used in earlier versions).

                  Comment


                  • #10
                    So where are you seeing it and how did it arise?

                    When you browse the data and click on any of the converted time it will be displaying on the top.

                    Alright well noted , my agender is to know the time it took the enumerator to complete a particular interview.
                    Example, if I see 00:01:18 it means the interviewer uses 1min 8 seconds to finish an interview.

                    Finally;

                    Honestly I generated the timestamp using the Cspro data collection application and exported the data into the Stata to help me with the data cleaning.
                    However in Cspro, the timestamp was calculated in reference to January 1, 1970("elapsed since January 1, 1970 in Coordinated Universal Time (UTC).")

                    On the other hand, in Stata, it appears that the conversion that I did uses time of "midnight of 1 Jan 1960" which differs from what the Csentry generated, . will that affect my analysis or conversion?
                    Please clarify this for me as well.

                    I am most grateful for your time.

                    Thank you.

                    Comment


                    • #11
                      When you browse the data and click on any of the converted time it will be displaying on the top.
                      Ah, yes, I see that now. It is odd that that part of the data editor (browser) does not use the same display format as is applied to the data in its cells. I had never noticed that before. I don't think there is anything that the user can do about it. StataCorp might want to consider changing that in a future release. (You should consider adding that to the Wishlist for Stata 18 thread.) It's a nuisance, but I think it is not too hard to just ignore it, at least for present purposes.

                      I generated the timestamp using the Cspro data collection application and exported the data into the Stata to help me with the data cleaning.
                      However in Cspro, the timestamp was calculated in reference to January 1, 1970("elapsed since January 1, 1970 in Coordinated Universal Time (UTC).")

                      On the other hand, in Stata, it appears that the conversion that I did uses time of "midnight of 1 Jan 1960" which differs from what the Csentry generated, . will that affect my analysis or conversion?
                      I don't think this will be a problem for you as you are not actually using the date portion of the variable. If you were using the date portion, then you would have to adjust the data to get the dates right.

                      Comment


                      • #12
                        Ohok, Clyde thank you very much for the clarity.

                        But Sir, How do I go about this? (You should consider adding that to the Wishlist for Stata 18 thread.)
                        I am most grateful.
                        sham.

                        Comment


                        • #13
                          How do I go about this? (You should consider adding that to the Wishlist for Stata 18 thread.)
                          When you open up the Statalist General Forum, the very first thread listed (as a "Sticky Topic") is the Wishlist for Stata 18. Just open that thread as you would any other, and go to the last page (it's very long). There will be an edit box where you can write what you would like to see in future versions of Stata, just as you would write a post in any thread.

                          Remember that it's a wishlist. As with other wishlists, most wishes do not come true. Also, given how long Stata 17 has been around, I suspect that the decisions about what to include in version 18 were already finalized some time ago--so really at this point wishes probably apply to version 19 or later.

                          Comment


                          • #14
                            Alright thank you very much

                            Comment

                            Working...
                            X