Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help finding the difference in time and date for two variables (format: yyyy-mm-ddThh:mm:ssZ)

    Hi, I would like to find the difference between 'survey.start.time' and 'submission_time' for each household, titled 'response_time_diff'.
    The time variable here is irrelevant, but explains why the same households did two surveys (time==0 is before, time==1 is after).
    The data is formatted as follows:
    time household_id survey.start.time _submission_time
    0 A2 2019-06-13T09:15:43.523Z 2019-06-13T13:27:23
    1 A2 2019-09-23T14:39:39.909Z 2019-09-25T09:07:03
    0 ABBH1 2019-06-18T14:22:00.420Z 2019-06-18T17:37:44
    1 ABBH1 2019-09-28T11:58:12.520Z 2019-09-28T12:42:07
    0 ABBH2 2019-06-18T15:15:48.094Z 2019-06-18T17:38:31
    1 ABBH2 2019-09-28T12:13:08.582Z 2019-09-28T12:42:08
    0 ABBH3 2019-08-22T17:02:49.862Z 2019-08-24T07:34:15
    1 ABBH3 2020-01-12T17:57:11.500Z 2020-01-14T04:33:16
    0 ABBH4 2019-09-12T07:49:40.027Z 2019-09-15T23:01:39
    1 ABBH4 2019-11-19T12:12:56.163Z 2019-11-26T12:13:59
    0 ABBH6 2019-10-12T16:43:25.417Z 2019-10-13T19:29:29
    1 ABBH6 2020-01-12T15:00:04.646Z 2020-01-14T04:40:07
    0 ABH10 2019-07-26T14:01:22.867Z 2019-07-26T16:39:14
    1 ABH10 2019-10-08T12:48:41.567Z 2019-10-09T16:04:17

    Thank you so much for your time and deeply appreciate your help!

  • #2
    The major obstacle here is that you appear to have your times stored as string variables. To do anything useful with them, they must first be converted to Stata internal format date-time variables. From there it is just a one-liner.

    Examining the data, it is not clear what unit is appropriate for the result. The elapsed times vary from less than one hour to over a week. I have written the code to return the number of hours elapsed, but it is easy enough to change this to whatever unit best suits your needs. See -help clockdiff_frac()-.

    Code:
    //  CREATE STATA INTERNAL FORMAT DATE-TIME VARIABLES FROM STRING VARIABLES
    foreach v of varlist surveystarttime _submission_time {
       foreach a in T Z {
           replace `v' = subinstr(`v', "`a'", " ", .)
       }
        replace `v' = trim(`v')
        gen double _`v' = clock(`v', "YMDhms"), after(`v')
        assert missing(_`v') == missing(`v')
        format _`v' %tc
        drop `v'
        rename _`v' `v'
    }
    
    gen double wanted = clockdiff_frac(surveystarttime,  _submission_time, "h")

    Comment


    • #3
      Dear Clyde,
      Thank you for the response, but once I've copied the code to STATA, seems like something has gone wrong:
      . foreach v of varlist surveystarttime _submission_time { 2. . foreach a in T Z { 3. . replace `v' = subinstr(`v', "`a'", " ", .) 4. . } 5. . replace `v' = trim(`v') 6. . gen double _`v' = clock(`v', "YMDhms"), after(`v') 7. . assert missing(_`v') == missing(`v') 8. . format _`v' %tc 9. . drop `v' 10. . rename _`v' `v' 11. . } (0 real changes made) (0 real changes made) (0 real changes made) (36 missing values generated) 36 contradictions in 556 observations assertion is false r(9); . . . . gen double wanted = clockdiff_frac(surveystarttime, _submission_time, "h") type mismatch r(109); Appreciate your help anyway and thanks a lot!

      Comment


      • #4
        These error messages indicate that there is something wrong with the data. Without seeing example data posted using the -dataex- command it is impossible to troubleshoot this. Please post back with that. In doing that, be sure that your example includes data that reproduces this problem. The code runs without error on the data shown in #1.

        If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        Comment


        • #5
          Let me elaborate on my response in #4 and help you find the source of the problem. The output you show, cleaned up for readability:

          Code:
          . foreach v of varlist surveystarttime _submission_time {
          2. . foreach a in T Z {
              3. . replace `v' = subinstr(`v', "`a'", " ", .)
              4. . }
              5. . replace `v' = trim(`v')
              6. . gen double _`v' = clock(`v', "YMDhms"), after(`v')
              7. . assert missing(_`v') == missing(`v')
              8. . format _`v' %tc
              9. . drop `v'
              10. . rename _`v' `v'
          11. . }
          (0 real changes made)
          (0 real changes made)
          (0 real changes made)
          (36 missing values generated)
          36 contradictions in 556 observations assertion is false r(9);

          demonstrates two things:

          1. The data in your Stata data set are not formatted the way you show in #1.
          2. There are 36 observations in which the values of surveystarttime and not valid date/times.

          Regarding 1. The three messages that all say "(0 real changes made)" come from the -replace- statements on lines 3 (first two) and 5 (last one). The command on line 3 removes the "T" and "Z" characters from the time variables. The fact that 0 real changes were made means that there aren't actually any T and Z characters in those variables. So the example data you showed in #1 is an inaccurate portrayal of the data you actually have in Stata. And that also means that the code devoted to removing those non-existent characters is unnecessary. It is precisely because of situations like this that the Forum FAQ instructs people to use the -dataex- command to show example data in this Forum. -dataex- is the only way to do that that will reproduce the data completely faithfully and also provide all the necessary metadata to use the example in Stata. (The third "(0 real changes made)" was actually expected: that was just included as a precaution in case some values got packed with leading or trailing spaces.)

          Regarding 2. This comes from the command on line 7. That command verifies that the strings found in the time variable are all correctly converted to a Stata internal format datetime variable. The error message tells us that in 36 observations, the string did not represent a proper specification of a date and time. Since your data set has 556 observations, it may be hard for you to pick them out as examples to show here. The way you can find them easily is:

          Code:
          browse if missing(clock(surveystarttime, "YMDhms")) & !missing(surveystarttime)
          The browser window will pop up and show them to you. You should then try to figure out how these invalid representations got into your data set and fix whatever data management (or data source) led to them. I strongly suggest you do the same with _submission_time, since whatever error led to the bad values of surveystarttime may well have affected those values as well. Once you have fixed up these bad data points, try the code from #2 again, and I think you will get satisfactory results.

          As an aside, I notice that following that original error message discussed above, you then went on to try running the -gen double wanted = ... - command. You should never do that. Once Stata halts and shows you an error message, you should forget about proceeding with the rest of the code: first fix the problem that led to the error message. Even if the rest of the code appears to be running well when you try it, you are running it on the wrong data. Now, in most cases, as happened to you, you just get new error messages anyway. But sometimes the code will appear to work notwithstanding the earlier errors. That is actually your worst case scenario: you get what looks like good output, but it is just garbage. Never ignore an error message. Always fix errors before moving on.
          Last edited by Clyde Schechter; 21 Jul 2022, 12:31.

          Comment


          • #6
            Thanks Clyde, the code worked perfectly after cleaning up the data a bit!

            Comment

            Working...
            X