No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • subtracting dates

    Please help. In my data set the dates are in the format: 17dec2016 00:00:00. I am trying to subtract one date to the other but am getting it in milliseconds, I believe.

    I have generated day_since_con using : gen day_since_con = (episode_start_dt - concuss_onset) and get the results below. I just want days.

    I have tried to convert by using; quietly gen double con_on_dt = date(concuss_onset, "YMD")

    No luck. Can you advise on how to convert these to the proper format so I can get days when I subtract one date from the other?

    input double(concuss_onset episode_start_dt) float day_since_con
    1.7832096e+12 1.8076608e+12 2.44512e+10
    1.7201376e+12 1.8110304e+12 9.08928e+10
    1.8067968e+12 1.8077472e+12 9.504e+08
    1.797552e+12 1.8063648e+12 8.8128e+09
    1.8026496e+12 1.8129312e+12 1.02816e+10
    1.8086976e+12 1.81008e+12 1.3824e+09
    1.8004896e+12 1.80576e+12 5.2704e+09
    1.8036e+12 1.8062784e+12 2.6784e+09
    1.7749152e+12 1.799712e+12 2.47968e+10
    1.8042912e+12 1.8065376e+12 2.2464e+09
    1.8077472e+12 1.809216e+12 1.4688e+09
    1.8000576e+12 1.8041184e+12 4.0608e+09
    1.798416e+12 1.8035136e+12 5.0976e+09
    1.8004032e+12 1.8080064e+12 7.6032e+09
    1.7961696e+12 1.809216e+12 1.30464e+10
    1.7974656e+12 1.8075744e+12 1.01088e+10
    1.8037728e+12 1.8112896e+12 7.5168e+09
    1.8055872e+12 1.813104e+12 7.5168e+09
    1.8240768e+12 1.8279648e+12 3.888e+09
    1.8146592e+12 1.8162144e+12 1.5552e+09
    1.835136e+12 1.8379872e+12 2.8512e+09
    1.8132768e+12 1.8257184e+12 1.24416e+10
    1.8277056e+12 1.8421344e+12 1.44288e+10
    1.8245952e+12 1.8309888e+12 6.3936e+09
    1.8149184e+12 1.8367776e+12 2.18592e+10
    1.8559584e+12 1.8372096e+12 -1.87488e+10
    1.828224e+12 1.8328896e+12 4.6656e+09
    1.8331488e+12 1.839456e+12 6.3072e+09
    1.8254592e+12 1.8304704e+12 5.0112e+09
    1.8379872e+12 1.8404064e+12 2.4192e+09
    1.842048e+12 1.845504e+12 3.456e+09
    1.53144e+12 1.8268416e+12 2.954016e+11
    1.82088e+12 1.8225216e+12 1.6416e+09
    1.7959968e+12 1.823904e+12 2.79072e+10
    1.8236448e+12 1.8262368e+12 2.592e+09
    1.8155232e+12 1.8376416e+12 2.21184e+10
    1.8337536e+12 1.8395424e+12 5.7888e+09
    1.8304704e+12 1.8304704e+12 0
    1.8367776e+12 1.8396288e+12 2.8512e+09
    1.8333216e+12 1.8340128e+12 6.912e+08
    1.8347904e+12 1.8359136e+12 1.1232e+09
    1.8257184e+12 1.8360864e+12 1.0368e+10
    1.8604512e+12 1.8331488e+12 -2.73024e+10
    1.8197568e+12 1.8216576e+12 1.9008e+09
    1.8408384e+12 1.8427392e+12 1.9008e+09
    1.8213984e+12 1.824336e+12 2.9376e+09
    1.8326304e+12 1.8360864e+12 3.456e+09
    1.8007488e+12 1.8197568e+12 1.9008e+10
    1.8065376e+12 1.8136224e+12 7.0848e+09
    1.8198432e+12 1.8275328e+12 7.6896e+09
    1.8236448e+12 1.8361728e+12 1.2528e+10
    1.8320256e+12 1.833408e+12 1.3824e+09
    1.8248544e+12 1.8276192e+12 2.7648e+09
    1.7908992e+12 1.8250272e+12 3.4128e+10
    1.8207936e+12 1.8313344e+12 1.05408e+10
    1.8313344e+12 1.8373824e+12 6.048e+09
    1.8108576e+12 1.836432e+12 2.55744e+10
    1.8330624e+12 1.8418752e+12 8.8128e+09
    1.8203616e+12 1.8231264e+12 2.7648e+09
    1.8280512e+12 1.836e+12 7.9488e+09
    1.8417024e+12 1.84464e+12 2.9376e+09
    1.8294336e+12 1.8408384e+12 1.14048e+10
    1.8423072e+12 1.845072e+12 2.7648e+09
    1.8239904e+12 1.8245088e+12 5.184e+08
    1.832112e+12 1.8328896e+12 7.776e+08
    1.8447264e+12 1.845504e+12 7.776e+08
    1.8132768e+12 1.814832e+12 1.5552e+09
    1.8263232e+12 1.831248e+12 4.9248e+09
    1.8319392e+12 1.8376416e+12 5.7024e+09
    1.8160416e+12 1.8245088e+12 8.4672e+09
    1.7588448e+12 1.8372096e+12 7.83648e+10
    1.832976e+12 1.837728e+12 4.752e+09
    1.8241632e+12 1.8245952e+12 4.32e+08
    1.8238176e+12 1.845072e+12 2.12544e+10
    1.8177696e+12 1.8414432e+12 2.36736e+10
    1.84248e+12 1.8428256e+12 3.456e+08
    1.8393696e+12 1.843776e+12 4.4064e+09
    1.8167328e+12 1.8251136e+12 8.3808e+09
    1.8209664e+12 1.8252e+12 4.2336e+09
    1.7991936e+12 1.8336672e+12 3.44736e+10
    1.806624e+12 1.8220896e+12 1.54656e+10
    1.8373824e+12 1.8440352e+12 6.6528e+09
    1.830384e+12 1.841616e+12 1.1232e+10
    1.8169056e+12 1.8261504e+12 9.2448e+09
    1.824336e+12 1.8252e+12 8.64e+08
    1.839888e+12 1.8426528e+12 2.7648e+09
    1.8408384e+12 1.8448992e+12 4.0608e+09
    1.8144864e+12 1.83168e+12 1.71936e+10
    1.8185472e+12 1.8209664e+12 2.4192e+09
    1.803168e+12 1.8176832e+12 1.45152e+10
    1.8068832e+12 1.8228672e+12 1.5984e+10
    1.8229536e+12 1.8251136e+12 2.16e+09
    1.823904e+12 1.8335808e+12 9.6768e+09
    1.8250272e+12 1.8252e+12 1.728e+08
    1.8169056e+12 1.8185472e+12 1.6416e+09
    1.8289152e+12 1.8292608e+12 3.456e+08
    1.8379008e+12 1.8391104e+12 1.2096e+09
    1.8259776e+12 1.8330624e+12 7.0848e+09
    1.8318528e+12 1.8413568e+12 9.504e+09
    1.8244224e+12 1.8245088e+12 8.64e+07
    format %tc concuss_onset
    format %tc episode_start_dt

  • #2
    Here's code that should do what you asked.
    . generate con = dofc(concuss_onset)
    . generate epi = dofc(episode_start_dt)
    . format %td con epi
    . replace day_since_con = epi-con
    (99 real changes made)
    . list in 1/5, clean abbreviate(16)
                concuss_onset     episode_start_dt   day_since_con         con         epi  
      1.   04jul2016 00:00:00   13apr2017 00:00:00             283   04jul2016   13apr2017  
      2.   05jul2014 00:00:00   22may2017 00:00:00            1052   05jul2014   22may2017  
      3.   03apr2017 00:00:00   14apr2017 00:00:00              11   03apr2017   14apr2017  
      4.   17dec2016 00:00:00   29mar2017 00:00:00             102   17dec2016   29mar2017  
      5.   14feb2017 00:00:00   13jun2017 00:00:00             119   14feb2017   13jun2017
    With that said, Stata's "date and time" variables are complicated and there is a lot to learn. If you have not already read the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF, do so now. If you have, it's time for a refresher. After that, the help datetime documentation will usually be enough to point the way. You can't remember everything; even the most experienced users end up referring to the help datetime documentation or back to the manual for details. But at least you will get a good understanding of the basics and the underlying principles. An investment of time that will be amply repaid.

    All Stata manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.


    • #3
      Your "dates" do, indeed, have the display format 13dec2018 00:00:00. But, from Stata's perspective, they are not dates, they are times, and times are measured in milliseconds. That's why you're getting what you see. In the example data you show, all of the hours minutes and seconds shown are 0. If that is representative of your data then, somehow, when your data set was created, these "dates" were converted to times with 00:00:00 tacked on. Or perhaps in your actual data, you do have actual precise time of day recorded.

      Anyway, there are two approaches you can take to get what you want. If your actual data include precise time of day, these will produce different results, so you need to think about which you want.

      First approach: extract the dates from the variables you have and then subtract the dates.
      gen int concuss_onset_date = dofc(concuss_onset)
      gen int episode_start_date = dofc(episode_start_dt)
      format *_date %td
      gen days_since_con = episode_start_date - concuss_onset_date
      This will always give an integer result. So even if episode_start_dt is today at 00:00:01 AM and concuss_onset_date is last night at 11:59:59PM, you will get a result of 1 day, even though the total elapsed time is just 2 seconds.

      The other approach is to just do the calculation the way you started it and rescale it by the number of milliseconds in a day.
      gen days_since_con = (episode_start_dt - concuss_onset)/msofhours(24)
      In the situation I mentioned just above, this will give you an answer of about 0.00002315, which is the fraction of a day that 2 seconds represents.

      (If all of your times are really just dates with 00:00:00 tacked on, both methods will give the same results.)

      By the way, are you aware that you have some observations where concus_onset precedes episode_start_dt? Would those be data errors, or can that happen under your definitions of these variables?

      Added: Crossed with #2, where my first method is also given. William Lisowski's advice for learning about Stata date and time variables is excellent, and I heartily endorse what he recommends there.


      • #4
        After two thorough explanations of what to do, there is still scope for a footnote explaining why the approach in #1 did not work -- and how to realise why it was wrong. The problem is not absence of luck, but guessing wildly.

        Focus on

        quietly gen double con_on_dt = date(concuss_onset, "YMD")
        There are three related problems here. Srudy of the help for date() -- which you can fire up directly with

        help date()
        --- shows that

        1. date() is for producing daily dates. But the difference between dates is not itself a date, just as the difference between 14 December 2018 and 13 December 2018 is not a date with magnitude 1 but a duration or elapsed time with length 1 day.

        2. date() expects string input but you are feeding it a numeric variable.

        3. The second argument of date() -- in your case "YMD" -- describes the input to the function. It doesn't describe or prescribe what you want as output from the function. But a string such as "2018Dec14" does match that pattern.

        I often check date solutions with one-off worked examples pushed through display.

        Here we see how date() works. I push a string through it and get out a numeric date. Stata's origin for daily dates is 1 January 1960 but insisting on a daily date display format shows that Stata understood me correctly.

        . display date("2018Dec14", "YMD")
        . display %td  date("2018Dec14", "YMD")
        Usually we push string variables through date(), not individual dates one at a time.

        A secondary detail here is that daily() is the same function as date(), but in my view preferable as making explicit what the function does.


        • #5
          All, thanks so much for the help, excellent advice and education. I will spend more time to learn about dates. Thanks. Clyde, yes, I am aware of those 2 cases and will exclude them from the analysis. Much appreciated.