Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Round up Date and Time

    Hi

    My dataset contains the close prices of some financial assets. It is collected at 30mins interval. It looks like this
    Code:
    clear
    input str10(Date Time) float time double(BNB BTC XRP ETH SOL)
    "12/10/2021" "22:00" 1.9547927e+12 579.2999877929688  48297.2 .8332849740982056 4141.34 175.75999450683594
    "12/10/2021" "22:30" 1.9547946e+12 577.6799926757813 48397.95 .8348299860954285 4205.27  177.3300018310547
    "12/10/2021" "23:00" 1.9547964e+12 570.0800170898438 47984.15 .8238450288772583 4178.57 173.55999755859375
    "12/10/2021" "23:30" 1.9547982e+12  562.510009765625 47695.42 .8194249868392944 4032.24  170.9600067138672
    "12/11/2021" "0:00"     1.9548e+12   561.22998046875 47554.36 .8165000081062317 4069.81  171.2100067138672
    end
    format %tc time
    The column time was created from the columns Date and Time by the following codes
    Code:
    gen time = clock(Date + " " + Time,"MDYhm")
    It returns the exact time (at milliseconds) where the price of the nearest previous transaction is recorded. I know that it is approximately on 30 min interval, but I do not want my data look like this. So I decided to round up my data to the 30 mins interval using the following codes
    Code:
    replace time = round(time/(30*60*1000),1)*30*60*1000
    but Stata says that no real changes made. Theoretically, it will round up all the values of the time column to the nearest 30mins interval. I also used the display function as follow
    Code:
    di %tc  round(time[1]/(30*60*1000),1)*30*60*1000
    to check if my previous code works but this time, Stata returns the correct rounded-up time.
    I don't know what make the difference in the results between two lines of code, why the replace function did not work as I expected?

    Thank you for your supports.

  • #2
    First of all, date-times should always be held as doubles. See

    Code:
    help datetime 
    where this point is made again and again.

    If you do that, then you'll see that using the right storage type removes your problem, which is that you think that you need to round.


    Code:
    clear
    input str10(Date Time) float time
    "12/10/2021" "22:00" 1.9547927e+12
    "12/10/2021" "22:30" 1.9547946e+12
    "12/10/2021" "23:00" 1.9547964e+12
    "12/10/2021" "23:30" 1.9547982e+12  
    "12/11/2021" "0:00"     1.9548e+12  
    end
    format %tc time
    
    gen double TIME = clock(Date + Time, "MDY hm")
    format %tc TIME
    
    list
    
        +--------------------------------------------------------------+
         |       Date    Time                 time                 TIME |
         |--------------------------------------------------------------|
      1. | 12/10/2021   22:00   10dec2021 21:58:54   10dec2021 22:00:00 |
      2. | 12/10/2021   22:30   10dec2021 22:29:29   10dec2021 22:30:00 |
      3. | 12/10/2021   23:00   10dec2021 23:00:04   10dec2021 23:00:00 |
      4. | 12/10/2021   23:30   10dec2021 23:30:39   10dec2021 23:30:00 |
      5. | 12/11/2021    0:00   10dec2021 23:59:03   11dec2021 00:00:00 |
         +--------------------------------------------------------------+
    The reason your rounding made no change is that your date-time values are already as close as they can be -- within floats -- to multiples of half-hours, ideally held as multiples of 1.8 million milliseconds.

    It can be illuminating to look at date-time values in hexadecimal format and also as exact integers Your float values are nice round numbers (in binary) and are held (in binary) as multiples of 1,835,008. The problem is that you need a double to hold them as multiples of 1,800,000.

    Code:
    . format time TIME %21x
    
    . list time TIME
    
         +-----------------------------------------------+
         |                  time                    TIME |
         |-----------------------------------------------|
      1. | +1.c722ba0000000X+028   +1.c722baff00000X+028 |
      2. | +1.c722d60000000X+028   +1.c722d67640000X+028 |
      3. | +1.c722f20000000X+028   +1.c722f1ed80000X+028 |
      4. | +1.c7230e0000000X+028   +1.c7230d64c0000X+028 |
      5. | +1.c723280000000X+028   +1.c72328dc00000X+028 |
         +-----------------------------------------------+
    
    . format time TIME %21.0f
    
    . list time TIME
    
         +----------------------------------------------------+
         |       Date    Time            time            TIME |
         |----------------------------------------------------|
      1. | 12/10/2021   22:00   1954792734720   1954792800000 |
      2. | 12/10/2021   22:30   1954794569728   1954794600000 |
      3. | 12/10/2021   23:00   1954796404736   1954796400000 |
      4. | 12/10/2021   23:30   1954798239744   1954798200000 |
      5. | 12/11/2021    0:00   1954799943680   1954800000000 |
         +----------------------------------------------------+
    
    . di time[2] - time[1]
    1835008
    
    . di %21x time[2] - time[1]
    +1.c000000000000X+014
    
    . inbase 2 1835008
    111000000000000000000

    There may be a language detail here as well. The job of round() is not to round up at all (meaning that values never decrease) but to round to the nearest multiple of a specified constant (or of 1 if no second argument is given). (meaning that values may decrease or increase). Otherwise put, in mathematical English the word up is literal here.

    More detail: display and replace are commands, not functions. In Stata, function is not another word for command.
    Last edited by Nick Cox; 04 Jan 2023, 02:30.

    Comment


    • #3
      Yeah, it solved my problem and it is pretty cool.

      I have never thought the storage type of data is that important before today. I have never specified the storage type of data when generating a new variables because I think Stata will automatically choose the most suitable storage type with respect to the data I input (I saw Stata did it so many times when I merged two data sets).

      Anyway, thank you for your help.



      Comment


      • #4
        Stata will often do what you need over storage types. The problem is that when you type


        Code:
        gen time =
        there is no intelligence inside Stata to appreciate that you need a double given what follows. Stata is here checking for legal syntax, not for whether your idea is good.

        After all, it would be equally legal to type

        Code:
        gen nguyen = 
        and the same content could follow. But see https://www.statalist.org/forums/for...date-variables for a different approach.

        Comment

        Working...
        X