Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Setting up Date and Time for time series.

    Dear all,

    I have a dataset (in .xlsx) with market data on a daily basis for seven years. For each day, I have 24 observations, i.e. one per hour. The dataset consists of the following columns: "Date - Hour - Price - Volume". I would like to run rolling regression to create expected price estimates using previous observations only. Unfortunately, I can't set up "Date" as time series variable using "tsset" because there are more than one observation for each day.

    I was therefore thinking to create a datetime variable that can be used as timeseries variable, but I seem to have several problems with the compatibility between Stata and Excel. If I create the datetime variable directly in Excel then Stata suddenly changes some of the times that were e.g. "02:00:00" in Excel into "01:59:59" in Stata after importing the dataset. Apart from it not looking very nice, it also prevents me to apply "tsset" as it no longer seems to be an integer.
    A similar issue arises if I try to create the variable in Stata. For some reason, the "Hour" observation that was simply "00:00" in Excel is now already a datetime variable in Stata and reads "31dec1899 00:00". Combining "Hour" and "Date" does then no longer work and I also didn't manage to find out how to isolate the time from the "Hour" column while keeping it in the %tc format.

    Hence, I would be very grateful for some help on how to circumvent these issues and how to set up a datetime variable that can be used with tsset.

    I have attached an extract of the dataset I am using.

    Thank you for your help in advance!
    Attached Files

  • #2
    Stefan,

    You're on the right track, but you should import the Excel date as a string and then convert it to a Stata date time variable (using clock()) with a format of %tc. The key is to declare the new variable as double to maintain the required precision since datetime variables can get very large (they represent the number of millliseconds since 1/1/1960 at midnight).

    Code:
    gen double datetime=clock(exceldate,"MDYHM")    // Adjust to suit your situation
    Regards,
    Joe

    Comment


    • #3
      Dear Joe,

      thanks for your comment. Unfortunately, this still seems not to work (see the attached screenshot). It seems that already when importing the data, Stata modifies what should be "01:00:00" into "00:59:59". I have used the following code

      Code:
      gen Datetime2 = clock(Datetime, "DMYhms")
      format Datetime2 %tc
      but the outcome is even worse and the times are further "away" from the full hours.

      Alternatively, I tried to create the datetime variable in Stata but as soon as I convert the "Hour" variable, Stata adds the date 1/1/1960 in front of it...

      Thanks again

      Comment


      • #4
        Stefan,

        Sorry, I was in a hurry when I first answered your question and didn't look at other possibilities. There is a way to import all Excel data as strings, but that applies to all columns, which is probably not that helpful.

        The precision disconnect between Excel and Stata is somewhat complicated and has been documented here previously, but the solution in your case is straightforward because you have the date and time stored separately in your data set and these can be combined to get the correct datetime value. Try something like this:

        Code:
        gen double Datetime1=cofd(Date)+Hour
        format Datetime1 %tc
        The cofd() function converts the date to a datetime variable with an implied time of midnight, which means that you can just add the time to get the correct datetime value.

        Regards,
        Joe

        Comment


        • #5
          Joe,

          thanks for your help again. This still doesn't seem to work as Stata doesn't create the correct date in the Datetime1 variable. Actually, the problem seems to be with the Hour variable that is not stored as time only but already as a datetime variable with the date 31dec1899 for all observations. Hence, when I add the Date and Hour variables, the date in the new Datetime1 variable is not 12jul2007 but 12jul1947.

          Nevertheless, after endless trying around with the formating, I have managed to get the result I was looking for, i.e. that the Datetime variable reads "12jul2007 00:00:00" (as an example for the first line). Here's the code I used (there's probably a more elegant way to do it, but at least it works like that):

          Code:
          format Date %20.0g
          gen Hour1 = hh(Hour)
          gen Minutes1 = mm(Hour)
          gen Second1 =ss(Hour)
          gen Time = hms(Hour1, Minutes1, Second1)
          format Time %20.0g
          format Time %tc
          gen Time1 = dofc(Time)
          gen double Datetime = cofd(Date) + Time1
          format Datetime %20.0g
          format Datetime %tc
          gen double Datetime2 = Datetime + Time
          format Datetime2 %20.0g
          format Datetime2 %tc
          drop Datetime Time Time1 Hour1 Minutes1 Second1 Hour
          rename Datetime2 Datetime
          format Date %td

          Comment


          • #6
            Stefan,

            Sorry, I didn't notice the problem with the time variable being 12/31/1899. In case you want a more efficient solution, try:

            Code:
            gen double Datetime1=cofd(Date)+(Hour-cofd(mdy(12,31,1899)))
            This assumes that the date part of the Hour variable is always 12/31/1899. A more generic solution would be:

            Code:
            gen double Datetime1=cofd(Date)+(Hour-cofd(dofc(Hour)))
            dofc(Hour) finds the date part of the Hour variable. This is then converted to a datetime variable and subtracted from Hour.

            Regards,
            Joe

            Comment


            • #7
              Cheers! This works fine!

              Comment


              • #8
                Can somebody explain how the tsset will work once the DateTime variable is set?

                In addition to running a regression model, I want to use the corrgram function to check for autocorrelation in my time series (which has both time and date variables and I have managed to create the Datetime variable). I used the following codes

                format Date %20.0g
                gen Time1 = hh(Time)
                gen Minutes1 = mm(Time)
                gen Second1 =ss(Time)
                gen TIME = hms(Time1, Minutes1, Second1)
                format TIME %20.0g
                format TIME %tc
                gen Time2 = dofc(TIME)
                gen double Datetime = cofd(Date) + Time2
                format Datetime %20.0g
                format Datetime %tc
                gen double Datetime2 = Datetime + TIME
                format Datetime2 %20.0g
                format Datetime2 %tc
                drop Datetime TIME Time2 Time1 Minutes1 Second1 Time
                rename Datetime2 Datetime
                format Date %td
                tsset Datetime:

                Next, I ran the corrgram code as follows:

                corrgram Consumption, lags(13) with (13) being just a random number. However, I am getting the following output as shown in the attached picture. I am sure this is quite correct and there must be something I am doing wrong here. Can anyone please tell what it is?

                I hope you will not mind my question as I am still a beginner with STATA.


                P.S. The dataset has also been attached with this post.

                Comment


                • #9
                  It's hard to be sure without seeing any of your data, and I also have only occasionally used corrgram myself. That said, when you -tsset Datetime- without specifying any unit options, Stata checks the display format for Datetime, finds it is %tc, and therefore uses milliseconds as the unit of time. So you are asking corrgram to show you the correlations between Consumption and its lags at 1 through 13 milliseconds. I suspect that isn't what you want.

                  By the way, please don't post screen shots to show Stata output. Often they are completely unreadable. Even when, as in this case, they can be read, they block the rest of the screen when you open them, and they do not support copy/paste operations. The best way to show Stata output is to copy it directly from the Results window or from your output log, and paste it into a code block on this Forum. (See FAQ if you aren't sure how to set up a code block.)
                  Last edited by Clyde Schechter; 09 Dec 2015, 16:35.

                  Comment


                  • #10
                    I will ensure to put the output log rather than image next time.

                    Also, I think the data did not post properly but I have attached it again with this post.


                    Even if I specify tsset Datetime with format(%td), format(%tm) or any other period, the answer is the same (as shown in the image).
                    Attached Files

                    Comment


                    • #11
                      I think I didn't make my point clearly. In fact, by focusing on the display format, I probably misled you. Changing the display format won't help--it doesn't change any of the data that Stata works with. The point I should have made is that your data are denominated in milliseconds, because Datetime comes from the clock() function. I haven't seen any of your data, but I suspect that you do not have observations spaced 1 millisecond apart. Let's say your observations are a second apart. Then you need to specify the -delta(1000)- option in your -tsset- command, because 1 second = 1000 milliseconds. -tsset- uses the delta() option to decide when observations have gaps. If your observations are a minute apart, it would be -delta(60000)-, etc. If your observations are separated by days or larger units, then the simpler approach would be to generate a new variable that is measured in days (or quarters, or whatever) using functions like dofc() or mofd(dofc()), etc. Without knowing about your data, I can't be more specific than that.

                      Another by the way. Excel files are not a good idea to post either. Some of the more frequent responders on this Forum don't use Microsoft Office products. I do, but like many others, I will not download an Office file from a stranger, because they can contain malware. So, again, to show code and output, copy directly from the do-file or the Results window or log file. To show data, use the -dataex- program (-ssc install dataex- if you don't already have it) to generate a code block that others can run to replicate a sample of your data.

                      Comment


                      • #12
                        Thanks! I will keep that in mind next time.

                        I have attached the Do File with this post. Also, I have installed the dataex program but I am unsure how it is executed.
                        Attached Files
                        Last edited by Anas Irfan; 09 Dec 2015, 18:35.

                        Comment


                        • #13
                          These are the first 48 rows from the data:

                          Date Cons Temp Time Date_F Time_F
                          1/1/1999 6659 27.19767 1:00 1-Jan-99 1:00:00
                          1/1/1999 6625 27.09518 2:00 1-Jan-99 2:00:00
                          1/1/1999 6561 26.18967 3:00 1-Jan-99 3:00:00
                          1/1/1999 6560 25.95187 4:00 1-Jan-99 4:00:00
                          1/1/1999 6638 25.91269 5:00 1-Jan-99 5:00:00
                          1/1/1999 6792 25.66688 6:00 1-Jan-99 6:00:00
                          1/1/1999 7027 25.52356 7:00 1-Jan-99 7:00:00
                          1/1/1999 7214 25.42748 8:00 1-Jan-99 8:00:00
                          1/1/1999 7329 24.76137 9:00 1-Jan-99 9:00:00
                          1/1/1999 7104 31.28803 10:00 1-Jan-99 10:00:00
                          1/1/1999 6802 37.64661 11:00 1-Jan-99 11:00:00
                          1/1/1999 6514 40.51292 12:00 1-Jan-99 12:00:00
                          1/1/1999 6087 44.21983 13:00 1-Jan-99 13:00:00
                          1/1/1999 5748 46.42479 14:00 1-Jan-99 14:00:00
                          1/1/1999 5479 47.71783 15:00 1-Jan-99 15:00:00
                          1/1/1999 5428 46.79557 16:00 1-Jan-99 16:00:00
                          1/1/1999 5678 46.09664 17:00 1-Jan-99 17:00:00
                          1/1/1999 6339 43.94693 18:00 1-Jan-99 18:00:00
                          1/1/1999 6667 40.62109 19:00 1-Jan-99 19:00:00
                          1/1/1999 6684 39.38335 20:00 1-Jan-99 20:00:00
                          1/1/1999 6663 38.09031 21:00 1-Jan-99 21:00:00
                          1/1/1999 6614 36.58195 22:00 1-Jan-99 22:00:00
                          1/1/1999 6457 35.14559 23:00 1-Jan-99 23:00:00
                          1/1/1999 6231 32.35225 0:00 2-Jan-99 0:00:00
                          1/2/1999 6009 33.60674 1:00 2-Jan-99 1:00:00
                          1/2/1999 5952 33.36093 2:00 2-Jan-99 2:00:00
                          1/2/1999 5961 32.21762 3:00 2-Jan-99 3:00:00
                          1/2/1999 6018 31.36095 4:00 2-Jan-99 4:00:00
                          1/2/1999 6187 30.46345 5:00 2-Jan-99 5:00:00
                          1/2/1999 6411 30.36096 6:00 2-Jan-99 6:00:00
                          1/2/1999 6724 29.69485 7:00 2-Jan-99 7:00:00
                          1/2/1999 7045 29.88541 8:00 2-Jan-99 8:00:00
                          1/2/1999 7463 29.74209 9:00 2-Jan-99 9:00:00
                          1/2/1999 7810 30.7749 10:00 2-Jan-99 10:00:00
                          1/2/1999 7854 31.76688 11:00 2-Jan-99 11:00:00
                          1/2/1999 7760 32.85494 12:00 2-Jan-99 12:00:00
                          1/2/1999 7614 33.99024 13:00 2-Jan-99 13:00:00
                          1/2/1999 7387 35.06223 14:00 2-Jan-99 14:00:00
                          1/2/1999 7244 35.10947 15:00 2-Jan-99 15:00:00
                          1/2/1999 7185 35.30003 16:00 2-Jan-99 16:00:00
                          1/2/1999 7304 35.49059 17:00 2-Jan-99 17:00:00
                          1/2/1999 7592 35.87171 18:00 2-Jan-99 18:00:00
                          1/2/1999 7558 36.64837 19:00 2-Jan-99 19:00:00
                          1/2/1999 7316 39.27533 20:00 2-Jan-99 20:00:00
                          1/2/1999 7025 40.132 21:00 2-Jan-99 21:00:00
                          1/2/1999 6585 41.40256 22:00 2-Jan-99 22:00:00
                          1/2/1999 6092 42.49062 23:00 2-Jan-99 23:00:00
                          1/2/1999 5566 44.65868 0:00 3-Jan-99 0:00:00
                          The data goes the same way until December 1999.

                          As you can see, there are 24 hourly observations for each of the two dates 1/1/1999 & 1/2/1999.

                          I will try to see how dataex works and, if possible, post the results here.
                          Last edited by Anas Irfan; 09 Dec 2015, 19:19.

                          Comment


                          • #14
                            So when you have your Datetime variable, you need to do -tsset Datetime, delta(3600000)- The magic number 3600000 is the number of milliseconds in an hour (= 1000*60*60). After that, corrgram and all other ts-dependent commands/operators should work the way you expect them to.

                            Comment


                            • #15
                              Dear Anna,
                              For your data, a very simple way is to generate a new variable "newh" which starts at the beginning of your date (01-jan-1999 at 01:00) and then add, for each observation( _n), one extra hour (3600000 mil seconds in 1 hour). Then use the delta (1 hour) option. Before you do this make sure you have sorted your data by date and hour (sort Date Time)

                              sort Date Time
                              gen double newh = tc(01jan1999 01:00) + 3600000*(_n-1)
                              format newh %tc
                              tsset newh, delta(1 hour)

                              Comment

                              Working...
                              X