Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issue with gaps in time series

    Hi!
    I am currently writing my master thesis about volatility spillover between cryptocurrencies and the S&P 500 and I am testing my data for unit root in stata.

    The problem I am having is that the data I have operate with trading dates, therefore, when I tsset there are gaps (weekends, non-trading days). This results in a wrong representation in my ADF test.

    Is there anyway to change the dates in a way that will let stata read them as consecutive dates without gaps?

    If any additional information is needed, please don't hesitate to ask!

    Thank you in advance!

  • #2
    -help datetime business calendars- Near the top of that window you will find a blue link to the complete PDF manual entry: read that.

    Comment


    • #3
      This results in a wrong representation in my ADF test.
      What does this mean?

      Is there anyway to change the dates in a way that will let stata read them as consecutive dates without gaps?
      Yes, but I don't think it's necessary.

      Code:
      clear
      
      local v = "SP500"
      !curl -L https://fred.stlouisfed.org/series/`v'/downloaddata/`v'.csv > "`v'.csv"
      insheet using "`v'.csv", comma clear
      erase "`v'.csv"
      
      gen day = date(date, "YMD")
      format day %td
      
      tsset day, daily 
      
      dfuller value ,  lags(3) trend regress
      
       tsset day, daily 
              time variable:  day, 07apr2010 to 06apr2020, but with gaps
                      delta:  1 day
      
       
       dfuller value ,  lags(3) trend regress
      
      Augmented Dickey-Fuller test for unit root         Number of obs   =       431
      
                                     ---------- Interpolated Dickey-Fuller ---------
                        Test         1% Critical       5% Critical      10% Critical
                     Statistic           Value             Value             Value
      ------------------------------------------------------------------------------
       Z(t)              1.244            -3.983            -3.423            -3.130
      ------------------------------------------------------------------------------
      MacKinnon approximate p-value for Z(t) = 1.0000
      
      ------------------------------------------------------------------------------
      D.value      |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
             value |
               L1. |   .0109401    .008794     1.24   0.214     -.006345    .0282252
               LD. |   -.273296   .0413585    -6.61   0.000    -.3545887   -.1920034
              L2D. |  -.1110031   .0492032    -2.26   0.025     -.207715   -.0142911
              L3D. |   .0828363   .0465588     1.78   0.076    -.0086779    .1743505
            _trend |  -.0067451   .0049875    -1.35   0.177    -.0165484    .0030582
             _cons |  -8.987857    8.97907    -1.00   0.317    -26.63677    8.661057
      ------------------------------------------------------------------------------

      Comment


      • #4
        Thank you for the response Clyde Schechter, I will read that.

        Comment


        • #5


          What does this mean?

          Yes sorry, I ran an ADF test in Matlab as well, finding the optimal lags. However, the amount of optimal lags exceeds the 3 lags.




          Comment


          • #6
            A quick fix is to first make sure that your data is sorted by the date variable (otherwise first sort it), then create a variable (let's say time) by
            gen time = _n.
            This will create a consecutively number variable which you can then use as your date variable for the dickey fuller test.
            Meanwhile you can read the Stata manual pdf on business calendars which can be very useful if you want the actual dates and not just numbers.

            Comment


            • #7
              Thank you Eric de Souza, the quick fix does give an efficient way to work around the gaps.
              Yes, I will indeed take a look at the Stata manual pdf for actual dates as well!

              Comment


              • #8
                There is one potential problem with the approach in #6. If your data actually contains all the consecutive trading days, then it will work fine. But suppose there is an error in your data set and there are some missing trading days. Then the method in #6 will not discover this: it just papers over any skipped calendar days and assigns consecutive numbers. By contrast, the business calendar assigns consecutive business dates to actual business days, and if you are missing any, the -tsset- command will recognize those gaps and alert you to the problem.

                In my experience most data sets have problems, and to prevent those problems from leading you to incorrect analyses, it is important to use commands that will choke on those problems as early as possible. My first law of data analysis is: "Never trust anybody else's data," and my second law is "Never trust your own data." My third law is "Fail early and often."

                Applying a business calendar to your date variable is safer than just creating a consecutively numbered variable: it will find an alert you to this kind of problem if you have it.

                Comment

                Working...
                X