Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • High frequency data with five minute interval

    Hi,

    I have high frequency data and I want to convert a five-minute interval in the data to run the Fama-MacBeth regressions. The example of data is as follows:

    Click image for larger version

Name:	img.PNG
Views:	1
Size:	32.9 KB
ID:	1353497


    and this

    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	30.7 KB
ID:	1353498


    I want to create a five minute interval on the data i.e. the command that I would able to get the average between the last five minutes in the next cell. More precisely, the average between 09:00:00 to 09:05:00 and then from latter point to 09:10:00 and so on.

    I have tried egen but unable to figure out specific command and therefore seeking your guidance.

    Thank you very much for your time.
    Best regards,



  • #2
    Please read the FAQ Advice, specifically http://www.statalist.org/forums/help#stata, on how screenshots are of limited use compared with explicit data examples.

    I guess from what you show that you have a daily date variable and a time of day variable with units milliseconds. I am guessing because you don't give information explicitly. With this arrangement you can proceed but at some point you will probably need to combine those variables.

    You can round the time of day in 5 minute intervals with

    Code:
    gen double tid2 = 3e5 * floor(tid/3e5)
    where 3e5 = 300000 is the number of milliseconds in 5 minutes. Then collapse ... , by(date tid2)

    Comment


    • #3
      I apologize for inconvenience. I am using Stata 14.0. The problem is that I was unable to get the millisecond data and have the closest second data.

      I am able to convert it to tid2 using the code you sent. When I used collapse, the remaining data disappear. I want to generate another variable which is five minutes ahead of the prior five minute observation i.e. if I have an observation at 09:00:00, the next observation should be 09:05:00. If I can able to generate a dummy variable then I can able to generate much detail analysis.

      I am extremely thankful for your valuable time and suggestion.

      Comment


      • #4
        Sorry, same advice as in #2. Please read the FAQ Advice, especially #12.

        Your question has this form: Code I am not showing you applied to data I am not showing you cause my data to disappear. I can't work out why that is happening on that little information.

        Comment


        • #5
          I am really sorry for inconvenience. I have generated a random data set similar to the original using the following functions.
          • set obs 10000
          • gen r = rnormal()
          • gen double time=Cmdyhms(10,15,2014,12,10,15)+(_n*1000)
          • format time %tc
          The last command is used to increase the time by one sec.

          What I want to do is to create a dummy variable that would have the value of 1 for every five minute in the data set. For example, the time starts as "15Oct2014 12:10:15" so the dummy variable would have the value of 1 when the time is "15Oct2014 12:15:15". I tried using the following way:

          gen d = 1 if time == Cmdyhms(10,15,2014,12,10,15)+(_n*50000)

          However, the outcome is the missing values. I tried different alteration to the formula but none is working.

          Thank you very much in advance and I really apologize for earlier inconvenience.
          Last edited by Muhammad Yahya; 18 Aug 2016, 08:26.

          Comment


          • #6
            First, your code to generate the random data is not quite correct. The final command should be -format time %tC-. Since you used the Cmdyhms function to generate times, those times include leap seconds. If you look at your data you will see that Cmdyhms(10,15,2014,12,10,15) does not display as 15oct2014 12:10:15 with the %tc format, it comes out as 15oct2014 12:10:40. To get the correct display you need the %tC format.

            Your -gen d- command will indeed produce only missing values. You are testing whether time == Cmdy... + _n*50000. But you created time to be Cmdy + _n*1000. The only way they will be equal is if _n*50000 = _n*1000, which only happens if _n = 0. But _n, of course, is never zero. What I believe you want is to set d = 1 if the difference between time and your base value of 15oct2014 12:10:15 is a multiple of five minutes. And indicator variables are typically more easily used in Stata if they are set to zero, not missing, in the contrary setting. So try this:

            Code:
            set obs 10000
            gen r = rnormal()
            gen double time = Cmdyhms(10,15,2014,12,10,15) + _n*1000
            format time %tC
            
            local 5min = 5 * 60 * 1000 // ms IN 5 MINUTES
            
            gen byte d = mod(time-tC(15oct2014 12:10:15), `5min') == 0

            Comment


            • #7
              Thank you very very much, sir. This solved the problem.

              Comment

              Working...
              X