Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mask for timestamp

    timestamp price contracts
    20210104 10:08:34:950 29.16 8748
    20210104 10:08:38:067 29.16 8748
    20210104 10:08:38:067 29.17 14585
    20210104 10:08:38:067 29.17 5834
    20210104 10:08:46:263 29.17 2917
    20210104 10:09:02:783 29.15 58300
    20210104 10:09:03:813 29.15 8745
    20210104 10:09:16:277 29.17 2917
    20210104 10:09:16:277 29.17 5834
    20210104 10:09:16:277 29.17 2917
    20210104 10:09:16:607 29.18 8754
    20210104 10:09:19:010 29.18 5836


    Dears, i have a file with high frequency data, with the layout above with 8mm+ observations.
    Timestamp is a string field in the format YYYYMMDD HHMMSSmmm, while price and contracts are float.
    I would like to set a mask for timestamp and then collapse price and contracts by means, every change of minute.
    I appreciate some help with the code.
    Thanks
    Eli Hadad Junior



  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str21 timestamp float price long contracts
    "20210104 10:08:34:950" 29.16  8748
    "20210104 10:08:38:067" 29.16  8748
    "20210104 10:08:38:067" 29.17 14585
    "20210104 10:08:38:067" 29.17  5834
    "20210104 10:08:46:263" 29.17  2917
    "20210104 10:09:02:783" 29.15 58300
    "20210104 10:09:03:813" 29.15  8745
    "20210104 10:09:16:277" 29.17  2917
    "20210104 10:09:16:277" 29.17  5834
    "20210104 10:09:16:277" 29.17  2917
    "20210104 10:09:16:607" 29.18  8754
    "20210104 10:09:19:010" 29.18  5836
    end
    
    replace timestamp = reverse(subinstr(reverse(timestamp), ":", ".", 1))
    gen double _timestamp = clock(timestamp, "YMDhms"), after(timestamp)
    assert missing(_timestamp) == missing(timestamp)
    format _timestamp %tcCCYY_NN_DD_HH_MM_SS.sss
    drop timestamp
    rename _timestamp timestamp
    
    gen double mask = floor(timestamp/msofminutes(1)) * msofminutes(1)
    format mask %tcCCYY_NN_DD_HH_MM_SS.sss
    
    collapse (mean) price contracts, by(mask)
    In the future, when posting example data, please use the -dataex- command to do so, as I have here. It took me far longer to wrestle your tableau into Stata than it did for me to solve your actual problem. I would also bet that it took you at least as long to compose that tableau as it would have to use -dataex-, probably longer. So -dataex- is a win-win for everyone involved. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Clyde, sorry for my fault. The program worked perfectly, thank you very much !!

      Comment


      • #4
        Clyde, i need an extra help:

        The dataset is:

        input str21 timestamp double price long contracts
        "20210104 10:08:00:000" 29.14 8742
        "20210104 10:08:00:000" 29.14 2914
        "20210104 10:08:00:000" 29.14 5828
        "20210104 10:08:00:000" 29.14 2914
        "20210104 10:08:00:000" 29.14 131130
        "20210104 10:08:00:000" 29.14 17484
        "20210104 10:08:00:000" 29.14 5828
        "20210104 10:08:00:000" 29.14 84506
        "20210104 10:08:00:000" 29.14 11656
        "20210104 10:08:00:000" 29.14 5828
        "20210104 10:08:00:000" 29.14 17484
        "20210104 10:08:00:000" 29.14 250604
        "20210104 10:08:00:000" 29.14 8742
        "20210104 10:08:00:000" 29.14 5828
        "20210104 10:08:00:000" 29.14 2914
        "20210104 10:08:00:000" 29.14 8742
        "20210104 10:08:00:000" 29.14 2914
        "20210104 10:08:00:000" 29.14 52452
        "20210104 10:08:00:000" 29.14 11656
        "20210104 10:08:12:543" 29.17 11668
        "20210104 10:08:12:820" 29.15 5830
        "20210104 10:08:13:567" 29.16 2916
        "20210104 10:08:14:640" 29.19 5838
        "20210104 10:08:25:990" 29.15 5830
        "20210104 10:08:25:990" 29.15 5830
        "20210104 10:08:26:040" 29.14 11656
        "20210104 10:08:30:703" 29.14 5828
        "20210104 10:08:30:703" 29.15 11660
        "20210104 10:08:31:720" 29.16 2916
        "20210104 10:08:32:727" 29.16 5832
        "20210104 10:08:33:783" 29.16 2916
        "20210104 10:08:34:010" 29.19 8757
        "20210104 10:08:34:917" 29.17 11668
        "20210104 10:08:46:263" 29.17 2917
        "20210104 10:09:02:783" 29.15 58300
        "20210104 10:09:16:277" 29.17 2917

        I have data collected from a certain stock. The price and the number of contracts traded for each milisecond are available in the dataex above.
        I need to forecast the price of last transaction, before the minute change.
        The ideia is to forecast the last price, based in variation,
        of price and number of contracts, and then use weighted average with this 2 variations.
        The last price collected of 10:08:46:263" 29.17 can't take part of the calculation, because it will be compared with the forecast.
        I couldn't discover how to do the collapse with weighted average, without including the last observation.
        Any help will be appreciated.
        Warmly
        Eli Hadad Junior

        Comment


        • #5
          Forecasting requires some particular mathematical or statistical model of how the data evolve. I don't know finance (I'm an epidemiologist) and I have no idea what kind of model is appropriate for this kind of data. I'm supposing you are trying to set that out here when you say "The ideia is to forecast the last price, based in variation,
          of price and number of contracts, and then use weighted average with this 2 variations." But that is not specific enough to code a model. "Based in ..." isn't sufficient to come up with a concrete mathematical formula.

          There are plenty of people who work in finance on this Forum, or know it well. Hopefully one of them will pick up the thread from here. If nobody helps you out within, say, 24 hours, I suggest you start a new thread with just this question, and give it a title that makes it clear you are looking for help with forecasting so those people who can help you will look at it.
          Last edited by Clyde Schechter; 22 Aug 2022, 15:07.

          Comment

          Working...
          X