Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Doubt about collapse command

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str21 timestamp double price long contracts
    "20210104 10:08:00:000" 29.14   8742
    "20210104 10:08:00:000" 29.14   2914
    "20210104 10:08:00:000" 29.14   5828
    "20210104 10:08:00:000" 29.14   2914
    "20210104 10:08:00:000" 29.14 131130
    "20210104 10:08:00:000" 29.14  17484
    "20210104 10:08:00:000" 29.14   5828
    "20210104 10:08:00:000" 29.14   2914
    "20210104 10:08:00:000" 29.14  17484
    "20210104 10:08:00:000" 29.14   8742
    "20210104 10:08:00:000" 29.14  29140
    "20210104 10:08:00:000" 29.14   2914
    "20210104 10:08:00:000" 29.14   2914
    "20210104 10:08:00:000" 29.14   2914
    "20210104 10:08:00:000" 29.14  84506
    "20210104 10:08:00:000" 29.14  11656
    "20210104 10:08:00:000" 29.14   5828
    "20210104 10:08:00:000" 29.14  17484
    "20210104 10:08:00:000" 29.14   5828
    "20210104 10:08:00:000" 29.14  32054
    "20210104 10:08:00:000" 29.14  17484
    "20210104 10:08:00:000" 29.14   2914
    "20210104 10:08:00:000" 29.14  14570
    "20210104 10:08:00:000" 29.14   5828
    "20210104 10:08:00:000" 29.14   8742
    "20210104 10:08:00:000" 29.14  37882
    "20210104 10:08:00:000" 29.14   2914
    "20210104 10:08:00:000" 29.14 250604
    "20210104 10:08:00:000" 29.14   8742
    "20210104 10:08:00:000" 29.14   5828
    "20210104 10:08:00:000" 29.14   2914
    "20210104 10:08:00:000" 29.14   8742
    "20210104 10:08:00:000" 29.14   2914
    "20210104 10:08:00:000" 29.14  52452
    "20210104 10:08:00:000" 29.14  11656
    "20210104 10:08:12:543" 29.17  11668
    "20210104 10:08:12:820" 29.15   5830
    "20210104 10:08:13:567" 29.16   2916
    "20210104 10:08:14:640" 29.19   5838
    "20210104 10:08:25:990" 29.15   5830
    "20210104 10:08:25:990" 29.15   5830
    "20210104 10:08:26:040" 29.14  11656
    "20210104 10:08:30:703" 29.14   5828
    "20210104 10:08:30:703" 29.15  11660
    "20210104 10:08:31:720" 29.16   2916
    "20210104 10:08:32:727" 29.16   5832
    "20210104 10:08:33:783" 29.16   2916
    "20210104 10:08:34:010" 29.19   8757
    "20210104 10:08:34:917" 29.17  11668
    "20210104 10:08:46:263" 29.17   2917
    "20210104 10:09:02:783" 29.15  58300
    "20210104 10:09:16:277" 29.17   2917
    "20210104 10:09:16:277" 29.17   5834
    "20210104 10:09:38:020" 29.17  11668
    "20210104 10:09:38:100" 29.18   2918
    "20210104 10:09:38:127"  29.2   2920
    "20210104 10:09:39:557"  29.2   2920
    "20210104 10:09:45:647"  29.2   8760
    "20210104 10:09:45:990"  29.2  11680
    "20210104 10:09:46:000" 29.19  11676
    "20210104 10:09:50:243" 29.21   2921
    "20210104 10:09:50:243" 29.21  11684
    "20210104 10:09:54:283" 29.16   5832
    "20210104 10:09:55:303" 29.16   2916
    "20210104 10:09:55:553" 29.19  11676
    "20210104 10:09:55:700"  29.2   2920
    "20210104 10:09:59:110" 29.19  11676
    "20210104 10:10:00:203" 29.19   5838
    "20210104 10:10:00:777" 29.23   8769
    "20210104 10:10:01:750"  29.2  11680
    "20210104 10:10:01:753"  29.2  14600
    "20210104 10:10:02:820"  29.2   2920
    "20210104 10:10:03:913"  29.2   2920
    "20210104 10:10:05:573"  29.2   8760
    "20210104 10:10:05:573"  29.2   2920
    "20210104 10:10:05:587"  29.2  14600
    "20210104 10:10:05:733" 29.19   5838
    "20210104 10:10:05:733" 29.19   5838
    "20210104 10:10:08:847" 29.21   2921
    "20210104 10:10:16:420" 29.21  11684
    "20210104 10:10:34:287" 29.17   8751
    "20210104 10:10:34:810" 29.15  14575
    "20210104 10:10:36:347" 29.12   2912
    "20210104 10:10:36:347" 29.12  11648
    "20210104 10:10:46:113"  29.1  11640
    "20210104 10:10:46:567" 29.12  11648
    "20210104 10:10:51:887" 29.13   5826
    "20210104 10:10:57:010" 29.15  11660
    "20210104 10:10:57:270" 29.16   8748
    "20210104 10:11:00:577" 29.13   2913
    "20210104 10:11:00:967" 29.13   8739
    "20210104 10:11:01:997" 29.13   2913
    "20210104 10:11:10:727" 29.13   2913
    "20210104 10:11:14:587" 29.13   8739
    "20210104 10:11:14:587" 29.13   2913
    "20210104 10:11:15:613" 29.13   2913
    "20210104 10:11:15:857" 29.15  11660
    "20210104 10:11:19:307" 29.16   2916
    "20210104 10:11:19:307" 29.16   5832
    "20210104 10:11:19:307" 29.17   2917
    end
    Dears i have a doubt about -collapse- command.
    In the data sample above i need to collapse the data set, every change of minute, but the last observation of the minute can't be used. For instance "20210104 10:08:46:263" 29.17 2917; "20210104 10:09:59:110" 29.19 11676
    The collapse will be by -mean- for variables price and contracts.
    Any help would be appreciated.
    Thank you
    Eli Hadad Junior

  • #2
    OK. The first step is to create a Stata internal format time variable out of your string variable timestamp.

    Then we can create the minute variable from that. And then it's just a question of removing the last observation from each minute before -collapse-ing.

    I note, by the way, there can be multiple observations with the same value of timestamp, down to the milliseconds. So, when you say you want to exclude the "last observation" of the minute, I interpret that to mean the last one in the order they appear in the data set whenthere are ties for the last value of timestamp within the minute.

    Code:
    //  RECORD DATA ORDER
    gen long obs_no = _n
    
    //  CREATE A STATA INTERNAL FORMAT TIME VARIABLE
    replace timestamp = subinstr(timestamp, ":", ".", .)
    gen double time = clock(timestamp, "YMDhms")
    assert missing(time) == missing(timestamp)
    format time %tcCCYY_MM_DD_HH_MM_SS.sss
    
    //  CALCULATE THE MINUTE
    gen double minute = msofminutes(1)*floor(time/msofminutes(1))
    format minute %tcCCYY_MM_DD_HH_MM_SS.sss
    
    // OVERWRITE LAST VALUE IN EACH MINUTE WITH MISSING
    by minute (time obs_no), sort: replace price = . if _n == _N
    by minute (time obs_no): replace contracts = . if _n == _N
    
    // CALCULATE THE MEAN FOR EACH MINUTE
    collapse (mean) price contracts, by(minute)
    Added: I forgot to mention that in my -gen double time...- command, I am assuming that 20210104 refers to 4 JAN 2021, not 1 APR 2021. From the example data, I cannot tell whether the information is ordered YMD or YDM. If I have guessed wrong, you must change "YMDhms" there to "YDMhms".
    Last edited by Clyde Schechter; 23 Aug 2022, 10:44.

    Comment


    • #3
      Thank you Clive, an ellegant solution

      Comment


      • #4
        Thank you Clive, an ellegant solution

        Comment

        Working...
        X