Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining two datasets in miliseconds

    Dears, i have two datasets below, obtained in milliseconds basis. Each dataset is different .dta file. One is stock p3 prices, the other p4 prices.
    The prices occur in various time and i want to combine the two datasets, to achieve a unique data file .dta, with a unique timestamp and the respective prices p3 and p4, in their places.
    f i have prices p3 and p4 for the same time, they should be there, side by side.
    Thanks a lot.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str21 date float p3
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:003" 30.84
    "20220103 10:03:57:017"  30.8
    "20220103 10:03:57:017" 30.76
    "20220103 10:03:57:173"  30.8
    "20220103 10:03:57:193" 30.81
    "20220103 10:03:57:213" 30.82
    "20220103 10:03:59:447" 30.79
    "20220103 10:03:59:447" 30.79
    "20220103 10:03:59:447" 30.79
    "20220103 10:03:59:740" 30.78
    "20220103 10:03:59:740" 30.78
    "20220103 10:04:00:097" 30.78
    "20220103 10:04:00:097" 30.79
    "20220103 10:04:00:103" 30.79
    "20220103 10:04:00:407" 30.78
    "20220103 10:04:00:407" 30.77
    "20220103 10:04:00:407" 30.77
    "20220103 10:04:00:407" 30.77
    "20220103 10:04:00:590" 30.79
    "20220103 10:04:00:590" 30.79
    "20220103 10:04:04:603" 30.79
    "20220103 10:04:04:603" 30.79
    "20220103 10:04:04:603" 30.79
    "20220103 10:04:07:367"  30.8
    "20220103 10:04:07:367"  30.8
    "20220103 10:04:07:370"  30.8
    "20220103 10:04:09:813" 30.79
    "20220103 10:04:09:813" 30.79
    "20220103 10:04:09:827" 30.77
    "20220103 10:04:09:827" 30.77
    "20220103 10:04:09:873" 30.79
    "20220103 10:04:10:380" 30.79
    "20220103 10:04:12:600" 30.78
    end

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str21 date float p4
    "20220103 10:03:55:927" 28.54
    "20220103 10:03:55:927" 28.54
    "20220103 10:03:55:927" 28.54
    "20220103 10:03:55:927" 28.54
    "20220103 10:03:55:927" 28.54
    "20220103 10:03:55:927" 28.54
    "20220103 10:03:55:927" 28.54
    "20220103 10:03:55:927" 28.54
    "20220103 10:03:55:927" 28.54
    "20220103 10:03:55:927" 28.54
    "20220103 10:03:55:927" 28.54
    "20220103 10:03:55:947" 28.53
    "20220103 10:03:55:947" 28.53
    "20220103 10:03:55:970" 28.54
    "20220103 10:03:56:087" 28.54
    "20220103 10:03:56:087" 28.54
    "20220103 10:03:56:087" 28.54
    "20220103 10:03:56:087" 28.54
    "20220103 10:03:56:087" 28.55
    "20220103 10:03:56:087" 28.55
    "20220103 10:03:56:087" 28.55
    "20220103 10:03:56:087" 28.55
    "20220103 10:03:56:127" 28.55
    "20220103 10:03:56:150" 28.55
    "20220103 10:03:56:153" 28.55
    "20220103 10:03:56:160" 28.54
    "20220103 10:03:56:187" 28.55
    "20220103 10:03:56:187" 28.55
    "20220103 10:03:56:187" 28.55
    "20220103 10:03:56:187" 28.55
    "20220103 10:03:56:187" 28.55
    "20220103 10:03:56:187" 28.55
    "20220103 10:03:56:187" 28.55
    "20220103 10:03:56:187" 28.55
    "20220103 10:03:56:190" 28.56
    "20220103 10:03:56:273" 28.55
    "20220103 10:03:56:460" 28.55
    "20220103 10:03:56:460" 28.55
    "20220103 10:03:56:460" 28.55
    "20220103 10:03:56:790" 28.55
    "20220103 10:03:56:937" 28.56
    "20220103 10:03:56:937" 28.56
    "20220103 10:03:56:937" 28.56
    "20220103 10:03:56:937" 28.57
    "20220103 10:03:56:937" 28.57
    "20220103 10:03:56:937" 28.58
    "20220103 10:03:56:937" 28.58
    "20220103 10:03:56:937" 28.58
    "20220103 10:03:56:937" 28.59
    "20220103 10:03:56:937"  28.6
    end


  • #2
    Currently there are no matching time stamps between your two example datasets. You also appear to have a lot of duplicated values in each. You need to tell us how you want to deal with those. Is your time stamp missing precision, or if not, should the duplicates be disregarded? Or is there another missing identifier like a company or stock symbol?

    Your two ways forward will be to use either -merge- or -joinby-. You may read more about them and pay attention to examples to test with your own data.

    Comment


    • #3
      Dear Leonardo Guizzetti, this is a sample of my database. in fact the database is 20mm+ observations. As the data is collected in milliseconds, you may have offers with the same price, for the same timestamp.
      here are no repeated information.
      i know that there are no matching time stamps and this is the problem.
      Let's suppose for a certain timestamp:
      You may have price for p1 and not for p2;
      You may have price for p2 and not for p1;
      You may have price for p1 and p2
      And this behavior happens for many timestamps.
      Then i would combine both p1 and p2 (different files) and have the two informations in only one file.
      I will have only one timestamp, combining the two timestamps.
      Regards
      Eli

      ​​​​​​​, .

      Comment


      • #4
        Originally posted by Eli Hadad View Post
        ... you may have offers with the same price, for the same timestamp.
        There are no repeated information.
        i know that there are no matching time stamps and this is the problem.
        Let's suppose for a certain timestamp:
        ...
        You may have price for p1 and p2
        ...
        I will have only one timestamp, combining the two timestamps.
        Unfortunately, this hasn't really clarified the underlying logic of what you want, and your description is contradictory.

        If there is nothing to match between the two datasets, and this is desired, you could -append- the two datasets. However, this contradicts your expectation to have observations with prices -p3- and -p4- in the same observation for the same timestamp. Example:

        Code:
        . mkf file1
        . cwf file1
        // code from first dataex omitted
        
        . mkf file2
        . cwf file2
        // code from second dataex omitted
        
        . cwf file1
        . frameappend file2
        . list
        
             +---------------------------------------+
             |                  date      p3      p4 |
             |---------------------------------------|
          1. | 20220103 10:03:57:003   30.84       . |
          2. | 20220103 10:03:57:003   30.84       . |
          3. | 20220103 10:03:57:003   30.84       . |
          4. | 20220103 10:03:57:003   30.84       . |
          5. | 20220103 10:03:57:003   30.84       . |
             |---------------------------------------|
          6. | 20220103 10:03:57:003   30.84       . |
          7. | 20220103 10:03:57:003   30.84       . |
          8. | 20220103 10:03:57:003   30.84       . |
          9. | 20220103 10:03:57:003   30.84       . |
         10. | 20220103 10:03:57:003   30.84       . |
             |---------------------------------------|
         11. | 20220103 10:03:57:003   30.84       . |
         12. | 20220103 10:03:57:003   30.84       . |
         13. | 20220103 10:03:57:003   30.84       . |
         14. | 20220103 10:03:57:003   30.84       . |
         15. | 20220103 10:03:57:003   30.84       . |
             |---------------------------------------|
         16. | 20220103 10:03:57:003   30.84       . |
         17. | 20220103 10:03:57:003   30.84       . |
         18. | 20220103 10:03:57:003   30.84       . |
         19. | 20220103 10:03:57:017    30.8       . |
         20. | 20220103 10:03:57:017   30.76       . |
             |---------------------------------------|
         21. | 20220103 10:03:57:173    30.8       . |
         22. | 20220103 10:03:57:193   30.81       . |
         23. | 20220103 10:03:57:213   30.82       . |
         24. | 20220103 10:03:59:447   30.79       . |
         25. | 20220103 10:03:59:447   30.79       . |
             |---------------------------------------|
         26. | 20220103 10:03:59:447   30.79       . |
         27. | 20220103 10:03:59:740   30.78       . |
         28. | 20220103 10:03:59:740   30.78       . |
         29. | 20220103 10:04:00:097   30.78       . |
         30. | 20220103 10:04:00:097   30.79       . |
             |---------------------------------------|
         31. | 20220103 10:04:00:103   30.79       . |
         32. | 20220103 10:04:00:407   30.78       . |
         33. | 20220103 10:04:00:407   30.77       . |
         34. | 20220103 10:04:00:407   30.77       . |
         35. | 20220103 10:04:00:407   30.77       . |
             |---------------------------------------|
         36. | 20220103 10:04:00:590   30.79       . |
         37. | 20220103 10:04:00:590   30.79       . |
         38. | 20220103 10:04:04:603   30.79       . |
         39. | 20220103 10:04:04:603   30.79       . |
         40. | 20220103 10:04:04:603   30.79       . |
             |---------------------------------------|
         41. | 20220103 10:04:07:367    30.8       . |
         42. | 20220103 10:04:07:367    30.8       . |
         43. | 20220103 10:04:07:370    30.8       . |
         44. | 20220103 10:04:09:813   30.79       . |
         45. | 20220103 10:04:09:813   30.79       . |
             |---------------------------------------|
         46. | 20220103 10:04:09:827   30.77       . |
         47. | 20220103 10:04:09:827   30.77       . |
         48. | 20220103 10:04:09:873   30.79       . |
         49. | 20220103 10:04:10:380   30.79       . |
         50. | 20220103 10:04:12:600   30.78       . |
             |---------------------------------------|
         51. | 20220103 10:03:55:927       .   28.54 |
         52. | 20220103 10:03:55:927       .   28.54 |
         53. | 20220103 10:03:55:927       .   28.54 |
         54. | 20220103 10:03:55:927       .   28.54 |
         55. | 20220103 10:03:55:927       .   28.54 |
             |---------------------------------------|
         56. | 20220103 10:03:55:927       .   28.54 |
         57. | 20220103 10:03:55:927       .   28.54 |
         58. | 20220103 10:03:55:927       .   28.54 |
         59. | 20220103 10:03:55:927       .   28.54 |
         60. | 20220103 10:03:55:927       .   28.54 |
             |---------------------------------------|
         61. | 20220103 10:03:55:927       .   28.54 |
         62. | 20220103 10:03:55:947       .   28.53 |
         63. | 20220103 10:03:55:947       .   28.53 |
         64. | 20220103 10:03:55:970       .   28.54 |
         65. | 20220103 10:03:56:087       .   28.54 |
             |---------------------------------------|
         66. | 20220103 10:03:56:087       .   28.54 |
         67. | 20220103 10:03:56:087       .   28.54 |
         68. | 20220103 10:03:56:087       .   28.54 |
         69. | 20220103 10:03:56:087       .   28.55 |
         70. | 20220103 10:03:56:087       .   28.55 |
             |---------------------------------------|
         71. | 20220103 10:03:56:087       .   28.55 |
         72. | 20220103 10:03:56:087       .   28.55 |
         73. | 20220103 10:03:56:127       .   28.55 |
         74. | 20220103 10:03:56:150       .   28.55 |
         75. | 20220103 10:03:56:153       .   28.55 |
             |---------------------------------------|
         76. | 20220103 10:03:56:160       .   28.54 |
         77. | 20220103 10:03:56:187       .   28.55 |
         78. | 20220103 10:03:56:187       .   28.55 |
         79. | 20220103 10:03:56:187       .   28.55 |
         80. | 20220103 10:03:56:187       .   28.55 |
             |---------------------------------------|
         81. | 20220103 10:03:56:187       .   28.55 |
         82. | 20220103 10:03:56:187       .   28.55 |
         83. | 20220103 10:03:56:187       .   28.55 |
         84. | 20220103 10:03:56:187       .   28.55 |
         85. | 20220103 10:03:56:190       .   28.56 |
             |---------------------------------------|
         86. | 20220103 10:03:56:273       .   28.55 |
         87. | 20220103 10:03:56:460       .   28.55 |
         88. | 20220103 10:03:56:460       .   28.55 |
         89. | 20220103 10:03:56:460       .   28.55 |
         90. | 20220103 10:03:56:790       .   28.55 |
             |---------------------------------------|
         91. | 20220103 10:03:56:937       .   28.56 |
         92. | 20220103 10:03:56:937       .   28.56 |
         93. | 20220103 10:03:56:937       .   28.56 |
         94. | 20220103 10:03:56:937       .   28.57 |
         95. | 20220103 10:03:56:937       .   28.57 |
             |---------------------------------------|
         96. | 20220103 10:03:56:937       .   28.58 |
         97. | 20220103 10:03:56:937       .   28.58 |
         98. | 20220103 10:03:56:937       .   28.58 |
         99. | 20220103 10:03:56:937       .   28.59 |
        100. | 20220103 10:03:56:937       .    28.6 |
             +---------------------------------------+
        If the data are to be -merge-d, you have to deal with multiple observations with the same timestamp in either dataset. With nothing to match on, there is no understanding of how observations are to be combined. This is the same problem even if you want to match on timestamp.

        Comment

        Working...
        X