Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge daily data with quarterly data

    Hey! A simple question!

    I have two panel data sets, one is quarterly, with 3 variables ID date and value. The date are the end of each quarter.
    One is daily, with 3 variables ID date and return. The date is every calendar day.

    I would like to merge the two data sets to daily data, that for each ID, there will be a different return every calendar day but same value in each quarter (around 90 observations).

    A LOT Thanks!

  • #2
    So, you have to have a quarterly date in both sets. In your quarterly data set, you have a daily date, though it corresponds to the last day of the quarter. In both data sets, do this:

    Code:
    gen qdate = qofd(date)
    format qdate %tq
    Then you can merge the two data sets on id and qdate.

    If you need more specific help, post back showing examples of both data sets using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Sorry I didn't write my question correct. It's a bit more complex.

      For data set 1, quarterly data, has 4 variables, permno (ID1) metro (ID2) yq(time series) and shares.

      For data set 2, daily data, has 4 variables, metro date yq and lsmi1.

      How could I merge, that for each permno of dataset 1,

      merge 1:m metro yq using dataset2

      ?

      Here are the examples:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double permno str14 metro float yq double shares
      10001 "Boston"        204  19986
      10001 "Chicago"       204  58062
      10001 "Houston"       204  16900
      10001 "Minneapolis"   204 119324
      10001 "New York"      204  24315
      10001 "Philadelphia"  204  12366
      10001 "Pittsburgh"    204 190860
      10001 "Sacramento"    204   6150
      10001 "San Francisco" 204    750
      10001 "St. Louis"     204 117181
      10001 "Boston"        206  16265
      10001 "Chicago"       206  82725
      10001 "Houston"       206  16900
      10001 "Los Angeles"   206  77531
      10001 "Minneapolis"   206 118404
      10001 "New York"      206  50124
      10001 "Pittsburgh"    206 197497
      10001 "Sacramento"    206   6150
      10001 "Boston"        207  27720
      10001 "Chicago"       207  64968
      10001 "Houston"       207  16900
      10001 "Minneapolis"   207 118229
      10001 "New York"      207  28012
      10001 "Philadelphia"  207  21907
      10001 "Pittsburgh"    207 197996
      10001 "Sacramento"    207   6150
      10001 "San Diego"     207     68
      10001 "San Francisco" 207   7500
      10001 "Boston"        208  28852
      10001 "Chicago"       208  79718
      10001 "Houston"       208  16900
      10001 "Minneapolis"   208 118229
      10001 "New York"      208  24833
      10001 "Philadelphia"  208  11716
      10001 "Pittsburgh"    208 155733
      10001 "Sacramento"    208   6150
      10001 "San Diego"     208     68
      10001 "Boston"        209  55003
      10001 "Charlotte"     209  10680
      10001 "Chicago"       209  79918
      10001 "Houston"       209  17800
      10001 "Minneapolis"   209 118049
      10001 "New York"      209  26007
      10001 "Philadelphia"  209  31459
      10001 "Sacramento"    209  16350
      10001 "Boston"        210  56361
      10001 "Charlotte"     210  10680
      10001 "Chicago"       210  77736
      10001 "Houston"       210  16800
      10001 "Milwaukee"     210  25458
      10001 "Minneapolis"   210   2000
      10001 "New York"      210  73165
      10001 "Philadelphia"  210  33474
      10001 "Sacramento"    210  16350
      10001 "Atlanta"       211   1500
      10001 "Boston"        211  51759
      10001 "Charlotte"     211  10930
      10001 "Chicago"       211  80976
      10001 "Houston"       211  17800
      10001 "Minneapolis"   211   2000
      10001 "New York"      211  23323
      10001 "Philadelphia"  211  31637
      10001 "Sacramento"    211  16350
      10001 "San Francisco" 211  30256
      10001 "Atlanta"       212   1500
      10001 "Boston"        212  51759
      10001 "Charlotte"     212  10780
      10001 "Chicago"       212  80665
      10001 "Houston"       212  17800
      10001 "Jacksonville"  212    100
      10001 "Minneapolis"   212   2000
      10001 "New York"      212  25049
      10001 "Philadelphia"  212  19804
      10001 "Sacramento"    212  16350
      10001 "San Francisco" 212  31256
      10001 "Atlanta"       213   1500
      10001 "Boston"        213  27730
      10001 "Charlotte"     213  67425
      10001 "Chicago"       213  84956
      10001 "Houston"       213  17800
      10001 "Jacksonville"  213    400
      10001 "Minneapolis"   213   2000
      10001 "New York"      213  35018
      10001 "Philadelphia"  213  25040
      10001 "Sacramento"    213  16350
      10001 "San Francisco" 213  35842
      10001 "Seattle"       213 105271
      10001 "Toronto"       213    653
      10001 "Atlanta"       214   1500
      10001 "Boston"        214  44701
      10001 "Charlotte"     214 135825
      10001 "Chicago"       214 140827
      10001 "Houston"       214  17800
      10001 "Los Angeles"   214  93581
      10001 "Minneapolis"   214   3000
      10001 "New York"      214 374733
      10001 "Philadelphia"  214  52311
      10001 "Sacramento"    214  16350
      10001 "San Francisco" 214  49943
      10001 "Seattle"       214 105124
      end
      format %tq yq


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str14 metro double date float(lsmi1 yq)
      "Atlanta" 18628   732.2151 204
      "Atlanta" 18629   23.18359 204
      "Atlanta" 18630  -875.7326 204
      "Atlanta" 18631          0 204
      "Atlanta" 18632 -71.758736 204
      "Atlanta" 18633  -94.94233 204
      "Atlanta" 18634          0 204
      "Atlanta" 18635   23.18359 204
      "Atlanta" 18636  -48.57514 204
      "Atlanta" 18637  -25.39155 204
      "Atlanta" 18638          0 204
      "Atlanta" 18639          0 204
      "Atlanta" 18640 -71.758736 204
      "Atlanta" 18641          0 204
      "Atlanta" 18642   23.18359 204
      "Atlanta" 18643  1679.7065 204
      "Atlanta" 18644  1679.7065 204
      "Atlanta" 18645   1584.764 204
      "Atlanta" 18646 -71.758736 204
      "Atlanta" 18647          0 204
      "Atlanta" 18648   23.18359 204
      "Atlanta" 18649  71.758736 204
      "Atlanta" 18650   23.18359 204
      "Atlanta" 18651   46.36718 204
      "Atlanta" 18652          0 204
      "Atlanta" 18653          0 204
      "Atlanta" 18654   48.57514 204
      "Atlanta" 18655          0 204
      "Atlanta" 18656 -71.758736 204
      "Atlanta" 18657          0 204
      "Atlanta" 18658          0 204
      "Atlanta" 18659          0 204
      "Atlanta" 18660   23.18359 204
      "Atlanta" 18661 -71.758736 204
      "Atlanta" 18662   23.18359 204
      "Atlanta" 18663 -71.758736 204
      "Atlanta" 18664 -120.33388 204
      "Atlanta" 18665 -120.33388 204
      "Atlanta" 18666   23.18359 204
      "Atlanta" 18667  71.758736 204
      "Atlanta" 18668          0 204
      "Atlanta" 18669          0 204
      "Atlanta" 18670  -23.18359 204
      "Atlanta" 18671   48.57514 204
      "Atlanta" 18672  71.758736 204
      "Atlanta" 18673 -71.758736 204
      "Atlanta" 18674          0 204
      "Atlanta" 18675  71.758736 204
      "Atlanta" 18676   23.18359 204
      "Atlanta" 18677          0 204
      "Atlanta" 18678   23.18359 204
      "Atlanta" 18679   23.18359 204
      "Atlanta" 18680   23.18359 204
      "Atlanta" 18681  71.758736 204
      "Atlanta" 18682   94.94233 204
      "Atlanta" 18683          0 204
      "Atlanta" 18684  -48.57514 204
      "Atlanta" 18685  -48.57514 204
      "Atlanta" 18686 -143.51747 204
      "Atlanta" 18687  71.758736 204
      "Atlanta" 18688   23.18359 204
      "Atlanta" 18689 -71.758736 204
      "Atlanta" 18690   23.18359 204
      "Atlanta" 18691  71.758736 204
      "Atlanta" 18692   48.57514 204
      "Atlanta" 18693  120.33388 204
      "Atlanta" 18694          0 204
      "Atlanta" 18695  71.758736 204
      "Atlanta" 18696  -23.18359 204
      "Atlanta" 18697          0 204
      "Atlanta" 18698   94.94233 204
      "Atlanta" 18699          0 204
      "Atlanta" 18700          0 204
      "Atlanta" 18701          0 204
      "Atlanta" 18702  -48.57514 204
      "Atlanta" 18703  71.758736 204
      "Atlanta" 18704  -23.18359 204
      "Atlanta" 18705  71.758736 204
      "Atlanta" 18706   94.94233 204
      "Atlanta" 18707   23.18359 204
      "Atlanta" 18708          0 204
      "Atlanta" 18709  71.758736 204
      "Atlanta" 18710  71.758736 204
      "Atlanta" 18711  -23.18359 204
      "Atlanta" 18712   23.18359 204
      "Atlanta" 18713  -48.57514 204
      "Atlanta" 18714 -143.51747 204
      "Atlanta" 18715          0 204
      "Atlanta" 18716   23.18359 204
      "Atlanta" 18717 -71.758736 204
      "Atlanta" 18718  -70.68202 205
      "Atlanta" 18719 -71.758736 205
      "Atlanta" 18720 -1.0767155 205
      "Atlanta" 18721   23.18359 205
      "Atlanta" 18722  -47.49843 205
      "Atlanta" 18723  142.44075 205
      "Atlanta" 18724   47.49843 205
      "Atlanta" 18725  24.314837 205
      "Atlanta" 18726    47.4439 205
      "Atlanta" 18727  166.70107 205
      end
      format %td date
      format %tq yq
      Last edited by Thomas Holst; 19 Dec 2019, 00:41.

      Comment


      • #4
        How could I merge, that for each permno of dataset 1,

        merge 1:m metro yq using dataset2

        ?
        Yes, exactly that. Have you tried it? Is there a problem?

        Comment


        • #5
          Yes...if there is only one permno it won't be a problem, but now there are many permno so it says:

          variables metro yq do not uniquely identify observations in the master data

          I guess I need to write a loop...could you help

          Thanks!

          Comment


          • #6
            No, this has nothing to do with loops. Your problem is that your data set is not consistent with your stated description of it and plans. Most likely the data is faulty, or perhaps it is just your misunderstanding of the data. So you need to either fix your data, if it is wrong, or clarify your understanding of the data and what you need to do with it so that those are consistent with the realities of the data.

            In your example data, metro and yq do uniquely identify observations in the master data.

            So it is hard for me to advise you how to proceed because the examples you have shown do not actually represent your situation. There are several possibilities:

            1. metro and yq should uniquely identify observations in the master data. Base on your statement in #1 "
            I have two panel data sets, one is quarterly, with 3 variables ID date and value. The date are the end of each quarter." I think this is the case. In a panel data set, you have one observation per time period per panel. So the fact that Stata is finding that this is not true, the conclusion would be that there is something wrong with the data in data set 1. So give some thought: shouldn't it be true that there is only one observation per metro per quarter in data set 1? If it should be true, then you need to find the violations and fix them. Finding them is easy:

            Code:
            duplicates tag metro quarter, gen(flag)
            browse if flag
            Fixing them depends on what you find when you do that.

            2. Perhaps I am reading too much into a casual description of your data as panel data. If it is perfectly appropriate for there to be multiple observations per metro per quarter, and given that there are lots of multiple observations per metro per quarter in the second data set because it has daily observations, then you need to figure out which of the multiple observations for a given metro and quarter in data set 1 should be paired with which observations of the same metro and quarter in data set 2. One possibility is that the answer is that every observation in data set 1 should be paired with every observation in data set 2 that has the same metro and quarter. If that's the case, see -help joinby-. Or it may be that there is some rule based on your data that identifies which of the observations in data set 1 should be paired up. In that case you would have to program accordingly. The details of the code obviously depend on the actual way you want to link up these data sets.

            If you need more specific advice, post back with example data that actually exhibits the problem at hand, and also give a very clear explanation of how to decide which observation(s) from data set 1 get paired with which observations in data set 2 (if we are in situation 2).

            Comment


            • #7
              It should be situation 2. Please see the new data samples:


              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input double permno str14 metro float yq double shares
              10026 "Atlanta"       205  437054
              10026 "Baltimore"     205   82340
              10026 "Boston"        205  793486
              10026 "Charlotte"     205  826436
              10026 "Chicago"       205  719912
              10026 "Columbus"      205   87867
              10026 "Dallas"        205    4364
              10026 "Denver"        205   62085
              10026 "Kansas City"   205  143887
              10026 "Los Angeles"   205    8125
              10026 "Memphis"       205   18310
              10026 "Milwaukee"     205  103532
              10026 "Minneapolis"   205   98717
              10026 "Montreal"      205   66250
              10026 "New York"      205 6063350
              10026 "Philadelphia"  205   83393
              10026 "Pittsburgh"    205  135085
              10026 "Sacramento"    205   29879
              10026 "San Francisco" 205   53588
              10026 "Seattle"       205   23989
              10026 "Toronto"       205   12151
              10026 "Washington"    205    4210
              10026 "Atlanta"       206  427537
              10026 "Baltimore"     206   81740
              10026 "Boston"        206  777268
              10026 "Charlotte"     206  834962
              10026 "Chicago"       206 1060282
              10026 "Columbus"      206   88316
              10026 "Dallas"        206    4153
              10026 "Denver"        206   61760
              10026 "Kansas City"   206  206301
              10026 "Los Angeles"   206  453313
              10026 "Memphis"       206   13800
              10026 "Milwaukee"     206  207499
              10026 "Minneapolis"   206   94591
              10026 "Montreal"      206   67450
              10026 "New York"      206 6123919
              10026 "Philadelphia"  206   87803
              10026 "Pittsburgh"    206  133676
              10026 "Sacramento"    206   29179
              10026 "San Francisco" 206   60480
              10026 "Seattle"       206   26156
              10026 "Toronto"       206   11906
              10051 "Baltimore"     205  872850
              10051 "Boston"        205 6840343
              10051 "Charlotte"     205 2049850
              10051 "Chicago"       205 3491193
              10051 "Cincinnati"    205    8655
              10051 "Cleveland"     205     243
              10051 "Columbus"      205   74818
              10051 "Dallas"        205   19000
              10051 "Denver"        205  864600
              10051 "Kansas City"   205   14620
              10051 "Los Angeles"   205 1895430
              10051 "Memphis"       205   32480
              10051 "Milwaukee"     205  481752
              10051 "Minneapolis"   205  260005
              10051 "Montreal"      205  154130
              10051 "New York"      205 4396011
              10051 "Philadelphia"  205  625500
              10051 "Pittsburgh"    205 1468586
              10051 "Sacramento"    205  125664
              10051 "San Diego"     205  719006
              10051 "San Francisco" 205  435634
              10051 "Seattle"       205  150547
              10051 "Tampa"         205  620098
              10051 "Toronto"       205   53915
              10051 "Baltimore"     206  944550
              10051 "Boston"        206 5742479
              10051 "Charlotte"     206 2102979
              10051 "Chicago"       206 2797312
              10051 "Cleveland"     206     187
              10051 "Columbus"      206  115799
              10051 "Denver"        206  960017
              10051 "Kansas City"   206   14620
              10051 "Los Angeles"   206 2847729
              10051 "Milwaukee"     206  474038
              10051 "Minneapolis"   206  216555
              10051 "Montreal"      206  157130
              10051 "New York"      206 3735409
              10051 "Philadelphia"  206  534390
              10051 "Pittsburgh"    206 2472563
              10051 "Sacramento"    206  128964
              10051 "San Diego"     206  941738
              10051 "San Francisco" 206  532984
              10051 "Seattle"       206  193349
              10051 "Tampa"         206  772048
              10051 "Toronto"       206   26406
              end
              format %tq yq


              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str14 metro double date float(lsmi1 yq)
              "New York"      18800   64.26257 205
              "New York"      18801  -64.26257 205
              "New York"      18802  -64.26257 205
              "New York"      18803   192.7877 205
              "New York"      18804          0 205
              "New York"      18805  -192.7877 205
              "New York"      18806          0 205
              "New York"      18807  -192.7877 205
              "New York"      18808  -192.7877 205
              "New York"      18809  -64.26257 206
              "New York"      18810  -64.26257 206
              "New York"      18811 -128.52515 206
              "New York"      18812  -64.26257 206
              "New York"      18813          0 206
              "New York"      18814  -192.7877 206
              "New York"      18815   64.26257 206
              "New York"      18816   192.7877 206
              "New York"      18817  -64.26257 206
              "New York"      18818 -128.52515 206
              "New York"      18819  -192.7877 206
              "New York"      18820          0 206
              "Philadelphia"  18800  -51.68946 205
              "Philadelphia"  18801  -51.68946 205
              "Philadelphia"  18802   51.68946 205
              "Philadelphia"  18803  -51.68946 205
              "Philadelphia"  18804          0 205
              "Philadelphia"  18805  -51.68946 205
              "Philadelphia"  18806          0 205
              "Philadelphia"  18807  -51.68946 205
              "Philadelphia"  18808  -51.68946 205
              "Philadelphia"  18809   51.68946 206
              "Philadelphia"  18810  -51.68946 206
              "Philadelphia"  18811 -103.37893 206
              "Philadelphia"  18812  -51.68946 206
              "Philadelphia"  18813 -103.37893 206
              "Philadelphia"  18814  -51.68946 206
              "Philadelphia"  18815   51.68946 206
              "Philadelphia"  18816          0 206
              "Philadelphia"  18817  -51.68946 206
              "Philadelphia"  18818          0 206
              "Philadelphia"  18819  -51.68946 206
              "Philadelphia"  18820          0 206
              "San Francisco" 18800   53.08647 205
              "San Francisco" 18801  -53.08647 205
              "San Francisco" 18802  -53.08647 205
              "San Francisco" 18803  -53.08647 205
              "San Francisco" 18804  -167.6415 205
              "San Francisco" 18805 -220.72797 205
              "San Francisco" 18806          0 205
              "San Francisco" 18807 -198.37576 205
              "San Francisco" 18808  114.55502 205
              "San Francisco" 18809  114.55502 206
              "San Francisco" 18810 -114.55502 206
              "San Francisco" 18811  -167.6415 206
              "San Francisco" 18812 -114.55502 206
              "San Francisco" 18813          0 206
              "San Francisco" 18814  114.55502 206
              "San Francisco" 18815 -114.55502 206
              "San Francisco" 18816  -53.08647 206
              "San Francisco" 18817  114.55502 206
              "San Francisco" 18818   61.46855 206
              "San Francisco" 18819    8.38208 206
              "San Francisco" 18820          0 206
              end
              format %td date
              format %tq yq
              For example:

              In data set 1:

              10026 "New York" 205 6063350

              and

              10051 "New York" 205 4396011

              have same metro and yq and both of them need to merge with

              "New York" 18800 64.26257 205
              "New York" 18801 -64.26257 205
              "New York" 18802 -64.26257 205
              "New York" 18803 192.7877 205
              "New York" 18804 0 205
              "New York" 18805 -192.7877 205
              "New York" 18806 0 205
              "New York" 18807 -192.7877 205
              "New York" 18808 -192.7877 205

              in data set 2.



              Thanks!
              Last edited by Thomas Holst; 19 Dec 2019, 17:56.

              Comment


              • #8
                Got it. So this is a -joinby-, not a -merge-.

                Code:
                use dataset1, clear
                joinby metro yq using dataset2, unmatched(master)
                I specified the -unmatched(master)- option on the (totally speculative on my part) assumption that if there are some metro-yq pairs in dataset 1 that find no match in dataset2 that you want to keep them, but you do not care to retain such unmatched observations from dataset2. If that's not right, change the -unmatched()- option to suit your needs. See -help joinby-.

                Comment


                • #9
                  It worked! Thanks!

                  Comment

                  Working...
                  X