Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge 2 datasets by range of dates

    Hi,

    I would like to merge 2 datasets by range of dates and with an identifier (gvkey). So far, I have used "rangejoin" (ssc install) but I do not see options to manage unmatched data in each dataset (similar to the variable _merge create when using "merge 1:1 var using ...").

    I would like to keep data in the using dataset "data3.dta" which includes the date (datadate) that I want to be within a certain range (linkdt) and (linkenddt). I do not wish to keep unmatched observations from the master dataset but I would like to browse and be able to see what has matched or not.

    Master dataset: lnk.dta:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 gvkey int(linkdt linkenddt)
    "001000"  3969  6755
    "001001"  8663  9708
    "001002"  4731  4904
    "001003"  8741 10820
    "001004"  4497     .
    "001005"  4779  8431
    "001007"  5022  6969
    "001007"  6970  9037
    "001008"  8637 10164
    "001009"  8053 13221
    "001010" -3532   760
    "001010"   761  8945
    "001011"  8480 13054
    "001012"  6605 10955
    "001013"  7014 18627
    "001015"  8064  9800
    "001016"  6543 10343
    "001017"  4731 13208
    "001018"  4930  6969
    "001018"  6970  7733
    "001019"  4731 10266
    "001020"  4731 10322
    "001021"  7592 14322
    "001022"  4930  6086
    "001022"  6087  8609
    "001023"  8195 10505
    "001024"  8602  9037
    "001025"  4779  9209
    "001026"  4731  6725
    "001027"  2222  4165
    "001028"  7972 11718
    "001029"  8566  9435
    "001030"  4779  5402
    "001031"    91  2221
    "001031"  2222  6299
    "001034"  8811 17895
    "001036"  8740 15126
    "001036"  8740 15126
    "001037"  7585 10464
    "001038"  8630 16436
    "001039"  3771  7273
    "001040" -3652   760
    "001040"   761  9435
    "001042"  8207  9310
    "001043" -3289   760
    "001043"   761  8152
    "001043"  8896 12192
    "001043" 12393 14636
    "001044"  2952  3833
    "001045" -3652   760
    "001045"   761 18996
    "001045" 19701     .
    "001046"  8897  9282
    "001047"  7894  8551
    "001049"  5873  8181
    "001050"  7637     .
    "001051"  1958  3287
    "001052"  4839  6969
    "001052"  6970  8490
    "001054"  7944 12691
    "001055"  9120 13755
    "001056"  6390 17409
    "001057"  1461  2586
    "001057"  2587  2919
    "001058"   913  9799
    "001059"  4731  7760
    "001061"  4839  6969
    "001061"  6970  7609
    "001062"  1855     .
    "001065"  8494 11522
    "001066"  8796  9834
    "001067"   913  2221
    "001067"  2222  9127
    "001069"  8691  9055
    "001070"  2922  2951
    "001070"  2952  7790
    "001072"  4749  4836
    "001072"  4837 10975
    "001072" 13010     .
    "001073"  7738 13389
    "001074"  3625  8400
    "001075"   761     .
    "001076"  8343 12053
    "001076" 11995 12053
    "001076" 12054 18596
    "001076" 12054 18596
    "001076" 18597 18606
    "001076" 18597     .
    "001077"  4930  6969
    "001077"  6970  7578
    "001078" -3652   760
    "001078"   761     .
    "001079"   366  3317
    "001079"  3318  6117
    "001080"  1004  1459
    "001080"  1460  5964
    "001081" 10058 17465
    "001081" 10058 17465
    "001082"  8215 18870
    "001083"  6573  7335
    end
    format %td linkdt
    format %td linkenddt
    Using dataset: data3.dta:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 gvkey double datadate
    "001000"  5843
    "001000"  6209
    "001000"  6574
    "001001"  8765
    "001001"  9131
    "001001"  9496
    "001003"  8765
    "001003"  9131
    "001003"  9527
    "001003"  9892
    "001003" 10257
    "001003" 10623
    "001003" 10988
    "001004"  5629
    "001004"  5995
    "001004"  6360
    "001004"  6725
    "001004"  7090
    "001004"  7456
    "001004"  7821
    "001004"  8186
    "001004"  8551
    "001004"  8917
    "001004"  9282
    "001004"  9647
    "001004" 10012
    "001004" 10378
    "001004" 10743
    "001004" 11108
    "001004" 11473
    "001004" 11839
    "001004" 12204
    "001004" 12569
    "001004" 12934
    "001004" 13300
    "001004" 13665
    "001004" 14030
    "001004" 14395
    "001004" 14761
    "001004" 15126
    "001004" 15491
    "001004" 15856
    "001004" 16222
    "001004" 16587
    "001004" 16952
    "001004" 17317
    "001004" 17683
    "001004" 18048
    "001004" 18413
    "001004" 18778
    "001004" 19144
    "001004" 19509
    "001004" 19874
    "001004" 20239
    "001004" 20605
    "001004" 20970
    "001004" 21335
    "001004" 21700
    "001004" 22066
    "001005"  5782
    "001005"  6148
    "001005"  6513
    "001005"  6878
    "001005"  7243
    "001005"  7609
    "001005"  7974
    "001006"  7851
    "001006"  8216
    "001007"  7212
    "001007"  7578
    "001007"  7943
    "001007"  8308
    "001007"  8673
    "001007"  9039
    "001008"  8917
    "001008"  9282
    "001008"  9647
    "001009"  8339
    "001009"  8704
    "001009"  9070
    "001009"  9435
    "001009"  9800
    "001009" 10165
    "001009" 10531
    "001009" 10896
    "001009" 11261
    "001009" 11626
    "001009" 11992
    "001009" 12357
    "001009" 12722
    "001010"  5843
    "001010"  6209
    "001010"  6574
    "001010"  6939
    "001010"  7304
    "001010"  7670
    "001010"  8035
    "001010"  8400
    "001010"  8765
    "001010"  9131
    end
    format %td datadate
    Is there something more adapted than rangejoin to do this? Thanks.




  • #2
    I am still stuck with this issue. This type of merge by range of dates seem to be very prevalent in SAS. There must be an equivalent in Stata. I was wondering if I could replace "rangejoin" with a combination of "joinby", "append" and "merge".

    I can provide further information if necessary. Please advise. Thanks.

    Comment


    • #3
      joinby has options to identify what is in the master, using or both datasets. As far as identifying a match, create a variable with your condition.

      Comment


      • #4
        I have tried jointby but I'm dealing with 2 large datasets and it takes a very long time to process and create a gigantic datasets. Is there a way to avoid jointby?

        Comment


        • #5
          You could reshape wide the second dataset and make it a reshape and merge problem. For both of these, there are community contributed commands that make the process efficient for large datasets. Below, I use fastreshape and fmerge, both from SSC. fmerge is part of ftools by Sergio Correia.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str6 gvkey double datadate
          "001000"  5843
          "001000"  6209
          "001000"  6574
          "001001"  8765
          "001001"  9131
          "001001"  9496
          "001003"  8765
          "001003"  9131
          "001003"  9527
          "001003"  9892
          "001003" 10257
          "001003" 10623
          "001003" 10988
          "001004"  5629
          "001004"  5995
          "001004"  6360
          "001004"  6725
          "001004"  7090
          "001004"  7456
          "001004"  7821
          "001004"  8186
          "001004"  8551
          "001004"  8917
          "001004"  9282
          "001004"  9647
          "001004" 10012
          "001004" 10378
          "001004" 10743
          "001004" 11108
          "001004" 11473
          "001004" 11839
          "001004" 12204
          "001004" 12569
          "001004" 12934
          "001004" 13300
          "001004" 13665
          "001004" 14030
          "001004" 14395
          "001004" 14761
          "001004" 15126
          "001004" 15491
          "001004" 15856
          "001004" 16222
          "001004" 16587
          "001004" 16952
          "001004" 17317
          "001004" 17683
          "001004" 18048
          "001004" 18413
          "001004" 18778
          "001004" 19144
          "001004" 19509
          "001004" 19874
          "001004" 20239
          "001004" 20605
          "001004" 20970
          "001004" 21335
          "001004" 21700
          "001004" 22066
          "001005"  5782
          "001005"  6148
          "001005"  6513
          "001005"  6878
          "001005"  7243
          "001005"  7609
          "001005"  7974
          "001006"  7851
          "001006"  8216
          "001007"  7212
          "001007"  7578
          "001007"  7943
          "001007"  8308
          "001007"  8673
          "001007"  9039
          "001008"  8917
          "001008"  9282
          "001008"  9647
          "001009"  8339
          "001009"  8704
          "001009"  9070
          "001009"  9435
          "001009"  9800
          "001009" 10165
          "001009" 10531
          "001009" 10896
          "001009" 11261
          "001009" 11626
          "001009" 11992
          "001009" 12357
          "001009" 12722
          "001010"  5843
          "001010"  6209
          "001010"  6574
          "001010"  6939
          "001010"  7304
          "001010"  7670
          "001010"  8035
          "001010"  8400
          "001010"  8765
          "001010"  9131
          end
          format %td datadate
          
          bys gvkey (datadate): gen which=_n
          *ssc install fastreshape
          fastreshape wide datadate, i(gvkey) j(which)
          tempfile ddate
          save `ddate'
          
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str6 gvkey int(linkdt linkenddt)
          "001000"  3969  6755
          "001001"  8663  9708
          "001002"  4731  4904
          "001003"  8741 10820
          "001004"  4497     .
          "001005"  4779  8431
          "001007"  5022  6969
          "001007"  6970  9037
          "001008"  8637 10164
          "001009"  8053 13221
          "001010" -3532   760
          "001010"   761  8945
          "001011"  8480 13054
          "001012"  6605 10955
          "001013"  7014 18627
          "001015"  8064  9800
          "001016"  6543 10343
          "001017"  4731 13208
          "001018"  4930  6969
          "001018"  6970  7733
          "001019"  4731 10266
          "001020"  4731 10322
          "001021"  7592 14322
          "001022"  4930  6086
          "001022"  6087  8609
          "001023"  8195 10505
          "001024"  8602  9037
          "001025"  4779  9209
          "001026"  4731  6725
          "001027"  2222  4165
          "001028"  7972 11718
          "001029"  8566  9435
          "001030"  4779  5402
          "001031"    91  2221
          "001031"  2222  6299
          "001034"  8811 17895
          "001036"  8740 15126
          "001036"  8740 15126
          "001037"  7585 10464
          "001038"  8630 16436
          "001039"  3771  7273
          "001040" -3652   760
          "001040"   761  9435
          "001042"  8207  9310
          "001043" -3289   760
          "001043"   761  8152
          "001043"  8896 12192
          "001043" 12393 14636
          "001044"  2952  3833
          "001045" -3652   760
          "001045"   761 18996
          "001045" 19701     .
          "001046"  8897  9282
          "001047"  7894  8551
          "001049"  5873  8181
          "001050"  7637     .
          "001051"  1958  3287
          "001052"  4839  6969
          "001052"  6970  8490
          "001054"  7944 12691
          "001055"  9120 13755
          "001056"  6390 17409
          "001057"  1461  2586
          "001057"  2587  2919
          "001058"   913  9799
          "001059"  4731  7760
          "001061"  4839  6969
          "001061"  6970  7609
          "001062"  1855     .
          "001065"  8494 11522
          "001066"  8796  9834
          "001067"   913  2221
          "001067"  2222  9127
          "001069"  8691  9055
          "001070"  2922  2951
          "001070"  2952  7790
          "001072"  4749  4836
          "001072"  4837 10975
          "001072" 13010     .
          "001073"  7738 13389
          "001074"  3625  8400
          "001075"   761     .
          "001076"  8343 12053
          "001076" 11995 12053
          "001076" 12054 18596
          "001076" 12054 18596
          "001076" 18597 18606
          "001076" 18597     .
          "001077"  4930  6969
          "001077"  6970  7578
          "001078" -3652   760
          "001078"   761     .
          "001079"   366  3317
          "001079"  3318  6117
          "001080"  1004  1459
          "001080"  1460  5964
          "001081" 10058 17465
          "001081" 10058 17465
          "001082"  8215 18870
          "001083"  6573  7335
          end
          format %td linkdt
          format %td linkenddt
          *ssc install ftools
          fmerge m:1 gvkey using `ddate'
          gen id=_n
          fastreshape long datadate, i(id) j(which)
          drop if missing(datadate)
          gen match=inrange(datadate, linkdt, linkenddt) & !missing(linkdt) & !missing(linkenddt)
          Res.:

          Code:
          . l, sepby(gvkey)
          
               +-----------------------------------------------------------------------------------+
               |  id   which    gvkey      linkdt   linkenddt    datadate           _merge   match |
               |-----------------------------------------------------------------------------------|
            1. |   1       1   001000   13nov1970   30jun1978   31dec1975      matched (3)       1 |
            2. |   1       2   001000   13nov1970   30jun1978   31dec1976      matched (3)       1 |
            3. |   1       3   001000   13nov1970   30jun1978   31dec1977      matched (3)       1 |
               |-----------------------------------------------------------------------------------|
            4. |   2       1   001001   20sep1983   31jul1986   31dec1983      matched (3)       1 |
            5. |   2       2   001001   20sep1983   31jul1986   31dec1984      matched (3)       1 |
            6. |   2       3   001001   20sep1983   31jul1986   31dec1985      matched (3)       1 |
               |-----------------------------------------------------------------------------------|
            7. |   4       1   001003   07dec1983   16aug1989   31dec1983      matched (3)       1 |
            8. |   4       2   001003   07dec1983   16aug1989   31dec1984      matched (3)       1 |
            9. |   4       3   001003   07dec1983   16aug1989   31jan1986      matched (3)       1 |
           10. |   4       4   001003   07dec1983   16aug1989   31jan1987      matched (3)       1 |
           11. |   4       5   001003   07dec1983   16aug1989   31jan1988      matched (3)       1 |
           12. |   4       6   001003   07dec1983   16aug1989   31jan1989      matched (3)       1 |
           13. |   4       7   001003   07dec1983   16aug1989   31jan1990      matched (3)       0 |
               |-----------------------------------------------------------------------------------|
           14. |   5       1   001004   24apr1972           .   31may1975      matched (3)       0 |
           15. |   5       2   001004   24apr1972           .   31may1976      matched (3)       0 |
           16. |   5       3   001004   24apr1972           .   31may1977      matched (3)       0 |
           17. |   5       4   001004   24apr1972           .   31may1978      matched (3)       0 |
           18. |   5       5   001004   24apr1972           .   31may1979      matched (3)       0 |
           19. |   5       6   001004   24apr1972           .   31may1980      matched (3)       0 |
           20. |   5       7   001004   24apr1972           .   31may1981      matched (3)       0 |
           21. |   5       8   001004   24apr1972           .   31may1982      matched (3)       0 |
           22. |   5       9   001004   24apr1972           .   31may1983      matched (3)       0 |
           23. |   5      10   001004   24apr1972           .   31may1984      matched (3)       0 |
           24. |   5      11   001004   24apr1972           .   31may1985      matched (3)       0 |
           25. |   5      12   001004   24apr1972           .   31may1986      matched (3)       0 |
           26. |   5      13   001004   24apr1972           .   31may1987      matched (3)       0 |
           27. |   5      14   001004   24apr1972           .   31may1988      matched (3)       0 |
           28. |   5      15   001004   24apr1972           .   31may1989      matched (3)       0 |
           29. |   5      16   001004   24apr1972           .   31may1990      matched (3)       0 |
           30. |   5      17   001004   24apr1972           .   31may1991      matched (3)       0 |
           31. |   5      18   001004   24apr1972           .   31may1992      matched (3)       0 |
           32. |   5      19   001004   24apr1972           .   31may1993      matched (3)       0 |
           33. |   5      20   001004   24apr1972           .   31may1994      matched (3)       0 |
           34. |   5      21   001004   24apr1972           .   31may1995      matched (3)       0 |
           35. |   5      22   001004   24apr1972           .   31may1996      matched (3)       0 |
           36. |   5      23   001004   24apr1972           .   31may1997      matched (3)       0 |
           37. |   5      24   001004   24apr1972           .   31may1998      matched (3)       0 |
           38. |   5      25   001004   24apr1972           .   31may1999      matched (3)       0 |
           39. |   5      26   001004   24apr1972           .   31may2000      matched (3)       0 |
           40. |   5      27   001004   24apr1972           .   31may2001      matched (3)       0 |
           41. |   5      28   001004   24apr1972           .   31may2002      matched (3)       0 |
           42. |   5      29   001004   24apr1972           .   31may2003      matched (3)       0 |
           43. |   5      30   001004   24apr1972           .   31may2004      matched (3)       0 |
           44. |   5      31   001004   24apr1972           .   31may2005      matched (3)       0 |
           45. |   5      32   001004   24apr1972           .   31may2006      matched (3)       0 |
           46. |   5      33   001004   24apr1972           .   31may2007      matched (3)       0 |
           47. |   5      34   001004   24apr1972           .   31may2008      matched (3)       0 |
           48. |   5      35   001004   24apr1972           .   31may2009      matched (3)       0 |
           49. |   5      36   001004   24apr1972           .   31may2010      matched (3)       0 |
           50. |   5      37   001004   24apr1972           .   31may2011      matched (3)       0 |
           51. |   5      38   001004   24apr1972           .   31may2012      matched (3)       0 |
           52. |   5      39   001004   24apr1972           .   31may2013      matched (3)       0 |
           53. |   5      40   001004   24apr1972           .   31may2014      matched (3)       0 |
           54. |   5      41   001004   24apr1972           .   31may2015      matched (3)       0 |
           55. |   5      42   001004   24apr1972           .   31may2016      matched (3)       0 |
           56. |   5      43   001004   24apr1972           .   31may2017      matched (3)       0 |
           57. |   5      44   001004   24apr1972           .   31may2018      matched (3)       0 |
           58. |   5      45   001004   24apr1972           .   31may2019      matched (3)       0 |
           59. |   5      46   001004   24apr1972           .   31may2020      matched (3)       0 |
               |-----------------------------------------------------------------------------------|
           60. |   6       1   001005   31jan1973   31jan1983   31oct1975      matched (3)       1 |
           61. |   6       2   001005   31jan1973   31jan1983   31oct1976      matched (3)       1 |
           62. |   6       3   001005   31jan1973   31jan1983   31oct1977      matched (3)       1 |
           63. |   6       4   001005   31jan1973   31jan1983   31oct1978      matched (3)       1 |
           64. |   6       5   001005   31jan1973   31jan1983   31oct1979      matched (3)       1 |
           65. |   6       6   001005   31jan1973   31jan1983   31oct1980      matched (3)       1 |
           66. |   6       7   001005   31jan1973   31jan1983   31oct1981      matched (3)       1 |
               |-----------------------------------------------------------------------------------|
           67. |   7       1   001007   01oct1973   30jan1979   30sep1979      matched (3)       0 |
           68. |   7       2   001007   01oct1973   30jan1979   30sep1980      matched (3)       0 |
           69. |   7       3   001007   01oct1973   30jan1979   30sep1981      matched (3)       0 |
           70. |   7       4   001007   01oct1973   30jan1979   30sep1982      matched (3)       0 |
           71. |   7       5   001007   01oct1973   30jan1979   30sep1983      matched (3)       0 |
           72. |   7       6   001007   01oct1973   30jan1979   30sep1984      matched (3)       0 |
           73. |   8       1   001007   31jan1979   28sep1984   30sep1979      matched (3)       1 |
           74. |   8       2   001007   31jan1979   28sep1984   30sep1980      matched (3)       1 |
           75. |   8       3   001007   31jan1979   28sep1984   30sep1981      matched (3)       1 |
           76. |   8       4   001007   31jan1979   28sep1984   30sep1982      matched (3)       1 |
           77. |   8       5   001007   31jan1979   28sep1984   30sep1983      matched (3)       1 |
           78. |   8       6   001007   31jan1979   28sep1984   30sep1984      matched (3)       0 |
               |-----------------------------------------------------------------------------------|
           79. |   9       1   001008   25aug1983   30oct1987   31may1984      matched (3)       1 |
           80. |   9       2   001008   25aug1983   30oct1987   31may1985      matched (3)       1 |
           81. |   9       3   001008   25aug1983   30oct1987   31may1986      matched (3)       1 |
               |-----------------------------------------------------------------------------------|
           82. |  10       1   001009   18jan1982   13mar1996   31oct1982      matched (3)       1 |
           83. |  10       2   001009   18jan1982   13mar1996   31oct1983      matched (3)       1 |
           84. |  10       3   001009   18jan1982   13mar1996   31oct1984      matched (3)       1 |
           85. |  10       4   001009   18jan1982   13mar1996   31oct1985      matched (3)       1 |
           86. |  10       5   001009   18jan1982   13mar1996   31oct1986      matched (3)       1 |
           87. |  10       6   001009   18jan1982   13mar1996   31oct1987      matched (3)       1 |
           88. |  10       7   001009   18jan1982   13mar1996   31oct1988      matched (3)       1 |
           89. |  10       8   001009   18jan1982   13mar1996   31oct1989      matched (3)       1 |
           90. |  10       9   001009   18jan1982   13mar1996   31oct1990      matched (3)       1 |
           91. |  10      10   001009   18jan1982   13mar1996   31oct1991      matched (3)       1 |
           92. |  10      11   001009   18jan1982   13mar1996   31oct1992      matched (3)       1 |
           93. |  10      12   001009   18jan1982   13mar1996   31oct1993      matched (3)       1 |
           94. |  10      13   001009   18jan1982   13mar1996   31oct1994      matched (3)       1 |
               |-----------------------------------------------------------------------------------|
           95. |  11       1   001010   01may1950   30jan1962   31dec1975      matched (3)       0 |
           96. |  11       2   001010   01may1950   30jan1962   31dec1976      matched (3)       0 |
           97. |  11       3   001010   01may1950   30jan1962   31dec1977      matched (3)       0 |
           98. |  11       4   001010   01may1950   30jan1962   31dec1978      matched (3)       0 |
           99. |  11       5   001010   01may1950   30jan1962   31dec1979      matched (3)       0 |
          100. |  11       6   001010   01may1950   30jan1962   31dec1980      matched (3)       0 |
          101. |  11       7   001010   01may1950   30jan1962   31dec1981      matched (3)       0 |
          102. |  11       8   001010   01may1950   30jan1962   31dec1982      matched (3)       0 |
          103. |  11       9   001010   01may1950   30jan1962   31dec1983      matched (3)       0 |
          104. |  11      10   001010   01may1950   30jan1962   31dec1984      matched (3)       0 |
          105. |  12       1   001010   31jan1962   28jun1984   31dec1975      matched (3)       1 |
          106. |  12       2   001010   31jan1962   28jun1984   31dec1976      matched (3)       1 |
          107. |  12       3   001010   31jan1962   28jun1984   31dec1977      matched (3)       1 |
          108. |  12       4   001010   31jan1962   28jun1984   31dec1978      matched (3)       1 |
          109. |  12       5   001010   31jan1962   28jun1984   31dec1979      matched (3)       1 |
          110. |  12       6   001010   31jan1962   28jun1984   31dec1980      matched (3)       1 |
          111. |  12       7   001010   31jan1962   28jun1984   31dec1981      matched (3)       1 |
          112. |  12       8   001010   31jan1962   28jun1984   31dec1982      matched (3)       1 |
          113. |  12       9   001010   31jan1962   28jun1984   31dec1983      matched (3)       1 |
          114. |  12      10   001010   31jan1962   28jun1984   31dec1984      matched (3)       0 |
               |-----------------------------------------------------------------------------------|
          115. | 101       1   001006           .           .   30jun1981   using only (2)       0 |
          116. | 101       2   001006           .           .   30jun1982   using only (2)       0 |
               +-----------------------------------------------------------------------------------+
          
          .

          Comment

          Working...
          X