Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two panel data sets with different variables and samples that do not perfectly match

    Hello everyone,

    I have two panel data sets that I want to merge. Both have observations which are uniquely identified by the variables year and id. However, both datasets contain different variables are do not contain perfectly equal samples.

    I have posted examples of my datasets below. What type of merge (e.g. 1:1, 1:m, etc.) would be most suitable? And why?

    Thank you.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id float year str8 cusip str4 ticker float(lag_IO_USD lag_IO lag_IO_PROP lag_HHI)
    2091 2012 "40048D10" "GPFO"      86.5  7.501876e-09  7.917063e-10         1
     465 2006 "15234Q20" "CAIF"   94261.2  2.559048e-08  .00027465922  .9660543
     465 2007 "15234Q20" "CAIF"  196631.2   3.63018e-08  .00007682755  .8612706
    2690 2010 "45069P10" "ITVP"    1436.5  4.371304e-08 .000012754213         1
    4700 2012 "78658410" "SAFR"       604  4.889976e-08  5.528215e-09         1
    4335 2013 "69367X10" "PTAI"   2137.92 6.7193675e-08  5.287189e-06         1
    1988 2010 "37441W10" "GFSZ" 2209.2002  7.446808e-08  .00002032706         1
    1988 2009 "37441W10" "GFSZ"   1636.14  7.883523e-08  .00001938505         1
    1612 2002 "V3267N10" "EXM"        3.5  8.333333e-08 1.7598925e-09         1
    2097 2009 "40048520" "GPOV"      1054  8.598452e-08  .00003277862         1
    1988 2011 "37441W10" "GFSZ"    2564.9  9.213324e-08 2.3353177e-08         1
    4644 2012 "46629410" "RSHY"    887.55 1.0330138e-07  8.119878e-09  .5153576
    2163 2012 "39138C10" "GWO/"      2001 1.0526315e-07   1.83145e-08         1
     441 2012 "08861Q10" "BZQI"     556.8 1.1049724e-07  5.096209e-09         1
    1817 2012 "30256310" "FPIC"     42.01      1.25e-07  7.163122e-10         1
    3212 2007 "98157D30" "MCWE"       .32 1.3445378e-07  6.493758e-09         1
    5376 2003 "88244310" "TGN"      104.5     1.375e-07 1.8053064e-06         1
    1140 2010 "23282C40" "CYRB"  16584.31 1.3981042e-07  5.198925e-06         1
    1140 2011 "23282C40" "CYRB"     15180 1.4184397e-07  1.382125e-07         1
     228 2011 "00753G10" "AVIF"      1415 1.6835017e-07 1.2883445e-08         1
    1955 2010 "36159210" "GEAG"     691.3 1.6847827e-07  6.360716e-06         1
    3172 2012 "59071710" "MBLT"       352  1.724138e-07  3.221741e-09         1
    4200 2011 "74053610" "PMOI"       603  1.724138e-07   5.49026e-09         1
    4725 2013 "78404D10" "SBFF"    545.75  1.957672e-07 3.5996813e-06         1
    1566 2001 "29358R10" "ESGR"     15.06      2.00e-07  7.113488e-10         1
    5958 2005 "96945510" "WCGR"         0  2.016129e-07             0         1
    5958 2007 "96945510" "WCGR"         0  2.016129e-07             0         1
    4725 2010 "78404D10" "SBFF"    770.64  2.378049e-07  7.090731e-06         1
    6038 2013 "98088R50" "WLWH"   1537.92      2.40e-07  .00001014388         1
    4549 2000 "76564130" "RICK"      2.94      2.50e-07 2.0426907e-09         1
    5463 2003 "88027710" "TNAB"      8000 2.5706942e-07  .00009996756         1
    5265 2013 "79584220" "SZGP"    812.76  2.883549e-07   4.68739e-06         1
    2097 2011 "40048520" "GPOV"    5075.8   3.56212e-07  4.621469e-08         1
    3279 2010 "56490510" "MFI/"    489.72  3.826087e-07 1.2393329e-07         1
    3279 2009 "56490510" "MFI/"    394.68 4.1121496e-07 1.1641372e-07         1
    3279 2008 "56490510" "MFI/"     662.2 4.1121496e-07 1.1650585e-07         1
    5463 2002 "88027710" "TNAB"     14638  4.181409e-07  .00016215292         1
    3212 2009 "98157D30" "MCWE"      1.02  4.285714e-07 1.0427176e-08  .8892734
    4802 2011 "81748L10" "SECC"    1896.4  4.462475e-07 1.7266547e-08         1
     794 2011 "18047P10" "CLZN"      1995  4.524887e-07 1.8164293e-08         1
    3307 2012 "59146510" "MHTL"    1479.6  4.651163e-07 1.3542296e-08         1
    5754 2007 "91730220" "USNZ"   2036.34  4.778761e-07 .000012562863         1
    4320 2013 "69367J10" "PSGT"    4654.5  4.882155e-07  8.371868e-06  .7330797
    5463 1995 "88027710" "TNAB"      6162  5.131579e-07   .0000411108         1
    5463 2001 "88027710" "TNAB"     19536  5.146349e-07  5.688995e-07  .6953125
    3999 2011 "69606A10" "PALA"    1926.6  5.205479e-07 .000016224065         1
    5264 2010 "86469110" "SZEV"   2996.63 5.2857143e-07  7.686928e-06  .3530508
    4360 2012 "74343W10" "PUOD"     790.5  5.555556e-07  7.235189e-09         1
    4543 2013 "74955W30" "RHDC"         0  5.797102e-07             0         1
    4661 2012 "76012510" "RTOK"     997.5  5.801105e-07  9.129792e-09         1
    4053 2002 "72365510" "PDC"         40      6.25e-07 1.8756312e-09         1
    1879 2000 "31807140" "FTT"        465  6.329114e-07  4.649331e-06         1
    6031 2011 "97143310" "WLMI" 18454.799    6.5625e-07 1.6802925e-07         1
    1879 2001 "31807140" "FTT"        423  6.578948e-07 3.6879785e-06         1
    1879 2002 "31807140" "FTT"      626.5  6.666667e-07  6.793011e-06         1
     267 2009 "21987A20" "BCA"    3044720  6.709121e-07  7.625424e-06 .29342243
    1066 2011 "21077120" "CTTA"     10934      7.00e-07  9.955306e-08         1
    4443 2011 "75279Q10" "RANJ"    6326.4  7.058824e-07  5.760129e-08         1
    6138 2007 "70590460" "XAN"      16.88  7.272727e-07 1.0670233e-10         1
    2767 2011 "47649310" "JRON"    7141.5   7.32484e-07 6.5022704e-08         1
    3999 2013 "69606A10" "PALA"    698.88  7.619047e-07  4.609703e-06         1
    2767 2012 "47649310" "JRON"      7920  7.643312e-07  7.248918e-08         1
    5204 2012 "86959C10" "SVNL"   12758.5  7.761438e-07  1.167744e-07         1
    4146 2013 "45577X10" "PIFM"   4160.69  7.784091e-07 .000010289606         1
    4146 2012 "45577X10" "PIFM"    3479.8  7.784091e-07 .000011057588         1
    1982 2011 "40231210" "GFKS"    1611.3  7.894737e-07 1.4670738e-08         1
    5963 2013 "98157D10" "WCOE"        24  8.078088e-07  1.050122e-07    .53125
    2720 2011 "46611010" "JBSA"    8930.1   8.16812e-07  8.130774e-08         1
    2014 2012 "37251T10" "GIGN"     11931   8.40164e-07 1.0920056e-07         1
    1988 2012 "37441W10" "GFSZ"      5064  8.510638e-07  4.634914e-08         1
    6031 2012 "97143310" "WLMI"   22318.4    9.0625e-07 2.0427305e-07         1
     228 2012 "00753G10" "AVIF"   12025.2  9.192735e-07 1.0925412e-07  .3421061
      93 2010 "02660R10" "AHMI"         2   9.25926e-07  5.061394e-10         1
     267 2010 "21987A20" "BCA"    8695308  9.269607e-07  7.117299e-06  .2421576
    5292 2012 "89236010" "TBLE"  9811.141  9.280245e-07  8.939402e-08  .8215861
    2731 2003 "47214610" "JDWP"      2680  9.302325e-07  .00001312637         1
    2058 2012 "36143A10" "GMHL"      1974  9.343937e-07 1.8067377e-08         1
    3212 2013 "98157D30" "MCWE"      2.24  9.411765e-07  1.633016e-09   .502551
    5933 2013 "93932210" "WAMU"     33.38  9.788856e-07  2.325995e-08  .5315127
    1982 2012 "40231210" "GFKS"   2554.63      1.00e-06 2.3381695e-08         1
    1803 2011 "34959F10" "FOJC"   28589.7 1.0603331e-06  2.603066e-07         1
    1803 2012 "34959F10" "FOJC"  20049.22 1.0695633e-06 1.8350397e-07         1
    5292 2011 "89236010" "TBLE"     11235 1.0719755e-06 1.0229363e-07         1
    4024 2012 "69367Y10" "PBMR"    5307.5 1.0784314e-06  4.857782e-08         1
     267 2008 "21987A20" "BCA"    8839121 1.0790714e-06  9.483426e-06 .55533427
    5265 2012 "79584220" "SZGP"   3060.24 1.1497227e-06  2.800938e-08         1
    3212 2008 "98157D30" "MCWE"      2.74 1.1512604e-06 3.8968686e-09  .3346475
    2772 2010 "48122U20" "JSFC"    150616 1.1647668e-06 .000010421089         1
    1161 2013 "24736110" "DALR"       4.6 1.1675127e-06 1.6185604e-08  .8130813
     742 2013 "12558110" "CITG"  18315.36 1.1703704e-06  .00006367048         1
     579 2010 "17273710" "CCTY"         2  1.190476e-06 3.2118794e-09         1
    1933 2009 "38991410" "GC/"        292 1.2195122e-06  2.422477e-06         1
    2421 2012 "44915J10" "HYPM"      3688 1.2779552e-06  3.375506e-08         1
    3286 2010 "59410T10" "MGDD"   2891.44 1.2789116e-06  .00002660441         1
    3212 2011 "98157D30" "MCWE"      3.06 1.2857142e-06 2.2003188e-09  .3217993
    1511 2011 "29248L10" "ENGG"    6206.2 1.2997904e-06  5.650688e-08         1
    3889 2012 "67935P10" "OLMI"      5248 1.3114754e-06  4.803323e-08         1
    1839 2012 "31543710" "FRRV"   11741.4 1.3487738e-06  1.074652e-07         1
    3947 2011 "68633110" "ORKL"     13356  1.372549e-06 1.2160515e-07         1
    2223 2012 "42281P20" "HDEL"   11185.3 1.4179104e-06  1.023754e-07         1
    end
    format %ty year
    label values id id
    label def id 93 "AHMI", modify
    label def id 228 "AVIF", modify
    label def id 267 "BCA", modify
    label def id 441 "BZQI", modify
    label def id 465 "CAIF", modify
    label def id 579 "CCTY", modify
    label def id 742 "CITG", modify
    label def id 794 "CLZN", modify
    label def id 1066 "CTTA", modify
    label def id 1140 "CYRB", modify
    label def id 1161 "DALR", modify
    label def id 1511 "ENGG", modify
    label def id 1566 "ESGR", modify
    label def id 1612 "EXM", modify
    label def id 1803 "FOJC", modify
    label def id 1817 "FPIC", modify
    label def id 1839 "FRRV", modify
    label def id 1879 "FTT", modify
    label def id 1933 "GC/", modify
    label def id 1955 "GEAG", modify
    label def id 1982 "GFKS", modify
    label def id 1988 "GFSZ", modify
    label def id 2014 "GIGN", modify
    label def id 2058 "GMHL", modify
    label def id 2091 "GPFO", modify
    label def id 2097 "GPOV", modify
    label def id 2163 "GWO/", modify
    label def id 2223 "HDEL", modify
    label def id 2421 "HYPM", modify
    label def id 2690 "ITVP", modify
    label def id 2720 "JBSA", modify
    label def id 2731 "JDWP", modify
    label def id 2767 "JRON", modify
    label def id 2772 "JSFC", modify
    label def id 3172 "MBLT", modify
    label def id 3212 "MCWE", modify
    label def id 3279 "MFI/", modify
    label def id 3286 "MGDD", modify
    label def id 3307 "MHTL", modify
    label def id 3889 "OLMI", modify
    label def id 3947 "ORKL", modify
    label def id 3999 "PALA", modify
    label def id 4024 "PBMR", modify
    label def id 4053 "PDC", modify
    label def id 4146 "PIFM", modify
    label def id 4200 "PMOI", modify
    label def id 4320 "PSGT", modify
    label def id 4335 "PTAI", modify
    label def id 4360 "PUOD", modify
    label def id 4443 "RANJ", modify
    label def id 4543 "RHDC", modify
    label def id 4549 "RICK", modify
    label def id 4644 "RSHY", modify
    label def id 4661 "RTOK", modify
    label def id 4700 "SAFR", modify
    label def id 4725 "SBFF", modify
    label def id 4802 "SECC", modify
    label def id 5204 "SVNL", modify
    label def id 5264 "SZEV", modify
    label def id 5265 "SZGP", modify
    label def id 5292 "TBLE", modify
    label def id 5376 "TGN", modify
    label def id 5463 "TNAB", modify
    label def id 5754 "USNZ", modify
    label def id 5933 "WAMU", modify
    label def id 5958 "WCGR", modify
    label def id 5963 "WCOE", modify
    label def id 6031 "WLMI", modify
    label def id 6038 "WLWH", modify
    label def id 6138 "XAN", modify

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id int year byte(CSP_t_s CSP_t_c CSP_e_s CSP_e_c CSP_s_s CSP_s_c CSP_g_s CSP_g_c)
     1 2000  1  2 0 0  1  1 0 1
     1 2001  2  3 0 0  2  2 0 1
     1 2002 11  8 0 0 10  7 1 1
     1 2003  8  9 0 0  7  8 1 1
     1 2004  9  9 0 0  8  8 1 1
     1 2005 13  7 3 0  9  7 1 0
     1 2006 14  5 4 0  9  5 1 0
     1 2007 16  6 4 0 11  6 1 0
     1 2008 13  6 3 0  9  6 1 0
     1 2009 13  6 3 0  9  6 1 0
     1 2010 11  5 4 0  6  5 1 0
     1 2011 10  3 3 0  6  3 1 0
     1 2012  4  3 0 0  4  2 0 1
     1 2013  9  2 2 0  7  2 0 0
     2 2013  6  0 1 0  5  0 0 0
     3 1991  2  2 1 2  1  0 0 0
     3 1992  3  2 1 2  1  0 1 0
     3 1993  3  3 1 3  1  0 1 0
     3 1994  5  4 1 3  3  1 1 0
     3 1995  5  5 1 3  3  1 1 1
     3 1996  6  3 1 3  4  0 1 0
     3 1997  6  5 1 3  4  1 1 1
     3 1998  4  4 2 2  2  1 0 1
     3 1999  4  3 1 2  3  1 0 0
     3 2000  4  5 1 3  3  1 0 1
     3 2001  6  5 1 2  4  2 1 1
     3 2002  7  7 1 4  5  2 1 1
     3 2003  6  7 1 4  4  2 1 1
     3 2004  5  7 1 4  3  2 1 1
     3 2005  9 10 2 4  6  5 1 1
     3 2006  9 10 2 4  6  5 1 1
     3 2007 10 11 3 4  5  6 2 1
     3 2008  9 11 3 4  5  6 1 1
     3 2009  9 11 3 4  5  6 1 1
     3 2010 17 13 5 3 10 10 2 0
     3 2011 17 13 5 3 10 10 2 0
     3 2012 16  7 3 2 11  4 2 1
     3 2013 15  5 2 1 12  3 1 1
     4 2004  0  1 0 0  0  1 0 0
     4 2005  0  2 0 0  0  2 0 0
     4 2006  1  3 0 0  0  2 1 1
     4 2007  1  2 0 0  0  1 1 1
     4 2008  1  2 0 0  0  1 1 1
     4 2009  1  2 0 0  0  1 1 1
     4 2010  0  0 0 0  0  0 0 0
     4 2011  0  1 0 0  0  0 0 1
     4 2012  0  0 0 0  0  0 0 0
     5 2004  0  2 0 0  0  2 0 0
     5 2005  1  2 0 0  0  2 1 0
     6 2013  0  0 0 0  0  0 0 0
     7 2003  0  0 0 0  0  0 0 0
     7 2004  2  1 0 0  2  1 0 0
     7 2005  2  1 0 0  2  1 0 0
     7 2006  2  1 0 0  2  1 0 0
     7 2007  2  1 0 0  2  1 0 0
     7 2008  2  1 0 0  2  1 0 0
     7 2009  2  1 0 0  2  1 0 0
     7 2010  0  1 0 0  0  1 0 0
     8 2003  0  0 0 0  0  0 0 0
     9 2013  0  0 0 0  0  0 0 0
    10 1991  0  2 0 0  0  2 0 0
    10 1992  1  2 0 0  1  2 0 0
    10 1993  1  2 0 0  1  2 0 0
    10 1994  1  2 0 0  1  1 0 1
    10 1995  0  2 0 0  0  0 0 2
    10 1996  2  1 0 0  1  0 1 1
    10 2013  3  0 1 0  2  0 0 0
    11 2013  0  0 0 0  0  0 0 0
    12 2009  1  1 0 0  0  1 1 0
    12 2012  0  0 0 0  0  0 0 0
    12 2013  1  1 0 0  1  1 0 0
    13 2010  0  0 0 0  0  0 0 0
    13 2011  0  1 0 0  0  0 0 1
    14 2003  1  0 0 0  0  0 1 0
    14 2004  1  0 0 0  0  0 1 0
    14 2005  0  0 0 0  0  0 0 0
    14 2006  0  0 0 0  0  0 0 0
    14 2007  0  0 0 0  0  0 0 0
    14 2008  0  0 0 0  0  0 0 0
    14 2009  0  0 0 0  0  0 0 0
    14 2010  0  0 0 0  0  0 0 0
    14 2011  0  1 0 0  0  0 0 1
    14 2012  0  0 0 0  0  0 0 0
    14 2013  0  0 0 0  0  0 0 0
    15 2002  1  0 0 0  0  0 1 0
    15 2003  0  0 0 0  0  0 0 0
    15 2004  0  1 0 0  0  0 0 1
    15 2005  0  0 0 0  0  0 0 0
    15 2006  0  1 0 0  0  0 0 1
    15 2007  0  3 0 0  0  2 0 1
    15 2008  0  3 0 0  0  2 0 1
    15 2009  0  3 0 0  0  2 0 1
    15 2010  1  2 0 0  1  1 0 1
    15 2011  1  2 0 0  1  1 0 1
    15 2012  0  0 0 0  0  0 0 0
    15 2013  0  0 0 0  0  0 0 0
    16 1991  5  2 0 0  5  2 0 0
    16 1992  6  2 1 0  5  2 0 0
    16 1993  6  1 1 0  5  1 0 0
    16 1994  6  3 0 0  6  3 0 0
    end
    format %ty year
    label values id id1
    label def id1 1 "A", modify
    label def id1 2 "A17U", modify
    label def id1 3 "AA", modify
    label def id1 4 "AACC", modify
    label def id1 5 "AACE", modify
    label def id1 6 "AAD", modify
    label def id1 7 "AAI", modify
    label def id1 8 "AAII", modify
    label def id1 9 "AAK", modify
    label def id1 10 "AAL", modify
    label def id1 11 "AALI", modify
    label def id1 12 "AAN", modify
    label def id1 13 "AAN/XX", modify
    label def id1 14 "AAON", modify
    label def id1 15 "AAP", modify
    label def id1 16 "AAPL", modify

  • #2
    Well, you would merge data sets such as you describe using -merge 1:1-. That is because id and year uniquely identify the observations in both data sets.

    But, there is a problem here. At least in the examples you show, not only do the two data set not have the same populations, they have no overlap at all. There isn't a single observation in either data set that has a corresponding observation (same id and year) in the other. Moreover the variables they include also do not overlap (except for id and year) so I don't see how it makes any sense to combine these data sets.

    If your real data does include observations that can match with each other, then go ahead with -merge 1:1-. After completing the -merge-, Stata will create a new variable, called _merge. When _merge == 1, the observation will be from the first data set, with no match in the second; when _merge == 2, the reverse. And when _merge == 3, the observation will represent the pairing of matched observations in each data set.

    Comment

    Working...
    X