Hi all,
I have two very large DB. DB1 has 30181537 observations and looks like the following:
DB2, consists of 2434786 observations and looks like this:
In DB1, I have more observations since more docdb_family_id can cite another docdb_family_id (see cited_docdb_family_id). DB2 instead consists of unique docdb_family_id. Now, what I would like to do is simply to merge m:1 DB1 with DB2, however this operation is quite expensive computationally. I have also tried to reclink the two but the computational scalability seems huge. I wonder if stata offers some parallel tool to cope with my situation or if I am missing some other solution to the problem.
Thank you
I have two very large DB. DB1 has 30181537 observations and looks like the following:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long(docdb_family_id cited_docdb_family_id) int distance long id_mas 14655790 184 9999 1 22435239 273 9999 2 23413025 277 9999 3 25032836 376 9999 4 25236037 796 9999 5 25389490 967 9999 6 6148394 1224 9999 7 8213391 1335 9999 8 8213486 1335 9999 9 9378178 1354 9999 10 6122783 1903 9999 11 25136035 2207 9999 12 38622262 2570 4 13 47664232 2629 3 14 9435050 2701 9999 15 24755188 3048 9999 16 23564926 3091 9999 17 10955460 3143 9999 18 25479825 3218 9999 19 6146064 3280 9999 20 4209965 3325 9999 21 23820210 3363 9999 22 22659293 4045 9999 23 7921670 4107 9999 24 6145713 4412 9999 25 6198118 4476 9999 26 17647544 6071 3 27 25347708 7261 9999 28 53479553 7307 3 29 23326895 9929 9999 30 4237610 10702 9999 31 3518621 11028 9999 32 6099169 11330 9999 33 9477906 11394 9999 34 40863376 11682 4 35 7645211 11797 9 36 69187708 12355 1 37 49167122 13043 4 38 32175886 13050 9999 39 32314360 13050 9999 40 3518621 13811 9999 41 21782807 14102 9999 42 8195431 15222 9999 43 38777234 15335 9 44 37030119 15335 9999 45 26277664 15706 9999 46 23045239 15782 9999 47 39571342 15864 5 48 35735976 16016 9999 49 6158000 16612 9999 50 11097108 16780 10 51 34835567 16780 9999 52 8193485 17091 9999 53 3883243 17278 9999 54 38103827 17362 3 55 25962891 17374 9999 56 7787500 17374 9999 57 6066778 17795 9999 58 25371852 17838 9999 59 45998613 17917 7 60 29225942 17991 9999 61 45420805 18025 5 62 8026829 18069 9999 63 32046109 18221 9999 64 24266686 18445 9999 65 11191737 18555 9999 66 11191728 18555 9999 67 27440849 19159 9999 68 25692576 19226 9999 69 4252040 19226 9999 70 4215870 19226 9999 71 19854428 19321 9999 72 9144723 19346 9999 73 6045892 19523 9999 74 26329156 19527 9999 75 9277003 20067 9999 76 9245880 20613 9999 77 26144032 20914 9999 78 4376343 21021 9999 79 11312063 21131 9999 80 9932116 21877 9999 81 32684183 21921 5 82 67810824 22078 9999 83 39434350 22219 9999 84 11049942 22698 9999 85 38331742 22827 9999 86 11162045 22875 9999 87 23135237 22956 9999 88 9205154 22982 9999 89 25380722 23001 2 90 43991091 23142 9999 91 43991094 23142 9999 92 3507359 23367 9999 93 9465611 23400 9999 94 37547534 23616 9999 95 14311934 24196 9999 96 9229318 24197 9999 97 6493957 24411 0 98 9309425 24648 9999 99 8005863 24872 9999 100 end
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long cited_docdb_family_id str71 ctry_docdb float id_us 569328 "['AT' 'DE' 'WO' 'US' 'EP']" 1 574660 "['JP' 'AU' 'AT' 'DE' 'WO' 'US' 'EP']" 2 1187498 "['DE' 'RU' 'US' 'WO']" 3 1226468 "['DE' 'US' 'ES' 'EP' 'WO']" 4 1236571 "['WO' 'DE' 'CA' 'US' 'EP']" 5 1239098 "['WO' 'CA' 'DE' 'EP' 'US']" 6 1239277 "['WO' 'DE' 'US' 'EP']" 7 1239483 "['DE' 'EP' 'US' 'WO']" 8 1239622 "['DE' 'EP' 'WO' 'FI' 'US']" 9 1239624 "['EP' 'US' 'DE' 'WO']" 10 1239749 "['EP' 'JP' 'US' 'DE' 'WO']" 11 1334477 "['DE']" 12 1340405 "['US' 'EP' 'WO']" 13 1340418 "['EP' 'WO' 'US' 'GB']" 14 1340462 "['WO' 'US' 'EP']" 15 1340471 "['US' 'WO' 'EP']" 16 1340485 "['EP' 'JP' 'AU' 'US' 'CA' 'WO']" 17 1340488 "['WO' 'CA' 'EP' 'US']" 18 1340508 "['US' 'EP' 'WO' 'DE']" 19 1340519 "['EP' 'US' 'WO']" 20 1340541 "['US' 'AU' 'EP' 'WO']" 21 1340647 "['EP' 'DE' 'WO' 'CA' 'US']" 22 1340659 "['WO' 'NZ' 'FI' 'AU' 'NO' 'EP' 'BR']" 23 1340673 "['EP' 'WO' 'FI' 'US']" 24 1340826 "['WO' 'EP']" 25 1341006 "['EP' 'US' 'WO']" 26 1341170 "['US' 'EP' 'WO']" 27 1341175 "['EP' 'US' 'WO']" 28 1341188 "['JP' 'EP' 'US' 'WO']" 29 1578920 "['NO' 'US' 'CH' 'FI' 'FR' 'SE' 'DK' 'GB' 'IL' 'AT']" 30 1763337 "['WO' 'US']" 31 2185207 "['US']" 32 2267654 "['US']" 33 2410050 "['US']" 34 2456718 "['US']" 35 2463680 "['US' 'DE' 'FI' 'EP' 'WO' 'CN']" 36 2588926 "['US']" 37 2590654 "['US']" 38 2856998 "['US']" 39 3411930 "['US' 'WO' 'EP' 'CA']" 40 3459794 "['US']" 41 3459918 "['US']" 42 3460123 "['US']" 43 3460237 "['US']" 44 3460351 "['US']" 45 3460409 "['US']" 46 3460621 "['AT' 'EP' 'DE' 'CA' 'AU' 'WO' 'HK' 'US' 'AP' 'CN']" 47 3460636 "['AP']" 48 3460679 "['AP']" 49 3460785 "['AR' 'US']" 50 3460796 "['CA' 'AR' 'AU' 'BR' 'MX' 'US' 'EP']" 51 3460811 "['AR' 'MX' 'PE' 'US' 'BR' 'CO']" 52 3460812 "['AT' 'AR' 'US' 'ES' 'EP' 'DE']" 53 3460829 "['CN' 'BR' 'MX' 'US' 'AR' 'IT' 'CO']" 54 3460972 "['BR' 'US']" 55 3460979 "['US' 'JP' 'BR' 'MX']" 56 3460980 "['US']" 57 3460981 "['US']" 58 3461002 "['AU' 'DE' 'AT' 'US' 'ZA' 'ES' 'EP' 'NZ']" 59 3461010 "['BR' 'IL' 'US' 'EP']" 60 3461018 "['DE' 'ES' 'EP' 'AU' 'ZA']" 61 3461040 "['US' 'BR']" 62 3461044 "['US' 'BR']" 63 3461068 "['US']" 64 3461084 "['DE' 'JP' 'US' 'FR']" 65 3461088 "['US']" 66 3461099 "['US']" 67 3461127 "['US' 'BR']" 68 3461128 "['IT' 'BR' 'ES' 'US']" 69 3461146 "['US']" 70 3461172 "['UY' 'BR' 'EP' 'US']" 71 3461173 "['US' 'BR']" 72 3461192 "['AR' 'US']" 73 3461205 "['DE' 'AT' 'AU' 'JP' 'US' 'EP' 'CA' 'BR']" 74 3461208 "['US' 'AU' 'CN' 'BR' 'CA' 'EP']" 75 3461508 "['US']" 76 3461552 "['US']" 77 3461559 "['BR' 'US' 'DE']" 78 3461677 "['US']" 79 3461734 "['US']" 80 3462350 "['US' 'BR']" 81 3463151 "['US']" 82 3463640 "['AU' 'DE' 'FR' 'ES' 'IT' 'AR' 'US']" 83 3463716 "['BR' 'US' 'AR']" 84 3464348 "['GB' 'ES' 'DE' 'AU' 'FR' 'BR' 'AR' 'CA' 'US' 'IT']" 85 3464855 "['AR' 'BR' 'CA' 'US' 'ES' 'IT']" 86 3466402 "['AR' 'US']" 87 3466535 "['AR' 'CA' 'US']" 88 3467766 "['AR' 'US']" 89 3467767 "['AR' 'US']" 90 3467833 "['AR' 'MX' 'AU' 'US' 'ZA' 'ES']" 91 3467977 "['AR' 'DE' 'JP' 'US' 'DK' 'GB' 'FR']" 92 3467987 "['DE' 'US' 'GB' 'FR' 'AR' 'IT' 'CA']" 93 3468001 "['AR' 'US']" 94 3468888 "['BR' 'AR' 'US']" 95 3469170 "['BR' 'US' 'AR']" 96 3469171 "['AR' 'US']" 97 3469272 "['ES' 'FR' 'AR' 'US']" 98 3470490 "['US' 'MX' 'NZ' 'BR' 'ZA' 'AU' 'AR' 'CA']" 99 3471419 "['FR' 'JP' 'DE' 'IT' 'US' 'GB' 'AR']" 100 end
Thank you
Comment