Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging Data with mismatch in Variable Names

    I currently have two datasets, This is the first one:-

    input str6 State str3 Statepost byte Year double Expenditure long Income double(Population CPI)
    "Alaba " "AL " 80 5159.8 29199 3901264.583 82.4
    "Alaba " "AL " 81 5639.3 32198 3918531 90.9
    "Alaba " "AL " 82 5986.8 34100 3925266 96.5
    "Alaba " "AL " 83 6605.3 36588 3934102 99.6
    "Alaba " "AL " 84 6940.1 39869 3951820 103.9
    "Alaba " "AL " 85 7929.263 42913 3972523 107.6
    "Alaba " "AL " 86 8614.51 45939 3991569 109.6
    "Alaba " "AL " 87 8472.785 48753 4015264 113.6
    "Alaba " "AL " 88 9149 52720 4023844 118.3
    "Alaba " "AL " 89 9933.606 56112 4030222 124
    "Alaba " "AL " 90 10868.354 60776 4048508 130.7
    "Alaba " "AL " 91 12031.041 63458 4091025 136.2
    "Alaba " "AL " 92 12865.789 68254 4139269 140.3
    "Alaba " "AL " 93 13753.988 71620 4193114 144.5
    "Alaba " "AL " 94 14781.534 75621 4232965 148.2
    "Alaba " "AL " 95 15735.583 81578 4262731 152.4
    "Alaba " "AL " 96 16585.241 87221 4290403 156.9
    "Alaba " "AL " 97 17229.332 91283 4320281 160.5
    "Alaba " "AL " 98 18583.232 96257 4351037 163
    "Alaba " "AL " 99 20119.093 100400 4369862 166.6
    "Alask " "AK " 80 2502.7 5136 406772.30288 82.4
    "Alask " "AK " 81 2895.7 5667 418491 90.9
    "Alask " "AK " 82 3486.6 7400 449606 96.5
    "Alask " "AK " 83 4149.2 8238 488417 99.6
    "Alask " "AK " 84 4364.7 8737 513702 103.9
    "Alask " "AK " 85 4956.118 9476 532495 107.6
    "Alask " "AK " 86 5039.416 9495 544268 109.6
    "Alask " "AK " 87 5226.657 9576 539309 113.6
    "Alask " "AK " 88 5002 10006 541983 118.3
    "Alask " "AK " 89 5187.073 11407 547159 124
    "Alask " "AK " 90 5376.046 11956 553120 130.7
    "Alask " "AK " 91 5572.551 12015 569273 136.2
    "Alask " "AK " 92 5807.24 13074 587073 140.3
    "Alask " "AK " 93 6021.477 13785 596993 144.5
    "Alask " "AK " 94 6160.897 14184 600624 148.2
    "Alask " "AK " 95 6253.194 14488 601345 152.4
    "Alask " "AK " 96 6291.342 15762 604918 156.9
    "Alask " "AK " 97 6597.742 16465 608846 160.5
    "Alask " "AK " 98 6762.597 17167 615205 163
    "Alask " "AK " 99 7131.493 17700 619500 166.6
    "Arizo " "AZ " 80 4204.7 23951 2745462.5268 82.4
    "Arizo " "AZ " 81 4751.7 27256 2810107 90.9
    "Arizo " "AZ " 82 5146.3 29100 2889861 96.5
    "Arizo " "AZ " 83 5586.3 31575 2968925 99.6
    "Arizo " "AZ " 84 6350.1 36151 3067135 103.9
    "Arizo " "AZ " 85 7346.194 40775 3183538 107.6
    "Arizo " "AZ " 86 8428.842 44719 3308262 109.6
    "Arizo " "AZ " 87 9490.366 48466 3437103 113.6
    "Arizo " "AZ " 88 10398 52233 3535183 118.3
    "Arizo " "AZ " 89 11422.003 56196 3622185 124
    "Arizo " "AZ " 90 13040.036 58946 3679056 130.7
    "Arizo " "AZ " 91 12831.049 62166 3762394 136.2
    "Arizo " "AZ " 92 13448.316 67001 3867333 140.3
    "Arizo " "AZ " 93 14195.414 71317 3993390 144.5
    "Arizo " "AZ " 94 15033.333 78050 4147561 148.2
    "Arizo " "AZ " 95 15853.3 86420 4306908 152.4
    "Arizo " "AZ " 96 16949.913 95787 4432308 156.9
    "Arizo " "AZ " 97 17769.15 103704 4552207 160.5
    "Arizo " "AZ " 98 18642.285 112635 4667277 163
    "Arizo " "AZ " 99 20302.006 120300 4778332 166.6
    "Arkan " "AR " 80 2742.3 16651 2288462.7012 82.4
    "Arkan " "AR " 81 2980 18467 2293201 90.9
    "Arkan " "AR " 82 3080.9 19400 2294257 96.5
    "Arkan " "AR " 83 3193.4 20875 2305761 99.6
    "Arkan " "AR " 84 3471.2 23033 2319768 103.9
    "Arkan " "AR " 85 3998.494 24707 2327046 107.6
    "Arkan " "AR " 86 4370.985 26268 2331984 109.6
    "Arkan " "AR " 87 4553.018 27481 2342355 113.6
    "Arkan " "AR " 88 4667 29263 2342656 118.3
    "Arkan " "AR " 89 4924.335 31035 2346358 124
    "Arkan " "AR " 90 5351.802 33389 2354343 130.7
    "Arkan " "AR " 91 5787.486 34698 2370666 136.2
    "Arkan " "AR " 92 6598.587 37312 2394098 140.3
    "Arkan " "AR " 93 7208.867 38776 2423743 144.5
    "Arkan " "AR " 94 7449.233 41248 2450605 148.2
    "Arkan " "AR " 95 8019.786 44958 2480121 152.4
    "Arkan " "AR " 96 8678.978 48700 2504858 156.9
    "Arkan " "AR " 97 9412.686 51059 2524007 160.5
    "Arkan " "AR " 98 10100.995 53962 2538202 163
    "Arkan " "AR " 99 10661.692 56700 2551373 166.6
    "Calif " "CA " 80 43412.7 259551 23851641.408 82.4
    "Calif " "CA " 81 49700.1 288481 24285933 90.9
    "Calif " "CA " 82 53087.9 310700 24820009 96.5
    "Calif " "CA " 83 55827.4 333741 25360026 99.6
    "Calif " "CA " 84 60390.1 371202 25844393 103.9
    "Calif " "CA " 85 69785.581 423566 26441109 107.6
    "Calif " "CA " 86 76082.192 456099 27102237 109.6
    "Calif " "CA " 87 84581.465 492989 27777158 113.6
    "Calif " "CA " 88 91729 530968 28464249 118.3
    "Calif " "CA " 89 97576.123 579189 29218164 124
    "Calif " "CA " 90 112945.261 619381 29950111 130.7
    "Calif " "CA " 91 120865.223 633326 30414114 136.2
    "Calif " "CA " 92 130091.682 667318 30875920 140.3
    "Calif " "CA " 93 134571.297 683002 31147208 144.5
    "Calif " "CA " 94 138527.68 702568 31317179 148.2
    "Calif " "CA " 95 145424.747 760431 31493525 152.4
    "Calif " "CA " 96 151432.535 812404 31780829 156.9
    "Calif " "CA " 97 158811.032 862114 32217708 160.5
    "Calif " "CA " 98 167984.922 924253 32682794 163
    "Calif " "CA " 99 180874.803 989600 33145121 166.6

    This is the second one:-

    input byte Year str5 State double(dwnom1 dwnom2)
    80 "Alaba" -.089 .915
    80 "Alask" -.020999999999999998 -.387
    80 "Arizo" .257 .13700000000000004
    80 "Arkan" -.22649999999999998 .6055
    80 "Calif" -.07050000000000001 -.4205
    80 "Color" .05350000000000002 .0365
    80 "Conne" -.2905 -.632
    80 "Delaw" -.104 -.1525
    80 "Flori" -.0995 .515
    80 "Georg" -.0145 .9345
    80 "Hawai" -.369 .10300000000000001
    80 "Idaho" .10200000000000001 .2525
    80 "Illin" -.19999999999999998 -.52
    80 "India" -.07649999999999998 -.3345
    80 "Iowa" -.12149999999999997 -.10700000000000001
    80 "Kansa" .1925 -.53
    80 "Kentu" -.20650000000000002 .7070000000000001
    80 "Louis" -.0245 .8135
    80 "Maine" -.261 -.18766666666666665
    80 "Maryl" -.391 -.5255
    80 "Massa" -.548 -.405
    80 "Michi" -.4985 -.1245
    80 "Minne" .048 -.96
    80 "Missi" .1285 .365
    80 "Misso" -.1535 -.247
    80 "Monta" -.25 .4335
    80 "Nebra" .0045000000000000005 1.0565
    80 "Nevad" .1815 .271
    80 "NewHa" .08149999999999999 .1505
    80 "NewJe" -.45999999999999996 -.4175
    80 "NewMe" .2575 -.44500000000000006
    80 "NewYo" -.385 -.6435
    80 "NoCar" .31799999999999995 .7665
    80 "NoDak" -.07050000000000001 .3235
    80 "Ohio" -.416 -.1065
    80 "Oklah" .11299999999999999 .269
    80 "Orego" -.0465 -.977
    80 "Penns" -.0010000000000000009 -.7755
    80 "Rhode" -.271 -.6649999999999999
    80 "SoCar" .15050000000000002 .57
    80 "SoDak" -.287 .0645
    80 "Tenne" -.012499999999999997 .013499999999999956
    80 "Texas" .15549999999999997 .01949999999999999
    80 "Utah" .496 -.154
    80 "Vermo" -.255 -.492
    80 "Virgi" .30400000000000005 .5
    80 "Washi" -.29700000000000004 .1865
    80 "WeVir" -.20600000000000002 .753
    80 "Wisco" -.3695 .29200000000000004
    80 "Wyomi" .37 -.5155
    81 "Alaba" .2455 .567
    81 "Alask" .1905 -.5835
    81 "Arizo" .246 .11500000000000002
    81 "Arkan" -.247 .6154999999999999
    81 "Calif" -.07399999999999998 -.4275
    81 "Color" .05200000000000002 .0315
    81 "Conne" -.3045 -.635
    81 "Delaw" -.09650000000000002 -.159
    81 "Flori" .02099999999999999 .29400000000000004
    81 "Georg" .15849999999999997 .4945
    81 "Hawai" -.3735 .106
    81 "Idaho" .5980000000000001 .1925
    81 "Illin" -.104 -.3385
    81 "India" .379 -.7364999999999999
    81 "Iowa" .275 .1555
    81 "Kansa" .2015 -.533
    81 "Kentu" -.20350000000000001 .7284999999999999
    81 "Louis" -.031 .8195
    81 "Maine" -.16699999999999998 -.321
    81 "Maryl" -.3905 -.5295
    81 "Massa" -.5475 -.40249999999999997
    81 "Michi" -.492 -.091
    81 "Minne" .06749999999999999 -.9359999999999999
    81 "Missi" .1325 .366
    81 "Misso" -.1445 -.25
    81 "Monta" -.263 .484
    81 "Nebra" .0024999999999999953 1.088
    81 "Nevad" .17600000000000002 .262
    81 "NewHa" .416 -.314
    81 "NewJe" -.224 -.6036666666666667
    81 "NewMe" .2615 -.44400000000000006
    81 "NewYo" -.149 -.353
    81 "NoCar" .6719999999999999 .6495
    81 "NoDak" -.1445 .23650000000000002
    81 "Ohio" -.427 -.1115
    81 "Oklah" .244 .8525
    81 "Orego" -.036000000000000004 -.981
    81 "Penns" -.08349999999999999 -.8965
    81 "Rhode" -.265 -.666
    81 "SoCar" .14300000000000002 .5665
    81 "SoDak" .18 .11850000000000001
    81 "Tenne" -.018500000000000003 .0010000000000000009
    81 "Texas" .14300000000000002 -.0040000000000000036
    81 "Utah" .4905 -.152
    81 "Vermo" -.2545 -.496
    81 "Virgi" .3125 .505
    81 "Washi" -.06600000000000002 -.5065000000000001
    81 "WeVir" -.20500000000000002 .7855
    81 "Wisco" .008000000000000007 .2245
    81 "Wyomi" .393

    I then merged them on the basis of Variables "State" and "Year" and got an output that looks like this:-

    input byte Year str6 State double(dwnom1 dwnom2) str3 Statepost double Expenditure long Income double(Population CPI) byte _merge
    80 "Alaba" -.089 .915 "" . . . . 1
    81 "Alaba" .2455 .567 "" . . . . 1
    82 "Alaba" .2455 .567 "" . . . . 1
    83 "Alaba" .243 .5685 "" . . . . 1
    84 "Alaba" .243 .5685 "" . . . . 1
    85 "Alaba" .241 .5695 "" . . . . 1
    86 "Alaba" .241 .5695 "" . . . . 1
    87 "Alaba" .0155 .9904999999999999 "" . . . . 1
    88 "Alaba" .0155 .9904999999999999 "" . . . . 1
    89 "Alaba" .013000000000000001 .992 "" . . . . 1
    90 "Alaba" .013000000000000001 .992 "" . . . . 1
    91 "Alaba" .0105 .9929999999999999 "" . . . . 1
    92 "Alaba" .0105 .9929999999999999 "" . . . . 1
    93 "Alaba" .008 .9944999999999999 "" . . . . 1
    94 "Alaba" .008 .9944999999999999 "" . . . . 1
    95 "Alaba" .206 .7885 "" . . . . 1
    96 "Alaba" .206 .7885 "" . . . . 1
    97 "Alaba" .54 .38949999999999996 "" . . . . 1
    98 "Alaba" .54 .38949999999999996 "" . . . . 1
    99 "Alaba" .54 .38949999999999996 "" . . . . 1
    80 "Alask" -.020999999999999998 -.387 "" . . . . 1
    81 "Alask" .1905 -.5835 "" . . . . 1
    82 "Alask" .1905 -.5835 "" . . . . 1
    83 "Alask" .21000000000000002 -.5555 "" . . . . 1
    84 "Alask" .21000000000000002 -.5555 "" . . . . 1
    85 "Alask" .22899999999999998 -.5275 "" . . . . 1
    86 "Alask" .22899999999999998 -.5275 "" . . . . 1
    87 "Alask" .2485 -.49949999999999994 "" . . . . 1
    88 "Alask" .2485 -.49949999999999994 "" . . . . 1
    89 "Alask" .267 -.4715 "" . . . . 1
    90 "Alask" .267 -.4715 "" . . . . 1
    91 "Alask" .2865 -.4435 "" . . . . 1
    92 "Alask" .2865 -.4435 "" . . . . 1
    93 "Alask" .3055 -.4155 "" . . . . 1
    94 "Alask" .3055 -.4155 "" . . . . 1
    95 "Alask" .3245 -.3875 "" . . . . 1
    96 "Alask" .3245 -.3875 "" . . . . 1
    97 "Alask" .3435 -.35950000000000004 "" . . . . 1
    98 "Alask" .3435 -.35950000000000004 "" . . . . 1
    99 "Alask" .363 -.3315 "" . . . . 1
    80 "Arizo" .257 .13700000000000004 "" . . . . 1
    81 "Arizo" .246 .11500000000000002 "" . . . . 1
    82 "Arizo" .246 .11500000000000002 "" . . . . 1
    83 "Arizo" .235 .09350000000000003 "" . . . . 1
    84 "Arizo" .235 .09350000000000003 "" . . . . 1
    85 "Arizo" .22399999999999998 .07200000000000001 "" . . . . 1
    86 "Arizo" .22399999999999998 .07200000000000001 "" . . . . 1
    87 "Arizo" .09699999999999999 .21400000000000002 "" . . . . 1
    88 "Arizo" .09699999999999999 .21400000000000002 "" . . . . 1
    89 "Arizo" .0945 .14150000000000001 "" . . . . 1
    90 "Arizo" .0945 .14150000000000001 "" . . . . 1
    91 "Arizo" .0925 .0685 "" . . . . 1
    92 "Arizo" .0925 .0685 "" . . . . 1
    93 "Arizo" .09000000000000001 -.003999999999999948 "" . . . . 1
    94 "Arizo" .09000000000000001 -.003999999999999948 "" . . . . 1
    95 "Arizo" .5395000000000001 -.759 "" . . . . 1
    96 "Arizo" .5395000000000001 -.759 "" . . . . 1
    97 "Arizo" .546 -.8315 "" . . . . 1
    98 "Arizo" .546 -.8315 "" . . . . 1
    99 "Arizo" .5525 -.9039999999999999 "" . . . . 1
    80 "Arkan" -.22649999999999998 .6055 "" . . . . 1
    81 "Arkan" -.247 .6154999999999999 "" . . . . 1
    82 "Arkan" -.247 .6154999999999999 "" . . . . 1
    83 "Arkan" -.2675 .626 "" . . . . 1
    84 "Arkan" -.2675 .626 "" . . . . 1
    85 "Arkan" -.28800000000000003 .636 "" . . . . 1
    86 "Arkan" -.28800000000000003 .636 "" . . . . 1
    87 "Arkan" -.309 .6465000000000001 "" . . . . 1
    88 "Arkan" -.309 .6465000000000001 "" . . . . 1
    89 "Arkan" -.32899999999999996 .6565000000000001 "" . . . . 1
    90 "Arkan" -.32899999999999996 .6565000000000001 "" . . . . 1
    91 "Arkan" -.34950000000000003 .667 "" . . . . 1
    92 "Arkan" -.34950000000000003 .667 "" . . . . 1
    93 "Arkan" -.37 .677 "" . . . . 1
    94 "Arkan" -.37 .677 "" . . . . 1
    95 "Arkan" -.3905 .688 "" . . . . 1
    96 "Arkan" -.3905 .688 "" . . . . 1
    97 "Arkan" .0905 .29 "" . . . . 1
    98 "Arkan" .0905 .29 "" . . . . 1
    99 "Arkan" .17099999999999999 .053000000000000005 "" . . . . 1
    80 "Calif" -.07050000000000001 -.4205 "" . . . . 1
    81 "Calif" -.07399999999999998 -.4275 "" . . . . 1
    82 "Calif" -.07399999999999998 -.4275 "" . . . . 1
    83 "Calif" -.07749999999999999 -.325 "" . . . . 1
    84 "Calif" -.07749999999999999 -.325 "" . . . . 1
    85 "Calif" -.08049999999999999 -.33199999999999996 "" . . . . 1
    86 "Calif" -.08049999999999999 -.33199999999999996 "" . . . . 1
    87 "Calif" -.08349999999999999 -.33899999999999997 "" . . . . 1
    88 "Calif" -.08349999999999999 -.33899999999999997 "" . . . . 1
    89 "Calif" -.087 -.346 "" . . . . 1
    90 "Calif" -.087 -.346 "" . . . . 1
    91 "Calif" -.1 -.271 "" . . . . 1
    92 "Calif" -.1 -.271 "" . . . . 1
    93 "Calif" -.44099999999999995 .026999999999999996 "" . . . . 1
    94 "Calif" -.44099999999999995 .026999999999999996 "" . . . . 1
    95 "Calif" -.44099999999999995 .026999999999999996 "" . . . . 1
    96 "Calif" -.44099999999999995 .026999999999999996 "" . . . . 1
    97 "Calif" -.44099999999999995 .026999999999999996 "" . . . . 1
    98 "Calif" -.44099999999999995 .026999999999999996 "" . . . . 1
    99 "Calif" -.44099999999999995 .026999999999999996 "" . . . . 1

    As you can see, it has incorrectly combined the data as there are no units showing up for the CPI Income and Expenditure columns, these just get listed towards the bottom in their original form. Was wondering what is causing this change.

  • #2
    We see from your data that each of the observations in the first dataset has a blank character as the last character in the State variable. Before merging, you will need to
    Code:
    replace State = trim(State)
    in that dataset, because to Stata "Calif " is not the same as "Calif"

    Comment


    • #3
      Thank you William, that solved my issue!

      Comment


      • #4
        As a note to anyone stumbling upon this thread, if the mismatch in your case is less easily solved, you could try -matchit- (on ssc), which does fuzzy string matching.

        Comment

        Working...
        X