Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge if variable name is similar

    Dear Stata users,


    I am trying to merge two datasets which are firm-level panel data (master) and NAICS to BEA code crosswalk data.
    NAICS code in my master data is quite heterogeneous. What is, it varies from 2-digits to 6-digits.
    However, NAICS to BEA code crosswalk data looks like the following:
    naics beacode
    111 1100
    112 1100
    113 1130
    114 1130
    115 1130
    211 2110
    212 2120
    213 2130
    22 2200
    23 2300
    311 3110
    312 3110
    313 3130
    314 3130
    315 3150
    316 3150
    321 3210
    322 3220
    323 3230
    324 3240
    325 3250
    326 3260
    327 3270
    331 3310
    332 3320
    333 3330
    334 3340
    335 3350
    336 3360
    337 3370
    339 3390
    42 4200
    44 4400
    45 4400
    481 4810
    482 4820
    483 4830
    484 4840
    485 4850
    486 4860
    487 4870
    488 4870
    492 4870
    493 4930
    511 5110
    512 5120
    513 5130
    514 5140
    515 5130
    516 5140
    517 5130
    518 5140
    519 5140
    521 5210
    522 5210
    523 5230
    524 5240
    525 5250
    531 5310
    532 5320
    533 5320
    5411 5411
    5412 5412
    5413 5412
    5414 5412
    5415 5415
    5416 5412
    5417 5412
    5418 5412
    5419 5412
    55 5500
    561 5610
    562 5620
    61 6100
    621 6210
    622 6220
    623 6220
    624 6240
    711 7110
    712 7110
    713 7130
    721 7210
    722 7220
    81 8100


    What I want to do is, for example, firm A's NAICS code is 817890, however since the crosswalk data does not include this code, I would like to make them as 81 (closest code). Therefore, BEA code for this firm A is 8100.
    Or, if I take another example, firm B's NAICS code is 722909. In this case, the closest code in the crosswalk data would be 722, therefore matched BEA code would be 7220.
    I hope this example is clear.
    In sum, I would like to match my master data with closest NAICS code in the crosswalk data in order to match with BEA code.
    Would there be any code to implement this?


    Thank you very much in advance,
    AC

  • #2
    Assume the data example in #1 is saved as "crosswalk.dta". The code below finds proper NAICS for firms A and B.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str1 firmid float naics_o
    "A" 817890
    "B" 722909
    end
    
    gen naics = naics_o
    expand ustrlen(strofreal(naics))-1
    bys firmid: replace naics = int(naics/10^(_n-1))
    
    merge m:1 naics using crosswalk, keep(1 3)
    bys firmid (_merge naics): keep if _n == _N
    drop _merge
    Results would be

    Code:
         +------------------------------------+
         | firmid   naics_o   naics   beacode |
         |------------------------------------|
      1. |      A    817890      81      8100 |
      2. |      B    722909     722      7220 |
         +------------------------------------+

    Comment


    • #3
      Fei Wang Thank you very much ! I works well ◡̈ !

      Comment


      • #4
        Here is an alternative method that creates a directory.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int beacode
        1100
        1100
        1130
        1130
        1130
        2110
        2120
        2130
        2200
        2300
        3110
        3110
        3130
        3130
        3150
        3150
        3210
        3220
        3230
        3240
        3250
        3260
        3270
        3310
        3320
        3330
        3340
        3350
        3360
        3370
        3390
        4200
        4400
        4400
        4810
        4820
        4830
        4840
        4850
        4860
        4870
        4870
        4870
        4930
        5110
        5120
        5130
        5140
        5130
        5140
        5130
        5140
        5140
        5210
        5210
        5230
        5240
        5250
        5310
        5320
        5320
        5411
        5412
        5412
        5412
        5415
        5412
        5412
        5412
        5412
        5500
        5610
        5620
        6100
        6210
        6220
        6220
        6240
        7110
        7110
        7130
        7210
        7220
        8100
        end
        
        contract beacode
        replace _freq= beacode
        tsset beacode
        tsfill
        replace _freq= _freq[_n-1] if missing(_freq)
        rename (beacode _freq) (newid beacode)
        tempfile directory
        save `directory'
        
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int naics
         111
         112
         113
         114
         115
         211
         212
         213
          22
          23
         311
         312
         313
         314
         315
         316
         321
         322
         323
         324
         325
         326
         327
         331
         332
         333
         334
         335
         336
         337
         339
          42
          44
          45
         481
         482
         483
         484
         485
         486
         487
         488
         492
         493
         511
         512
         513
         514
         515
         516
         517
         518
         519
         521
         522
         523
         524
         525
         531
         532
         533
        5411
        5412
        5413
        5414
        5415
        5416
        5417
        5418
        5419
          55
         561
         562
          61
         621
         622
         623
         624
         711
         712
         713
         721
         722
          81
        end
        gen newid= real(substr(string(naics)+"0000", 1, 4))
        merge 1:1 newid using `directory', keep(master match) nogen
        Res.:

        Code:
        . l, sep(0)
        
             +-------------------------+
             | naics   newid   beacode |
             |-------------------------|
          1. |   111    1110      1100 |
          2. |   112    1120      1100 |
          3. |   113    1130      1130 |
          4. |   114    1140      1130 |
          5. |   115    1150      1130 |
          6. |   211    2110      2110 |
          7. |   212    2120      2120 |
          8. |   213    2130      2130 |
          9. |    22    2200      2200 |
         10. |    23    2300      2300 |
         11. |   311    3110      3110 |
         12. |   312    3120      3110 |
         13. |   313    3130      3130 |
         14. |   314    3140      3130 |
         15. |   315    3150      3150 |
         16. |   316    3160      3150 |
         17. |   321    3210      3210 |
         18. |   322    3220      3220 |
         19. |   323    3230      3230 |
         20. |   324    3240      3240 |
         21. |   325    3250      3250 |
         22. |   326    3260      3260 |
         23. |   327    3270      3270 |
         24. |   331    3310      3310 |
         25. |   332    3320      3320 |
         26. |   333    3330      3330 |
         27. |   334    3340      3340 |
         28. |   335    3350      3350 |
         29. |   336    3360      3360 |
         30. |   337    3370      3370 |
         31. |   339    3390      3390 |
         32. |    42    4200      4200 |
         33. |    44    4400      4400 |
         34. |    45    4500      4400 |
         35. |   481    4810      4810 |
         36. |   482    4820      4820 |
         37. |   483    4830      4830 |
         38. |   484    4840      4840 |
         39. |   485    4850      4850 |
         40. |   486    4860      4860 |
         41. |   487    4870      4870 |
         42. |   488    4880      4870 |
         43. |   492    4920      4870 |
         44. |   493    4930      4930 |
         45. |   511    5110      5110 |
         46. |   512    5120      5120 |
         47. |   513    5130      5130 |
         48. |   514    5140      5140 |
         49. |   515    5150      5140 |
         50. |   516    5160      5140 |
         51. |   517    5170      5140 |
         52. |   518    5180      5140 |
         53. |   519    5190      5140 |
         54. |   521    5210      5210 |
         55. |   522    5220      5210 |
         56. |   523    5230      5230 |
         57. |   524    5240      5240 |
         58. |   525    5250      5250 |
         59. |   531    5310      5310 |
         60. |   532    5320      5320 |
         61. |   533    5330      5320 |
         62. |  5411    5411      5411 |
         63. |  5412    5412      5412 |
         64. |  5413    5413      5412 |
         65. |  5414    5414      5412 |
         66. |  5415    5415      5415 |
         67. |  5416    5416      5415 |
         68. |  5417    5417      5415 |
         69. |  5418    5418      5415 |
         70. |  5419    5419      5415 |
         71. |    55    5500      5500 |
         72. |   561    5610      5610 |
         73. |   562    5620      5620 |
         74. |    61    6100      6100 |
         75. |   621    6210      6210 |
         76. |   622    6220      6220 |
         77. |   623    6230      6220 |
         78. |   624    6240      6240 |
         79. |   711    7110      7110 |
         80. |   712    7120      7110 |
         81. |   713    7130      7130 |
         82. |   721    7210      7210 |
         83. |   722    7220      7220 |
         84. |    81    8100      8100 |
             +-------------------------+

        Comment


        • #5
          Thank you very much for the alternative method !

          Comment

          Working...
          X