Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two panel data set when name of countries differ slighlty

    I am using two different data sets from; one from World Bank and another one from IMF. The name of the countries in their database differs slightly for some countries.
    I am interest in finding a faster way to merge them without having to change the name of the countries in the excel file.


    Differences:

    WB
    IMF
    Congo, Dem. Rep. Congo, Democratic Republic of
    Egypt, Arab Rep. Egypt
    Korea, Rep. Korea, Republic of
    Venezuela, RB Venezuela, Republica Bolivariana de
    P.S. There are some more differences in the dataset.




  • #2
    Arbnor:
    a possible approach calls -split- on duty:
    Code:
    . set obs 1
    number of observations (_N) was 0, now 1
    
    . generate str var1 = "Congo, Dem. Rep" in 1
    
    . rename var1 WB
    
    . help split
    
    . split WB,parse(,)
    variables created as string: 
    WB1  WB2
    
    . list
    
         +-------------------------------------+
         |              WB     WB1         WB2 |
         |-------------------------------------|
      1. | Congo, Dem. Rep   Congo    Dem. Rep |
         +-------------------------------------+
    
    .
    Then you can -drop- -WB- and -WB2- and:
    Code:
    rename WB1 WB
    And then -merge-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      First of all the variable name is countryname and WB stands for World Bank and IMF for International Monetary Found. Your technic might work with some countries but not always.
      In the data set there are countries like: Congo, Democratic Republic of and Congo, Republic of, or Guinea and Guinea-Bissau. So, if I use your commands will repeated data for country like Congo and Guinea. That's the main problem in this task.

      Comment


      • #4
        Arbnor:
        thanks for clarifying what the meaning of obscure acronyms such as WB and IMF (admittedly, I surmised something similar to your explanation after some days spent fiddling with economic data).
        That said, you can -rename- ambiguous country name whenever they creep up.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks a lot Mr. Lazzaro.

          Comment


          • #6
            You may want to look at fuzzy matching technique to match those data sets that do not have exact match, which is exactly your problem. Since I don't like the Stata way of doing the technique, I'd recommend R and Python. I always to do it in Python and it works very well. There are many great examples in https://stackoverflow.com

            Here is also another thread that you many want to look at it if you would like to do it in Stata https://www.statalist.org/forums/for...ring-variables.
            https://www.statalist.org/forums/for...-e-fuzzy-match

            Good luck,
            Mehmet

            Comment


            • #7
              I work with cross-country macroeconomic data and the preferred technique is to use International Organization for Standardization (ISO) codes to identify countries instead of country names. Both the WB and IMF data sets include these ISO codes. For organizations that use different versions, install the kountry command from SSC by Rafal Raciborski that will allow you to convert between one and the other.

              Code:
              ssc install kountry
              For your immediate problem, the following do-file based on this paper will give you the necessary adjustments that are needed to convert IMF country names to WB country names.

              Attached Files

              Comment

              Working...
              X