Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • unsuccessful merge

    Hello everyone!

    I am trying to merge two data files from different sources using the command merge 1:1, based on a string variable Country. However, the merge is never successful, even though the names of countries in both data sets are identical.
    I suspect this is because each of the data files has Country saved as a different type of variable (str42 and str30). When I try to merge, Stata tries to adjust the data (note: variable Country was str30, now str42 to accommodate using data's values) which, I believe, causes the merge to fail.

    Is there any way to deal with this issue?

    Any help is very appreciated.

    Thanks in advance,
    Barbora


  • #2
    at a guess, you have spaces in one of the files - they may be between words (if a country name has more than one word) or at the beginning or at the end; see
    Code:
    help string functions
    and search on "trim"

    Comment


    • #3
      See also

      Code:
      help compress
      after applying trim().

      Comment


      • #4
        As Rich and Nick mentioned, it's probably extra spaces (probably at the end of country names) that are causing your problem.

        To give you an example, imagine I have a list of countries, and I am trying to add in data on the land area to my master list:
        (I just grabbed this data from here)

        Code:
        dataex id country pop  //  Data shared using Stata command dataex. To install: ssc install dataex
        clear
        input byte id str13 country double pop
         1 "China"         1420.062
         2 "India"         1368.738
         3 "United States"  329.093
         4 "Indonesia"      269.536
         5 "Brazil"         212.393
         6 "Pakistan"       204.596
         7 "Nigeria"        200.962
         8 "Bangladesh"     168.066
         9 "Russia"         143.896
        10 "Mexico"         132.328
        end
        
        format pop %10.2fc
        . list, noobs
        
          +-------------------------------+
          | id         country        pop |
          |-------------------------------|
          |  1           China   1,420.06 |
          |  2           India   1,368.74 |
          |  3   United States     329.09 |
          |  4       Indonesia     269.54 |
          |  5          Brazil     212.39 |
          |-------------------------------|
          |  6        Pakistan     204.60 |
          |  7         Nigeria     200.96 |
          |  8      Bangladesh     168.07 |
          |  9          Russia     143.90 |
          | 10          Mexico     132.33 |
          +-------------------------------+
        
        *** This file will be the master file, I've called it "merge1 - country & pop.dta"
        I've taken those same 10 countries and added some trailing spaces to the end of obs 1-5.
        Code:
        dataex id country land_area  //  Data shared using Stata command dataex. To install: ssc install dataex
        clear
        input byte id str15 country long land_area
         1 "China     "       9388211
         2 "India     "       2973190
         3 "United States  "  9147420
         4 "Indonesia  "      1811570
         5 "Brazil  "         8358140
         6 "Pakistan"          770880
         7 "Nigeria"           910770
         8 "Bangladesh"        130170
         9 "Russia"          16376870
        10 "Mexico"           1943950
        end
        
        format land_area %12.0gc
        label var land_area "Land area in square kilometers"
        
        . list, noobs
        
          +-----------------------------------+
          | id           country    land_area |
          |-----------------------------------|
          |  1        China         9,388,211 |
          |  2        India         2,973,190 |
          |  3   United States      9,147,420 |
          |  4       Indonesia      1,811,570 |
          |  5          Brazil      8,358,140 |
          |-----------------------------------|
          |  6          Pakistan      770,880 |
          |  7           Nigeria      910,770 |
          |  8        Bangladesh      130,170 |
          |  9            Russia   16,376,870 |
          | 10            Mexico    1,943,950 |
          +-----------------------------------+
        
        *** This will be the using file, I've called it "merge2 - country & land area.dta"
        Code:
        * When I try to merge the two files:
        use "merge1 - country & pop.dta", clear
        
        * Merging using the old syntax (someone can update)
        merge country using "merge2 - country & land area.dta", unique sort nonotes nokeep  _merge(merge_land_area)
        sort id
        
        list, noobs
        
          +-------------------------------------------------------+
          | id         country        pop    land_area   merge_~a |
          |-------------------------------------------------------|
          |  1           China   1,420.06            .          1 |
          |  2           India   1,368.74            .          1 |
          |  3   United States     329.09            .          1 |
          |  4       Indonesia     269.54            .          1 |
          |  5          Brazil     212.39            .          1 |
          |-------------------------------------------------------|
          |  6        Pakistan     204.60      770,880          3 |
          |  7         Nigeria     200.96      910,770          3 |
          |  8      Bangladesh     168.07      130,170          3 |
          |  9          Russia     143.90   16,376,870          3 |
          | 10          Mexico     132.33    1,943,950          3 |
          +-------------------------------------------------------+


        Comment


        • #5
          Thank you for the help, it worked!

          Comment

          Working...
          X