Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching Pairs and Indicating Common Characters

    Dear All,

    I am encountering a problem that I found really hard to solve. I am writing here for seek for some advices.
    There are two datasets in my analysis. The first one contains interstate conflicts data. The data looks like (for example):
    Conflict ID Involved Countries Side A or B
    1 USA A
    1 Canada B
    1 Australia A
    1 China B
    1 Russia B
    2 India A
    2 Thailand B
    2 Bangladesh B
    (A: Conflict starter. B: The other side of the conflict)
    My aim is to pair all the A and B side within each Conflict ID. Is there any quick way that I can use to pair them?

    The other dataset includes Country's the certain information.
    Country Characteristics
    USA HAHAHA
    USA LALALA
    USA WAWAWA
    Canada LALALA
    Canada NONONO
    China HAHAHA
    China YAYAYA
    Australia HAHAHA
    Russia YAYAYA
    My aim for this dataset is to identify whether two country share the same characteristics.

    Finally, I would like to combine these two dataset to create a new dataset indicating a paired country with
    a dummy indicating whether they share the same characteristic. The ideal data structure look like:
    Conflict ID Pair ID Country Side Character Dummy
    1 1 USA A 1
    1 1 Canada B 1
    1 2 USA A 1
    1 2 China B 1
    1 3 USA A 0
    1 3 Russia B 0
    I understand this is quite long and a bit confusing question! I really appreciate any help from here!
    If there is anything unclear, I will be more than happy to explain!

    Thank you in advance for your kind help! I look forward to hearing from you!

    Best regards
    Long







  • #2
    I'm completely confused about what you want to do here. Let's look at Conflict ID #1 in your ideal data structure. Why did you form pairs for USA, but not for Australia? What pairs would you form for Conflict ID #2 in the first table? And do you set character dummy = 1 if the pair of countries have any of the characteristics in common?

    Comment


    • #3
      Dear Prof. Schechter,

      Sorry for the confusing!

      1. I listed conflict 1 and 2 just to show what the conflict data look like.
      And for the following tables, I use conflict 1 as the main example.
      Therefore, the Conflict 2 data are not mentioned any more.

      2. For the last table, I only used USA as an example to form pairs,
      bur I should have listed the pairs for Australia. (USA and Australia are alliance in this example)

      3. Yes. I set the dummy = 1 if there is ANY characteristics in common.


      Let me simplify my example here:
      1. Conflict Dataset:
      Conflict ID Country Side A or B
      1 USA A
      1 China B
      1 Canada B
      2 India A
      2 China B
      (USA [Side A] starts the conflict to China and Canada - China and Canada are alliance in this conflict)

      To PAIR this dataset:
      Conflict ID Pair ID Country Side
      1 1 USA A
      1 1 China B
      1 2 USA A
      1 2 Canada B
      2 3 India A
      2 3 China B
      (For each pair, the two country contain both in Side A and Side B)
      (So, for each conflict ID, the total number of pair = # of Side A * # of Side B)

      2. Characteristic Dataset (Let's take common ethnic group for example)
      Country Ethnic Group
      USA English
      Canada English
      Canada French
      China Chinese
      India Indian
      (In this example, of course not for real, USA only has English ethnic group
      while Canada only have English and French ethnic group)

      Therefore, in each pair (2nd table), I will have USA and China do not share the same ethnic group (dummy = 0),
      but Canada and USA share the same ethnic group (dummy = 1).
      Therefore, the dummy is also in a pair form. For each pair the dummy is the same.

      Final Table I am looking for (Combining 2nd table + the information in the 3rd table):
      [the first four columns are exactly the same in 2nd table, but I add one dummy from the 3rd table]
      Conflict ID Pair ID Country Side Dummy
      1 1 USA A 0
      1 1 China B 0
      1 2 USA A 1
      1 2 Canada B 1
      2 3 India A 0
      2 3 China B 0
      Hope I have clarified my question. If there is anything unclear, I am very happy to provide more information!

      I look forward to hearing from you!

      Best regards
      Long



      Comment


      • #4
        Thanks for the clarifications. With the understanding gained from #3, I decided to work with the example data in #1 as it is richer. I think this works (at least it does for the sample data):

        Code:
        //    GENREATE DATA SETS TO MATCH EXAMPLE IN POST #1
        clear
        input byte conflict str10 country str1 side
        1 "USA"        "A"
        1 "Canada"     "B"
        1 "Australia"  "A"
        1 "China"      "B"
        1 "Russia"     "B"
        2 "India"      "A"
        2 "Thailand"   "B"
        2 "Bangladesh" "B"
        end
        save conflicts, replace
        
        clear
        input str9 country str6 characteristic
        "USA"       "HAHAHA"
        "USA"       "LALALA"
        "USA"       "WAWAWA"
        "Canada"    "LALALA"
        "Canada"    "NONONO"
        "China"     "HAHAHA"
        "China"     "YAYAYA"
        "Australia" "HAHAHA"
        "Russia"    "YAYAYA"
        end
        save characteristics, replace
        
        // CREATE A CONFLICT PAIR DATA SET 
        // IN WIDE LAYOUT FOR NOW (WIL RESHAPE LONG LATER)
        tempfile B
        use conflicts, clear
        preserve
        keep if side == "B"
        drop side
        rename country country_B
        save `B'
        restore
        keep if side == "A"
        rename country country_A
        drop side
        joinby conflict using `B'
        gen pair = _n
        tempfile conflict_pairs
        save `conflict_pairs'
        list, noob clean
        
        //    NOW CREATE A FILE OF PAIRS OF COUNTRIES
        //    WHICH MATCH ON ANY CHARACTERISTIC
        use characteristics, clear
        rename country country2
        rename characteristic characteristic_2
        cross using characteristics
        keep if characteristic == characteristic_2
        drop characteristic_2
        drop if country == country2
        //    AND DUPLICATE THE OBSERVATIONS REVERSING
        //    WHICH COUNTRY IS WHICH
        tempfile characteristic_matches
        save `characteristic_matches'
        gen junk = country
        replace country = country2
        drop country2
        rename junk country2
        append using `characteristic_matches'
        rename country country_A
        rename country2 country_B
        drop characteristic
        duplicates drop
        list, noobs clean
        
        //    NOW MERGE THIS WITH THE CONFLICT PAIRS
        merge 1:m country_A country_B using `conflict_pairs', keep(match using)
        gen byte character_dummy = (_merge == 3)
        drop _merge
        //    AND GO TO LONG LAYOUT
        reshape long country_, i(conflict pair) j(side) string
        rename country_ country
        
        list, noobs clean

        Comment


        • #5
          Dear Prof. Schecther, Thank you sooo much for the great help! - Best regards, Long

          Comment

          Working...
          X