Matching Pairs and Indicating Common Characters

Long Hong

Join Date: Oct 2015

Posts: 68
#1

Matching Pairs and Indicating Common Characters

06 Nov 2015, 14:47

Dear All,

I am encountering a problem that I found really hard to solve. I am writing here for seek for some advices.
There are two datasets in my analysis. The first one contains interstate conflicts data. The data looks like (for example):
Conflict ID Involved Countries Side A or B

1 USA A

1 Canada B

1 Australia A

1 China B

1 Russia B

2 India A

2 Thailand B

2 Bangladesh B

(A: Conflict starter. B: The other side of the conflict)
My aim is to pair all the A and B side within each Conflict ID. Is there any quick way that I can use to pair them?

The other dataset includes Country's the certain information.
Country Characteristics

USA HAHAHA

USA LALALA

USA WAWAWA

Canada LALALA

Canada NONONO

China HAHAHA

China YAYAYA

Australia HAHAHA

Russia YAYAYA

My aim for this dataset is to identify whether two country share the same characteristics.

Finally, I would like to combine these two dataset to create a new dataset indicating a paired country with
a dummy indicating whether they share the same characteristic. The ideal data structure look like:
Conflict ID Pair ID Country Side Character Dummy

1 1 USA A 1

1 1 Canada B 1

1 2 USA A 1

1 2 China B 1

1 3 USA A 0

1 3 Russia B 0

I understand this is quite long and a bit confusing question! I really appreciate any help from here!
If there is anything unclear, I will be more than happy to explain!

Thank you in advance for your kind help! I look forward to hearing from you!

Best regards
Long
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

06 Nov 2015, 14:57

I'm completely confused about what you want to do here. Let's look at Conflict ID #1 in your ideal data structure. Why did you form pairs for USA, but not for Australia? What pairs would you form for Conflict ID #2 in the first table? And do you set character dummy = 1 if the pair of countries have any of the characteristics in common?
Comment
Long Hong

Join Date: Oct 2015

Posts: 68
#3

07 Nov 2015, 01:29

Dear Prof. Schechter,

Sorry for the confusing!

1. I listed conflict 1 and 2 just to show what the conflict data look like.
And for the following tables, I use conflict 1 as the main example.
Therefore, the Conflict 2 data are not mentioned any more.

2. For the last table, I only used USA as an example to form pairs,
bur I should have listed the pairs for Australia. (USA and Australia are alliance in this example)

3. Yes. I set the dummy = 1 if there is ANY characteristics in common.

Let me simplify my example here:
1. Conflict Dataset:
Conflict ID Country Side A or B

1 USA A

1 China B

1 Canada B

2 India A

2 China B

(USA [Side A] starts the conflict to China and Canada - China and Canada are alliance in this conflict)

To PAIR this dataset:

Conflict ID Pair ID Country Side

1 1 USA A

1 1 China B

1 2 USA A

1 2 Canada B

2 3 India A

2 3 China B

(For each pair, the two country contain both in Side A and Side B)
(So, for each conflict ID, the total number of pair = # of Side A * # of Side B)

2. Characteristic Dataset (Let's take common ethnic group for example)
Country Ethnic Group

USA English

Canada English

Canada French

China Chinese

India Indian

(In this example, of course not for real, USA only has English ethnic group
while Canada only have English and French ethnic group)

Therefore, in each pair (2nd table), I will have USA and China do not share the same ethnic group (dummy = 0),
but Canada and USA share the same ethnic group (dummy = 1).
Therefore, the dummy is also in a pair form. For each pair the dummy is the same.

Final Table I am looking for (Combining 2nd table + the information in the 3rd table):
[the first four columns are exactly the same in 2nd table, but I add one dummy from the 3rd table]

Conflict ID Pair ID Country Side Dummy

1 1 USA A 0

1 1 China B 0

1 2 USA A 1

1 2 Canada B 1

2 3 India A 0

2 3 China B 0

Hope I have clarified my question. If there is anything unclear, I am very happy to provide more information!

I look forward to hearing from you!

Best regards
Long
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30119

07 Nov 2015, 09:44

Thanks for the clarifications. With the understanding gained from #3, I decided to work with the example data in #1 as it is richer. I think this works (at least it does for the sample data):

Code:

//    GENREATE DATA SETS TO MATCH EXAMPLE IN POST #1
clear
input byte conflict str10 country str1 side
1 "USA"        "A"
1 "Canada"     "B"
1 "Australia"  "A"
1 "China"      "B"
1 "Russia"     "B"
2 "India"      "A"
2 "Thailand"   "B"
2 "Bangladesh" "B"
end
save conflicts, replace

clear
input str9 country str6 characteristic
"USA"       "HAHAHA"
"USA"       "LALALA"
"USA"       "WAWAWA"
"Canada"    "LALALA"
"Canada"    "NONONO"
"China"     "HAHAHA"
"China"     "YAYAYA"
"Australia" "HAHAHA"
"Russia"    "YAYAYA"
end
save characteristics, replace

// CREATE A CONFLICT PAIR DATA SET 
// IN WIDE LAYOUT FOR NOW (WIL RESHAPE LONG LATER)
tempfile B
use conflicts, clear
preserve
keep if side == "B"
drop side
rename country country_B
save `B'
restore
keep if side == "A"
rename country country_A
drop side
joinby conflict using `B'
gen pair = _n
tempfile conflict_pairs
save `conflict_pairs'
list, noob clean

//    NOW CREATE A FILE OF PAIRS OF COUNTRIES
//    WHICH MATCH ON ANY CHARACTERISTIC
use characteristics, clear
rename country country2
rename characteristic characteristic_2
cross using characteristics
keep if characteristic == characteristic_2
drop characteristic_2
drop if country == country2
//    AND DUPLICATE THE OBSERVATIONS REVERSING
//    WHICH COUNTRY IS WHICH
tempfile characteristic_matches
save `characteristic_matches'
gen junk = country
replace country = country2
drop country2
rename junk country2
append using `characteristic_matches'
rename country country_A
rename country2 country_B
drop characteristic
duplicates drop
list, noobs clean

//    NOW MERGE THIS WITH THE CONFLICT PAIRS
merge 1:m country_A country_B using `conflict_pairs', keep(match using)
gen byte character_dummy = (_merge == 3)
drop _merge
//    AND GO TO LONG LAYOUT
reshape long country_, i(conflict pair) j(side) string
rename country_ country

list, noobs clean

Comment

Long Hong

Join Date: Oct 2015

Posts: 68
#5

07 Nov 2015, 14:38

Dear Prof. Schecther, Thank you sooo much for the great help! - Best regards, Long
Comment

Conflict ID	Involved Countries	Side A or B
1	USA	A
1	Canada	B
1	Australia	A
1	China	B
1	Russia	B
2	India	A
2	Thailand	B
2	Bangladesh	B

Country	Characteristics
USA	HAHAHA
USA	LALALA
USA	WAWAWA
Canada	LALALA
Canada	NONONO
China	HAHAHA
China	YAYAYA
Australia	HAHAHA
Russia	YAYAYA

Conflict ID	Pair ID	Country	Side	Character Dummy
1	1	USA	A	1
1	1	Canada	B	1
1	2	USA	A	1
1	2	China	B	1
1	3	USA	A	0
1	3	Russia	B	0

Conflict ID	Country	Side A or B
1	USA	A
1	China	B
1	Canada	B
2	India	A
2	China	B

Announcement

Matching Pairs and Indicating Common Characters

Comment

Comment

Comment

Comment