Dyad-Year to Directed-Dyad Year

Monica White

Join Date: Jan 2017

Posts: 98
#1

Dyad-Year to Directed-Dyad Year

16 Jan 2017, 13:50

Hello,

I apologize if this has been asked before. I tried to thoroughly look through the forum.

I have trade data from COW that is in dyad-year (non-directed dyad), however, due to the nature of my DV, these data have to be converted to directed-dyad year.

So,

importer1 importer2 flow1 flow2

A B .01 .02
.
.
B C .03 .04
.
.

has to be converted to
importer1 importer2 flow1 flow2

A B .01 .02
.
.
B A .02 .01
.
.
B C .03 .04
.
.
C B .04 .03

I have no idea how to begin to go about this.

The closest links I found in this site are:
http://www.statalist.org/forums/foru...ic?view=thread

and

http://www.statalist.org/forums/foru...ing-many-dyads

However it seems the unit of analyses are from country-year to directed-dyad year.

Many thanks in advanced,
MW
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#2

16 Jan 2017, 14:41

You describe the problem in terms of dyad-year pairs, but you don't seem to have a year variable. I'll assume that's just an omission in the way you showed your example data and that there really is a variable called year. This should do it.

Code:

isid importer1 importer2 year gen temp_importer = importer1 gen temp_flow = flow1 expand 2 by importer1 importer2 year, sort: replace importer1 = importer2 if _n == 2 by importer1 importer2 year: replace importer2 = temp_importer if _n == 2 by importer1 importer2 year: replace flow1 = flow2 if _n == 2 by importer1 importer2 year: replace flow2 = temp_flow if _n == 2 drop temp_importer temp_flow

Note: Not tested. Beware of typos.

The reason it's not tested is that the way you posted your data makes it difficult to import into Stata to try it. In the future, please use the -dataex- command to post example data on this forum. You can install the -dataex- command by running -ssc install dataex-. The simple instructions for use are in -help dataex-. By using -dataex- you enable those who want to help you to quickly and easily create a completely faithful replica of your example data in Stata with a simple copy/paste operation.
Comment
Monica White

Join Date: Jan 2017

Posts: 98
#3

16 Jan 2017, 15:43

Originally posted by Clyde Schechter View Post

You describe the problem in terms of dyad-year pairs, but you don't seem to have a year variable. I'll assume that's just an omission in the way you showed your example data and that there really is a variable called year. This should do it.

Code:

isid importer1 importer2 year gen temp_importer = importer1 gen temp_flow = flow1 expand 2 by importer1 importer2 year, sort: replace importer1 = importer2 if _n == 2 by importer1 importer2 year: replace importer2 = temp_importer if _n == 2 by importer1 importer2 year: replace flow1 = flow2 if _n == 2 by importer1 importer2 year: replace flow2 = temp_flow if _n == 2 drop temp_importer temp_flow

Note: Not tested. Beware of typos.

The reason it's not tested is that the way you posted your data makes it difficult to import into Stata to try it. In the future, please use the -dataex- command to post example data on this forum. You can install the -dataex- command by running -ssc install dataex-. The simple instructions for use are in -help dataex-. By using -dataex- you enable those who want to help you to quickly and easily create a completely faithful replica of your example data in Stata with a simple copy/paste operation.

Clyde, my apologies. I'm terribly new here. But thank you for the dataex reference. I installed it and this is were the first few lines from the COW bilateral dataset:

Code:

input double ccode1 int(ccode2 year) str32(importer1 importer2) double(flow1 flow2) byte(source1 source2) double(bel_lux_alt_flow1 bel_lux_alt_flow2 china_alt_flow1 china_alt_flow2 version) 2 20 1960 "United States of America" "Canada" 3452 4147.1 2 2 -9 -9 -9 -9 2.01 20 20 1960 "United States of America" "Canada" 3452 4147.1 2 2 -9 -9 -9 -9 2.01 2 20 1961 "United States of America" "Canada" 3555 4263.3 2 2 -9 -9 -9 -9 2.01

I also apologize, forgot to mention the year variable. And obviously, the ccode1 and ccod2 variables are better, as they are not strings. Unfortunately, I tried to re-write your code using ccode1 and ccode2, and got the error message of:

Code:

variables ccode1 ccode2 year do not uniquely identify the observations

I'm not sure what's going on here. Hopefully I can figure it out and run it again.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#4

16 Jan 2017, 16:09

Well, it means that for some combination(s) of ccode1 ccode2 and year, there is more than one observation. Based on the way you describe your data set, that most likely means that there are errors in your data that you will need to fix. After all, if there is more than one observation for each dyad-year your problem is indeterminate. Alternatively, your expectation that there should be just one observation for each such combination may be incorrect: perhaps the data are further subclassified by types of products (agricultural, electronics, etc.) and there is a variable somewhere that identifies this. (What is the version variable?)

If you have obtained this data from an external source (I don't know what COW might be other than a shouting bovine ), and have not made modifications to it, I can only say that it is common for publicly released data sets to have errors. If you have curated this data set yourself, there may be errors in the code you used to create it. Anyway, the first step at this point is to find out why there are some multiple records for the same year-dyad. And that begins with identifying them. So I would start with:

Code:

duplicates tag ccode1 ccode2 year, gen(flag) browse if flag

That will show you all the offending multiples. Then you can decide which are correct and which are errors, and ponder what to do about it. Sometimes when there are multiple records like this they are all exact duplicates of each other. In that case, -duplicates drop- solves the problem. If they differ in some ways, however, then you have to decide whether one is correct and the others are errors that should be dropped, or perhaps the multiple records need to be combined in some way (added up, averaged, whatever). In the latter situation it is likely that -collapse- will be your friend.

I should point out that even in the short data example you posted there is a hint of a problem. In the first observation USA is associated with ccode2, but in the second observation USA is associated with ccode20, whereas ccode20 is elsewhere associated with Canada. If there are inconsistencies in the linkage between ccode and country, this could cause this problem and many others.

Finally, as an aside, I notice a lot of -9's in your data. It is common in some alternative software to use numbers like -9 to stand for missing value. That practice does not work in Stata and the -9's will be treated as the number -9 in any calculations you perform with those variables. If those are supposed to represent missing values, you need to fix that. Fortunately, it is very easy:

Code:

mvdecode bel_lux_alt_flow* china_alt_flow*, mv(-9)
Comment
Monica White

Join Date: Jan 2017

Posts: 98
#5

16 Jan 2017, 18:10

Originally posted by Clyde Schechter View Post

Well, it means that for some combination(s) of ccode1 ccode2 and year, there is more than one observation. Based on the way you describe your data set, that most likely means that there are errors in your data that you will need to fix. After all, if there is more than one observation for each dyad-year your problem is indeterminate. Alternatively, your expectation that there should be just one observation for each such combination may be incorrect: perhaps the data are further subclassified by types of products (agricultural, electronics, etc.) and there is a variable somewhere that identifies this. (What is the version variable?)

If you have obtained this data from an external source (I don't know what COW might be other than a shouting bovine ), and have not made modifications to it, I can only say that it is common for publicly released data sets to have errors. If you have curated this data set yourself, there may be errors in the code you used to create it. Anyway, the first step at this point is to find out why there are some multiple records for the same year-dyad. And that begins with identifying them. So I would start with:

Code:

duplicates tag ccode1 ccode2 year, gen(flag) browse if flag

That will show you all the offending multiples. Then you can decide which are correct and which are errors, and ponder what to do about it. Sometimes when there are multiple records like this they are all exact duplicates of each other. In that case, -duplicates drop- solves the problem. If they differ in some ways, however, then you have to decide whether one is correct and the others are errors that should be dropped, or perhaps the multiple records need to be combined in some way (added up, averaged, whatever). In the latter situation it is likely that -collapse- will be your friend.

I should point out that even in the short data example you posted there is a hint of a problem. In the first observation USA is associated with ccode2, but in the second observation USA is associated with ccode20, whereas ccode20 is elsewhere associated with Canada. If there are inconsistencies in the linkage between ccode and country, this could cause this problem and many others.

Finally, as an aside, I notice a lot of -9's in your data. It is common in some alternative software to use numbers like -9 to stand for missing value. That practice does not work in Stata and the -9's will be treated as the number -9 in any calculations you perform with those variables. If those are supposed to represent missing values, you need to fix that. Fortunately, it is very easy:

Code:

mvdecode bel_lux_alt_flow* china_alt_flow*, mv(-9)

Clyde,

Thanks again. You have been most helpful here and on the forum in general.

For some reason, the variables are being doubled when I use your code. The only thing I did was that I changed the importer to ccode. Here is the original (prior to running the code):

Code:

input double ccode1 int(ccode2 year) str32(importer1 importer2) double(flow1 flow2) byte(source1 source2) double(bel_lux_alt_flow1 bel_lux_alt_flow2 china_alt_flow1 china_alt_flow2 version) 2 20 1960 "United States of America" "Canada" 3452 4147.1 2 2 -9 -9 -9 -9 2.01 2 20 1961 "United States of America" "Canada" 3555 4263.3 2 2 -9 -9 -9 -9 2.01 2 20 1962 "United States of America" "Canada" 3979.1 4427 2 2 -9 -9 -9 -9 2.01 2 20 1963 "United States of America" "Canada" 4169.3 4535.9 2 2 -9 -9 -9 -9 2.01

This is the code I used:

Code:

isid ccode1 ccode2 year gen temp_importer = ccode1 gen temp_flow = flow1 expand 2 by ccode1 ccode2 year, sort: replace ccode1 = ccode2 if _n == 2 by ccode1 ccode2 year: replace ccode2 = temp_importer if _n == 2 by ccode1 ccode2 year: replace flow1 = flow2 if _n == 2 by ccode1 ccode2 year: replace flow2 = temp_flow if _n == 2 drop temp_importer temp_flow

Then, I get the error:

Code:

run "/var/folders/_4/yrdp6ksd4yq3kz2zd75qtph00000gn/T//SD91151.000000" not sorted r(5);

I guess instead of making:

Code:

20 2 1960 "Canada" "United States" 4147.1 3452

It's duplicating the values instead?

And I'm terribly sorry, COW is where I got the data(from and external source). It came in an Excel sheet (.csv). I cleaned it (took out years that I didn't need), and I imported as an excel sheet (but not as CSV).

Thank you for pointing out the missing values (-9). I hadn't done anything to them yet (again, thanks for the code!). I was just trying to see if I can get the dyad-year to directed dyad-year.

Last edited by Monica White; 16 Jan 2017, 18:14.
Comment
Monica White

Join Date: Jan 2017

Posts: 98
#6

16 Jan 2017, 18:28

Update: I also used the importer variables, and they duplicated as well.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30147

16 Jan 2017, 18:29

Sorry, my mistake. I overlooked the fact that when I -replace ccode1 = ccode2...-, I break the sort order, and so the subsequent command doesn't work anymore. The code below takes a slightly different approach that doesn't founder on that problem:

Code:

clear
input double ccode1 int(ccode2 year) str32(importer1 importer2) double(flow1 flow2) byte(source1 source2) double(bel_lux_alt_flow1 bel_lux_alt_flow2 china_alt_flow1 china_alt_flow2 version)
 2 20 1960 "United States of America" "Canada"     3452  4147.1 2 2 -9 -9 -9 -9 2.01
 2 20 1961 "United States of America" "Canada"     3555  4263.3 2 2 -9 -9 -9 -9 2.01
 2 20 1962 "United States of America" "Canada"   3979.1    4427 2 2 -9 -9 -9 -9 2.01
 2 20 1963 "United States of America" "Canada"   4169.3  4535.9 2 2 -9 -9 -9 -9 2.01
 end
 
 isid ccode1 ccode2 year

gen temp_importer = ccode1

gen temp_flow = flow1

expand 2
by ccode1 ccode2 year, sort: gen flag = (_n == 2)

replace flow1 = flow2 if flag
replace flow2 = temp_flow if flag
replace ccode1 = ccode2 if flag
replace ccode2 = temp_importer if flag
drop temp_importer temp_flow flag

And it produces correct results on your example.

Comment

Monica White

Join Date: Jan 2017
Posts: 98

16 Jan 2017, 20:06

Originally posted by Clyde Schechter View Post

Code:

clear
input double ccode1 int(ccode2 year) str32(importer1 importer2) double(flow1 flow2) byte(source1 source2) double(bel_lux_alt_flow1 bel_lux_alt_flow2 china_alt_flow1 china_alt_flow2 version)
2 20 1960 "United States of America" "Canada" 3452 4147.1 2 2 -9 -9 -9 -9 2.01
2 20 1961 "United States of America" "Canada" 3555 4263.3 2 2 -9 -9 -9 -9 2.01
2 20 1962 "United States of America" "Canada" 3979.1 4427 2 2 -9 -9 -9 -9 2.01
2 20 1963 "United States of America" "Canada" 4169.3 4535.9 2 2 -9 -9 -9 -9 2.01
end

isid ccode1 ccode2 year

gen temp_importer = ccode1

gen temp_flow = flow1

expand 2
by ccode1 ccode2 year, sort: gen flag = (_n == 2)

replace flow1 = flow2 if flag
replace flow2 = temp_flow if flag
replace ccode1 = ccode2 if flag
replace ccode2 = temp_importer if flag
drop temp_importer temp_flow flag

And it produces correct results on your example.

Clyde,
Thank you so much! Also, I appreciate your patience! I've learned so much.

Comment

Richard Stoll

Join Date: Mar 2014

Posts: 9
#9

16 Jan 2017, 21:36

Monica, if you are going to work with COW (Correlates of War Project) data you might find it useful to use EUGene. You can download it at http://www.eugenesoftware.org/.
Comment
Monica White

Join Date: Jan 2017

Posts: 98
#10

19 Jan 2017, 06:37

Originally posted by Richard Stoll View Post

Monica, if you are going to work with COW (Correlates of War Project) data you might find it useful to use EUGene. You can download it at http://www.eugenesoftware.org/.

Hi Richard,

EUGene was my first choice, but where I am currently, I only have Apple computers.

However, I can probably start learning EUGene next month. Do you know if there are any help forums for it online? Tried looking for them, but couldn't find anything. I'm working with both Correlates of War data, but other data as well.
Comment

Announcement