Dear all,
I am computing some diversity indices and for that issue I need to match two datasets. The first one contains all the different nationalities of people which are living in different counties (dataset1.dta). It looks like the following example:
The variable "County" is the county every person is living at. For each county all the permutations of the different populations are matched. So, origin_1 is the nationality of one part and origin_2 of the matched part to compute the Herfindahl-Hirtschmann-Index. Share_1 and Share_2 are the different shares regarding the origin of the population. d is a string variable, which contains all the permutations of the different origins of the people.
Now I would like to match it to a distance dataset (dataset2.dta), which looks like the following example:
Origin_1 and Origin_2 are again the countries of origin where people were born in. d is the same like above and distance is a distance measure between two persons from different countries.
Now, I need to get a dataset, which looks like the following example (I need to compute: bysort county: Herfindahl-Hirschmann-Index= 1-total(share_1*share_2*distance)):
I tried to match dataset1 and dataset2 by using the following command:
use "dataset1.dta", clear
sort d
merge n:n d using "dataset2.dta"
drop _merge
But using this command I do not get the dataset, I want. I've got something like that:
And getting this dataset I cannot compute the Herfindahl-Hirschmann-Index, because I do not get a merged dataset for every county. Can someone help me?
Thanks in advance!
I am computing some diversity indices and for that issue I need to match two datasets. The first one contains all the different nationalities of people which are living in different counties (dataset1.dta). It looks like the following example:
County | Origin_1 | Origin_2 | Share_1 | Share_2 | d |
1 | A | B | 0.2 | 0.6 | AB |
1 | A | C | 0.2 | 0.1 | AC |
1 | A | D | 0.2 | 0.1 | AD |
1 | B | C | 0.6 | 0.1 | BC |
1 | B | D | 0.6 | 0.1 | BD |
1 | C | D | 0.1 | 0.1 | CD |
2 | A | B | 0.4 | 0.3 | AB |
2 | A | C | 0.4 | 0.2 | AC |
2 | A | D | 0.4 | 0.1 | AD |
2 | B | C | 0.3 | 0.2 | BC |
2 | B | D | 0.3 | 0.1 | BD |
2 | C | D | 0.2 | 0.1 | CD |
Now I would like to match it to a distance dataset (dataset2.dta), which looks like the following example:
Origin_1 | Origin_2 | d | distance |
A | B | AB | 0.3 |
A | C | AC | 0.2 |
A | D | AD | 0.7 |
B | C | BC | 0.9 |
B | D | BD | 0.1 |
C | D | CD | 0.2 |
Now, I need to get a dataset, which looks like the following example (I need to compute: bysort county: Herfindahl-Hirschmann-Index= 1-total(share_1*share_2*distance)):
County | Origin_1 | Origin_2 | Share_1 | Share_2 | d | distance |
1 | A | B | 0.1 | ... | AB | 0.3 |
1 | A | C | 0.1 | ... | AC | 0.2 |
1 | A | D | ... | ... | AD | 0.7 |
1 | B | C | ... | ... | BC | 0.9 |
1 | B | D | ... | ... | BD | 0.1 |
1 | C | D | ... | ... | CD | 0.2 |
2 | A | B | ... | ... | AB | 0.3 |
2 | A | C | ... | ... | AC | 0.2 |
2 | A | D | ... | ... | AD | 0.7 |
2 | B | C | ... | ... | BC | 0.9 |
2 | B | D | ... | ... | BD | 0.1 |
2 | C | D | ... | ... | CD | 0.2 |
use "dataset1.dta", clear
sort d
merge n:n d using "dataset2.dta"
drop _merge
But using this command I do not get the dataset, I want. I've got something like that:
County | Origin_1 | Origin_2 | Share_1 | Share_2 | d | distance |
1 | A | B | ... | ... | AB | 0.3 |
2 | A | C | ... | ... | AC | 0.2 |
2 | B | C | ... | ... | BC | 0.9 |
1 | D | A | ... | ... | AD | 0.7 |
2 | B | D | ... | ... | BD | 0.1 |
1 | D | C | ... | ... | CD | 0.2 |
Thanks in advance!
Comment