Hi,
I am trying to merge two datasets which look as follows:
Master Data:
Year: are not in order and can have different gaps in between e.g. 1991, 1992, 1996, 1998, 2004
Using Data:
State is the biggest unit followed by Constituency and then District. In few exceptions, district can be the same size or sometimes bigger than constituency.
Result wanted: District wise winner
Things I have tried/considered:
merge 1:m state constituency : This will not work since state-constituency is not uniquely identified due to them being repeated over the years
merge m:1 state constituency: Similar to the above reason, state-constituency is not uniquely identified due to them being repeated over the district variable
I thought an m:m merge would be the solution and hence tried
merge m:m state constituency
While the command does not give an error, it does not matches the constituency variable all the districts. It chooses few districts only and assigns it to the candidate in a logic that I do not understand. However, it does not give me the desired unique district-winner result.
An example of the result is
I think the problem is that am trying to merge higher order units (state and constituency) to match lower order units (district-winner) but I am unable to find a way around it.
Any help would be really appreciated.
Thank you
Purnima
P.S. I am new to the forum so in case I have missed any detail, please let me know.
I am trying to merge two datasets which look as follows:
Master Data:
Year | State | Constituency | Candidate | Winner |
1991 | Arunachal Pradesh | Arunachal East | Person A | 1 |
1991 | Arunachal Pradesh | Arunachal East | Person B | 0 |
1991 | Arunachal Pradesh | Arunachal East | Person C | 0 |
1991 | Arunachal Pradesh | Arunachal East | Person D | 0 |
1991 | Arunachal Pradesh | Arunachal West | Person E | 1 |
1991 | Arunachal Pradesh | Arunachal West | Person F | 0 |
1991 | Arunachal Pradesh | Arunachal West | Person G | 0 |
Using Data:
State | Constituency_no. | Constituency | District |
Arunachal Pradesh | 1 | Arunachal East | East Siang |
Arunachal Pradesh | 1 | Arunachal East | Lohit |
Arunachal Pradesh | 1 | Arunachal East | Tirap |
Arunachal Pradesh | 2 | Arunachal West | East Kameng |
Arunachal Pradesh | 2 | Arunachal West | Lower Subansiri |
Arunachal Pradesh | 2 | Arunachal West | Tawang |
Arunachal Pradesh | 2 | Arunachal West | Upper Subansiri |
Arunachal Pradesh | 2 | Arunachal West | West Kameng |
Arunachal Pradesh | 2 | Arunachal West | West Siang |
Result wanted: District wise winner
Year | State | Constituency | District | Candidate | Winner |
1991 | Arunachal Pradesh | Arunachal East | East Siang | Person A | 1 |
1991 | Arunachal Pradesh | Arunachal East | East Siang | Person B | 0 |
1991 | Arunachal Pradesh | Arunachal East | East Siang | Person C | 0 |
1991 | Arunachal Pradesh | Arunachal East | East Siang | Person D | 0 |
1991 | Arunachal Pradesh | Arunachal East | Lohit | Person A | 1 |
1991 | Arunachal Pradesh | Arunachal East | Lohit | Person B | 0 |
1991 | Arunachal Pradesh | Arunachal East | Lohit | Person C | 0 |
1991 | Arunachal Pradesh | Arunachal East | Lohit | Person D | 0 |
1991 | Arunachal Pradesh | Arunachal East | Tirap | Person A | 1 |
1991 | Arunachal Pradesh | Arunachal East | Tirap | Person B | 0 |
1991 | Arunachal Pradesh | Arunachal East | Tirap | Person C | 0 |
1991 | Arunachal Pradesh | Arunachal East | Tirap | Person D | 0 |
1991 | Arunachal Pradesh | Arunachal West | East Kameng | Person E | 1 |
1991 | Arunachal Pradesh | Arunachal West | East Kameng | Person F | 0 |
1991 | Arunachal Pradesh | Arunachal West | East Kameng | Person G | 0 |
1991 | Arunachal Pradesh | Arunachal West | Lower Subansiri | Person E | 1 |
1991 | Arunachal Pradesh | Arunachal West | Lower Subansiri | Person F | 0 |
1991 | Arunachal Pradesh | Arunachal West | Lower Subansiri | Person G | 0 |
and so on.... |
merge 1:m state constituency : This will not work since state-constituency is not uniquely identified due to them being repeated over the years
merge m:1 state constituency: Similar to the above reason, state-constituency is not uniquely identified due to them being repeated over the district variable
I thought an m:m merge would be the solution and hence tried
merge m:m state constituency
While the command does not give an error, it does not matches the constituency variable all the districts. It chooses few districts only and assigns it to the candidate in a logic that I do not understand. However, it does not give me the desired unique district-winner result.
An example of the result is
Year | State | Constituency | District | Candidate | Winner |
1991 | Arunachal Pradesh | Arunachal East | East Siang | Person A | 1 |
1991 | Arunachal Pradesh | Arunachal East | Lohit | Person B | 0 |
1991 | Arunachal Pradesh | Arunachal East | East Siang | Person C | 0 |
1991 | Arunachal Pradesh | Arunachal East | East Siang | Person E | 0 |
I think the problem is that am trying to merge higher order units (state and constituency) to match lower order units (district-winner) but I am unable to find a way around it.
Any help would be really appreciated.
Thank you
Purnima
P.S. I am new to the forum so in case I have missed any detail, please let me know.
Comment