Merging problem of congressional district data with counties

Tariq Abdullah

Join Date: Apr 2021
Posts: 366

Merging problem of congressional district data with counties

25 Jan 2023, 20:32

This is my one set of data where I have statefip , district ( which stands for congressional district) and county. I want to merge this data with the following dataset. Problem is in my master data ( which is posted first here) congressional district from the same district belongs to multiple counties. I don't know how I can resolve the issue to merge with the using data ( which is posted second in this post below )

Can anyone help ??

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(cd_statefip district) float county
6  9 6001
6 10 6001
6 11 6001
6 13 6001
6 15 6001
6  3 6003
6  3 6003
6  3 6003
6  4 6003
6 19 6003
6  4 6005
6  4 6005
6 11 6009
6 19 6009
6 19 6009
6  1 6011
6  1 6011
6  2 6011
6  2 6011
48  5 48001
48  6 48001
48 11 48003
48 19 48003
48  1 48005
48  1 48005
48  5 48005
48 27 48007
48 15 48055
48 25 48055
48 25 48055
48 28 48055
end

The data I'm trying to merge with is like the following.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(state_fips district) float democrat_pres byte incumbent_pres_match float(democrat_voteperc competitive)
6  1 1 0  42.61742 2
6  1 1 0  42.61742 2
6  2 1 1   71.2433 8
6  2 1 1   71.2433 8
6  3 1 1  54.23049 4
6  3 1 1  54.23049 4
6  4 1 0  38.89085 1
6  4 1 0  38.89085 1
6  5 1 1  74.47112 8
6  5 1 1  74.47112 8
6  6 1 1  75.05244 9
6  6 1 1  75.05244 9
6  7 1 1  51.68154 4
6  7 1 1  51.68154 4
6  8 1 0         . .
6  8 1 0         . .
6  9 1 1   55.5541 5
6  9 1 1   55.5541 5
6 10 1 0  47.29181 3
6 10 1 0  47.29181 3
6 11 1 1  69.73173 8
6 11 1 1  69.73173 8
6 12 1 1  85.08385 .
6 12 1 1  85.08385 .
6 13 1 1  86.78157 .
6 13 1 1  86.78157 .
6 14 1 1  78.91653 9
48  1 0 0  56.44611  5
48  1 0 0  56.44611  5
48  2 0 0  60.84797  6
48  2 0 0  60.84797  6
48  2 0 0  60.84797  6
48  3 0 1 24.331583  .
48  3 0 1 24.331583  .
48  3 0 1 24.331583  .
48  4 0 0  57.82096  5
48  4 0 0  57.82096  5
48  4 0 0  57.82096  5
48  5 0 1  40.26217  2
48  5 0 1  40.26217  2
48  5 0 1  40.26217  2
48  5 0 1  40.26217  2
48 17 0 0  51.36383  4
48 17 0 0  51.36383  4
48 17 0 0  51.36383  4
48 18 0 0  76.91312  9
48 18 0 0  76.91312  9
48 18 0 0  76.91312  9
end
label values competitive competitivb
label def competitivb 0 "30-", modify
label def competitivb 1 "35-", modify
label def competitivb 2 "40-", modify
label def competitivb 3 "45-", modify
label def competitivb 4 "50-", modify
label def competitivb 5 "55-", modify
label def competitivb 6 "60-", modify
label def competitivb 7 "65-", modify
label def competitivb 8 "70-", modify
label def competitivb 9 "75-", modify
label def competitivb 10 "80-", modify

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30084
#2

26 Jan 2023, 11:06

You have two problems here. One is major, the other is minor. Interestingly, the fact that in data set 1 the same district belongs to multiple counties may not be a problem at all.

The major problem here is that you have multiple observations for the same state_fips district combination in both data sets. That makes them unmergeable. Now, in the second dataset, the multiple observations for any combination of state_fips and district are exact duplicates on all variables. So you can get around this problem by simply running -duplicates drop- on the second data set, and then you can do a 1:m merge with the first data set.* You won't lose any information since the observations you will be dropping are just exact copies of the observations that will be retained. However, before doing that, review your data management that created that second data set. Usually when a data set contains exact duplicate observations, that arises because of a mistake in creation of the data set. And where one mistake has been found, others may lurk. So before proceeding, you should double-check the validity of that second data set. But once you are sure that the data it contains are correct, you will be good to go.

The minor problem is that you have different variable names for the state-level FIPS code variable in the two data sets. So before you can do the -merge-, you need to rename one or both of them so they are the same.

Added: *This will work in your example data. However, if in your complete data set there are multiple observations of the same state FIPS code and district that have conflicting values on other variables, then -duplicates drop- will give you an error message and refuse to proceed. In that case you can be rather certain that either you have some incorrect data that needs to be fixed, or if the data are correct because, say, the observations refer to different time periods or smaller geographical units within the districts, then the two data sets you have are simply not mergeable unless you also have the same time period or subunit variables in both of them, in which case the solution is to add that variable to the -merge- key.

Last edited by Clyde Schechter; 26 Jan 2023, 11:10.
1 like
Comment

Announcement

Merging problem of congressional district data with counties

Comment