I have a deceivingly complicated (to me, at least!) two-part question.
We have a large dataset where zones are smaller geographical areas than counties. Multiple zones can be within a single county. But, it is not clear whether or not zones can stretch across counties. In the first data example below, I've created one zone for which this is true: zone 16 is in both county 14 and county 38.
Does anyone have an efficient solution to creating some sort of flag variable that will tell us when this happens?
Data for the first issue:
Now, let's say that in our real data we find out that it is NOT true that zones can spread across multiple counties. We have zone data for two waves; but only county data for wave 1.
Thus, if it is true that zones can only belong to a single county, we could then theoretically backfill wave 2 county based on wave 1 county -- zone information. However, because a) individuals can move between waves 1 and 2, and b) multiple zones can be inside one county, this isn't quite as simple (I think) as just doing a simple equivalency. You'd have to account for, e.g., the fact that zones 13 and 19 are both within county 35, and can't just rely on an individual's zone/county staying the same between wave 1 and wave 2.
Data for the second issue, with 2 waves and where no zones spread across counties (so zone 16 now only belongs in county 14, you'll also note that a few individuals reside in new zones):
We have a large dataset where zones are smaller geographical areas than counties. Multiple zones can be within a single county. But, it is not clear whether or not zones can stretch across counties. In the first data example below, I've created one zone for which this is true: zone 16 is in both county 14 and county 38.
Does anyone have an efficient solution to creating some sort of flag variable that will tell us when this happens?
Data for the first issue:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte(zone county wave) int id 13 35 1 100 13 35 1 101 13 35 1 102 19 35 1 103 19 35 1 104 19 35 1 105 2 22 1 106 2 22 1 107 2 22 1 108 2 22 1 109 2 22 1 110 6 90 1 111 6 90 1 112 16 14 1 113 16 14 1 114 16 38 1 115 16 38 1 116 25 38 1 117 25 38 1 118 25 38 1 119 end
Now, let's say that in our real data we find out that it is NOT true that zones can spread across multiple counties. We have zone data for two waves; but only county data for wave 1.
Thus, if it is true that zones can only belong to a single county, we could then theoretically backfill wave 2 county based on wave 1 county -- zone information. However, because a) individuals can move between waves 1 and 2, and b) multiple zones can be inside one county, this isn't quite as simple (I think) as just doing a simple equivalency. You'd have to account for, e.g., the fact that zones 13 and 19 are both within county 35, and can't just rely on an individual's zone/county staying the same between wave 1 and wave 2.
Data for the second issue, with 2 waves and where no zones spread across counties (so zone 16 now only belongs in county 14, you'll also note that a few individuals reside in new zones):
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte(zone county wave) int id 13 35 1 100 13 35 1 101 13 35 1 102 19 35 1 103 19 35 1 104 19 35 1 105 2 22 1 106 2 22 1 107 2 22 1 108 2 22 1 109 2 22 1 110 6 90 1 111 6 90 1 112 16 14 1 113 16 14 1 114 16 14 1 115 16 14 1 116 25 38 1 117 25 38 1 118 25 38 1 119 13 . 2 100 13 . 2 101 13 . 2 102 19 . 2 103 19 . 2 104 19 . 2 105 2 . 2 106 2 . 2 107 6 . 2 108 25 . 2 109 19 . 2 110 6 . 2 111 6 . 2 112 16 . 2 113 16 . 2 114 16 . 2 115 25 . 2 116 25 . 2 117 25 . 2 118 25 . 2 119 end

Comment