Hello,
In my dataset I have a number of small geographical areas (beginning with A in the sample data below) that fit within a larger geographical areas (designated by code 16C or 01K or 01H in the sample data below). There are around 2000 of these smaller geographical areas and around 100 of these larger geographical areas that they fit in to.
The smaller geographies can merge with other smaller geographies and cease to exist , so in the below sample, area A81015 ceased to exist at the end of one year, this designated by a dummy variable, taking a value of 1 where this has occurred ( variable: mis_1805) . This geographical area had a population of 2919 people. You can find which geographical area it merged to, by checking the last column which shows you the change in populations for these small geographical areas from one year to the next. You can see that area A81042 grew by 2956 people, and importantly, it belongs so the same wider geographical area as the one that ceased to exist (16C).
I am looking for a way to automate this process , so that if, for a given larger geographical area (16C, OR 01K OR 01H ) a smaller geographical area ceases to exist (identified by a dummy variable taking the value 1) and another area grows by similar amount, then a variable is created giving both observations, the code of the geographical area that one has merged in to. So in the below, an additional column would be created with code A81042 for both observations ( A81015 & A81042 ).
There is likely to be changes in population for the smaller geographical areas each year, so the amount by which one area grows does not need to be the exactly the same as the population for the area that ceased to exist. I have chosen a threshold of 90%, so in this example 2919 (old area)/2956 (other area) = 98%
The intention is not for this to provide a definitive answer, but automate the process somewhat prior to a manual inspection of the data.
I don't know if this is possible to do within Stata - and perhaps SQL may be a better place to do this.
CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gp_code str5 LastCCG float(mis_1805 mis_1805n mis_1805_diff)
"A81001" "16C" . . 5
"A81002" "16C" . . -44
"A81004" "16C" . . 129
"A81005" "16C" . . -7
"A81006" "16C" . . 17
"A81007" "16C" . . 10
"A81009" "16C" . . 1
"A81011" "16C" . . -9
"A81012" "16C" . . 12
"A81013" "16C" . . 18
"A81014" "16C" . . 5
"A81015" "16C" 1 2919 .
"A81016" "16C" . . -8
"A81017" "16C" . . -100
"A81018" "16C" . . 1
"A81019" "16C" . . -39
"A81020" "16C" . . 22
"A81021" "16C" . . 33
"A81022" "16C" . . 6
"A81023" "16C" . . -10
"A81025" "16C" . . -2
"A81026" "16C" . . 400
"A81027" "16C" . . 33
"A81029" "16C" . . 113
"A81030" "16C" . . 58
"A81031" "16C" . . 9
"A81032" "16C" . . 41
"A81034" "16C" . . 26
"A81035" "16C" . . 66
"A81036" "16C" . . -16
"A81037" "16C" . . 35
"A81038" "16C" . . 53
"A81039" "16C" . . 40
"A81040" "16C" . . -3
"A81041" "16C" . . 16
"A81042" "16C" . . 2956
"A81043" "16C" . . 18
"A81044" "16C" . . -19
"A81045" "16C" . . -15
"A81046" "16C" . . -14
"A81047" "16C" 1 259 .
"A81048" "16C" . . 24
"A81049" "16C" . . 390
"A81051" "16C" . . -59
"A81052" "16C" . . 38
"A81053" "16C" . . 3
"A81054" "16C" . . 4
"A81056" "16C" . . -12
"A81057" "16C" . . 35
"A81058" "16C" . . 47
"A81060" "16C" . . -10
"A81063" "16C" . . 0
"A81064" "16C" . . 86
"A81065" "16C" . . 0
"A81066" "16C" . . 7
"A81067" "16C" . . 8
"A81070" "16C" . . -21
"A81602" "16C" . . -7
"A81608" "16C" . . 84
"A81610" "16C" . . -3
"A81611" "16C" . . 28
"A81612" "16C" . . 21
"A81618" "16C" . . -6
"A81621" "16C" . . 9
"A81622" "16C" . . 13
"A81629" "16C" . . 14
"A81630" "16C" . . 13
"A81631" "16C" . . -8
"A81632" "16C" . . -8
"A81633" "16C" . . 8
"A81634" "16C" . . -28
"A82003" "01K" . . -15
"A82004" "01H" . . 4
"A82005" "01K" . . -23
"A82006" "01H" . . 1
"A82007" "01K" . . 11
"A82008" "01K" . . -1
"A82009" "01K" . . 21
"A82010" "01K" . . 0
"A82012" "01H" . . 14
"A82013" "01H" . . 5
"A82014" "01H" . . -4
"A82015" "01H" . . 1
"A82016" "01H" . . 41
"A82018" "01H" . . 8
"A82019" "01H" . . -6
"A82020" "01H" . . 29
"A82021" "01H" . . 20
"A82022" "01H" . . 4
"A82023" "01H" . . 24
"A82024" "01H" . . -2
"A82025" "01K" . . 18
"A82026" "01K" . . 0
"A82027" "01K" . . -20
"A82028" "01H" . . 6
"A82029" "01H" . . -5
"A82030" "01K" . . 2
"A82031" "01H" . . 1
"A82032" "01H" . . 14
"A82033" "01K" . . -10
end
[/CODE]
In my dataset I have a number of small geographical areas (beginning with A in the sample data below) that fit within a larger geographical areas (designated by code 16C or 01K or 01H in the sample data below). There are around 2000 of these smaller geographical areas and around 100 of these larger geographical areas that they fit in to.
The smaller geographies can merge with other smaller geographies and cease to exist , so in the below sample, area A81015 ceased to exist at the end of one year, this designated by a dummy variable, taking a value of 1 where this has occurred ( variable: mis_1805) . This geographical area had a population of 2919 people. You can find which geographical area it merged to, by checking the last column which shows you the change in populations for these small geographical areas from one year to the next. You can see that area A81042 grew by 2956 people, and importantly, it belongs so the same wider geographical area as the one that ceased to exist (16C).
I am looking for a way to automate this process , so that if, for a given larger geographical area (16C, OR 01K OR 01H ) a smaller geographical area ceases to exist (identified by a dummy variable taking the value 1) and another area grows by similar amount, then a variable is created giving both observations, the code of the geographical area that one has merged in to. So in the below, an additional column would be created with code A81042 for both observations ( A81015 & A81042 ).
There is likely to be changes in population for the smaller geographical areas each year, so the amount by which one area grows does not need to be the exactly the same as the population for the area that ceased to exist. I have chosen a threshold of 90%, so in this example 2919 (old area)/2956 (other area) = 98%
The intention is not for this to provide a definitive answer, but automate the process somewhat prior to a manual inspection of the data.
I don't know if this is possible to do within Stata - and perhaps SQL may be a better place to do this.
CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 gp_code str5 LastCCG float(mis_1805 mis_1805n mis_1805_diff)
"A81001" "16C" . . 5
"A81002" "16C" . . -44
"A81004" "16C" . . 129
"A81005" "16C" . . -7
"A81006" "16C" . . 17
"A81007" "16C" . . 10
"A81009" "16C" . . 1
"A81011" "16C" . . -9
"A81012" "16C" . . 12
"A81013" "16C" . . 18
"A81014" "16C" . . 5
"A81015" "16C" 1 2919 .
"A81016" "16C" . . -8
"A81017" "16C" . . -100
"A81018" "16C" . . 1
"A81019" "16C" . . -39
"A81020" "16C" . . 22
"A81021" "16C" . . 33
"A81022" "16C" . . 6
"A81023" "16C" . . -10
"A81025" "16C" . . -2
"A81026" "16C" . . 400
"A81027" "16C" . . 33
"A81029" "16C" . . 113
"A81030" "16C" . . 58
"A81031" "16C" . . 9
"A81032" "16C" . . 41
"A81034" "16C" . . 26
"A81035" "16C" . . 66
"A81036" "16C" . . -16
"A81037" "16C" . . 35
"A81038" "16C" . . 53
"A81039" "16C" . . 40
"A81040" "16C" . . -3
"A81041" "16C" . . 16
"A81042" "16C" . . 2956
"A81043" "16C" . . 18
"A81044" "16C" . . -19
"A81045" "16C" . . -15
"A81046" "16C" . . -14
"A81047" "16C" 1 259 .
"A81048" "16C" . . 24
"A81049" "16C" . . 390
"A81051" "16C" . . -59
"A81052" "16C" . . 38
"A81053" "16C" . . 3
"A81054" "16C" . . 4
"A81056" "16C" . . -12
"A81057" "16C" . . 35
"A81058" "16C" . . 47
"A81060" "16C" . . -10
"A81063" "16C" . . 0
"A81064" "16C" . . 86
"A81065" "16C" . . 0
"A81066" "16C" . . 7
"A81067" "16C" . . 8
"A81070" "16C" . . -21
"A81602" "16C" . . -7
"A81608" "16C" . . 84
"A81610" "16C" . . -3
"A81611" "16C" . . 28
"A81612" "16C" . . 21
"A81618" "16C" . . -6
"A81621" "16C" . . 9
"A81622" "16C" . . 13
"A81629" "16C" . . 14
"A81630" "16C" . . 13
"A81631" "16C" . . -8
"A81632" "16C" . . -8
"A81633" "16C" . . 8
"A81634" "16C" . . -28
"A82003" "01K" . . -15
"A82004" "01H" . . 4
"A82005" "01K" . . -23
"A82006" "01H" . . 1
"A82007" "01K" . . 11
"A82008" "01K" . . -1
"A82009" "01K" . . 21
"A82010" "01K" . . 0
"A82012" "01H" . . 14
"A82013" "01H" . . 5
"A82014" "01H" . . -4
"A82015" "01H" . . 1
"A82016" "01H" . . 41
"A82018" "01H" . . 8
"A82019" "01H" . . -6
"A82020" "01H" . . 29
"A82021" "01H" . . 20
"A82022" "01H" . . 4
"A82023" "01H" . . 24
"A82024" "01H" . . -2
"A82025" "01K" . . 18
"A82026" "01K" . . 0
"A82027" "01K" . . -20
"A82028" "01H" . . 6
"A82029" "01H" . . -5
"A82030" "01K" . . 2
"A82031" "01H" . . 1
"A82032" "01H" . . 14
"A82033" "01K" . . -10
end
[/CODE]
Comment