Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging Local Authority-level refugee data with CSP-level crime data in Stata

    Hello,

    I have what I believe is a relatively straightforward merge problem, and I’d be grateful for your advice.

    I have two datasets:
    1. Community Safety Partnership police recorded hate crime data (CSP data) – covering England and Wales
    2. Ukrainian refugees (UK) visa arrivals by local authority (UKS)only England and Wales
    I want to merge these two datasets. Since refugees arrived in April 2022, I am working with UK fiscal years. The Ukrainian refugee arrivals data covers FY 2022/23 and 2023/24, while the CSP crime data is available from 2017/18 to 2023/24. I plan to use this for a Difference-in-Differences analysis.

    Here are the main adjustments I have already made:
    • In the CSP data, I dropped Isles of Scilly because it appears in only one FY.
    • I dropped Manchester Airport because it does not have a CSP code.
    • In the UKS data, I aggregated certain local authorities in the first FY to reflect the 2023 LA mergers:
      Cumberland = Allerdale + Carlisle + Copeland; Westmorland and Furness = Barrow-in-Furness + Eden + South Lakeland; Somerset = Mendip + Sedgemoor + Somerset West and Taunton + South Somerset; North Yorkshire = Craven + Hambleton + Harrogate + Richmondshire + Ryedale + Scarborough + Selby.
    Now, I want to merge the two datasets using the CSP lookup files (attached). However, when aggregating the UKS data from Local Authority to CSP level using the lookup file, multiple CSPs share the same name (but they are for different local authorities, so dropping is not an option) in the CSP dataset, and no unique CSP code is available. This makes direct merging difficult.

    Any guidance on how to best handle this aggregation and merge would be very helpful.

    Kind regards,

    UKS data:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str35 local_authority str9 ons_code double arrivals str7 FinancialYear
    "Adur"                                "E07000223"   77 "2022/23"
    "Amber Valley"                        "E07000032"  181 "2022/23"
    "Arun"                                "E07000224"  246 "2022/23"
    "Ashfield"                            "E07000170"   77 "2022/23"
    "Ashford"                             "E07000105"  216 "2022/23"
    "Babergh"                             "E07000200"  193 "2022/23"
    "Barking and Dagenham"                "E09000002"  321 "2022/23"
    "Barnet"                              "E09000003"  999 "2022/23"
    "Barnsley"                            "E08000016"   84 "2022/23"
    "Basildon"                            "E07000066"  152 "2022/23"
    "Basingstoke and Deane"               "E07000084"  352 "2022/23"
    "Bassetlaw"                           "E07000171"  162 "2022/23"
    "Bath and North East Somerset"        "E06000022"  368 "2022/23"
    "Bedford"                             "E06000055"  421 "2022/23"
    "Bexley"                              "E09000004"  320 "2022/23"
    "Birmingham"                          "E08000025"  805 "2022/23"
    "Blaby"                               "E07000129"  102 "2022/23"
    "Blackburn with Darwen"               "E06000008"   69 "2022/23"
    "Blackpool"                           "E06000009"   96 "2022/23"
    "Blaenau Gwent"                       "W06000019"   31 "2022/23"
    "Bolsover"                            "E07000033"   64 "2022/23"
    "Bolton"                              "E08000001"  151 "2022/23"
    "Boston"                              "E07000136"   89 "2022/23"
    "Bournemouth, Christchurch and Poole" "E06000058"  566 "2022/23"
    "Bracknell Forest"                    "E06000036"  181 "2022/23"
    "Bradford"                            "E08000032"  405 "2022/23"
    "Braintree"                           "E07000067"  229 "2022/23"
    "Breckland"                           "E07000143"  213 "2022/23"
    "Brent"                               "E09000005"  423 "2022/23"
    "Brentwood"                           "E07000068"  131 "2022/23"
    "Bridgend"                            "W06000013"  189 "2022/23"
    "Brighton and Hove"                   "E06000043"  523 "2022/23"
    "Bristol, City of"                    "E06000023"  711 "2022/23"
    "Broadland"                           "E07000144"  175 "2022/23"
    "Bromley"                             "E09000006"  631 "2022/23"
    "Bromsgrove"                          "E07000234"  131 "2022/23"
    "Broxbourne"                          "E07000095"   61 "2022/23"
    "Broxtowe"                            "E07000172"  104 "2022/23"
    "Buckinghamshire"                     "E06000060" 1573 "2022/23"
    "Burnley"                             "E07000117"   62 "2022/23"
    "Bury"                                "E08000002"  168 "2022/23"
    "Caerphilly"                          "W06000018"   88 "2022/23"
    "Calderdale"                          "E08000033"  188 "2022/23"
    "Cambridge"                           "E07000008"  499 "2022/23"
    "Camden"                              "E09000007"  756 "2022/23"
    "Cannock Chase"                       "E07000192"   66 "2022/23"
    "Canterbury"                          "E07000106"  352 "2022/23"
    "Cardiff"                             "W06000015"  330 "2022/23"
    "Carmarthenshire"                     "W06000010"  259 "2022/23"
    "Castle Point"                        "E07000069"   49 "2022/23"
    "Central Bedfordshire"                "E06000056"  415 "2022/23"
    "Ceredigion"                          "W06000008"  105 "2022/23"
    "Charnwood"                           "E07000130"  166 "2022/23"
    "Chelmsford"                          "E07000070"  258 "2022/23"
    "Cheltenham"                          "E07000078"  297 "2022/23"
    "Cherwell"                            "E07000177"  327 "2022/23"
    "Cheshire East"                       "E06000049"  683 "2022/23"
    "Cheshire West and Chester"           "E06000050"  641 "2022/23"
    "Chesterfield"                        "E07000034"   70 "2022/23"
    "Chichester"                          "E07000225"  559 "2022/23"
    "Chorley"                             "E07000118"  119 "2022/23"
    "City of London"                      "E09000001"    . "2022/23"
    "Colchester"                          "E07000071"  296 "2022/23"
    "Conwy"                               "W06000003"  132 "2022/23"
    "Cornwall"                            "E06000052"  947 "2022/23"
    "Cotswold"                            "E07000079"  292 "2022/23"
    "County Durham"                       "E06000047"  534 "2022/23"
    "Coventry"                            "E08000026"  296 "2022/23"
    "Crawley"                             "E07000226"  169 "2022/23"
    "Croydon"                             "E09000008"  580 "2022/23"
    "Cumberland"                          "E06000063"  338 "2022/23"
    "Dacorum"                             "E07000096"  301 "2022/23"
    "Darlington"                          "E06000005"  122 "2022/23"
    "Dartford"                            "E07000107"  123 "2022/23"
    "Denbighshire"                        "W06000004"  110 "2022/23"
    "Derby"                               "E06000015"  173 "2022/23"
    "Derbyshire Dales"                    "E07000035"  364 "2022/23"
    "Doncaster"                           "E08000017"  221 "2022/23"
    "Dorset"                              "E06000059"  833 "2022/23"
    "Dover"                               "E07000108"  259 "2022/23"
    "Dudley"                              "E08000027"   92 "2022/23"
    "Ealing"                              "E09000009"  760 "2022/23"
    "East Cambridgeshire"                 "E07000009"  237 "2022/23"
    "East Devon"                          "E07000040"  332 "2022/23"
    "East Hampshire"                      "E07000085"  385 "2022/23"
    "East Hertfordshire"                  "E07000242"  340 "2022/23"
    "East Lindsey"                        "E07000137"  243 "2022/23"
    "East Riding of Yorkshire"            "E06000011"  437 "2022/23"
    "East Staffordshire"                  "E07000193"  185 "2022/23"
    "East Suffolk"                        "E07000244"  396 "2022/23"
    "Eastbourne"                          "E07000061"  181 "2022/23"
    "Eastleigh"                           "E07000086"  230 "2022/23"
    "Elmbridge"                           "E07000207"  485 "2022/23"
    "Enfield"                             "E09000010"  324 "2022/23"
    "Epping Forest"                       "E07000072"  247 "2022/23"
    "Epsom and Ewell"                     "E07000208"  135 "2022/23"
    "Erewash"                             "E07000036"   88 "2022/23"
    "Exeter"                              "E07000041"  226 "2022/23"
    "Fareham"                             "E07000087"  121 "2022/23"
    "Fenland"                             "E07000010"  117 "2022/23"
    end

    CSP data includes 5 types of hate crimes (as defined by the UK Home Office), aggregated to fiscal year level.:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str19 PoliceForce str35 CSPName str7 FinancialYear double OffenceCount
    "Avon and Somerset"  "Bristol, City of"             "2017/18" 1002
    "Avon and Somerset"  "South Gloucestershire"        "2017/18"  162
    "Avon and Somerset"  "Bath and North East Somerset" "2017/18"  154
    "Avon and Somerset"  "North Somerset"               "2017/18"  118
    "Avon and Somerset"  "Somerset"                     "2017/18"  293
    "Bedfordshire"       "Bedford"                      "2017/18"  147
    "Bedfordshire"       "Luton"                        "2017/18"  372
    "Bedfordshire"       "Central Bedfordshire"         "2017/18"  111
    "Cambridgeshire"     "Cambridge"                    "2017/18"  168
    "Cambridgeshire"     "South Cambridgeshire"         "2017/18"   72
    "Cambridgeshire"     "East Cambridgeshire"          "2017/18"   18
    "Cambridgeshire"     "Peterborough"                 "2017/18"  364
    "Cambridgeshire"     "Huntingdonshire"              "2017/18"   88
    "Cambridgeshire"     "Fenland"                      "2017/18"   41
    "Cheshire"           "Warrington"                   "2017/18"  171
    "Cheshire"           "Cheshire East"                "2017/18"  203
    "Cheshire"           "Cheshire West and Chester"    "2017/18"  243
    "Cheshire"           "Halton"                       "2017/18"  106
    "Cleveland"          "Hartlepool"                   "2017/18"  113
    "Cleveland"          "Middlesbrough"                "2017/18"  280
    "Cleveland"          "Stockton-on-Tees"             "2017/18"  183
    "Cleveland"          "Redcar and Cleveland"         "2017/18"   70
    "Cumbria"            "South Lakeland"               "2017/18"   34
    "Cumbria"            "Eden"                         "2017/18"   11
    "Cumbria"            "Barrow-in-Furness"            "2017/18"   33
    "Cumbria"            "Copeland"                     "2017/18"   15
    "Cumbria"            "Carlisle"                     "2017/18"   78
    "Cumbria"            "Allerdale"                    "2017/18"   25
    "Derbyshire"         "North East Derbyshire"        "2017/18"    7
    "Derbyshire"         "Chesterfield"                 "2017/18"   32
    "Derbyshire"         "Amber Valley"                 "2017/18"   26
    "Derbyshire"         "South Derbyshire"             "2017/18"   28
    "Derbyshire"         "Bolsover"                     "2017/18"   22
    "Derbyshire"         "Derbyshire Dales"             "2017/18"   14
    "Derbyshire"         "High Peak"                    "2017/18"   23
    "Derbyshire"         "Derby"                        "2017/18"  292
    "Derbyshire"         "Erewash"                      "2017/18"   36
    "Devon and Cornwall" "Cornwall"                     "2017/18"  132
    "Devon and Cornwall" "South Devon and Dartmoor"     "2017/18"   59
    "Devon and Cornwall" "Torbay"                       "2017/18"  119
    "Devon and Cornwall" "Plymouth"                     "2017/18"  246
    "Devon and Cornwall" "East and Mid Devon"           "2017/18"   33
    "Devon and Cornwall" "North Devon"                  "2017/18"   54
    "Devon and Cornwall" "Exeter"                       "2017/18"  108
    "Dorset"             "Bournemouth"                  "2017/18"  161
    "Dorset"             "Poole"                        "2017/18"   53
    "Dorset"             "Dorset"                       "2017/18"   76
    "Durham"             "County Durham"                "2017/18"  279
    "Durham"             "Darlington"                   "2017/18"  115
    "Dyfed-Powys"        "Pembrokeshire"                "2017/18"   24
    "Dyfed-Powys"        "Ceredigion"                   "2017/18"   15
    "Dyfed-Powys"        "Powys"                        "2017/18"   27
    "Dyfed-Powys"        "Carmarthenshire"              "2017/18"   70
    "Essex"              "Braintree"                    "2017/18"   45
    "Essex"              "Castle Point"                 "2017/18"   24
    "Essex"              "Basildon"                     "2017/18"  134
    "Essex"              "Colchester"                   "2017/18"  125
    "Essex"              "Southend-on-Sea"              "2017/18"  184
    "Essex"              "Chelmsford"                   "2017/18"  101
    "Essex"              "Harlow"                       "2017/18"   91
    "Essex"              "Thurrock"                     "2017/18"  190
    "Essex"              "Brentwood"                    "2017/18"   37
    "Essex"              "Uttlesford"                   "2017/18"   38
    "Essex"              "Epping Forest"                "2017/18"   96
    "Essex"              "Maldon"                       "2017/18"   14
    "Essex"              "Rochford"                     "2017/18"   20
    "Essex"              "Tendring"                     "2017/18"   86
    "Gloucestershire"    "Forest of Dean"               "2017/18"   14
    "Gloucestershire"    "Stroud"                       "2017/18"   14
    "Gloucestershire"    "Gloucester"                   "2017/18"  114
    "Gloucestershire"    "Cheltenham"                   "2017/18"   82
    "Gloucestershire"    "Cotswold"                     "2017/18"    7
    "Gloucestershire"    "Tewkesbury"                   "2017/18"   11
    "Greater Manchester" "Bury"                         "2017/18"  315
    "Greater Manchester" "Oldham"                       "2017/18"  479
    "Greater Manchester" "Bolton"                       "2017/18"  481
    "Greater Manchester" "Tameside"                     "2017/18"  272
    "Greater Manchester" "Trafford"                     "2017/18"  239
    "Greater Manchester" "Stockport"                    "2017/18"  239
    "Greater Manchester" "Manchester"                   "2017/18" 1989
    "Greater Manchester" "Rochdale"                     "2017/18"  472
    "Greater Manchester" "Wigan"                        "2017/18"  214
    "Greater Manchester" "Salford"                      "2017/18"  498
    "Gwent"              "Blaenau Gwent"                "2017/18"   39
    "Gwent"              "Torfaen"                      "2017/18"   25
    "Gwent"              "Newport"                      "2017/18"  175
    "Gwent"              "Monmouthshire"                "2017/18"   17
    "Gwent"              "Caerphilly"                   "2017/18"  104
    "Hampshire"          "Winchester"                   "2017/18"   57
    "Hampshire"          "North Hampshire"              "2017/18"  262
    "Hampshire"          "Gosport"                      "2017/18"   48
    "Hampshire"          "Eastleigh"                    "2017/18"   56
    "Hampshire"          "East Hampshire"               "2017/18"   45
    "Hampshire"          "Southampton"                  "2017/18"  458
    "Hampshire"          "Isle of Wight"                "2017/18"   82
    "Hampshire"          "Havant"                       "2017/18"   68
    "Hampshire"          "Fareham"                      "2017/18"   53
    "Hampshire"          "New Forest"                   "2017/18"   56
    "Hampshire"          "Portsmouth"                   "2017/18"  380
    "Hampshire"          "Test Valley"                  "2017/18"   56
    end
    Attached Files

  • #2
    I have what I believe is a relatively straightforward merge problem...
    I think not.

    Caveat: I know next to nothing about UK geography, and it may be that some of the names of the local authorities or CSPNames are synonyms, and perhaps somebody familiar with the nomenclature could come up with a -merge- based solution relying on that domain knowledge.

    But on their faces, these data sets cannot be -merge-d at all. Examining the lad_to_csp.dta, it is quickly apparent that there is no 1:many relationship between local authorities and CSPNames in either direction. That is, there are local authorities that are associated with multiple CSPNames and there are CSPNames that are associated with multiple local authorities. So -merge- is out of the question. (Again, perhaps knowledge of the actual geography might eliminate this problem, but I cannot fill that niche.)

    The data sets can be combined using -joinby-, but this is not a suitable solution to O.P.'s problem because then the offense counts and arrivals in the districts and CSPNames that have multiple associations will be multiply counted!

    When I have faced situations like this they have typically involved administrative US districts such as zip (postal) codes and counties, which may exhibit many:many associations. My approach has been to use census data to aggregate up to the larger areas (counties), and for the zip codes that overlap multiple counties, allocate the events (i.e. variables playing the role that arrivals and offences play in O.P.'s problem) in proportion to the overlapping counties' populations. Of course, this allocation in proportion to population is a fairly strong assumption. On its face it is unlikely to be true, but perhaps not terribly far off. In any case, it would only be usable if the areas that participate in many:many relationships are a small subset of the full data, so the error in this proportionality assumption inflicts limited damage overall. Also, evidently, this solution is only possible when population data at the local authority or CSPName level is available.

    Another possible approach is if there is some geographic unit that is many:1 associated to both the local authority and the CSPName. For example, in the US, all zip codes are contained in a single state*, as are all counties. So both the zip-code level data and the county-level data could be aggregated up to the state level and then the state-level data sets merged 1:1. The drawback to this is that this kind of aggregated data may be too coarse-grained for the purposes at hand, and the resulting N may be too small. Moreover, the relationships in aggregated data of this nature may differ from those at the finer-grained level (analogous to ecologic analysis vs individual analysis, the so-called ecologic "fallacy"). I have no idea whether this approach is viable for O.P.'s situation.

    *Added: This is not strictly true. There is are 13 zip codes that overlap more than one state. In the situations I have worked on, none of these multi-state zip-codes have had any data to contribute to my analysis, so I have not had to explicitly exclude them. But excluding 13 out of over 40,000 zipcodes from an analysis does not seem overwhelmingly problematic.
    Last edited by Clyde Schechter; 13 Aug 2025, 09:01.

    Comment


    • #3
      Another possible approach is if there is some geographic unit that is many:1 associated to both the local authority and the CSPName. For example, in the US, all zip codes are contained in a single state*, as are all counties. So both the zip-code level data and the county-level data could be aggregated up to the state level and then the state-level data sets merged 1:1.
      There is another approach, which I have used before (though in the context of Indian districts, not English administrative units) and might be an adequate compromise: construct minimal supersets of CSPNames such that any component local_authority maps only into that superset and not outside; aggregate both datasets to those super-CSPName regions, and then do the merge. For instance, I note that the local_authority "Bournemouth, Christchurch and Poole" (E06000058) maps to three CSPNames -- Poole (E22000067), Dorset (E22000367), and Bournemouth (E22000063). So we aggregate these three into one region (this corresponds to 4 rows in the dataset you attached). In most cases, no aggregation will be needed, since the many:1 mappings are relatively few.

      The basic principle here is the same as the one in Clyde's quote. What we lose in a lack of natural interpretation of units (in his case, "states" as natural agglomerates of "ZIP codes"), we gain in far less coarseness of the unit of analysis and a far larger N.

      Comment

      Working...
      X