Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • matching single county with unique congressional district

    According to us house election data a single county for a specific state belongs to multiple congressional districts. However, based on the total geographical area I can match unique county to a specific district for that state.

    In the following data nhgisnam means the name of the county , district means Congressional district , cnty_area means the total geographical area of that county and cnty_part_area means the portion of total area of that county belonging to that particular congressional district. For example : In th first line for Autauga county the cnty_part_area's value is 1564828723. That means in district 2 Autauga county's total area is 1564828723 - out of its total area (cnty_area) of 1565529773. Autauga county also belong to district 2, district 7 and district 6 of state 1. But, the highestr portion of it's area belong to district 2 which I can figure out from that cnty_part_area variable.

    Can anyone kindly guide me how I can code the data so that for each county in a particular state I can keep the observation where each county is assigned to a single district in that state based on the highest value of cnty_part_area variable for that specific county ??

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str36 nhgisnam str14 statenam byte(cd_statefip district) float county double(cnty_area cnty_part_area)
    "Autauga"       "Alabama" 1 2 1001 1565529773  1564828723
    "Autauga"       "Alabama" 1 6 1001 1565529773 272866.6476
    "Autauga"       "Alabama" 1 7 1001 1565529773 428181.5623
    "Baldwin"       "Alabama" 1 1 1003 4232265763  4228366412
    "Baldwin"       "Alabama" 1 7 1003 4232265763 149732.2219
    "Barbour"       "Alabama" 1 2 1005 2342716428  2341574170
    "Barbour"       "Alabama" 1 3 1005 2342716428 601774.0707
    "Bibb"          "Alabama" 1 6 1007 1621774445  1621293818
    "Bibb"          "Alabama" 1 7 1007 1621774445 480626.7196
    "Alameda"      "California" 6  9 6001  1928023778 347196605.8
    "Alameda"      "California" 6 10 6001  1928023778 350508415.5
    "Alameda"      "California" 6 11 6001  1928023778 654388620.1
    "Alameda"      "California" 6 13 6001  1928023778 571141686.4
    "Alameda"      "California" 6 15 6001  1928023778  77415.0506
    "Alpine"       "California" 6  3 6003  1924880954  1923718768
    "Alpine"       "California" 6  4 6003  1924880954 194564.3245
    "Alpine"       "California" 6 19 6003  1924880954 163767.2729
    "Alpine"       "California" 6 25 6003  1924880954 326582.7162
    "Amador"       "California" 6  3 6005  1566159667  1565521028
    "Amador"       "California" 6  4 6005  1566159667 637209.1726
    "Amador"       "California" 6 11 6005  1566159667 1425.507371
    "Glenn"        "California" 6  1 6021  3437311730 146543.9293
    "Glenn"        "California" 6  2 6021  3437311730  3437165187
    "Humboldt"     "California" 6  1 6023  9286259251  9282075925
    "Humboldt"     "California" 6  2 6023  9286259251 589746.8231
    "Los Angeles"  "California" 6 22 6037 10591484333  1402787367
    "Los Angeles"  "California" 6 24 6037 10591484333 350751.8175
    "Los Angeles"  "California" 6 25 6037 10591484333  4108619373
    "Los Angeles"  "California" 6 26 6037 10591484333  1272584266
    "Los Angeles"  "California" 6 27 6037 10591484333 392578311.2
    "Los Angeles"  "California" 6 28 6037 10591484333 201732469.2
    "Los Angeles"  "California" 6 29 6037 10591484333 262922748.4
    "Los Angeles"  "California" 6 30 6037 10591484333 742182418.1
    end

  • #2
    Is the following coding going to work to get what I'm looking for ??

    Code:
    * Group the data by county and state
    groupby cd_statefip district county
    
    
    * Find the maximum cnty_part_area for each county and state
    egen max_cnty_part_area = max(cnty_part_area)
    
    * Keep only the observations where cnty_part_area is equal to max_cnty_part_area
    keep if cnty_part_area == max_cnty_part_area

    Comment


    • #3
      In theory this should do it.
      Code:
      bysort cd_statefip district (cnty_part_area): generate tokeep = _n==_N
      format %16.0fc cnty_part_area
      list cd_statefip district statenam nhgisnam cnty_part_area tokeep, sepby(cd_statefip district) noobs
      But looking at the output tells us you have data problems, because as a former resident of Los Angeles I can tell you that Alpine County is nowhere near Los Angeles County, and they would not have a congressional district split between them.
      Code:
        +-------------------------------------------------------------------------+
        | cd_sta~p   district     statenam      nhgisnam   cnty_part_a~a   tokeep |
        |-------------------------------------------------------------------------|
        |        1          1      Alabama       Baldwin   4,228,366,412        1 |
        |-------------------------------------------------------------------------|
        |        1          2      Alabama       Autauga   1,564,828,723        0 |
        |        1          2      Alabama       Barbour   2,341,574,170        1 |
        |-------------------------------------------------------------------------|
        |        1          3      Alabama       Barbour         601,774        1 |
        |-------------------------------------------------------------------------|
        |        1          6      Alabama       Autauga         272,867        0 |
        |        1          6      Alabama          Bibb   1,621,293,818        1 |
        |-------------------------------------------------------------------------|
        |        1          7      Alabama       Baldwin         149,732        0 |
        |        1          7      Alabama       Autauga         428,182        0 |
        |        1          7      Alabama          Bibb         480,627        1 |
        |-------------------------------------------------------------------------|
        |        6          1   California         Glenn         146,544        0 |
        |        6          1   California      Humboldt   9,282,075,925        1 |
        |-------------------------------------------------------------------------|
        |        6          2   California      Humboldt         589,747        0 |
        |        6          2   California         Glenn   3,437,165,187        1 |
        |-------------------------------------------------------------------------|
        |        6          3   California        Amador   1,565,521,028        0 |
        |        6          3   California        Alpine   1,923,718,768        1 |
        |-------------------------------------------------------------------------|
        |        6          4   California        Alpine         194,564        0 |
        |        6          4   California        Amador         637,209        1 |
        |-------------------------------------------------------------------------|
        |        6          9   California       Alameda     347,196,606        1 |
        |-------------------------------------------------------------------------|
        |        6         10   California       Alameda     350,508,416        1 |
        |-------------------------------------------------------------------------|
        |        6         11   California        Amador           1,426        0 |
        |        6         11   California       Alameda     654,388,620        1 |
        |-------------------------------------------------------------------------|
        |        6         13   California       Alameda     571,141,686        1 |
        |-------------------------------------------------------------------------|
        |        6         15   California       Alameda          77,415        1 |
        |-------------------------------------------------------------------------|
        |        6         19   California        Alpine         163,767        1 |
        |-------------------------------------------------------------------------|
        |        6         22   California   Los Angeles   1,402,787,367        1 |
        |-------------------------------------------------------------------------|
        |        6         24   California   Los Angeles         350,752        1 |
        |-------------------------------------------------------------------------|
        |        6         25   California        Alpine         326,583        0 |
        |        6         25   California   Los Angeles   4,108,619,373        1 |
        |-------------------------------------------------------------------------|
        |        6         26   California   Los Angeles   1,272,584,266        1 |
        |-------------------------------------------------------------------------|
        |        6         27   California   Los Angeles     392,578,311        1 |
        |-------------------------------------------------------------------------|
        |        6         28   California   Los Angeles     201,732,469        1 |
        |-------------------------------------------------------------------------|
        |        6         29   California   Los Angeles     262,922,748        1 |
        |-------------------------------------------------------------------------|
        |        6         30   California   Los Angeles     742,182,418        1 |
        +-------------------------------------------------------------------------+

      Comment


      • #4
        Mr. Lisowski,

        This worked perfectly.

        When initially started working on this conversion felt I'd not be able to go very far with it due to data constraint and unavailability. But, with you and other honorable mentors' help of statalist , now it feels just few more steps away from executing an insurmountable task! Truly obliged for this wonderful community !

        Comment


        • #5
          Unfortunately, the problem is still there. the data looks like the following. The single county is still belonging to multiple district within the same state. Given that for county 1001 , cnty_part_area for district 2 within state 1 is the highest , the rest of the 2 districts ( district 6 and district 7) in first and third line shouldn't have been there anymore in my data after executing the above code. But, it's still there. I may have done something wrong, but I' not sure what did I do.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input double cnty_area byte(statefip district) float county double(cd_area cnty_part_area) float tokeep
          1565529773 1 7 1001 22740093504 428181.5623 0
          1565529773 1 2 1001 27471139257  1564828723 0
          1565529773 1 6 1001 12040322876 272866.6476 0
          4232265763 1 1 1003 16566047005  4228366412 1
          4232265763 1 7 1003 22740093504 149732.2219 0
          2342716428 1 2 1005 27471139257  2341574170 0
          2342716428 1 3 1005 20681592749 601774.0707 0
          1621774445 1 6 1007 12040322876  1621293818 0
          1621774445 1 7 1007 22740093504 480626.7196 0
          1685071136 1 6 1009 12040322876 154701.6124 0
          1685071136 1 4 1009 22077169282  1684916430 0
          1621500980 1 2 1011 27471139257  1621063663 0
          1621500980 1 3 1011 20681592749 437313.0598 0
          2014835522 1 1 1013 16566047005 48220.37588 0
          2014835522 1 2 1013 27471139257  2014715107 0
          2014835522 1 7 1013 22740093504 72190.20342 0
          1585923967 1 4 1015 22077169282 152356.9656 0
          1585923967 1 3 1015 20681592749  1585666312 0
          1928023778 6 11 6001  5993538058 654388620.1 0
           1928023778 6 13 6001 571246593.4 571141686.4 1
           1928023778 6 10 6001  2676574696 350508415.5 0
           1928023778 6 15 6001 744490529.1  77415.0506 0
           1928023778 6  9 6001 347345064.7 347196605.8 1
           1924880954 6  4 6003 44439160065 194564.3245 0
           1924880954 6  3 6003  8860803999  1923718768 0
           1924880954 6 25 6003 56000534108 326582.7162 0
           1924880954 6 19 6003 17561371799 163767.2729 0
           1566159667 6  3 6005  8860803999  1565521028 0
           1566159667 6  4 6005 44439160065 637209.1726 0
           1566159667 6 11 6005  5993538058 1425.507371 0
           4343701551 6  2 6007 56920751720  3351472693 0
           4343701551 6  4 6007 44439160065 992228851.4 0
           2685417635 6 11 6009  5993538058 3339.832601 0
           2685417635 6  3 6009  8860803999  2684755304 1
           2685417635 6 19 6009 17561371799  658991.995 0
          end

          Comment


          • #6
            Removed, more to come.

            Comment


            • #7
              I misunderstood.

              From your original topic posted a few days ago "house election district to county conversion" I understood your objective to be to match each district to a single county. That was probably a misunderstanding, and in any event I should have read the title of this topic more carefully.

              So my answer was based on thinking you want to match each district to the county in which the largest portion of the district lies. That is what makes sense for counties like Los Angeles, which span multiple districts. To do the reverse and assign Los Angeles to a single district does not make sense to me.

              I see now that in less populated areas, the reverse is appropriate. In Wyoming, with a single district, it makes sense to assign each county to the one district. And it would not make sense to assign the district to a single county.

              But I see no way of handling both the county of Los Angeles and the state of Wyoming consistently with a single rule. I return to my advice from your original topic.

              I would not tackle this unless it were possible to obtain Census data by congressional district for each election year, using the district boundaries in effect for that year's elections.
              and bow out of this discussion, but first reminding you that whatever your source of data was, anything that puts part of Alpine County in the same congressional district as part of Los Angeles County is simply wrong.

              Comment


              • #8
                though, it didn't work out , a huge thanks for so patiently guiding me through the whole process. Now, I have to delete the observations for counties by hand.

                Still, learnt a lot about the whole us congressional district with your knowledge and rest of the mentors' help. Very much appreciate this kind gesture !

                Comment

                Working...
                X