Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Code for connecting neighborhoods

    Good afternoon,

    I have a dataset that includes four variables tract1 tract2 change1 change2.

    Variable Definitions:
    Tract1 refers to each base census tract.
    Tract2 refers to all census tracts that Tract1 is touching.
    Change1 refers to if the binary change has occurred in tract1.
    Change 2 refers to if the binary change has occurred in tract2


    The structure of the database is trying to identify geographic chains of variables that have undergone a change.

    For a visual understanding, I produced a small chart that represents census tracts.
    0 0 0 0 0 1
    0 0 0 0 0 0
    0 0 0 0 1 1
    1 0 0 1 1 0
    1 0 0 1 1 0


    In that table, F1 would identify as 1 (a grouping of one tract changing), A4 and A5 would identify as 2, and D4 D5 E3 E4 E5 and F3 would identify as 6.


    Below is a subset of the dataset.

    For example in the dataset: tract1 G0100210060700 touches G0100010020900 G0100010021000 G0100210060101 and G0100210060200.

    Tract1 G0100210060700 did not undergo any change and out of the four neighboring tracts, only G0100210060101 experienced that change.




    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str14(tract1 tract2) float(change1 change2)
    "G0100210060700" "G0100010020900" 0 0
    "G0100210060600" "G0100010021000" 0 0
    "G0100210060700" "G0100010021000" 0 0
    "G0100210060500" "G0100010021000" 0 0
    "G0100070010003" "G0100070010001" 0 0
    "G0100070010004" "G0100070010001" 1 0
    "G0100070010002" "G0100070010001" 0 0
    "G0100730014203" "G0100070010002" 1 0
    "G0100070010003" "G0100070010002" 0 0
    "G0101170030334" "G0100070010002" 1 0
    "G0100070010001" "G0100070010002" 0 0
    "G0100070010001" "G0100070010003" 0 0
    "G0100210060500" "G0100070010003" 0 0
    "G0101170030405" "G0100070010003" 1 0
    "G0100070010004" "G0100070010003" 1 0
    "G0101170030334" "G0100070010003" 1 0
    "G0101170030408" "G0100070010003" 1 0
    "G0100210060402" "G0100070010003" 1 0
    "G0100070010002" "G0100070010003" 0 0
    "G0100070010003" "G0100070010004" 0 1
    "G0100070010001" "G0100070010004" 0 1
    "G0100090050102" "G0100090050101" 1 1
    "G0100090050700" "G0100090050101" 0 1
    "G0101150040402" "G0100090050101" 0 1
    "G0101150040501" "G0100090050101" 0 1
    "G0100090050700" "G0100090050102" 0 1
    "G0101150040402" "G0100090050102" 0 1
    "G0100090050101" "G0100090050102" 1 1
    "G0100090050300" "G0100090050102" 1 1
    "G0100090050200" "G0100090050102" 1 1
    "G0100090050500" "G0100090050200" 0 1
    "G0100090050300" "G0100090050200" 1 1
    "G0100090050102" "G0100090050200" 1 1
    "G0100090050400" "G0100090050200" 0 1
    "G0100090050700" "G0100090050200" 0 1
    "G0100090050500" "G0100090050300" 0 1
    "G0100090050200" "G0100090050300" 1 1
    "G0100090050400" "G0100090050300" 0 1
    "G0100090050102" "G0100090050300" 1 1
    "G0100090050200" "G0100090050400" 1 0
    "G0100090050500" "G0100090050400" 0 0
    "G0100090050300" "G0100090050400" 1 0
    "G0100090050400" "G0100090050500" 0 0
    "G0100090050700" "G0100090050500" 0 0
    "G0100090050602" "G0100090050500" 0 0
    "G0100090050300" "G0100090050500" 1 0
    "G0100090050200" "G0100090050500" 1 0
    "G0100730011400" "G0100090050601" 0 0
    "G0100090050602" "G0100090050601" 0 0
    "G0101270021900" "G0100090050601" 0 0
    "G0100090050500" "G0100090050602" 0 0
    "G0100730011400" "G0100090050602" 0 0
    "G0100730011302" "G0100090050602" 0 0
    "G0100090050601" "G0100090050602" 0 0
    "G0100730011301" "G0100090050602" 1 0
    "G0100090050700" "G0100090050602" 0 0
    "G0100730011206" "G0100090050700" 0 0
    "G0100090050500" "G0100090050700" 0 0
    "G0100090050602" "G0100090050700" 0 0
    "G0100730011110" "G0100090050700" 0 0
    "G0100090050101" "G0100090050700" 1 0
    "G0100090050200" "G0100090050700" 1 0
    "G0100090050102" "G0100090050700" 1 0
    "G0101150040501" "G0100090050700" 0 0
    "G0100730011301" "G0100090050700" 1 0
    "G0101150040300" "G0100150002600" 0 0
    "G0101150040401" "G0100150002600" 0 0
    "G0100210060102" "G0100210060101" 1 1
    "G0100210060600" "G0100210060101" 0 1
    "G0100210060200" "G0100210060101" 0 1
    "G0100210060700" "G0100210060101" 0 1
    "G0100210060101" "G0100210060102" 1 1
    "G0100210060200" "G0100210060102" 0 1
    "G0100210060600" "G0100210060102" 0 1
    "G0100210060401" "G0100210060102" 0 1
    "G0100210060300" "G0100210060102" 0 1
    "G0100210060401" "G0100210060200" 0 0
    "G0100210060700" "G0100210060200" 0 0
    "G0100210060101" "G0100210060200" 1 0
    "G0100210060300" "G0100210060200" 0 0
    "G0100210060102" "G0100210060200" 1 0
    "G0100210060102" "G0100210060300" 1 0
    "G0100210060401" "G0100210060300" 0 0
    "G0101170030704" "G0100210060300" 0 0
    "G0100210060402" "G0100210060300" 1 0
    "G0101170030502" "G0100210060300" 0 0
    "G0100210060200" "G0100210060300" 0 0
    "G0100210060102" "G0100210060401" 1 0
    "G0100210060200" "G0100210060401" 0 0
    "G0100210060402" "G0100210060401" 1 0
    "G0100210060600" "G0100210060401" 0 0
    "G0100210060500" "G0100210060401" 0 0
    "G0100210060300" "G0100210060401" 0 0
    "G0100070010003" "G0100210060402" 0 1
    "G0100210060300" "G0100210060402" 0 1
    "G0101170030502" "G0100210060402" 0 1
    "G0101170030501" "G0100210060402" 0 1
    "G0100210060500" "G0100210060402" 0 1
    "G0100210060401" "G0100210060402" 0 1
    "G0101170030405" "G0100210060402" 1 1
    end
    ------------------ copy up to and including the previous line ------------------

    Best,
    Damon


  • #2
    Hi Damon, I'm not quite sure what you are asking...

    Comment


    • #3
      There was no question asked here. I don't get the problem.

      Comment


      • #4
        I am looking at a dataset of how census tracts change over time. While I have the information on if a single census tract transitions (the binary variable), I am trying to identify how large of a "neighborhood" transitions. For example, if 7 neighboring census tracts undergo a transition, that is fundamentally different than a single isolated census tract.

        Pulled from the original file, I know that G0100090050102 underwent a transition as well as least 3 neighboring census tracts. "G0100090050102" "G0100090050101" 1 1
        "G0100090050102" "G0100090050200" 1 1
        "G0100090050102" "G0100090050300" 1 1
        I can create a code to identify the number of direct neighbors that transition for each census tract. Something like the following would likely suffice:

        bysort tract1: egen transition = count(change2) if change2 == 1
        replace transition = 0 if change1 == 0


        But the key question is how to add the neighbors of the subsequent census tracts? Starting from G0100090050102 there is a cluster of 4 (including itself). How can I identify if any of the three other census tracts touch another tract that underwent a change, excluding the census tracts already counted?


        Let me know if that clarifies.

        Best,
        Damon



        Comment


        • #5
          Hi Damon,

          So if I understand you correctly, you have an edge list that represents adjacent census tracts. Some census tracks have undergone a transition of some sort. You are looking for an algorithm that looks for connected components of the graph that have undergone a transition. You have code that generates the number of neighbors of a node who have undergone a transition, but you can't get the count of neighbors of neighbors, and so on. Is that correct?

          Comment


          • #6
            Hi Damon,

            First, I should point out that I believe there is a problem in the code that you provided above - correct me if I am wrong, but I believe the ego node is not counted as part of the total number of transitions.

            Code:
            bysort tract1: egen transition = count(change2) if change2 == 1
            replace transition = 0 if change1 == 0
            * add on each ego node here.
            replace transition = transition + change1
            I see you haven't had a chance to respond yet to my previous comment. However, if I have indeed correctly characterized your problem, then I know of a few solutions. Unfortunately, I'm afraid none of them are easy to implement. I have, of course, been wrong before, and if any other poster has an elegant solution, please do post it.

            The usual (not Stata specific) way to do this is with a depth first search algorithm on your graph. I've been trying to think up a clever vectorized way to do this with your adjacency list, but sufficed to say this would likely require the careful use of for loops and some recursive command calls. You could also potentially implement your own disjoint set data structure, but I don't feel that this is straightforward in Stata's ado language.

            There is perhaps an elegant linear-algebra style solution using the Laplacian matrix, but that might be more suitable for Mata. I think one could also implement a variation of Dijkstra's algorithm on the Adjacency matrix as well, and that is how I would probably approach this with Mata. I wonder if William Lisowski has anything to add.

            This is a tough problem, and I imagine this is not what you were hoping to hear. I'm sorry I couldn't be of more help.
            Last edited by Daniel Schaefer; 01 Jun 2022, 13:00. Reason: Fixed code tags

            Comment


            • #7
              Originally posted by Daniel Schaefer View Post
              Hi Damon,

              So if I understand you correctly, you have an edge list that represents adjacent census tracts. Some census tracks have undergone a transition of some sort. You are looking for an algorithm that looks for connected components of the graph that have undergone a transition. You have code that generates the number of neighbors of a node who have undergone a transition, but you can't get the count of neighbors of neighbors, and so on. Is that correct?
              Yes, that is correct, and thanks for your help regardless.

              Comment

              Working...
              X