Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count how many times an individual lies in the same cluster as another one

    Hi all, I have a database that comprises cities divided into clusters for each year. In other words, I applied a community detection algorithm for different databases containing cities in different years base on modularity.
    The final database looks like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int v1 str21 city byte cluster float year
    0 "city1"  0  2000. 
    1 "city2"  2. 2000
    2 "city3" 1.  2000
    3 "city4" 0  2000
    4 "city5" 2  2000
    0 "city1" 2  2001
    1 "city2" 1 2001
    2 "city3" 0 2001
    3 "city4" 0 2001
    4 "city5" 0 2001
    0 "city1" 1 2002
    1 "city2" 2 2002
    2 "city3" 0  2002
    3 "city4" 0 2002
    4 "city5" 1 2002
    end
    Now what would like to do is counting how many times a city ends up in the same cluster as another city each year.
    So in the mock example above I should end up with a 5 times 5 matrix where rows and columns are cities where each entry represent the number of times that city I and j are in the same cluster (independently of which cluster) in all years.

    Thank you

  • #2
    Unless you plan on doing some actual matrix algebra, it is unlikely that creating a matrix with this information will prove useful for subsequent analysis. So the following code instead creates a data set of city pairs and a count of the number of years in which they appear in the same cluster.

    Code:
    preserve
    rename (v1 city cluster) =2
    tempfile copy
    save `copy'
    
    restore
    rename (v1 city cluster) =1
    joinby year using `copy'
    
    keep if city2 < city1
    
    gen byte same_cluster = (cluster1 == cluster2)
    collapse (sum) same_cluster, by(city1 city2)
    If you really do need a matrix, you can reshape this result to wide and then use the -mkmat- command to get that.

    Comment


    • #3
      Actually, here's another way that's a bit more efficient, and requires much less memory. Both of these factors will matter if the real data set is appreciably large.

      Code:
      preserve
      rename (v1 city) =2
      tempfile copy
      save `copy'
      
      restore
      rename (v1 city) =1
      joinby year cluster using `copy'
      
      keep if city2 != city1
      contract city1 city2, freq(same_cluster)
      fillin city1 city2
      replace same_cluster = 0 if _fillin
      keep if city1 < city2
      Last edited by Clyde Schechter; 12 Mar 2023, 13:49.

      Comment


      • #4
        Clyde Schechter thanks a lot. It works!

        Comment

        Working...
        X