Count how many times an individual lies in the same cluster as another one

Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#1

Count how many times an individual lies in the same cluster as another one

12 Mar 2023, 04:52

Hi all, I have a database that comprises cities divided into clusters for each year. In other words, I applied a community detection algorithm for different databases containing cities in different years base on modularity.
The final database looks like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int v1 str21 city byte cluster float year 0 "city1" 0 2000. 1 "city2" 2. 2000 2 "city3" 1. 2000 3 "city4" 0 2000 4 "city5" 2 2000 0 "city1" 2 2001 1 "city2" 1 2001 2 "city3" 0 2001 3 "city4" 0 2001 4 "city5" 0 2001 0 "city1" 1 2002 1 "city2" 2 2002 2 "city3" 0 2002 3 "city4" 0 2002 4 "city5" 1 2002 end

Now what would like to do is counting how many times a city ends up in the same cluster as another city each year.
So in the mock example above I should end up with a 5 times 5 matrix where rows and columns are cities where each entry represent the number of times that city I and j are in the same cluster (independently of which cluster) in all years.

Thank you
Tags: counting, data, panel data, Suggestion
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#2

12 Mar 2023, 12:16

Unless you plan on doing some actual matrix algebra, it is unlikely that creating a matrix with this information will prove useful for subsequent analysis. So the following code instead creates a data set of city pairs and a count of the number of years in which they appear in the same cluster.

Code:

preserve rename (v1 city cluster) =2 tempfile copy save `copy' restore rename (v1 city cluster) =1 joinby year using `copy' keep if city2 < city1 gen byte same_cluster = (cluster1 == cluster2) collapse (sum) same_cluster, by(city1 city2)

If you really do need a matrix, you can reshape this result to wide and then use the -mkmat- command to get that.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#3

12 Mar 2023, 13:43

Actually, here's another way that's a bit more efficient, and requires much less memory. Both of these factors will matter if the real data set is appreciably large.

Code:

preserve rename (v1 city) =2 tempfile copy save `copy' restore rename (v1 city) =1 joinby year cluster using `copy' keep if city2 != city1 contract city1 city2, freq(same_cluster) fillin city1 city2 replace same_cluster = 0 if _fillin keep if city1 < city2

Last edited by Clyde Schechter; 12 Mar 2023, 13:49.
1 like
Comment
Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#4

04 Aug 2023, 04:09

Clyde Schechter thanks a lot. It works!
Comment

Announcement

Count how many times an individual lies in the same cluster as another one

Comment

Comment

Comment