Hi all,
I have a question about how to do cluster analysis in Stata.
My data looks like this:
What I want to do is to classify the consumers whose estate addresses are within 5 kilometers into one cluster and then calculate some variables of interest within and without the cluster.
I encountered two difficulties during the process. The first one is how to create a loop to calculate the distance among estates automatically since I have lots of estates in the dataset. I know the command to calculate the distance between two locations is
and previously, I have a dataset where there's one consumer corresponding to 1 location, and I have the command as below: but now there are multiple consumers in each building and I don't know how to revise the code
Second, I don't know how to cluster the consumers based on the distance calculation because it seems that there would be some kind of high-dimensional data.
I'm not sure if I made myself clear, please let me know if you have any idea of how to achieve this in Stata.
Thank you so much!
I have a question about how to do cluster analysis in Stata.
My data looks like this:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte consumer float(longitude latitude) str1 estate_name 1 114.154 22.2488 "A" 2 114.154 22.2488 "A" 3 114.154 22.2488 "A" 4 114.154 22.2488 "A" 5 113.975 22.4032 "B" 6 113.975 22.4032 "B" 7 113.975 22.4032 "B" 8 114.244 22.4287 "C" 9 114.244 22.4287 "C" 10 114.229 22.2811 "D" 11 114.229 22.2811 "D" 12 114.229 22.2811 "D" 13 114.104 22.3783 "E" 14 114.217 22.3251 "F" 15 114.217 22.3251 "F" 16 114.151 22.2445 "G" 17 114.147 22.334 "H" 18 114.147 22.334 "H" 19 114.147 22.334 "H" 20 114.147 22.334 "H" 21 114.262 22.3064 "I" 22 114.262 22.3064 "I" 23 114.061 22.3674 "J" end
I encountered two difficulties during the process. The first one is how to create a loop to calculate the distance among estates automatically since I have lots of estates in the dataset. I know the command to calculate the distance between two locations is
Code:
geodist lat1 lon1 lat2 lon2
Code:
forval i = 1/`=_N' { local olat = latitude[`i'] local olong = longitude[`i'] // note misspelling of longitude in your example geodist latitude longitude `olat' `olong', gen(dist`i') }
Second, I don't know how to cluster the consumers based on the distance calculation because it seems that there would be some kind of high-dimensional data.
I'm not sure if I made myself clear, please let me know if you have any idea of how to achieve this in Stata.
Thank you so much!