Identify all groups that have a common element

Jim Conklin

Join Date: Nov 2021

Posts: 3
#1

Identify all groups that have a common element

04 Nov 2021, 19:40

I am working with real estate appraisal data. Each appraisal (appr_id) has multiple comps (comp_id) associated with it. Comps can be used across multiple appraisals. My data contains 27 million observations, and looks like:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float unique_id str1(appr_id comp_id) 1 "1" "h" 2 "1" "c" 3 "1" "s" 4 "2" "a" 5 "2" "h" 6 "2" "m" 7 "3" "a" 8 "3" "y" 9 "3" "x" 10 "4" "p" 11 "4" "q" 12 "5" "z" 13 "5" "p" 14 "6" "w" 15 "6" "t" end

Appraisal 1 has a comp in common with appraisal 2 (h), and appraisal 3 has a comp in common with appraisal 2 (a), so appraisals 1-3 should have the same value in the new variable. Appraisal 4 and appraisal 5 both use p as a comp, so they should have the same value in the new variable. Appraisal 6 does not share a comp with any of the other appraisals, so it should have its own value in the new variable. The final dataset should look like:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float unique_id str1(appr_id comp_id) float newvar 1 "1" "h" 1 2 "1" "c" 1 3 "1" "s" 1 4 "2" "a" 1 5 "2" "h" 1 6 "2" "m" 1 7 "3" "a" 1 8 "3" "y" 1 9 "3" "x" 1 10 "4" "p" 2 11 "4" "q" 2 12 "5" "z" 2 13 "5" "p" 2 14 "6" "w" 3 15 "6" "t" 3 end

I'm guessing there's an easy solution, but I've hit the wall. Any help would be greatly appreciated. Thanks!
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#2

04 Nov 2021, 20:07

It's not clear how your logic relates the example data to the expected solution. At the very least, the set of rules of incomplete. For example, why would appraisals 1, 2 and 3 all get the same value of 1, when they share two different comps ? What happens if any two appraisals share more than one comp?
Comment
Jim Conklin

Join Date: Nov 2021

Posts: 3
#3

04 Nov 2021, 21:36

Sorry that was my explanation was incomplete. Within an appraisal, all of the comps are essentially identical properties. Roughly speaking,

h = s = c (all in appraisal 1).

a = h = m (all in appraisal 2)

h is the same comp, but just used in different appraisals. Because h shows up in both appraisals, those 5 properties are basically the same ( h = s = c = a = m.).

The three comps in appraisal 3 are basically all the same property (a=y=x), and because a is also in appraisal 2, then

h = s = c = a = m = y = x.

All of those properties (comp_id) get the same value of newvar, 1.

From appraisal 4 (p = q) and 5 (z = p), we get:

p = q = z

All three of those comps get the same value of newvar, 2.

From appraisal 6, w = t. w and t do not appear in any of the other appraisals, so get the same value of newvar, 3.

I hope this makes more sense. Sorry for the confusion..
Comment

Fei Wang

Join Date: Oct 2021
Posts: 726

05 Nov 2021, 06:19

Not quite sure if it works for more complicated cases.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float unique_id str1(appr_id comp_id)
 1 "1" "h"
 2 "1" "c"
 3 "1" "s"
 4 "2" "a"
 5 "2" "h"
 6 "2" "m"
 7 "3" "a"
 8 "3" "y"
 9 "3" "x"
10 "4" "p"
11 "4" "q"
12 "5" "z"
13 "5" "p"
14 "6" "w"
15 "6" "t"
end

sort comp_id
gen n = sum(comp_id != comp_id[_n-1])
bys n: replace n = . if _N == 1
local a = 1
local c = 1
while `a' > 0 | `c' > 0 {
    bys appr_id (n): replace n = n[_n-1] if n[_n-1] != .
    bys comp_id (n): egen c = sd(n)
    quietly sum c
    local c = r(mean)
    
    bys comp_id (n): replace n = n[_n-1] if n[_n-1] != .
    bys appr_id (n): egen a = sd(n)
    quietly sum a
    local a = r(mean)
    
    drop a c
}

sort n appr_id
egen g = group(n)
replace g = g[_n-1] + (appr_id != appr_id[_n-1] & g == .) if g == .
drop n
sort unique_id

Code:

. list

     +----------------------------------+
     | unique~d   appr_id   comp_id   g |
     |----------------------------------|
  1. |        1         1         h   1 |
  2. |        2         1         c   1 |
  3. |        3         1         s   1 |
  4. |        4         2         a   1 |
  5. |        5         2         h   1 |
     |----------------------------------|
  6. |        6         2         m   1 |
  7. |        7         3         a   1 |
  8. |        8         3         y   1 |
  9. |        9         3         x   1 |
 10. |       10         4         p   2 |
     |----------------------------------|
 11. |       11         4         q   2 |
 12. |       12         5         z   2 |
 13. |       13         5         p   2 |
 14. |       14         6         w   3 |
 15. |       15         6         t   3 |
     +----------------------------------+

Comment

Jim Conklin

Join Date: Nov 2021

Posts: 3
#5

05 Nov 2021, 08:55

This is great. Thank you!
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

05 Nov 2021, 10:30

You can use group_id (from SSC) to group appr_id identifiers that share one or more comp_id identifiers:

Code:

* create a numeric identifier for appraisals
egen long newid = group(appr_id)

* group appraisals if they share comps
group_id newid, match(comp_id)

list, sepby(newid)

Code:

. list, sepby(newid)

     +--------------------------------------+
     | unique~d   appr_id   comp_id   newid |
     |--------------------------------------|
  1. |        1         1         h       1 |
  2. |        2         1         c       1 |
  3. |        3         1         s       1 |
  4. |        4         2         a       1 |
  5. |        5         2         h       1 |
  6. |        6         2         m       1 |
  7. |        7         3         a       1 |
  8. |        8         3         y       1 |
  9. |        9         3         x       1 |
     |--------------------------------------|
 10. |       10         4         p       4 |
 11. |       11         4         q       4 |
 12. |       12         5         z       4 |
 13. |       13         5         p       4 |
     |--------------------------------------|
 14. |       14         6         w       6 |
 15. |       15         6         t       6 |
     +--------------------------------------+

.

Announcement

Identify all groups that have a common element

Comment

Comment

Comment

Comment

Comment