identify peers

River Huang

Join Date: Mar 2016
Posts: 1908

03 Nov 2018, 01:08

Dear All, I found this question here (http://bbs.pinggu.org/forum.php?mod=...=1#pid54489763). The data is,

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int stkcd str10 reptdt str1 name str12 university str2 investor1 str12 graduate1
2 "2008-06-25" "A" ""           "BB" "MIT" 
2 "2008-06-25" "B" "Harvard"    "AA" "UCLA"
2 "2008-06-25" "C" ""           "AA" "UCLA"
2 "2008-06-25" "D" ""           "AA" "UCLA"
2 "2008-06-25" "E" ""           "AA" "UCLA"
2 "2008-06-25" "F" "MIT"        "AA" "UCLA"
2 "2008-06-25" "G" ""           "AA" "UCLA"
3 "2008-07-27" "A" ""           "BB" "MIT" 
3 "2008-07-27" "B" "Boston"     "AA" "UCLA"
3 "2008-07-27" "C" ""           "AA" "UCLA"
3 "2008-07-27" "D" ""           "AA" "UCLA"
3 "2008-07-27" "E" ""           "AA" "UCLA"
3 "2008-07-27" "F" "Vanderbilt" "AA" "UCLA"
3 "2008-07-27" "G" ""           "AA" "UCLA"
end

For each `stkcd', if any element in `university' also appears in `graduate1', then return 1 for this `stkcd', 0 otherwise. Thus, for stkcd=2, the returned value is 1, and for stkcd=3, the returned value is 0. Any suggestion is highly appreciated.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10216

03 Nov 2018, 06:41

I change your data example somewhat to incorporate more than 2 groups.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int stkcd str10 reptdt str1 name str12 university str2 investor1 str12 graduate1
2 "2008-06-25" "A" ""           "BB" "MIT"
2 "2008-06-25" "B" "Harvard"    "AA" "UCLA"
2 "2008-06-25" "C" ""           "AA" "UCLA"
2 "2008-06-25" "D" ""           "AA" "UCLA"
2 "2008-06-25" "E" ""           "AA" "UCLA"
2 "2008-06-25" "F" "MIT"        "AA" "UCLA"
2 "2008-06-25" "G" "UCLA"           "AA" "UCLA"
3 "2008-07-27" "A" ""           "BB" "MIT"
3 "2008-07-27" "B" "Boston"     "AA" "UCLA"
3 "2008-07-27" "C" ""           "AA" "UCLA"
3 "2008-07-27" "D" ""           "AA" "UCLA"
3 "2008-07-27" "E" ""           "AA" "UCLA"
3 "2008-07-27" "F" "Vanderbilt" "AA" "UCLA"
3 "2008-07-27" "G" ""           "AA" "UCLA"
4 "2008-07-27" "A" ""           "BB" "MIT"
4 "2008-07-27" "B" "Harvard"    "AA" "UCLA"
4 "2008-07-27" "C" ""           "AA" "UCLA"
4 "2008-07-27" "D" ""           "AA" "UCLA"
4 "2008-07-27" "E" ""           "AA" "UCLA"
4 "2008-07-27" "F" "MIT"       "AA" "UCLA"
4 "2008-07-27" "G" ""           "AA" "UCLA"
4 ""           ""  ""           ""   ""    
end

levelsof stkcd, local(id)
foreach i in `id'{ 
levelsof university if stkcd==`i', local(uid`i')
levelsof graduate1 if stkcd==`i', local(gid`i')
local diff`i': list uid`i' - gid`i'
local present`i': list uid`i'- diff`i'
local p`i': subinstr local present`i' `" "' `", "', all
local n`i': word count `p`i''
}

foreach i in `id'{
if `n`i''>0{
local wanted "`wanted' `i'"
local wanted: subinstr local wanted " " ",", all
}
}
gen varwanted= inlist(stkcd, 0.1`wanted')
list, clean

The resulting output:

Code:

. list, clean

       stkcd       reptdt   name   university   invest~1   gradua~1   varwan~d  
  1.       2   2008-06-25      A                      BB        MIT          1  
  2.       2   2008-06-25      B      Harvard         AA       UCLA          1  
  3.       2   2008-06-25      C                      AA       UCLA          1  
  4.       2   2008-06-25      D                      AA       UCLA          1  
  5.       2   2008-06-25      E                      AA       UCLA          1  
  6.       2   2008-06-25      F          MIT         AA       UCLA          1  
  7.       2   2008-06-25      G         UCLA         AA       UCLA          1  
  8.       3   2008-07-27      A                      BB        MIT          0  
  9.       3   2008-07-27      B       Boston         AA       UCLA          0  
 10.       3   2008-07-27      C                      AA       UCLA          0  
 11.       3   2008-07-27      D                      AA       UCLA          0  
 12.       3   2008-07-27      E                      AA       UCLA          0  
 13.       3   2008-07-27      F   Vanderbilt         AA       UCLA          0  
 14.       3   2008-07-27      G                      AA       UCLA          0  
 15.       4   2008-07-27      A                      BB        MIT          1  
 16.       4   2008-07-27      B      Harvard         AA       UCLA          1  
 17.       4   2008-07-27      C                      AA       UCLA          1  
 18.       4   2008-07-27      D                      AA       UCLA          1  
 19.       4   2008-07-27      E                      AA       UCLA          1  
 20.       4   2008-07-27      F          MIT         AA       UCLA          1  
 21.       4   2008-07-27      G                      AA       UCLA          1  
 22.       4                                                                 1

Last edited by Andrew Musau; 03 Nov 2018, 07:31.

Comment

River Huang

Join Date: Mar 2016

Posts: 1908
#3

03 Nov 2018, 17:53

Dear Andrew, Thanks a lot for this helpful suggestion.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#4

04 Nov 2018, 03:23

A similar issue has been discussed in this thread. And again, I found that: for this kind of issue, -inlist-,while being an interesting choice, seems a little bit complicated, and ineffective with its limitation (in the number of values). Meanwhile, -expand- contributes an easier solution, without such limitation.

Code:

expand 2, gen(ex) replace graduate1 = university if ex bys stkcd graduate1 (ex): gen tag = (ex[1] != ex[_N]) if !missing(graduate1) by stkcd: egen wanted = max(tag) drop if ex drop ex tag
2 likes
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#5

04 Nov 2018, 17:02

Dear Romalpa, Thank you so much for this concise solution.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Vinh Ng

Join Date: Oct 2017
Posts: 37

21 Nov 2018, 23:15

Dear all,

I've got a quite similar question.

Based on the suggested code by Romalpa Akzo, I have got the following output:

Code:

clear
input int stkcd str10 reptdt str1 name str12 university str2 investor1 str12 graduate1 int wanted
2 "2008-06-25" "A" ""           "BB" "MIT" 1
2 "2008-06-25" "B" "Harvard"    "AA" "UCLA" 1
2 "2008-06-25" "C" ""           "AA" "UCLA" 1
2 "2008-06-25" "D" ""           "AA" "UCLA" 1
2 "2008-06-25" "E" ""           "AA" "UCLA" 1
2 "2008-06-25" "F" "MIT"        "AA" "UCLA" 1
2 "2008-06-25" "G" ""           "AA" "UCLA" 1 
3 "2008-07-27" "A" ""           "BB" "MIT" 0 
3 "2008-07-27" "B" "Boston"     "AA" "UCLA" 0 
3 "2008-07-27" "C" ""           "AA" "UCLA" 0 
3 "2008-07-27" "D" ""           "AA" "UCLA" 0
3 "2008-07-27" "E" ""           "AA" "UCLA" 0
3 "2008-07-27" "F" "Vanderbilt" "AA" "UCLA" 0
3 "2008-07-27" "G" ""           "AA" "UCLA" 0
end

I just wonder, in this example, instead of returning the value of 1 for all observations within the same stkcd if any element in graduate1 also appears in university, is there any way to return the value of 1 for only the particular observation in graduate1 that appears in university?

River Huang: I'm sorry for taking over your thread.

Thank you very much for your help.

Vinh

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10216

22 Nov 2018, 01:39

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int stkcd str10 reptdt str1 name str12 university str2 investor1 str12 graduate1
2 "2008-06-25" "A" ""           "BB" "MIT"
2 "2008-06-25" "B" "Harvard"    "DD" "Stanford"
2 "2008-06-25" "C" ""           "AA" "Boston"
2 "2008-06-25" "D" ""           "AA" "UCLA"
2 "2008-06-25" "E" ""           "CC" "UCSD"
2 "2008-06-25" "F" "MIT"        "EE" "Yale"
2 "2008-06-25" "G" "UCLA"       "AA" "UCLA"
3 "2008-07-27" "A" ""           "BB" "MIT"
3 "2008-07-27" "B" "Boston"     "AA" "UCLA"
3 "2008-07-27" "C" ""           "AA" "UCLA"
3 "2008-07-27" "D" ""           "AA" "UCLA"
3 "2008-07-27" "E" ""           "AA" "UCLA"
3 "2008-07-27" "F" "Vanderbilt" "AA" "UCLA"
3 "2008-07-27" "G" ""           "AA" "UCLA"
4 "2008-07-27" "A" ""           "BB" "MIT"
4 "2008-07-27" "B" "Harvard"    "AA" "UCLA"
4 "2008-07-27" "C" ""           "AA" "UCLA"
4 "2008-07-27" "D" ""           "AA" "UCLA"
4 "2008-07-27" "E" ""           "AA" "UCLA"
4 "2008-07-27" "F" "MIT"        "AA" "UCLA"
4 "2008-07-27" "G" ""           "AA" "UCLA"
end

expand 2, gen(ex)
replace graduate1 = university if ex
bys stkcd: gen tag= university==graduate1 & !missing(graduate1) & ex
bys stkcd graduate1: egen wanted= max(tag)
drop if ex
drop ex tag

Result:

Code:

. l, sepby(stkcd)

     +-----------------------------------------------------------------------+
     | stkcd       reptdt   name   university   invest~1   gradua~1   wanted |
     |-----------------------------------------------------------------------|
  1. |     2   2008-06-25      C                      AA     Boston        0 |
  2. |     2   2008-06-25      A                      BB        MIT        1 |
  3. |     2   2008-06-25      B      Harvard         DD   Stanford        0 |
  4. |     2   2008-06-25      G         UCLA         AA       UCLA        1 |
  5. |     2   2008-06-25      D                      AA       UCLA        1 |
  6. |     2   2008-06-25      E                      CC       UCSD        0 |
  7. |     2   2008-06-25      F          MIT         EE       Yale        0 |
     |-----------------------------------------------------------------------|
  8. |     3   2008-07-27      A                      BB        MIT        0 |
  9. |     3   2008-07-27      B       Boston         AA       UCLA        0 |
 10. |     3   2008-07-27      F   Vanderbilt         AA       UCLA        0 |
 11. |     3   2008-07-27      D                      AA       UCLA        0 |
 12. |     3   2008-07-27      G                      AA       UCLA        0 |
 13. |     3   2008-07-27      E                      AA       UCLA        0 |
 14. |     3   2008-07-27      C                      AA       UCLA        0 |
     |-----------------------------------------------------------------------|
 15. |     4   2008-07-27      A                      BB        MIT        1 |
 16. |     4   2008-07-27      G                      AA       UCLA        0 |
 17. |     4   2008-07-27      B      Harvard         AA       UCLA        0 |
 18. |     4   2008-07-27      C                      AA       UCLA        0 |
 19. |     4   2008-07-27      D                      AA       UCLA        0 |
 20. |     4   2008-07-27      E                      AA       UCLA        0 |
 21. |     4   2008-07-27      F          MIT         AA       UCLA        0 |
     +-----------------------------------------------------------------------+

.

Comment

Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#8

22 Nov 2018, 02:21

Andrew's code has captured what Vinh needs. I just have a small contribution: The line generating tag is not necessary. Instead, we could just do

Code:

bys stkcd graduate1: egen wanted = max(ex) if !missing(graduate1)

As it might be relevant, I would also note that indeed, the info (that Vinh needs) has been available in the code suggested at #4. Specifically, this wanted info happens to be found in the variable tag of that code.

Code:

expand 2, gen(ex) replace graduate1 = university if ex bys stkcd graduate1 (ex): gen tag = (ex[1] != ex[_N]) if !missing(graduate1) * or you can use the above mentioned blue line drop if ex drop ex
2 likes
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#9

22 Nov 2018, 03:20

Dear Vinh, It's fine with me. And hope your problem has been solved.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
1 like
Comment
Vinh Ng

Join Date: Oct 2017

Posts: 37
#10

22 Nov 2018, 04:40

Dear Andrew and Akzo,

Thank you very much for your help. That's exactly what I'd like to have.

Cheers,

Vinh
1 like
Comment

Announcement

identify peers

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment