Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Peers of peers dataset

    Dear all,
    using an individuals nested in schools dataset, I am trying to create a dataset with the peers or peers which meet the following condition: the primary school (ks2) peers’ of the secondary school (ks4) peers, who attended a different primary school than that of the individual of interest.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id ks2perf) str1(ks4_school_id ks2_school_id)
    1 14 "a" "b"
    2 11 "a" "b"
    3  9 "a" "c"
    4 17 "a" "c"
    5 22 "a" "c"
    6  1 "d" "c"
    7 18 "d" "b"
    end
    In this toy example case 6 is a relevant peer of peer for the peers 3, 4, 5, which in turn are the relevant peers for individuals 1 and 2.
    How do I create an extra column that indicates if the peers of peers condition is met? I think this requires creating additional datasets with different variables names and then merging them, but my attempts do not seem to work.
    Any help is much appreciated, Nic
    Last edited by Nicola Pensiero; 28 Oct 2023, 10:45.

  • #2
    No, this cannot be done with -merge-'s. You need to pair up many with many, and -merge- cannot properly do that. (There is a -merge m:m-, but it does not do what is needed here; in fact it does not do anything you will ever want to do. Forget I mentioned it.) You need -joinby-. You also have to do some dancing with variable names to make this work. I believe the following gives what you want:

    Code:
    preserve
    rename id id2
    keep id *_school_id
    tempfile schools
    save `schools'
    
    restore
    joinby ks4_school_id using `schools'
    drop if id == id2
    preserve
    
    use `schools', clear
    rename id2 id3
    rename ks4_school_id ks4_school_orig
    save `schools', replace
    
    restore
    joinby ks2_school_id using `schools'
    drop if inlist(id3, id, id2)| ks4_school_id == ks4_school_orig
    
    
    keep id id3
    rename id3 peer_of_peer_id
    duplicates drop
    If you are interested, you can stop this code before the -keep id id3- campaign and you can see the "path" through the peer "network" that establishes each of these peer-of-peer relationship. Many of these peers-of-peers achieve that status through multiple paths, hence the need for the -duplicates drop-. If you wish to bring back the original information about which schools each id attended and their value of k2sperf, you can now do that by -merge 1:1 id- using the original data.

    Comment


    • #3
      thanks, indeed I was running in circles with -merge-. I did not know about - joinby - super useful. Thanks for providing the code Clyde. Best, Nic

      Comment


      • #4
        With relatively large datasets with thousands of individuals, the peers' databases produced with -joinby- become quickly huge. I have used to -collapse- to generate (hopefully) the same datasets.
        Code:
        use "xxx\cohort.dta", clear
        
        rename ks2perf ks2perf_peers
        rename ks2_school_id ks2_school_id_peers
        collapse (mean) ks2perf_peers, by(ks4_school_id ks2_school_id_peers)
        save "xxx\peers.dta", replace
        
        use "xxx\cohort.dta", clear
        rename ks4_school_id ks4_school_id_peersofpeers
        rename ks2perf ks2perf_peersofpeers
        collapse (mean) ks2perf_peersofpeers, by(ks4_school_id_peersofpeers ks2_school_id)
        
        save "xxx\peersofpeers.dta", replace
        
        
        use "xxx\cohort.dta", clear
        joinby ks4_school_id using "xxx\peers.dta"
        
        drop if ks2_school_id_peers==ks2_school_id
        
        joinby ks2_school_id using "xxx\peersofpeers.dta"
        
        
        g peersofpeers=1 if ks4_school_id!=ks4_school_id_peersofpeers
        
        keep if peersofpeers==1

        Comment

        Working...
        X