How to link data on friends with ego's data

Lara Rosa

Join Date: Dec 2016

Posts: 6
#1

How to link data on friends with ego's data

23 Feb 2017, 06:00

Dear all,

I am working with a wonderful dataset, which includes both data on ego and on his/her top 5 friends. You are supposed to link data by using the ID number that is given in the 'bestfriend1', 'bestfriend2' etc variables. (Ego nominates 5 friends in class, most of them have also filled in the questionnaire)
I tried to use merge to include data of friends in the main dataset, but it didn't work.. "not able to uniquely identify cases".
There is probably an easy solution to this problem. Unfortunately I am struggling to find it.
Could you maybe help me?

Thanks a lot,

Lara
Tags: merge, network data
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

23 Feb 2017, 06:26

Hello Lara,

Please present a display of the data, as recommended in the FAQ.

This is the best way to improve chances of getting the solution for your query.

You may use the CODE delimiters or install the SSC dataex. Thanks,

Best regards,

Marcos
Comment
Lara Rosa

Join Date: Dec 2016

Posts: 6
#3

23 Feb 2017, 06:32

Dear Marcos,

I am sorry, hope this helps? (there is more information available ofcourse, but hopefully this gives an idea of the data)

youth_id educ age friend1 friend_2

566000 4 14 566012 566021

566601 3 15 566000 576222

Best,
Lara
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

23 Feb 2017, 08:49

That is a good start, Lara. But I failed to understand what you really wish. Also, I still don't know how to tell apart the "wonderful" data set and the other one.

Shall you want to use merge, you could present a short display (or scheme) of the to-be-merged datasets.

That being said, maybe you will need to - reshape long - first. Anyway, you do have a ID number and it can be (potentially) used so as to "link" (is this "merge", "append"?) them.

Best regards,

Marcos
Comment
Lara Rosa

Join Date: Dec 2016

Posts: 6
#5

23 Feb 2017, 12:08

I am sorry for the confusion. I already merged the data on friends in the main dataset (so, now I have the variables friend1, friend2 etcetera merged into the main dataset. What I would like now is to "move" information I have on the friend (which I do, because I have their IDnumber) to ego. So, in the end I would like to have something like this:
youth_id educ age friend1 friend2 educ_friend1 age_friend1 educ_friend2 age_friend2

566000

4 14 566012 566021 3 15

566012 3 15 566000 576222

Because person 566000 said that person 566012 was his friend, the data of this friend (566012) is used to generate the variables educ_friend1 and age_friend1.
I only use the friends nominated by ego (so at this point I do not care about reciprocity of the nomination)
Hopefully this makes it clearer? Thanks in advance!

Best,
Lara
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#6

23 Feb 2017, 13:19

So this is basically a -reshape- operation. There are two complications. The first is that the variable names are not particularly friendly to that command. The other is that you need to carry into the new observations information about who they were linked to in the original data. The following code, I believe, will work:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(youth_id educ age friend1 friend2 educ_friend1 age_friend1 educ_friend2 age_friend2) 566000 4 14 566012 566021 3 15 2 16 123456 2 17 121011 143719 1 18 3 17 end // SIMPLIFY VARIABLE NAMES FOR RESHAPING rename youth_id id0 rename friend* id* rename *_friend# *# rename (educ age) =0 // RECORD THE ASSOCIATION OF THE THREE IDS egen triad = concat(id0 id1 id2), punct(" ") // RESHAPE LONG gen long obs_no = _n reshape long id educ age, i(obs_no) j(_j) // NOW CREATE VARIABLES LINKING EACH PERSON TO THE OTHER TWO IN THE TRIAD replace triad = trim(itrim(subinstr(triad, string(id), "", .))) split triad, gen(friend) destring drop triad _j

So the first obstacle is overcome with some -rename- commands. The second is dealt with by creating a variable, triad, that includes all three id's from the original observations. Then after the -reshape, we just delete from triad the self-id, clean it up, and split it into two parts.

Added: In the future, please use the -dataex- command to show example data, as I have done here. The kind of HTML table you used is extremely difficult to import into Stata. It took me longer to create a data set to test my code on than to write and test the code. You can install the -dataex- command by running -ssc install dataex-. Then run -help dataex- to read the simple instructions for using it. Please help those who want to help you, by always using -dataex- to show example data.

Last edited by Clyde Schechter; 23 Feb 2017, 13:22.
Comment

Eric Haavind-Berman

Join Date: Aug 2015
Posts: 29

23 Feb 2017, 13:47

I had a somewhat different interpretation of the problem, maybe I have it wrong, but it seems like the data for each observation include an id, education, age, and ids of friends. Something like this:

Code:

clear
input float(youth_id educ age friend1 friend2 educ_friend1 age_friend1 educ_friend2 age_friend2)
566000 4 14 566012 576222 . . . .
566012 3 15 566000 576222 . . . .
576222 3 15 566000 576002 . . . .
end

and Lara is interested in populating educ_friend1 age_friend1 educ_friend2 etc...

If this is the case, the code below should work. Although it seems like there should be some more elegant or efficient solution (which I would definitely be interested in seeing if anyone knows it)

Code:

// get all values of youth_id
qui levelsof youth_id, local(youths)

// foreach friend number (change 2 to however many you have in the data)
forvalues f_num = 1/2 {
    
    // for each youth we keep their data on educ and age and then merge with
    // whatever friend number we are on from above
    foreach y_id in `youths' {
        
        // save the state of the data
        preserve
        
        // keep only the youth with the specific id and rename the information
        // for the data with the new names they will be matched to
        keep if youth_id == `y_id'
        keep youth_id educ age
        rename (youth_id educ age) (friend`f_num' educ_friend`f_num' age_friend`f_num')
        
        // save a temporary file
        tempfile temp_`f_num'_`y_id'
        qui save `temp_`f_num'_`y_id''
        
        // get saved state of data back
        restore
        
        // merge with the temporary file
        qui merge m:1 friend`f_num' using `temp_`f_num'_`y_id'', update
        drop _merge
        
    }
}
// because we update with merge, if someone is not a friend of anyone, an
// observation will be created with no youth_id, so we can discard those
drop if youth_id == .

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30089

23 Feb 2017, 14:18

So, if the problem is as Eric thinks, and not as I understood it, I think the solution could be a little simpler:

Code:

clear
input float(youth_id educ age friend1 friend2 educ_friend1 age_friend1 educ_friend2 age_friend2)
566000 4 14 566012 576222 . . . .
566012 3 15 566000 576222 . . . .
576222 3 15 566000 576002 . . . .
end

//    GET RID OF VARIABLES WITH ALL MISSING VALUES
drop *_friend1 *_friend2

//    CREATE A FILE OF IDS, EDUC, AND AGE FOR EVERYONE
preserve
keep youth_id educ age
rename youth_id link
tempfile everyone
isid link, sort
save `everyone'

restore
//    RENAME EDUC & AGE TO AVOID NAME CONFLICTS
rename educ educ0
rename age age0

//    NOW MERGE IN THE FRIENDS' DATA, ONE AT A TIME
forvalues i = 1/2 {
    gen link = friend`i'
    merge m:1 link using `everyone', keep(match master) nogenerate
    rename educ friend`i'_educ
    rename age friend`i'_age
    drop link
}
rename *0 *

Last edited by Clyde Schechter; 23 Feb 2017, 14:21.

Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

23 Feb 2017, 14:50

Here's another implementation of Clyde's approach. I think it's better to use clonevar when creating a copy of the friend's identifier to avoid variable type problems.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(youth_id educ age friend1 friend2)
566000 4 14 566012 576222
566012 3 15 566000 576222
576222 5 16 566000 576002
end

preserve
keep youth_id educ age
rename (youth_id educ age) (youth_id0 educ0 age0)
save "match2use.dta", replace
restore

foreach v of varlist friend1 friend2 {
    clonevar youth_id0 = `v'
    merge m:1 youth_id0 using "match2use.dta", keep(master match) nogen
    rename (educ0 age0) (educ_`v' age_`v')
    drop youth_id0
}
isid youth_id, sort
list

Note that rangestat (from SSC) can also be used to look-up values based on identifiers if they are of numeric type. It goes something like:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(youth_id educ age friend1 friend2)
566000 4 14 566012 576222
566012 3 15 566000 576222
576222 5 16 566000 576002
end

isid youth_id, sort
rangestat (min) educ_f1=educ age_f1=age, interval(youth_id friend1 friend1)
rangestat (min) educ_f2=educ age_f2=age, interval(youth_id friend2 friend2)
list

Announcement