Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to link data on friends with ego's data

    Dear all,

    I am working with a wonderful dataset, which includes both data on ego and on his/her top 5 friends. You are supposed to link data by using the ID number that is given in the 'bestfriend1', 'bestfriend2' etc variables. (Ego nominates 5 friends in class, most of them have also filled in the questionnaire)
    I tried to use merge to include data of friends in the main dataset, but it didn't work.. "not able to uniquely identify cases".
    There is probably an easy solution to this problem. Unfortunately I am struggling to find it.
    Could you maybe help me?

    Thanks a lot,

    Lara

  • #2
    Hello Lara,

    Please present a display of the data, as recommended in the FAQ.

    This is the best way to improve chances of getting the solution for your query.

    You may use the CODE delimiters or install the SSC dataex. Thanks,
    Best regards,

    Marcos

    Comment


    • #3
      Dear Marcos,

      I am sorry, hope this helps? (there is more information available ofcourse, but hopefully this gives an idea of the data)
      youth_id educ age friend1 friend_2
      566000 4 14 566012 566021
      566601 3 15 566000 576222
      Best,
      Lara

      Comment


      • #4
        That is a good start, Lara. But I failed to understand what you really wish. Also, I still don't know how to tell apart the "wonderful" data set and the other one.

        Shall you want to use merge, you could present a short display (or scheme) of the to-be-merged datasets.

        That being said, maybe you will need to - reshape long - first. Anyway, you do have a ID number and it can be (potentially) used so as to "link" (is this "merge", "append"?) them.
        Best regards,

        Marcos

        Comment


        • #5
          I am sorry for the confusion. I already merged the data on friends in the main dataset (so, now I have the variables friend1, friend2 etcetera merged into the main dataset. What I would like now is to "move" information I have on the friend (which I do, because I have their IDnumber) to ego. So, in the end I would like to have something like this:
          youth_id educ age friend1 friend2 educ_friend1 age_friend1 educ_friend2 age_friend2
          566000
          4 14 566012 566021 3 15
          566012 3 15 566000 576222
          Because person 566000 said that person 566012 was his friend, the data of this friend (566012) is used to generate the variables educ_friend1 and age_friend1.
          I only use the friends nominated by ego (so at this point I do not care about reciprocity of the nomination)
          Hopefully this makes it clearer? Thanks in advance!

          Best,
          Lara

          Comment


          • #6
            So this is basically a -reshape- operation. There are two complications. The first is that the variable names are not particularly friendly to that command. The other is that you need to carry into the new observations information about who they were linked to in the original data. The following code, I believe, will work:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float(youth_id educ age friend1 friend2 educ_friend1 age_friend1 educ_friend2 age_friend2)
            566000 4 14 566012 566021 3 15 2 16
            123456 2 17 121011 143719 1 18 3 17
            end
            
            //    SIMPLIFY VARIABLE NAMES FOR RESHAPING
            rename youth_id id0
            rename friend* id*
            rename *_friend# *#
            rename (educ age) =0
            
            //    RECORD THE ASSOCIATION OF THE THREE IDS
            egen triad = concat(id0 id1 id2), punct(" ")
            
            //    RESHAPE LONG
            gen long obs_no = _n
            reshape long id educ age, i(obs_no) j(_j)
            
            //    NOW CREATE VARIABLES LINKING EACH PERSON TO THE OTHER TWO IN THE TRIAD
            replace triad = trim(itrim(subinstr(triad, string(id), "", .)))
            split triad, gen(friend) destring
            drop triad _j
            So the first obstacle is overcome with some -rename- commands. The second is dealt with by creating a variable, triad, that includes all three id's from the original observations. Then after the -reshape, we just delete from triad the self-id, clean it up, and split it into two parts.

            Added: In the future, please use the -dataex- command to show example data, as I have done here. The kind of HTML table you used is extremely difficult to import into Stata. It took me longer to create a data set to test my code on than to write and test the code. You can install the -dataex- command by running -ssc install dataex-. Then run -help dataex- to read the simple instructions for using it. Please help those who want to help you, by always using -dataex- to show example data.
            Last edited by Clyde Schechter; 23 Feb 2017, 13:22.

            Comment


            • #7
              I had a somewhat different interpretation of the problem, maybe I have it wrong, but it seems like the data for each observation include an id, education, age, and ids of friends. Something like this:
              Code:
              clear
              input float(youth_id educ age friend1 friend2 educ_friend1 age_friend1 educ_friend2 age_friend2)
              566000 4 14 566012 576222 . . . .
              566012 3 15 566000 576222 . . . .
              576222 3 15 566000 576002 . . . .
              end
              and Lara is interested in populating educ_friend1 age_friend1 educ_friend2 etc...

              If this is the case, the code below should work. Although it seems like there should be some more elegant or efficient solution (which I would definitely be interested in seeing if anyone knows it)

              Code:
              // get all values of youth_id
              qui levelsof youth_id, local(youths)
              
              // foreach friend number (change 2 to however many you have in the data)
              forvalues f_num = 1/2 {
                  
                  // for each youth we keep their data on educ and age and then merge with
                  // whatever friend number we are on from above
                  foreach y_id in `youths' {
                      
                      // save the state of the data
                      preserve
                      
                      // keep only the youth with the specific id and rename the information
                      // for the data with the new names they will be matched to
                      keep if youth_id == `y_id'
                      keep youth_id educ age
                      rename (youth_id educ age) (friend`f_num' educ_friend`f_num' age_friend`f_num')
                      
                      // save a temporary file
                      tempfile temp_`f_num'_`y_id'
                      qui save `temp_`f_num'_`y_id''
                      
                      // get saved state of data back
                      restore
                      
                      // merge with the temporary file
                      qui merge m:1 friend`f_num' using `temp_`f_num'_`y_id'', update
                      drop _merge
                      
                  }
              }
              // because we update with merge, if someone is not a friend of anyone, an
              // observation will be created with no youth_id, so we can discard those
              drop if youth_id == .

              Comment


              • #8
                So, if the problem is as Eric thinks, and not as I understood it, I think the solution could be a little simpler:

                Code:
                clear
                input float(youth_id educ age friend1 friend2 educ_friend1 age_friend1 educ_friend2 age_friend2)
                566000 4 14 566012 576222 . . . .
                566012 3 15 566000 576222 . . . .
                576222 3 15 566000 576002 . . . .
                end
                
                //    GET RID OF VARIABLES WITH ALL MISSING VALUES
                drop *_friend1 *_friend2
                
                //    CREATE A FILE OF IDS, EDUC, AND AGE FOR EVERYONE
                preserve
                keep youth_id educ age
                rename youth_id link
                tempfile everyone
                isid link, sort
                save `everyone'
                
                restore
                //    RENAME EDUC & AGE TO AVOID NAME CONFLICTS
                rename educ educ0
                rename age age0
                
                //    NOW MERGE IN THE FRIENDS' DATA, ONE AT A TIME
                forvalues i = 1/2 {
                    gen link = friend`i'
                    merge m:1 link using `everyone', keep(match master) nogenerate
                    rename educ friend`i'_educ
                    rename age friend`i'_age
                    drop link
                }
                rename *0 *
                Last edited by Clyde Schechter; 23 Feb 2017, 14:21.

                Comment


                • #9
                  Here's another implementation of Clyde's approach. I think it's better to use clonevar when creating a copy of the friend's identifier to avoid variable type problems.

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input float(youth_id educ age friend1 friend2)
                  566000 4 14 566012 576222
                  566012 3 15 566000 576222
                  576222 5 16 566000 576002
                  end
                  
                  preserve
                  keep youth_id educ age
                  rename (youth_id educ age) (youth_id0 educ0 age0)
                  save "match2use.dta", replace
                  restore
                  
                  foreach v of varlist friend1 friend2 {
                      clonevar youth_id0 = `v'
                      merge m:1 youth_id0 using "match2use.dta", keep(master match) nogen
                      rename (educ0 age0) (educ_`v' age_`v')
                      drop youth_id0
                  }
                  isid youth_id, sort
                  list
                  Note that rangestat (from SSC) can also be used to look-up values based on identifiers if they are of numeric type. It goes something like:

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input float(youth_id educ age friend1 friend2)
                  566000 4 14 566012 576222
                  566012 3 15 566000 576222
                  576222 5 16 566000 576002
                  end
                  
                  isid youth_id, sort
                  rangestat (min) educ_f1=educ age_f1=age, interval(youth_id friend1 friend1)
                  rangestat (min) educ_f2=educ age_f2=age, interval(youth_id friend2 friend2)
                  list

                  Comment

                  Working...
                  X