Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Indentifying social interactions through multiple activities

    Hi everyone,

    I have a dataset with individuals within different universities and distributed in five degrees: Biology, Economics, Physics, Psychology and Sociology. Each individual has the possibility to participate in roughly 30 activities which are at university level, from different sports to entrepreurship or volunteering and different arts. The main idea is that an individual who studies Biology can be in the same basketball team, for example, that an individual who studies Sociology, within the same university. I would like to know which are the indirect connections of each individual outside his bachelor's degree, that is, the mates of my bachelor's degree mates who are not my mates. In order to achieve this, I need to match the information of the activities in which I don't participate, but my mates do, and then I substract the information about the other activity mates who study other bachelor's degrees. I provide an example to illustrate this case:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id int university_code str10 degree str16 activity double activity_code float female
     1 9 "biology"    "volleyball"       26 1
     1 9 "biology"    "baseball"         17 1
     1 9 "biology"    "basketball"       18 1
     2 9 "biology"    "chorus"           15 0
     3 9 "biology"    "volleyball"       26 1
     3 9 "biology"    "baseball"         17 1
     4 9 "biology"    "soccer"           22 0
     4 9 "biology"    "football"         20 0
     4 9 "biology"    "basketball"       18 0
     4 9 "biology"    "track"            25 0
     5 9 "biology"    "math"             11 1
     5 9 "biology"    "baseball"         17 1
     6 9 "economics"  "baseball"         17 1
     6 9 "economics"  "basketball"       18 1
     6 9 "economics"  "track"            25 1
     6 9 "economics"  "volleyball"       26 1
     7 9 "economics"  "football"         20 0
     7 9 "economics"  "basketball"       18 0
     7 9 "economics"  "soccer"           22 0
     8 9 "economics"  "drama"             8 1
     8 9 "economics"  "entrepreneurship" 14 1
     8 9 "economics"  "basketball"       18 1
     8 9 "economics"  "track"            25 1
     9 9 "economics"  "volleyball"       26 1
    10 9 "economics"  "math"             11 1
    10 9 "economics"  "band"             13 1
    10 9 "economics"  "volunteering"     30 1
    11 9 "physics"    "baseball"         17 0
    11 9 "physics"    "basketball"       18 0
    12 9 "physics"    "baseball"         17 0
    13 9 "physics"    "basketball"       18 1
    13 9 "physics"    "volleyball"       26 1
    13 9 "physics"    "baseball"         17 1
    14 9 "physics"    "volleyball"       26 1
    14 9 "physics"    "spanish"           4 1
    14 9 "physics"    "chess"            31 1
    15 9 "physics"    "computer"          6 1
    15 9 "physics"    "volleyball"       26 1
    15 9 "physics"    "basketball"       18 1
    15 9 "physics"    "soccer"           22 1
    15 9 "physics"    "baseball"         17 1
    16 9 "physics"    "entrepreneurship" 14 1
    17 9 "physics"    "chorus"           15 1
    17 9 "physics"    "chess"            31 1
    17 9 "physics"    "french"            1 1
    18 9 "physics"    "baseball"         17 0
    18 9 "physics"    "football"         20 0
    18 9 "physics"    "chess"            31 0
    18 9 "physics"    "basketball"       18 0
    19 9 "physics"    "volleyball"       26 1
    19 9 "physics"    "basketball"       18 1
    19 9 "physics"    "digitallearning"  29 1
    19 9 "physics"    "chess"            31 1
    19 9 "physics"    "baseball"         17 1
    20 9 "physics"    "baseball"         17 0
    21 9 "psychology" "chess"            31 1
    21 9 "psychology" "entrepreneurship" 14 1
    22 9 "psychology" "swimming"         23 1
    22 9 "psychology" "volleyball"       26 1
    22 9 "psychology" "entrepreneurship" 14 1
    22 9 "psychology" "baseball"         17 1
    23 9 "psychology" "football"         20 0
    23 9 "psychology" "wrestling"        27 0
    23 9 "psychology" "baseball"         17 0
    24 9 "psychology" "baseball"         17 1
    24 9 "psychology" "drama"             8 1
    24 9 "psychology" "band"             13 1
    24 9 "psychology" "volleyball"       26 1
    25 9 "psychology" "track"            25 0
    25 9 "psychology" "football"         20 0
    25 9 "psychology" "wrestling"        27 0
    25 9 "psychology" "basketball"       18 0
    26 9 "sociology"  "volleyball"       26 1
    26 9 "sociology"  "entrepreneurship" 14 1
    27 9 "sociology"  "track"            25 1
    27 9 "sociology"  "volleyball"       26 1
    27 9 "sociology"  "math"             11 1
    28 9 "sociology"  "baseball"         17 1
    28 9 "sociology"  "volleyball"       26 1
    29 9 "sociology"  "math"             11 1
    30 9 "sociology"  "football"         20 0
    30 9 "sociology"  "baseball"         17 0
    30 9 "sociology"  "basketball"       18 0
    30 9 "sociology"  "soccer"           22 0
    31 9 "sociology"  "entrepreneurship" 14 1
    31 9 "sociology"  "baseball"         17 1
    31 9 "sociology"  "volleyball"       26 1
    32 9 "sociology"  "track"            25 0
    32 9 "sociology"  "volunteering"     30 0
    32 9 "sociology"  "baseball"         17 0
    32 9 "sociology"  "fieldhockey"      19 0
    32 9 "sociology"  "football"         20 0
    32 9 "sociology"  "newspaper"        28 0
    32 9 "sociology"  "basketball"       18 0
    32 9 "sociology"  "tennis"           24 0
    32 9 "sociology"  "swimming"         23 0
    32 9 "sociology"  "entrepreneurship" 14 0
    32 9 "sociology"  "debate"            7 0
    32 9 "sociology"  "french"            1 0
    32 9 "sociology"  "icehockey"        21 0
    end

    I have individuals' ids, the code of the university (9 in that case), the string variable with the bachelor's degree name, the string variable with the name of every activity each individual takes part in, the code assigned to each activity, and a final variable with gender information, being female equals 1 and 0 otherwise.

    What I would like to know, but I am still unable to figure out how, is the average gender of individuals in the same university, outside my bachelor's degree, but with whom others in my bachelor's degree have a direct contact through participation in activities. For example, individual 1, who study Biology in university 9, participates in Volleyball, Baseball and Basketball teams simultaneously. The question I attempt to respond is: What are the average gender of those individuals who belong to different bachelor's degrees and are in contact with at least one of his mates, but not with him? So first, inside Biology, I look at what different activities mates of individual 1 are involved. They are Chorus, Soccer, Football, Track and Math. Outside Biology, the average gender in Chorus, Soccer, Football, Track and Math is, respectively: 1, 0.33, 0, 0.6, and 1. I would like to know this information for each individual.

    Any feedback and/or suggestions will be highly appreciated. Thanks a lot in advance for your time.

    Best regards,
    Daniel

  • #2
    This is fairly complicated. There may be a simpler way to do this than the code that I show below, but it has eluded me so far.
    Code:
    tempfile original
    save `original'
    keep id university_code degree activity
    rename (id activity) =_mate
    tempfile copy
    save `copy'
    
    use `original', clear
    contract university_code degree activity female
    rename degree degree_mate
    tempfile counts
    save `counts'
    
    
    use `original', clear
    keep id university_code degree activity
    joinby university_code degree using `copy'
    keep id university_code degree activity activity_mate
    duplicates drop
    rename activity activity_self
    gen `c(obs_t)' obs_no = _n
    reshape long activity, i(obs_no) j(who) string
    drop obs_no
    duplicates drop
    by id activity (who), sort: keep if _N == 1 & who == "_mate"
    drop who
    
    joinby university_code activity using `counts'
    by id university_code activity, sort: egen female_total ///
        = total(cond(female & degree != degree_mate, _freq, .))
    by id university_code activity: egen male_total ///
        = total(cond(!female & degree != degree_mate, _freq, .))
    gen avg_female = (female_total)/(female_total + male_total)
    keep id university_code activity avg_female
    duplicates drop

    Comment


    • #3
      Thank you very much, Clyde, for your quick and valuable response! It has helped me a lot.

      Thanks again and have a nice day.

      Best,
      Daniel

      Comment

      Working...
      X