Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting dyads

    Hi everyone,
    I have a problem here which I hope to find help for. I have a dataset of 3 vs 3 soccer matches, where each match pits teams of 3 players against another team of 3 players, who are all drawn from a pool of players. In each match, each team is assigned to either the HOME team or the AWAY team. I like to calculate for each player in each match,

    A) How many times the player has played with the other players in his team as team-mates before
    B) How many times the player has played with the other players in his team as opponents before
    C) How many times the player has played against the other players in his opponent team as team-mates before
    D) How many times the player has played against the other players in his opponent team as opponents before

    The sample dataset is reproduced here, where match is the ID for each match, player is the ID for each player, and home is if the team the player is in for the match is the home team or not.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double match long player float home
    1  2 0
    1  1 0
    1  3 0
    1  4 1
    1  5 1
    1  6 1
    2 11 0
    2  1 0
    2  5 0
    2  3 1
    2  2 1
    2 12 1
    3  5 0
    3  1 0
    3  6 0
    3  7 1
    3  9 1
    3 10 1
    4  1 0
    4  2 0
    4  8 0
    4  4 1
    4  5 1
    4  7 1
    5  9 0
    5 12 0
    5 11 0
    5  3 1
    5  1 1
    5  4 1
    end
    Any help to calculate this would be most welcome and appreciated. Thank you.


    Kenneth Zeng

  • #2
    A first step here is to make a data set of dyads, per below, which should help you. However, you refer to "before" but don't include a time variable, so I stopped here. Perhaps the match numbers are in time order?
    Code:
    // Example data presumed to be loaded
    compress  // everything is a byte
    // Questions about dyads require a data set of dyads.
    preserve
    rename (player home) =2
    tempfile temp
    save `temp'
    restore
    rename (player home) =1
    // make pairs
    joinby match using `temp'
    // Clean up and label
    label define homelbl 1 "home" 0 "away"
    label values home1 home2 homelbl
    drop if player1 == player2  // no self pairs
    // If both players are home or both are away, they must be on the same team, right?
    gen byte sameteam = (home1 == home2)
    label define teamlbl 1 "teammmate" 0 "opponent"
    label values sameteam teamlbl
    // This ordering helps me think when browsing
    order match player1 player2 sameteam home1 home2
    sort player1 player2 match
    browse





    Comment


    • #3
      Hi Mike, thanks much for helping. Yes the match numbers are in time order.

      I had tried previously with making a pure set of dyads. The issue though is that the full dataset has more than 150k unique players across more than a million matches. I am still trying to see what is the most computationally efficient way.

      Any other suggestions would be most appreciated, thanks much in advance.


      Kenneth

      Comment


      • #4
        The computational difficulty is a different question than what you posted to start with. Clearly, with something like 2e10 pairs, no solution that involves storing pairs in a dataset is likely to work very nicely, which to me would rule out a solution in Stata. I'd suggest that you post your problem, including your example data but explaining the size and therefore the computational complexity, in the Mata forum.

        Comment


        • #5
          Kenneth Zeng -

          I agree with Mike that the real question you have is the size of your problem, not the mechanics of attacking it.

          If you had previously unsuccessfully attempted a pure set of dyads, it was inconsiderate of you to fail to include those details. Why invite people to spend their time duplicating work you've already done?

          If you choose to post this in the Mata Forum, you should clearly explain your problem, the attempts you have made to solve it, and why those attempts did not work, and you should like back to this topic.

          Comment


          • #6
            Apologies to all, I was inconsiderate on this matter. I had hoped that there would be a faster computational method I had missed. Apologies once again for this debacle.

            Comment

            Working...
            X