Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting Occurrences in Observations

    Hi everyone,
    I have a problem here which I hope to find help for. I have a dataset of 3 vs 3 soccer matches, where each match pits teams of 3 players against another team of 3 players, who are all drawn from a pool of players. In each match, each team is assigned to either the HOME team or the AWAY team. I like to calculate for each player in each match,

    A) How many times the player has played with the other players in his team as team-mates before
    B) How many times the player has played with the other players in his team as opponents before
    C) How many times the player has played against the other players in his opponent team as team-mates before
    D) How many times the player has played against the other players in his opponent team as opponents before

    The sample dataset is reproduced here, where match is the ID for each match, player is the ID for each player, and home is if the team the player is in for the match is the home team or not.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double match long player float home
    1  2 0
    1  1 0
    1  3 0
    1  4 1
    1  5 1
    1  6 1
    2 11 0
    2  1 0
    2  5 0
    2  3 1
    2  2 1
    2 12 1
    3  5 0
    3  1 0
    3  6 0
    3  7 1
    3  9 1
    3 10 1
    4  1 0
    4  2 0
    4  8 0
    4  4 1
    4  5 1
    4  7 1
    5  9 0
    5 12 0
    5 11 0
    5  3 1
    5  1 1
    5  4 1
    end

    Previously I attempted to brute force this by calculating the number of total player-player dyads. But because I have more than 150k players and over 1 million matches, this turned out to be computationally impossible. I am looking for a more feasible method. Any help to calculate this would be most welcome and appreciated. Thank you.


    Kenneth Zeng

  • #2
    I should also mention that the match ID are ascending in time i.e. a match with match ID = 2 occurred after one with match ID = 1

    Thanks to all for help.

    Comment


    • #3
      You've posted in the sub-forum specializing in questions about Mata. I recommend reposting in the main "General" forum unless you can rephrase your question to be more specifically about Mata

      Comment


      • #4
        Stephen Jenkins, that Kenneth Zengposted here is "my fault." <grin> He originally posted on the General forum, but I suggested he try the Mata side because I learned that his problem is so large that the Stata solution I suggested to him (-joinby- to create a set of dyads) would yield a data set with something on the order of 2e10 observations. My thinking--perhaps wrong, of course--is that the only approach would be a Mata solution that processes sequentially the pairs formed from each player crossed with each other player, entailing processing/storing sequentially only 150e3 comparisons at a time. We had a long thread involving a different kind of N X N comparison in Mata that I started several months ago, and some suggestions for intelligently using Mata (by others than me!) made orders of magnitude differences in run time by avoiding loops. I hoped that someone might have similar ideas here.

        Comment


        • #5
          Sorry, Mike! I should have noticed, I suppose. (I was wrong-footed by the lack of Mata-type referencing in the post)

          Comment


          • #6
            A possible solution to question A.
            Regrettably, I mix index and id.
            But idea should be clear.

            Code:
            cls
            mata mata clear
            mata: // https://www.statalist.org/forums/forum/general-stata-discussion/mata/1310012-howto-to-get-all-pairs-from-a-vector
                transmorphic matrix do_pairs(colvector v) {
                    if ((R = rows(v)) > 1) {
                        out = J(0,2,missingof(v))
                        for(r=1;r<R;r++) out = out \ (J(R-r, 1, v[r]), v[r+1::R])
                        return(out)
                    }
                }
            end
            mata: // Generating testdata
                rseed(123)
                nmatches = 1e6
                nplayers = 15
                matches = J(nmatches, 6, .)
                for(r=1;r<=nmatches;r++) matches[|r,1 \ r,6|] = jumble(1::nplayers)[|1,1 \ 6,1|]'
                player = colshape(matches,1)
                match = (1::nmatches) # J(6,1,1)
                home = J(nmatches,1,1) # ((0,1)' # J(3,1,1))
                data = match, home, player
            end
            mata: // A) How many times the player has played with the other players in his team as team-mates before
                slct = home :== 0
                hm_data = select(data, slct)
                
                out = J(nplayers,nplayers,0)
                for(r=1;r<=rows(hm_data);r=r+3) {
                    hm_pairs = do_pairs(hm_data[|r,3 \ r+2,3|])
                    for(p=1;p<=rows(hm_pairs);p++) {
                        out[hm_pairs[p,1], hm_pairs[p,2]] = out[hm_pairs[p,1], hm_pairs[p,2]] + 1
                        out[hm_pairs[p,2], hm_pairs[p,1]] = out[hm_pairs[p,2], hm_pairs[p,1]] + 1
                    }
                }
                out
            end
            The answer:
            Code:
            :     out
            [symmetric]
                        1       2       3       4       5       6       7       8       9      10      11      12      13      14      15
                 +-------------------------------------------------------------------------------------------------------------------------+
               1 |      0                                                                                                                  |
               2 |  28330       0                                                                                                          |
               3 |  28243   28381       0                                                                                                  |
               4 |  28304   28414   28455       0                                                                                          |
               5 |  28318   28536   28578   28850       0                                                                                  |
               6 |  28420   28560   28703   28429   28521       0                                                                          |
               7 |  28417   28389   28783   28424   28567   28617       0                                                                  |
               8 |  28480   28588   28856   28561   28753   28340   28671       0                                                          |
               9 |  28470   28720   28350   28598   28587   28609   28591   28710       0                                                  |
              10 |  28507   28821   28438   28471   28529   28537   28921   28652   28695       0                                          |
              11 |  28548   28401   28814   28440   28851   28667   28732   28751   28831   28824       0                                  |
              12 |  28636   28424   28636   28563   28540   28496   28784   28553   28749   28750   28623       0                          |
              13 |  28373   28509   28482   28513   28499   28608   28290   28808   28494   28618   28622   28558       0                  |
              14 |  28559   28451   28726   28601   28733   28638   28607   28556   28505   28608   28677   28510   28600       0          |
              15 |  28549   28300   28609   28733   28704   28507   28467   28499   28761   28515   28541   28582   28660   28121       0  |
                 +-------------------------------------------------------------------------------------------------------------------------+
            Kind regards

            nhb

            Comment

            Working...
            X