Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating the "opposing" variable in a long dataset

    Hi all,

    I am using Stata 17/SE on Mac and I am having trouble generating a variable using another observation within a group.

    For context, I am working with tennis data. I have two rank variables, one a singles ranking (single_rank) and a doubles_ranking.
    The singles rank is reflective of the player's rank, while the doubles rank is the average of the team's double ranking: egen var = mean(var), by(i).

    i refers to match number, j is player number 1-2 is team 1 and 3-4 is team 2, Ranking_* variables refer to the original ranking data (MS = men's singles, MD = men's doubles, etc.).

    My question is, is there a way that I can generate the opposing player/team's ranking in this long dataset (I have this for the tournament seed variable, where t_ refers to the player and o_ refers to the opponent).

    Code:
    input float i byte(j team p_pos t_tourn_seed o_tourn_seed) int(Ranking_MS Ranking_MD Ranking_WS Ranking_WD) float(rank_single rank_dbls)
    367 1 1 1 4 . . .    .   .    .     .
    367 3 2 1 . 4 . . 1326   . 1326     .
    368 1 1 1 . 5 . . 1028 638 1028   638
    368 3 2 1 5 . . .  536 626  536   626
    369 1 1 1 . 3 . .    .   .    .     .
    369 3 2 1 3 . . .  484 587  484   587
    370 1 1 1 . 5 . .    .   .    .     .
    370 3 2 1 5 . . .  536 626  536   626
    371 1 1 1 . . . .    .   .    .     .
    371 3 2 1 . . . .    .   .    .     .
    372 1 1 1 4 . . .    .   .    .     .
    372 3 2 1 . 4 . .    .   .    .     .
    373 1 1 1 . . . .  692   .  692     .
    373 3 2 1 . . . .    .   .    .     .
    374 1 1 1 . 7 . . 1326   . 1326     .
    374 3 2 1 7 . . .  612 620  612   620
    375 1 1 1 4 2 . .    .   .    .     .
    375 3 2 1 2 4 . .  326 324  326   324
    376 1 1 1 . . . . 1326   . 1326     .
    376 3 2 1 . . . .  986   .  986     .
    377 1 1 1 . 3 . .  999   .  999     .
    377 3 2 1 3 . . .  484 587  484   587
    378 1 1 1 6 . . .  631 454  631   454
    378 3 2 1 . 6 . .  938   .  938     .
    379 1 1 1 1 4 . .  187 405  187   405
    379 3 2 1 4 1 . .    .   .    .     .
    380 1 1 1 . . . . 1168   . 1168     .
    380 3 2 1 . . . .  825 852  825   852
    381 1 1 1 1 5 . .  187 405  187   405
    381 3 2 1 5 1 . .  536 626  536   626
    382 1 1 1 . 3 . .    .   .    .     .
    382 3 2 1 3 . . .  484 587  484   587
    383 1 1 1 6 . . .  631 454  631   454
    383 3 2 1 . 6 . .  915 891  915   891
    384 1 1 1 1 . . .  187 405  187   405
    384 3 2 1 . 1 . .    .   .    .     .
    385 1 1 1 . . . .  692   .  692     .
    385 3 2 1 . . . .  999   .  999     .
    386 1 1 1 . . . .    .   .    .     .
    386 3 2 1 . . . . 1307   . 1307     .
    387 1 1 1 1 . . .  187 405  187   405
    387 3 2 1 . 1 . .  925   .  925     .
    388 1 1 1 . 2 . .    .   .    .     .
    388 3 2 1 2 . . .  326 324  326   324
    389 1 1 1 . 2 . . 1168   . 1168     .
    389 3 2 1 2 . . .  326 324  326   324
    390 1 1 1 . 2 . .  999   .  999     .
    390 3 2 1 2 . . .  326 324  326   324
    391 1 1 1 6 2 . .  631 454  631   454
    391 3 2 1 2 6 . .  326 324  326   324
    392 1 1 1 . . . . 1028 638 1028   638
    392 3 2 1 . . . .  752   .  752     .
    393 1 1 1 . 7 . . 1153   . 1153     .
    393 3 2 1 7 . . .  612 620  612   620
    394 1 1 1 . . . .    .   .    .     .
    394 3 2 1 . . . . 1119 744 1119   744
    395 1 1 1 . . . .  999   .  999     .
    395 3 2 1 . . . .    .   .    .     .
    396 1 1 1 4 . . .    .   .    .     .
    396 3 2 1 . 4 . .    .   .    .     .
    397 1 1 1 . . . . 1294   . 1294     .
    397 3 2 1 . . . .  938   .  938     .
    398 1 1 1 . . . .    .   .    .   907
    398 2 1 2 . . . . 1383 907 1383   907
    398 3 2 1 . . . . 1101 927 1101   927
    398 4 2 2 . . . .    .   .    .   927
    399 1 1 1 . . . .    .   .    .     .
    399 2 1 2 . . . .    .   .    .     .
    399 3 2 1 . . . .    .   .    .     .
    399 4 2 2 . . . .    .   .    .     .
    400 1 1 1 . 2 . .  915 891  915   891
    400 2 1 2 . 2 . . 1168   . 1168   891
    400 3 2 1 2 . . .    .   .    .     .
    400 4 2 2 2 . . .    .   .    .     .
    401 1 1 1 1 3 . .  187 405  187 364.5
    401 2 1 2 1 3 . .  326 324  326 364.5
    401 3 2 1 3 1 . .    .   .    .   620
    401 4 2 2 3 1 . .  612 620  612   620
    402 1 1 1 3 . . .    .   .    .   620
    402 2 1 2 3 . . .  612 620  612   620
    402 3 2 1 . 3 . .    .   .    .     .
    402 4 2 2 . 3 . .    .   .    .     .
    403 1 1 1 1 . . .  187 405  187 364.5
    403 2 1 2 1 . . .  326 324  326 364.5
    403 3 2 1 . 1 . .  536 626  536   626
    403 4 2 2 . 1 . .  752   .  752   626
    404 1 1 1 4 2 . . 1028 638 1028   638
    404 2 1 2 4 2 . .    .   .    .   638
    404 3 2 1 2 4 . .    .   .    .     .
    404 4 2 2 2 4 . .    .   .    .     .
    405 1 1 1 1 . . .  187 405  187 364.5
    405 2 1 2 1 . . .  326 324  326 364.5
    405 3 2 1 . 1 . .  999   .  999     .
    405 4 2 2 . 1 . .  986   .  986     .
    406 1 1 1 . . . .    .   .    .   454
    406 2 1 2 . . . .  631 454  631   454
    406 3 2 1 . . . .  536 626  536   626
    406 4 2 2 . . . .  752   .  752   626
    407 1 1 1 . 4 . . 1119 744 1119 665.5
    407 2 1 2 . 4 . .  484 587  484 665.5
    end
    Any help is appreciated!

  • #2
    The only variable in your example data that makes any mention of the opposing player is o_tourn_seed. But there are many, many observations having the same o_tourn_seed values. So presumably to identify which of these is the opposing player for a given observation, you need to make use of some of the other variables as well. Probably if I knew something about tournaments it would be obvious how to do this. But I don't. So, if you were going to do this by hand, for any given observation in your data set, how would you go about identifying which other observation corresponds to the opposing player?

    Comment


    • #3
      Hi Clyde,

      Yeah, that is correct (I have many more variables that use o_ and t_, these are just a subset of what I felt was relevant to the question). Tournament seed is referring to the "ranking" within a tournament (it's not an official ranking). It is also a variable that I had generated before converting to a long dataset to then merge the ranking data observations (e.g., Ranking_*).

      If I was going to do it by hand, it would be something along the lines of:
      - Sorting by i (match numbers)
      - If team = 1, use the rank variable of team 2 within that i
      - If team = 2, use the rank variable of team 1 within that i

      I'm just stuck on what kind of commands I can do to implement this, noting that I can't really use an egen because I don't have a specific function.

      Thanks,
      Nikita

      Comment


      • #4
        Hi all,

        I think I have worked it out for myself:
        Code:
        bysort i (team) : gen diff = rank_single - rank_single[_n-1]
        bysort i (team) : replace diff = rank_single - rank_single[_n+1] if missing(diff)
        gen o_rank_single = rank_single - diff
        Then will just need to correct the code for differences between singles and dbls matches
        Last edited by Nikita Ferguson; 28 Nov 2022, 23:37.

        Comment


        • #5
          I think the following code handles both singles and doubles correctly:
          Code:
          by i (j), sort: gen o_rank_dbls = rank_dbls[_N] if _n == 1
          by i (j): replace o_rank_dbls = rank_dbls[1] if _n == _N
          by i team (o_rank_dbls), sort: replace o_rank_dbls = o_rank_dbls[1]
          
          by i (j), sort: gen o_rank_single = rank_single[_N] if _n == 1 & _N == 2
          by i (j): replace o_rank_single = rank_single[1] if _n == _N & _N == 2
          My only reservation is the handling of o_rank_single in the doubles matches. I cannot tell if player one is opposed by player 3 or by player 4. Or maybe it doesn't even make sense to think of one player as opposed to one other player in a doubles match. I'm assuming the latter, and therefore I don't even calculate an o_rank_single in doubles matches.

          Note also that this code assumes (but does not verify) that in a doubles match, the rank_double for each team is the same for both of its players. That is true of the example data and it is also consistent with your description of how rank double was calculated. But if, for some reason, it is not true in your full data set, this code will give incorrect results.

          Comment


          • #6
            Hi Clyde,

            Thanks for this - I did some tweaking yesterday after I had posted this and surprisingly have pretty much the same code as you've posted - thanks again.


            Comment

            Working...
            X