Generating the "opposing" variable in a long dataset

Nikita Ferguson

Join Date: Sep 2022
Posts: 8

Generating the "opposing" variable in a long dataset

28 Nov 2022, 18:35

Hi all,

I am using Stata 17/SE on Mac and I am having trouble generating a variable using another observation within a group.

For context, I am working with tennis data. I have two rank variables, one a singles ranking (single_rank) and a doubles_ranking.
The singles rank is reflective of the player's rank, while the doubles rank is the average of the team's double ranking: egen var = mean(var), by(i).

i refers to match number, j is player number 1-2 is team 1 and 3-4 is team 2, Ranking_* variables refer to the original ranking data (MS = men's singles, MD = men's doubles, etc.).

My question is, is there a way that I can generate the opposing player/team's ranking in this long dataset (I have this for the tournament seed variable, where t_ refers to the player and o_ refers to the opponent).

Code:

input float i byte(j team p_pos t_tourn_seed o_tourn_seed) int(Ranking_MS Ranking_MD Ranking_WS Ranking_WD) float(rank_single rank_dbls)
367 1 1 1 4 . . .    .   .    .     .
367 3 2 1 . 4 . . 1326   . 1326     .
368 1 1 1 . 5 . . 1028 638 1028   638
368 3 2 1 5 . . .  536 626  536   626
369 1 1 1 . 3 . .    .   .    .     .
369 3 2 1 3 . . .  484 587  484   587
370 1 1 1 . 5 . .    .   .    .     .
370 3 2 1 5 . . .  536 626  536   626
371 1 1 1 . . . .    .   .    .     .
371 3 2 1 . . . .    .   .    .     .
372 1 1 1 4 . . .    .   .    .     .
372 3 2 1 . 4 . .    .   .    .     .
373 1 1 1 . . . .  692   .  692     .
373 3 2 1 . . . .    .   .    .     .
374 1 1 1 . 7 . . 1326   . 1326     .
374 3 2 1 7 . . .  612 620  612   620
375 1 1 1 4 2 . .    .   .    .     .
375 3 2 1 2 4 . .  326 324  326   324
376 1 1 1 . . . . 1326   . 1326     .
376 3 2 1 . . . .  986   .  986     .
377 1 1 1 . 3 . .  999   .  999     .
377 3 2 1 3 . . .  484 587  484   587
378 1 1 1 6 . . .  631 454  631   454
378 3 2 1 . 6 . .  938   .  938     .
379 1 1 1 1 4 . .  187 405  187   405
379 3 2 1 4 1 . .    .   .    .     .
380 1 1 1 . . . . 1168   . 1168     .
380 3 2 1 . . . .  825 852  825   852
381 1 1 1 1 5 . .  187 405  187   405
381 3 2 1 5 1 . .  536 626  536   626
382 1 1 1 . 3 . .    .   .    .     .
382 3 2 1 3 . . .  484 587  484   587
383 1 1 1 6 . . .  631 454  631   454
383 3 2 1 . 6 . .  915 891  915   891
384 1 1 1 1 . . .  187 405  187   405
384 3 2 1 . 1 . .    .   .    .     .
385 1 1 1 . . . .  692   .  692     .
385 3 2 1 . . . .  999   .  999     .
386 1 1 1 . . . .    .   .    .     .
386 3 2 1 . . . . 1307   . 1307     .
387 1 1 1 1 . . .  187 405  187   405
387 3 2 1 . 1 . .  925   .  925     .
388 1 1 1 . 2 . .    .   .    .     .
388 3 2 1 2 . . .  326 324  326   324
389 1 1 1 . 2 . . 1168   . 1168     .
389 3 2 1 2 . . .  326 324  326   324
390 1 1 1 . 2 . .  999   .  999     .
390 3 2 1 2 . . .  326 324  326   324
391 1 1 1 6 2 . .  631 454  631   454
391 3 2 1 2 6 . .  326 324  326   324
392 1 1 1 . . . . 1028 638 1028   638
392 3 2 1 . . . .  752   .  752     .
393 1 1 1 . 7 . . 1153   . 1153     .
393 3 2 1 7 . . .  612 620  612   620
394 1 1 1 . . . .    .   .    .     .
394 3 2 1 . . . . 1119 744 1119   744
395 1 1 1 . . . .  999   .  999     .
395 3 2 1 . . . .    .   .    .     .
396 1 1 1 4 . . .    .   .    .     .
396 3 2 1 . 4 . .    .   .    .     .
397 1 1 1 . . . . 1294   . 1294     .
397 3 2 1 . . . .  938   .  938     .
398 1 1 1 . . . .    .   .    .   907
398 2 1 2 . . . . 1383 907 1383   907
398 3 2 1 . . . . 1101 927 1101   927
398 4 2 2 . . . .    .   .    .   927
399 1 1 1 . . . .    .   .    .     .
399 2 1 2 . . . .    .   .    .     .
399 3 2 1 . . . .    .   .    .     .
399 4 2 2 . . . .    .   .    .     .
400 1 1 1 . 2 . .  915 891  915   891
400 2 1 2 . 2 . . 1168   . 1168   891
400 3 2 1 2 . . .    .   .    .     .
400 4 2 2 2 . . .    .   .    .     .
401 1 1 1 1 3 . .  187 405  187 364.5
401 2 1 2 1 3 . .  326 324  326 364.5
401 3 2 1 3 1 . .    .   .    .   620
401 4 2 2 3 1 . .  612 620  612   620
402 1 1 1 3 . . .    .   .    .   620
402 2 1 2 3 . . .  612 620  612   620
402 3 2 1 . 3 . .    .   .    .     .
402 4 2 2 . 3 . .    .   .    .     .
403 1 1 1 1 . . .  187 405  187 364.5
403 2 1 2 1 . . .  326 324  326 364.5
403 3 2 1 . 1 . .  536 626  536   626
403 4 2 2 . 1 . .  752   .  752   626
404 1 1 1 4 2 . . 1028 638 1028   638
404 2 1 2 4 2 . .    .   .    .   638
404 3 2 1 2 4 . .    .   .    .     .
404 4 2 2 2 4 . .    .   .    .     .
405 1 1 1 1 . . .  187 405  187 364.5
405 2 1 2 1 . . .  326 324  326 364.5
405 3 2 1 . 1 . .  999   .  999     .
405 4 2 2 . 1 . .  986   .  986     .
406 1 1 1 . . . .    .   .    .   454
406 2 1 2 . . . .  631 454  631   454
406 3 2 1 . . . .  536 626  536   626
406 4 2 2 . . . .  752   .  752   626
407 1 1 1 . 4 . . 1119 744 1119 665.5
407 2 1 2 . 4 . .  484 587  484 665.5
end

Any help is appreciated!

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

28 Nov 2022, 18:59

The only variable in your example data that makes any mention of the opposing player is o_tourn_seed. But there are many, many observations having the same o_tourn_seed values. So presumably to identify which of these is the opposing player for a given observation, you need to make use of some of the other variables as well. Probably if I knew something about tournaments it would be obvious how to do this. But I don't. So, if you were going to do this by hand, for any given observation in your data set, how would you go about identifying which other observation corresponds to the opposing player?
Comment
Nikita Ferguson

Join Date: Sep 2022

Posts: 8
#3

28 Nov 2022, 19:05

Hi Clyde,

Yeah, that is correct (I have many more variables that use o_ and t_, these are just a subset of what I felt was relevant to the question). Tournament seed is referring to the "ranking" within a tournament (it's not an official ranking). It is also a variable that I had generated before converting to a long dataset to then merge the ranking data observations (e.g., Ranking_*).

If I was going to do it by hand, it would be something along the lines of:
- Sorting by i (match numbers)
- If team = 1, use the rank variable of team 2 within that i
- If team = 2, use the rank variable of team 1 within that i

I'm just stuck on what kind of commands I can do to implement this, noting that I can't really use an egen because I don't have a specific function.

Thanks,
Nikita
Comment
Nikita Ferguson

Join Date: Sep 2022

Posts: 8
#4

28 Nov 2022, 23:23

Hi all,

I think I have worked it out for myself:

Code:

bysort i (team) : gen diff = rank_single - rank_single[_n-1] bysort i (team) : replace diff = rank_single - rank_single[_n+1] if missing(diff) gen o_rank_single = rank_single - diff

Then will just need to correct the code for differences between singles and dbls matches

Last edited by Nikita Ferguson; 28 Nov 2022, 23:37.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#5

29 Nov 2022, 06:35

I think the following code handles both singles and doubles correctly:

Code:

by i (j), sort: gen o_rank_dbls = rank_dbls[_N] if _n == 1 by i (j): replace o_rank_dbls = rank_dbls[1] if _n == _N by i team (o_rank_dbls), sort: replace o_rank_dbls = o_rank_dbls[1] by i (j), sort: gen o_rank_single = rank_single[_N] if _n == 1 & _N == 2 by i (j): replace o_rank_single = rank_single[1] if _n == _N & _N == 2

My only reservation is the handling of o_rank_single in the doubles matches. I cannot tell if player one is opposed by player 3 or by player 4. Or maybe it doesn't even make sense to think of one player as opposed to one other player in a doubles match. I'm assuming the latter, and therefore I don't even calculate an o_rank_single in doubles matches.

Note also that this code assumes (but does not verify) that in a doubles match, the rank_double for each team is the same for both of its players. That is true of the example data and it is also consistent with your description of how rank double was calculated. But if, for some reason, it is not true in your full data set, this code will give incorrect results.
Comment
Nikita Ferguson

Join Date: Sep 2022

Posts: 8
#6

29 Nov 2022, 14:06

Hi Clyde,

Thanks for this - I did some tweaking yesterday after I had posted this and surprisingly have pretty much the same code as you've posted - thanks again.
Comment

Announcement

Generating the "opposing" variable in a long dataset

Comment

Comment

Comment

Comment

Comment