Hi everyone,
Sorry this is going to be a long post
I am working on Stata 14. I am using the data from the US TV show The Survivor. I have about 30 000 observations of all seasons. Basically, participants must eliminate each other. So at each round of each season, one participant face the other participants in a council and can vote against one of them (generally only one, very rarely more than one). I would like to build a variable which measure the proximity between each participants. Ideally, I would like this measure to be the % of vote taken against the same person (the more you vote against the same person, the more likely it is you are friends).
For that, I need two variables:
Here are my different variables:
« FE_season »: This corresponds to the number of the Season (1, 2, 3, etc.)
« Round »: This refers to the voting round number, for example, 1, 2, 3, etc.
« indiv »: These are the identifiers of all the players who are allowed to vote in the voting round. For example, 100, 101, 102, etc.
« indiv_voted »: These are the identifiers of all the players against whom a voter can vote in the particular voting round. For example, 100, 101, 102, etc.
« Choice »: This variable defines the panel of players against whom a voter can vote in the voting round.
« Vote »: This is a binary indicator (0 or 1) that indicates against which player the individual voted in the voting round. If the value is 1, it indicates that the individual voted against the player corresponding to indiv_voted, otherwise, they did not vote against that player.
« couples_vote »: A variable which concatenate indiv and indiv_voted, such as : 1-10, 1-2, 1-35 etc.
Here is my question : How to compute number_similar_vote_per_round ?
This is particularly difficult because there are many different variables to take into account, on different rows/columns.
Ideally, I would like to have :
For example: if on the same row we have: indiv==1 and indiv_voted==328. Then we need to check if when indiv==1 and indiv==328, Vote==1 for the same indiv_voted. If yes, then create a new variable that is equal to 1. If not, then write a value of 0 in this variable.
I know it might be complex, thank you so much for everyone who will take the time to read and answer.
Best,
Antoine
Sorry this is going to be a long post

I am working on Stata 14. I am using the data from the US TV show The Survivor. I have about 30 000 observations of all seasons. Basically, participants must eliminate each other. So at each round of each season, one participant face the other participants in a council and can vote against one of them (generally only one, very rarely more than one). I would like to build a variable which measure the proximity between each participants. Ideally, I would like this measure to be the % of vote taken against the same person (the more you vote against the same person, the more likely it is you are friends).
For that, I need two variables:
- “couples_pair_per_round”: Cumulative number of rounds where two participants can vote together (done successfully)
- “number_similar_vote_per_round”: The cumulative number of rounds where two participants vote against the same person (this is where I struggle)
Here are my different variables:
« FE_season »: This corresponds to the number of the Season (1, 2, 3, etc.)
« Round »: This refers to the voting round number, for example, 1, 2, 3, etc.
« indiv »: These are the identifiers of all the players who are allowed to vote in the voting round. For example, 100, 101, 102, etc.
« indiv_voted »: These are the identifiers of all the players against whom a voter can vote in the particular voting round. For example, 100, 101, 102, etc.
« Choice »: This variable defines the panel of players against whom a voter can vote in the voting round.
« Vote »: This is a binary indicator (0 or 1) that indicates against which player the individual voted in the voting round. If the value is 1, it indicates that the individual voted against the player corresponding to indiv_voted, otherwise, they did not vote against that player.
« couples_vote »: A variable which concatenate indiv and indiv_voted, such as : 1-10, 1-2, 1-35 etc.
Here is my question : How to compute number_similar_vote_per_round ?
This is particularly difficult because there are many different variables to take into account, on different rows/columns.
Ideally, I would like to have :
- A command that checks for each row who are the voter (indiv) and the voted (indiv_voted).
- A command that checks for each of them (when they are both reported in indiv), they have voted for the same person in the same round of the same season.
- If yes, then it reports 1 in the variable « number_similar_vote_per_round », otherwise 0.
For example: if on the same row we have: indiv==1 and indiv_voted==328. Then we need to check if when indiv==1 and indiv==328, Vote==1 for the same indiv_voted. If yes, then create a new variable that is equal to 1. If not, then write a value of 0 in this variable.
I know it might be complex, thank you so much for everyone who will take the time to read and answer.
Best,
Antoine
Comment