Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop(s) on different rows and columns

    Hi everyone,

    Sorry this is going to be a long post

    I am working on Stata 14. I am using the data from the US TV show The Survivor. I have about 30 000 observations of all seasons. Basically, participants must eliminate each other. So at each round of each season, one participant face the other participants in a council and can vote against one of them (generally only one, very rarely more than one). I would like to build a variable which measure the proximity between each participants. Ideally, I would like this measure to be the % of vote taken against the same person (the more you vote against the same person, the more likely it is you are friends).

    For that, I need two variables:
    • couples_pair_per_round”: Cumulative number of rounds where two participants can vote together (done successfully)
    • number_similar_vote_per_round”: The cumulative number of rounds where two participants vote against the same person (this is where I struggle)
    I was planning on dividing couples_pair_per_round / number_similar_vote_per_round to obtain the variable “proximity”.

    Here are my different variables:
    « FE_season »: This corresponds to the number of the Season (1, 2, 3, etc.)
    « Round »: This refers to the voting round number, for example, 1, 2, 3, etc.
    « indiv »: These are the identifiers of all the players who are allowed to vote in the voting round. For example, 100, 101, 102, etc.
    « indiv_voted »: These are the identifiers of all the players against whom a voter can vote in the particular voting round. For example, 100, 101, 102, etc.
    « Choice »: This variable defines the panel of players against whom a voter can vote in the voting round.
    « Vote »: This is a binary indicator (0 or 1) that indicates against which player the individual voted in the voting round. If the value is 1, it indicates that the individual voted against the player corresponding to indiv_voted, otherwise, they did not vote against that player.
    « couples_vote »: A variable which concatenate indiv and indiv_voted, such as : 1-10, 1-2, 1-35 etc.


    Here is my question : How to compute number_similar_vote_per_round ?

    This is particularly difficult because there are many different variables to take into account, on different rows/columns.
    Ideally, I would like to have :
    • A command that checks for each row who are the voter (indiv) and the voted (indiv_voted).
    • A command that checks for each of them (when they are both reported in indiv), they have voted for the same person in the same round of the same season.
    • If yes, then it reports 1 in the variable « number_similar_vote_per_round », otherwise 0.

    For example: if on the same row we have: indiv==1 and indiv_voted==328. Then we need to check if when indiv==1 and indiv==328, Vote==1 for the same indiv_voted. If yes, then create a new variable that is equal to 1. If not, then write a value of 0 in this variable.

    I know it might be complex, thank you so much for everyone who will take the time to read and answer.

    Best,
    Antoine
    Last edited by Antoine Malezieux; 05 Apr 2023, 02:55.

  • #2
    Also, this is what the data looks like:

    Round Choice Vote Totem FE_season indiv indiv_voted couples_vote couples_pair_per_round
    1 186 0 0 1 47 33 47-33 1
    1 184 0 0 1 41 33 41-33 1
    1 187 0 0 1 48 33 48-33 1
    1 182 0 0 1 35 33 35-33 1
    1 183 0 0 1 36 33 36-33 1
    1 185 0 0 1 42 33 42-33 1
    1 181 0 0 1 34 33 34-33 1
    1 180 0 0 1 33 34 33-34 1
    1 187 0 0 1 48 34 48-34 1
    1 183 0 0 1 36 34 36-34 1
    1 182 0 0 1 35 34 35-34 1
    1 185 0 0 1 42 34 42-34 1
    1 186 0 0 1 47 34 47-34 1
    1 184 0 0 1 41 34 41-34 1
    1 187 0 0 1 48 35 48-35 1
    1 180 0 0 1 33 35 33-35 1
    1 186 0 0 1 47 35 47-35 1
    1 181 0 0 1 34 35 34-35 1
    1 185 0 0 1 42 35 42-35 1
    1 183 0 0 1 36 35 36-35 1
    1 184 0 0 1 41 35 41-35 1
    1 187 0 0 1 48 36 48-36 1

    Comment


    • #3
      Up, just in case someone had an idea. Thank you.

      Comment


      • #4
        I think your question has gone unanswered because of a combination of unclarity and a poorly chosen and poorly presented data example.

        Your example data does not include any actual positive votes: vote is always 0. So that doesn't give anybody anything to work with in trying to develop and test code. Moreover, your descriptions of several of the variables seem very similar and it is hard to keep straight what the differences between them are. For example choice and indiv_voted, by your description, are the same thing. Yet the values they take on don't even overlap, let alone agree. Finally, although it is set out in a way that makes it possible, with a little extra effort, to import it into Stata to work with, given that -dataex- output makes it effortless to import and also provides metadata that is sometimes (though probably not in this instance) crucial, some people who might otherwise respond are deterred.

        I suggest you post back once more on this thread. This time, use the -dataex- command to show your example data. Also, choose your example to contain some instances where vote = 1, and, in fact, make sure you include some instances where two people did vote against the same person at the same time, and also some instances of people who don't vote for the same person ever. Finally, work out on paper what the results should look like, and then show that. Showing the desired results is often the key to helping people understand when the description in words is long and complicated.

        If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        Comment


        • #5
          Hi Clyde,
          Many thanks for your answer. You are perfectly right.

          Round Choice Vote FE_season indiv indiv_voted couples_vote number_similar_vote_per_round
          1 186 0 1 33 50 33-50 0
          1 186 1 1 33 47 33-47 0
          1 187 1 1 50 33 50-33 0
          1 187 0 1 50 47 50-47 1
          1 188 1 1 47 33 47-33 0
          1 188 0 1 47 50 47-50 1

          I have opted for the option to write myself the sample of data to show.
          Participants #50 and #47) voted against the same participant (#33) therefore the column "number_similar_vote_per_round" is equal to 1 in "50-47" and "47-50".


          I would like to have the column "number_similar_vote_per_round" on a much bigger sample (higher number of indiv/indiv_voted, Round, Choice, FE_season, etc.).


          I hope this is clearer ! Do not hesitate if something is missing.

          Comment


          • #6
            This is still not adequate. You need your data example to look like this. Exactly in this format- nothing else. For a visual tutorial on how to use -dataex- so that we may help you better, please watch this. Verbal descriptions are okay only when supplemented by the real data that you yourself see on your terminal, not in tables or screenshots. We need to see the data such that we can just take the code block and put it into our do file and see what you're seeing.

            The more complex the question, as a very general rule, the more exact the data example and code must be.

            Comment


            • #7
              I believe this works. There may be a simpler way to do it that I missed.

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input byte round int choice byte(vote fe_season indiv indiv_voted) str6 couples_vote byte number_similar_vote_per_round
              1 186 0 1 33 50 "33-50 " 0
              1 186 1 1 33 47 "33-47 " 0
              1 187 1 1 50 33 "50-33 " 0
              1 187 0 1 50 47 "50-47 " 1
              1 188 1 1 47 33 "47-33 " 0
              1 188 0 1 47 50 "47-50 " 1
              end
              
              by round fe_season indiv, sort: egen target = max(cond(vote, indiv_voted, .))
              
              preserve
              keep round fe_season indiv target
              rename indiv indiv2
              duplicates drop
              tempfile votes
              save `votes'
              
              restore
              gen `c(obs_t)' obs_no = _n
              joinby round fe_season target using `votes'
              by round fe_season indiv indiv_voted, sort: egen wanted ///
                  = total(indiv2 == indiv_voted)
              by obs_no (indiv indiv_voted), sort: keep if _n == 1
              drop target obs_no indiv2

              Comment


              • #8
                @Clyde,
                Thank you very much for your code, which works perfectly. You are the Stata Master.

                @Jared,
                Really sorry, I thought the table was enough. I misinterpreted Clyde's original post. From now on, I will make sure to use -dataex-.

                Comment

                Working...
                X