Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating new variables based on information in other observations in dataset

    I'm working with some NCAA football player data. I've reached a bit of a difficult (for me) data cleaning/organization problem, that is a bit tricky to explain.

    Here's what the data looks like:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str25 player_ncaa_pfr int year str22 school_pfr str5 pos_guess_ncaa float pos_guess_rank_ncaa int pass_yards_ncaa double(rush_yards_ncaa rec_yards_ncaa)
    "Ahmaad Galloway"  2001 "Alabama" "RB"    2    0  881  20
    "Antonio Carter"   2001 "Alabama" "WR/TE" 1    0    0 428
    "Derrick Hamilton" 2001 "Clemson" "WR/TE" 3    0   21 590
    "Freddie Milons"   2001 "Alabama" "WR/TE" 2    0   10 626
    "J.J. McKelvey"    2001 "Clemson" "WR/TE" 2    0    0 392
    "Roscoe Crosby"    2001 "Clemson" "WR/TE" 1    0    0 396
    "Santonio Beard"   2001 "Alabama" "RB"    1    0  633   8
    "Travis Zachery"   2001 "Clemson" "RB"    1    0  576 414
    "Tyler Watts"      2001 "Alabama" "QB"    1 1325  564   0
    "Woodrow Dantzler" 2001 "Clemson" "QB"    1 2360 1004   0
    end
    What I want to do is create new variables for each player's teammates' statistics. For example, in 2001, Alabama had two running backs (in this example dataset), Galloway and Beard. For Galloway, I want to create a new variable with Beard's statistics, and vice versa. I was thinking to use the position rank (pos_guess_rank_ncaa) variable to help with this. Galloway is the first-ranked RB on Alabama in 2001, while Beard is the second-ranked (the higher numerical ranking indicates lesser actual ranking). My idea is to create two new variables here: rush_yards_ncaa for the first ranked RB on a team and rush_yards_ncaa for the second ranked RB on a team.

    I'm just not sure how to actually implement this in Stata. Is there a way to do this? I don't really know where to start on coding it.


  • #2
    I'm not sure I understand what you want, but I think it's the following. If not, please post back showing what the results for the example data should look like and a detailed explanation of how you arrived at them.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str25 player_ncaa_pfr int year str22 school_pfr str5 pos_guess_ncaa float pos_guess_rank_ncaa int pass_yards_ncaa double(rush_yards_ncaa rec_yards_ncaa)
    "Ahmaad Galloway"  2001 "Alabama" "RB"    2    0  881  20
    "Antonio Carter"   2001 "Alabama" "WR/TE" 1    0    0 428
    "Derrick Hamilton" 2001 "Clemson" "WR/TE" 3    0   21 590
    "Freddie Milons"   2001 "Alabama" "WR/TE" 2    0   10 626
    "J.J. McKelvey"    2001 "Clemson" "WR/TE" 2    0    0 392
    "Roscoe Crosby"    2001 "Clemson" "WR/TE" 1    0    0 396
    "Santonio Beard"   2001 "Alabama" "RB"    1    0  633   8
    "Travis Zachery"   2001 "Clemson" "RB"    1    0  576 414
    "Tyler Watts"      2001 "Alabama" "QB"    1 1325  564   0
    "Woodrow Dantzler" 2001 "Clemson" "QB"    1 2360 1004   0
    end
    
    forvalues i = 1/2 {
        by school_pfr year pos_guess_ncaa, sort:  ///
            egen rush_yards_rank_`i' = ///
            max(cond(pos_guess_rank_ncaa == `i'), rush_yards_ncaa, .)
    }

    Comment


    • #3
      Thanks for your response, Clyde! This worked for me.

      Comment

      Working...
      X