I'm working with some NCAA football player data. I've reached a bit of a difficult (for me) data cleaning/organization problem, that is a bit tricky to explain.
Here's what the data looks like:
What I want to do is create new variables for each player's teammates' statistics. For example, in 2001, Alabama had two running backs (in this example dataset), Galloway and Beard. For Galloway, I want to create a new variable with Beard's statistics, and vice versa. I was thinking to use the position rank (pos_guess_rank_ncaa) variable to help with this. Galloway is the first-ranked RB on Alabama in 2001, while Beard is the second-ranked (the higher numerical ranking indicates lesser actual ranking). My idea is to create two new variables here: rush_yards_ncaa for the first ranked RB on a team and rush_yards_ncaa for the second ranked RB on a team.
I'm just not sure how to actually implement this in Stata. Is there a way to do this? I don't really know where to start on coding it.
Here's what the data looks like:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str25 player_ncaa_pfr int year str22 school_pfr str5 pos_guess_ncaa float pos_guess_rank_ncaa int pass_yards_ncaa double(rush_yards_ncaa rec_yards_ncaa) "Ahmaad Galloway" 2001 "Alabama" "RB" 2 0 881 20 "Antonio Carter" 2001 "Alabama" "WR/TE" 1 0 0 428 "Derrick Hamilton" 2001 "Clemson" "WR/TE" 3 0 21 590 "Freddie Milons" 2001 "Alabama" "WR/TE" 2 0 10 626 "J.J. McKelvey" 2001 "Clemson" "WR/TE" 2 0 0 392 "Roscoe Crosby" 2001 "Clemson" "WR/TE" 1 0 0 396 "Santonio Beard" 2001 "Alabama" "RB" 1 0 633 8 "Travis Zachery" 2001 "Clemson" "RB" 1 0 576 414 "Tyler Watts" 2001 "Alabama" "QB" 1 1325 564 0 "Woodrow Dantzler" 2001 "Clemson" "QB" 1 2360 1004 0 end
I'm just not sure how to actually implement this in Stata. Is there a way to do this? I don't really know where to start on coding it.
Comment