Hi, So I am working on a class project, where my testable hypothesis is that "Players playing position on the soccer field influence his overall rating" The data set I found was in sql, so I had to convert it into excel first then Stata. As you could see that in the attached screenshot here there are variables like Player_fifa_api_id and year1, and categorical variable Attack_rate and Defense rate coded as Medium, High, Low etc. From the categorical variable I created player positions such that if a player has high attack rate and low defense, he would play at Forward position. As this data set shows same player playing matches in different years, so he had varied attributes for every match, I created means of the variables and dropped the original variable, so when declaring panel data there are no varied observation. I have the same issue here, if you could see the two highlighted rows (17 & 18), you could notice that a player with same player id played two matches in 2012 but in one of the matches he had attack rate low, and defense rate medium, so he has varied observations for the same year. This dataset has 65000 observations, so I want to know how could I fix this problem so that I have only one observation for each year, and that observation is not varied across the row. I want to declare this data set as panel such that xtset player_fifa_api year1. I have already done a lot of data manipulation on this data set, and I am not sure if I could test my hypothesis on this data set. Also If anybody could suggest good soccer data which would easily test my hypothesis (like OLS reg), that would be helpful.
Thanks!
Thanks!
Comment