Hi,
Like many others im completely new to STATA. I have a model which I think is quite simple yet I'm struggling to model in STATA.
Essentially, I have a large dataset on early observations of players in a particular sport (by yearly I mean data on each player is his yearly summary, not game by game). My data range covers roughly 15 years.
The story relates to how financial incentives alter player behaviour. The control group were not given the incentives, whilst the treatment group were given the incentives. So all players belong to two groups. The basic structure is a DiD model with multiple pre and post treatment periods ( 5 and 8 respectively) and I'm not sure how to go about this. A problem I have is that since the time period of my data is quite long, there is a lot of player attrition (since it is very rare for a player to play 15 years continuously), and missing values. That is, I might have player A for 2003,2007, but then nothing thereafter or inbetween. Or I might only have Player B for the year 2006 and nothing else. That is, there is quite a bit of variation in the player data I have. This poses a problem for me in thinking about how to organize my data. Which players should I use (e.g. only use players for which I have at least 2 obvs pre and post tratment)? What does stata do with all the missing values? Should I just somehow 'pool' all my data so basically consider each year as a new sample of players and do the analysis like that? This is the code I ran:
reg y time time1 time2 time3 time5 time6 time7 time8 time9 time10 time11 time12 time13 treated did,
Where I have the time dummies to control for time trends, 'treated' is an indicator of the treated group of players, did is my interaction term. I suspect this is wrong as the outcome variable does not have a t subscript. Is that correct? The second regression I ran was:
xtreg y time time1 time2 time3 time5 time6 time7 time8 time9 time10 time11 time12 time13 treated did, fe
Which did not work (did estimate came up as omitted). I can sort of understand the model on paper (i've looked at pischke's notes on how to model a DiD with multiple time periods), I find it hard translating it to stata. Also, i'm confused on whether I should player fixed effects or group fixed effects in the model, or both?
The data is structured like:
Player variable 1 variable 2 variable 3 year
A
A
B
Your help is much appreciated!
Like many others im completely new to STATA. I have a model which I think is quite simple yet I'm struggling to model in STATA.
Essentially, I have a large dataset on early observations of players in a particular sport (by yearly I mean data on each player is his yearly summary, not game by game). My data range covers roughly 15 years.
The story relates to how financial incentives alter player behaviour. The control group were not given the incentives, whilst the treatment group were given the incentives. So all players belong to two groups. The basic structure is a DiD model with multiple pre and post treatment periods ( 5 and 8 respectively) and I'm not sure how to go about this. A problem I have is that since the time period of my data is quite long, there is a lot of player attrition (since it is very rare for a player to play 15 years continuously), and missing values. That is, I might have player A for 2003,2007, but then nothing thereafter or inbetween. Or I might only have Player B for the year 2006 and nothing else. That is, there is quite a bit of variation in the player data I have. This poses a problem for me in thinking about how to organize my data. Which players should I use (e.g. only use players for which I have at least 2 obvs pre and post tratment)? What does stata do with all the missing values? Should I just somehow 'pool' all my data so basically consider each year as a new sample of players and do the analysis like that? This is the code I ran:
reg y time time1 time2 time3 time5 time6 time7 time8 time9 time10 time11 time12 time13 treated did,
Where I have the time dummies to control for time trends, 'treated' is an indicator of the treated group of players, did is my interaction term. I suspect this is wrong as the outcome variable does not have a t subscript. Is that correct? The second regression I ran was:
xtreg y time time1 time2 time3 time5 time6 time7 time8 time9 time10 time11 time12 time13 treated did, fe
Which did not work (did estimate came up as omitted). I can sort of understand the model on paper (i've looked at pischke's notes on how to model a DiD with multiple time periods), I find it hard translating it to stata. Also, i'm confused on whether I should player fixed effects or group fixed effects in the model, or both?
The data is structured like:
Player variable 1 variable 2 variable 3 year
A
A
B
Your help is much appreciated!
Comment