Hi
I am working on a study of peer effects, closely related to Enrico Moretti and Alexandre Mas (2006) "Peers at Work" paper http://eml.berkeley.edu/~moretti/text20.pdf
In their paper, their model is:
worker i's productivity, yitcs= unobservable permanent productivity of worker i + unobservable permanent productivity of worker i's peers + number of workers on shift + dummies for time-date-store combinations
They use first differences across working shifts to form their baseline model, and of course that still leaves a unobservable change in permanent productivity of worker i's peers between shift t and shift t-1 in the baseline regression
So to estimate this first-differences equation, they seek to estimate all the fixed effects of workers in their panel set first i.e. the unobservable permanent productivity of worker (i...N).
They do this by running worker i's productivity, yitcs= unobservable permanent productivity of worker i + vector of all dummies for all possible combinations of workers who worked with worker i + number of workers on shift + dummies for time-date-store combinations.
This is line 343 in their do. file which I have attached.
areg lnprod HHHT1-HHHT`max_check' REG* DOW_HOUR* DDT* if prod_unit >0.02 & prod_unit <1.5, absorb(shift_group);
where HHHT1-HHHT`max_check' are the dummy vectors and `max_check' is the total number of workers in the store, shift_group is the vector of all dummies for all possible combinations of workers who worked with worker i
I am trying to adopt the same empirical strategy of the authors. But I've not been able to generate/retrieve the individual fixed effects as Stata drops almost all my worker dummies for multicollinearity. I wondered if it was the case of the dummy variable trap, but even dropping one of the worker dummies did not solve the multicollinearity issue. Furthermore, that didn't seem to be a problem for the authors anyway, as they had dummy vectors for all `max_check' number of workers in the store.
My dataset is on football players and my "time variable" is the gameweek(gw), pdummy are my player dummies and corresponds to the HHHT. I have attached my do. file and a sample of my dataset here too.
Any help will be greatly appreciated as I'm relatively new to Stata. Thank you
I am working on a study of peer effects, closely related to Enrico Moretti and Alexandre Mas (2006) "Peers at Work" paper http://eml.berkeley.edu/~moretti/text20.pdf
In their paper, their model is:
worker i's productivity, yitcs= unobservable permanent productivity of worker i + unobservable permanent productivity of worker i's peers + number of workers on shift + dummies for time-date-store combinations
They use first differences across working shifts to form their baseline model, and of course that still leaves a unobservable change in permanent productivity of worker i's peers between shift t and shift t-1 in the baseline regression
So to estimate this first-differences equation, they seek to estimate all the fixed effects of workers in their panel set first i.e. the unobservable permanent productivity of worker (i...N).
They do this by running worker i's productivity, yitcs= unobservable permanent productivity of worker i + vector of all dummies for all possible combinations of workers who worked with worker i + number of workers on shift + dummies for time-date-store combinations.
This is line 343 in their do. file which I have attached.
areg lnprod HHHT1-HHHT`max_check' REG* DOW_HOUR* DDT* if prod_unit >0.02 & prod_unit <1.5, absorb(shift_group);
where HHHT1-HHHT`max_check' are the dummy vectors and `max_check' is the total number of workers in the store, shift_group is the vector of all dummies for all possible combinations of workers who worked with worker i
I am trying to adopt the same empirical strategy of the authors. But I've not been able to generate/retrieve the individual fixed effects as Stata drops almost all my worker dummies for multicollinearity. I wondered if it was the case of the dummy variable trap, but even dropping one of the worker dummies did not solve the multicollinearity issue. Furthermore, that didn't seem to be a problem for the authors anyway, as they had dummy vectors for all `max_check' number of workers in the store.
My dataset is on football players and my "time variable" is the gameweek(gw), pdummy are my player dummies and corresponds to the HHHT. I have attached my do. file and a sample of my dataset here too.
Any help will be greatly appreciated as I'm relatively new to Stata. Thank you
Comment