Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating fixed effects estimates with panel data

    Hi

    I am working on a study of peer effects, closely related to Enrico Moretti and Alexandre Mas (2006) "Peers at Work" paper http://eml.berkeley.edu/~moretti/text20.pdf

    In their paper, their model is:
    worker i's productivity, yitcs= unobservable permanent productivity of worker i + unobservable permanent productivity of worker i's peers + number of workers on shift + dummies for time-date-store combinations
    They use first differences across working shifts to form their baseline model, and of course that still leaves a unobservable change in permanent productivity of worker i's peers between shift t and shift t-1 in the baseline regression
    So to estimate this first-differences equation, they seek to estimate all the fixed effects of workers in their panel set first i.e. the unobservable permanent productivity of worker (i...N).
    They do this by running worker i's productivity, yitcs= unobservable permanent productivity of worker i + vector of all dummies for all possible combinations of workers who worked with worker i + number of workers on shift + dummies for time-date-store combinations.
    This is line 343 in their do. file which I have attached.
    areg lnprod HHHT1-HHHT`max_check' REG* DOW_HOUR* DDT* if prod_unit >0.02 & prod_unit <1.5, absorb(shift_group);
    where HHHT1-HHHT`max_check' are the dummy vectors and `max_check' is the total number of workers in the store, shift_group is the vector of all dummies for all possible combinations of workers who worked with worker i



    I am trying to adopt the same empirical strategy of the authors. But I've not been able to generate/retrieve the individual fixed effects as Stata drops almost all my worker dummies for multicollinearity. I wondered if it was the case of the dummy variable trap, but even dropping one of the worker dummies did not solve the multicollinearity issue. Furthermore, that didn't seem to be a problem for the authors anyway, as they had dummy vectors for all `max_check' number of workers in the store.
    My dataset is on football players and my "time variable" is the gameweek(gw), pdummy are my player dummies and corresponds to the HHHT. I have attached my do. file and a sample of my dataset here too.

    Any help will be greatly appreciated as I'm relatively new to Stata. Thank you
    Attached Files
    Last edited by Sam Huang; 31 Jan 2015, 05:34.

  • #2
    Sam:
    it's rarely the case (for me, at least), that list contributors has time (or willingness) to skim through all these attachments, set aside the risk of downloading nasty active contents possibly embedded in electronic spreadsheets.. The best way to post your code is via Code delimiters (# icon), that you can find among the advamced editor (A icon) options.
    That said, before dwelling into more recommendations, I would double-check if your workers dummies are dropped because of previous -xtset gameweek(gw) pdummy-
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo,

      Thanks for the advice. That's was my first post so I didn't realise that, my apologies.

      I'm not sure what you meant by xtset dropping the dummies? And my xtset would be -xtset id gw instead of pdummy. pdummy is only generated later by tab id, gen(pdummy). The dummies are only dropped when running the areg.

      Code:
      set more off
      clear
      set matsize 5000
      set maxvar 20000
      import excel "statalist.xlsx", sheet("Sheet1") firstrow case(lower)
      destring mins shots sot keypasses pa touches tackles interceptions clearances blockedshots fouls rating opprank oppwage ifhome, replace
      drop if position=="GK"
      encode name, gen(id)
      replace rating=. if rating==99
      replace mins=0 if mins<0
      sort id
      sort id club
      sort club gw
      egen clubgw=group(club gw)
      sort clubgw
      tab id, gen(pdummy)
      *dummies for each player `i'
      summ id
      local maxnum=r(max)
      local i=1
      while `i'<=`maxnum'{
      bysort clubgw: egen mpdummy`i'=mean(pdummy`i')
      gen byte teamcomp`i'=0
      replace teamcomp`i'=1 if mpdummy`i'>0 & mpdummy`i'!=.
      replace teamcomp`i'=0 if pdummy`i'==1
      local i=`i'+1
      }
      *teamcomp`i' indicates dummy for teammates who have played with player `i' in any single gw
      egen compdummies=group(teamcomp*)
      *compdummies is now vector of dummies for all possible combinations of players who played with player `i'. E.g If player A played with players B & C in GW1, and players B & D in GW2, there are separate dummies for the combinations B&C and B&D
      drop if compdummies==.
      
      areg rating pdummy1-pdummy`maxnum' opprank oppwage ifhome, absorb(compdummies)

      Comment


      • #4
        Sam:
        I assumed that you used a panel data analysis Stata command (please, see -xt-; -xtreg- if you're not already famliar with them), as the subject of your post seemed to suggest.
        As fa as I can get it, -xtset id gw- should work fine as panel data declaration.
        You say that dummies are only dropped with -areg-.
        Do you mean that they are kept under -xtreg, fe-?
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo
          Yes I did. And yes xtset id gw works perfectly in declaring my dataset as panel. That is not a problem

          The problem is the alot of the dummies pdummy1-pdummyN are dropped with areg. And I need them to be included since I want the coefficients on pdummy*
          No, many of the pdummy* are also dropped when running xtreg rating pdummy1-pdummy168 opprank oppwage ifhome, i(compdummies) fe

          Note: I'm not interested in the compdummies fixed effects (fixed effects of different combinations of workers), I'm interested in the pdummy* coefficients, which represent the individual fixed effects. As such I need my pdummy to stay in the regression.

          Comment


          • #6
            Many players are only in your sample for a few games.. so not only are their FEs not identifiable, some of the times they are perfectly collinear with your other RHS variables and that's why they are dropped.

            If you have a FE for every combination of players then my guess is that there is no way you can estimate that; there are just too many parameters. In that case, you need to make somewhat stronger assumptions to reduce your number of FEs.

            Comment


            • #7
              Hi Sergio

              I do not need to estimate the FE for every combination of players, so that's not a worry.

              I guess you right that I do not have enough observations for some of the players in the sample, and so their FE's are not identifiable. Is there a general rule to how many observations are necessary?

              And do you mean that having little observations is also the cause for the collinearity? Or is there some other reason?

              Comment


              • #8
                Hi Sam,

                1) About the minimum number of obs: not that I'm aware of.
                2) The number of obs. per se is not the issue, but the number of obs. for certain subgroups.

                I know about researchers having used -reghdfe- (ssc describe reghdfe) to obtain estimates for the fixed effects (which will be zero instead of just dropped when collinear), but as many people have pointed, you need to be very careful about when the FEs are identified (for instance see [1]), and the fact that it can be done doesn't mean it *should* be done.

                My best guess for a soln would be to add some structure to the fixed effects (i.e. to think of them as being random, with some distribution), kinda like what Arellano & Bonhomme do for the 1 fixed-effect case [2] . Sorry about not being that helpful but there is no straightforward soln to your problem, as far as I am aware.

                Best,
                Sergio


                [1] Abowd, J. M., R. H. Creecy, and F. Kramarz 2002. Computing person and firm effects using linked longitudinal employer-employee data. Census Bureau Technical Paper TP-2002-06.

                [2] Identifying Distributional Characteristics in Random Coefficients Panel Data Models (with Manuel Arellano) Review of Economic Studies, 79(3), 987-1020, 2012.
                https://www.dropbox.com/s/pw8sfas7js...Final.pdf?dl=0

                Comment

                Working...
                X