Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Three-dimensional panel dataset

    Hi,

    I have a 3D panel dataset looking at football matches. This dataset is at the individual (i), day(d) and match(m) level. There are usually 4 matches for every day (but this is not entirely consistent, because 2 days have <4 matches), and around 380-390 individuals for each football match.

    I am currently trying to run a regression that estimates the likelihood of betting on the game that the individual has watched.

    I have tried to group my variables as such:

    egen newID=group(match ID)
    xtset newID day

    egen matchday=group(match day)
    xtset matchday ID

    egen newID=group(ID day)
    xtset newID match


    However, when I run the xtset command I get the same error - "repeated time values within panel"

    I am struggling to understand why I cannot run the xtset command. My independent variable is the number of bets (a count variable that varies by the individual and the match). The day variable accounts for the fact that there are multiple games on each day.

    Does anyone know where I might be going wrong?

  • #2
    Stata with your set-up supports only xtset identifier

    Otherwise, as you have found, at most one observation for each (identifier, time) pair is allowed.

    Not being able to xtset otherwise is less of a loss than you might imagine, because all you lose out on is the scope for fitting models that don't match your data structure any way.

    So a question in return is that you want to do with your dataset, meaning what commands do you expect to run after xtset?

    Comment


    • #3
      I want to run a panel regression model (probably a poisson or something similar to deal with the count data). The model will look something like this:

      number of bets (i, m) = α + β1X1(m) + β2X2(i, m) + β3X1*X2(i,m) + β4Z(i) + δ(i) + ε(i,m)

      where α is a constant and δ​​​​​​​ takes into account individual fixed effect.

      However, some of the variables in the dataset are at the match-day level. Is it best to drop these (as they are not important for the model), and then run:

      xtset ID match

      Or would I be introducing any bias by not controlling for the day level effects too?

      Comment


      • #4
        Thanks for the further information. It's best that I leave your questions open for panel enthusiasts.

        Comment


        • #5
          Barring multilevel methods that likely could work here, sounds like a good problem for tensor approaches. 3-D matrix data structures which Stata doesn't support. Python, however, does. Either way, the reason you can't xtset your data is because you need to know your unit of analysis. That is, what two variables across time and space UNIQUELY ID your observations?

          Once you can answer that, you can do as you desire, but it'll likely need to be with a technique that accounts for the nesting structure you describe.

          Comment

          Working...
          X