Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Beginner's question on dataset

    I have data on online competitions that are held weekly. I have data from 2001 all the way up to 2019 but I am considering the data from 2004-2007. Each competition has around 180 entrants, but it is not necessarily the same entrants each week. There are repeated individuals in a fair few of the competitions, yet there are some individuals who only participate once. Individuals essentially pick and choose each week whether they are going to participate or not. The data concerns each entrant's final score in the competition, and their placement in each competition amongst other variables that describe individuals performance.
    So altogether, I have data on 180 entrants in each of 3 years worth of weekly competitions, with some individuals competing a lot and some competing once.
    It's not repeated cross-section, as I'm not randomly sampling from a population each time, individuals are effectively choosing if they want to participate each week in a competition, and some individuals participate in many competitions.
    It's not a balanced panel, as not all individuals participate in each competition.
    It's not an unbalanced panel, as the data on those individuals who chose NOT to participate isn't "missing", we know where it is, it's because that individual decided not to participate. It's not like that individual did participate but we've lost the data on how they did during the competition.

    Just wondering what type of data I have? This data set has been analysed before and the paper that used it described it as an unbalanced panel, but I'm not convinced.

  • #2
    Well, in my world (epidemiology) we would call this a sample of convenience which is partly cross-sectional and partly longitudinal. That has the advantage of being fully descriptive, if long-winded.

    It doesn't neatly fit into either the unbalanced panel or the serial cross sectional rubric--which you know, or you wouldn't have asked the question. There are at least two perspectives for answering this question:

    1. How can I give a terse description of the data set to fit into an introductory sentence that will conjure the right intuitions in people reading/hearing me? For this purpose, I would figure out how many of the people are repeat participants and how many appear only once. If it is mostly repeaters, with a distinct minority who show up only once, I would call it an unbalanced panel. If it is mostly one-time participants and a distinct minority of repeaters I would refer to it as a serial cross-section sampled with replacement. If neither is a clear majority, then I don't think you can properly assign it a simple label and you might have to play epidemiologist.

    2. How will I analyze this data? Unless the number of repeat participants is small enough that you would be comfortable actually omitting them altogether (or omitting all but one of their observations) from the analysis, you will have no choice but to use the same analytic methods that would be applied to an unbalanced panel because you must account for dependence among observations pertaining to the same person.

    Comment

    Working...
    X