Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Grouping observations for panel data

    I've been trying to estimate a model where I have individual student data for years 2010, 2012, 2013. The problem is that it is not panel data at the individual level, because data is only available for the 10th grade for each year, which obviously presents different individual students each year (unless they don't pass 10th grade). So I'm trying to define a "panel" not at the individual level, but at the school level, because each school is repeated for the three considered years.

    Because if I try to set the model using xtset student_id year, then student_id doesn't come up more than once (it's not really panel data). I know one way to do it would be to simply collapse the data at a school level and then set xtset school_id year, but I would like not to loose so much information. So is it possible to define a "grouped" panel data somehow? I mean, grouping all observations for a given school but not collapsing them. I hope I made myself clear.

  • #2
    You don't have panel data, and you shouldn't try to treat it as such. You have observations from multiple persons nested within schools, and time information is also available, although individual persons appear at only one time period. You can still use the -xt- suite of commands for these, but you need to -xtset school_id-. If time is relevant to outcomes, it can be included as a predictor variable in your model, using some suitable specification.

    Comment


    • #3
      Thank you for your quick and clear response. I'll treat is as such then.

      Admittedly, it is not panel data at the student level. But can't it be considered panel data at the school level? The underlying assumption being that each the "10th graders" of the same school but at different time intervals are similar. I'm asking this as a matter of general knowledge.

      Comment


      • #4
        It is not panel data. Panel data implies a single cohort of respondents is recruited and their outcomes (responses) are followed over time. In panel data, time is a within person (or firm, country, etc.) variable. There is a simple nesting hierarchy: observations at different times are nested within the panel members. (When those panel members are themselves nested within higher level groupings such as schools, this should also be accounted for in analysis and the use of higher-order designs such as -mixed- and the -me- suite of commands will generally be preferred to just -xt- commands.)

        What you have is a series of cross-sectional data: at various points in time, schools are sampled and the outcomes (responses) of the people who happen to be there then are gathered. Time is a between person variable. There is no simple nesting here. There are schools, and they are sampled using different people at different times. There is no simple nesting, although there is a clustering of observations within schools that -xt- commands can recognize and model. Parameterizing the observations within a school actually requires two dimensions: student and time. In true panel data, the within-panel-member observations can be parameterized on a single dimension (often time).

        The -xt- commands are, able to deal with both of these designs. For true panel data, we ordinarily -xtset panelvar timevar- (though the timevar may not be needed, depending on the specific analysis), and for a series of cross-sections we can only -xtset panelvar-.

        Comment


        • #5
          Again, thank you, it is much clearer now.

          Comment

          Working...
          X