Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unbalanced panel data

    Hello, Stata users!

    I have several issues with panel data, prepared for my research. There are missing observations for period 2012-2023, it is a sociological survey.

    For example, I have observations for particular person (I) in 2012-2017, but I have nothing for 2018-2023, the strings are simply don't exist. Therefore, I can't fill them with something like zeros, median values etc.

    The questions are: how I can justify that I can use this unbalanced dataset? Or maybe there are some tests or methods which could help me to understand, whether I need to create balanced dataset from unbalanced or not?

    I found some information about randomly and non-randomly missing observations, but it was not enough.

    Thank you in advance!

  • #2
    Martin:
    1) I would stop at 2017;
    2) if your data come from a survey you have a repeated cross-sectional study, not a panel dataset.
    Last edited by Carlo Lazzaro; 16 Mar 2025, 09:42.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Martin:
      1) I would stop at 2017;
      2) if your data come from a survey you have a repeated cross-sectional study, not a panel dataset.
      Carlo, thanks for your answer!

      1) Finally, I prepared both unbalanced and balanced datasets, compared model's coefficients for unbalanced model and balanced one and found that there is no significant difference between outcomes (but I made this conclusion just by my "expert assessment"). There is any formal test to justify that I can use unbalanced dataset as well as balanced one?

      2) Yes, this data represents the results of the longitudinal surveys, but I am not sure that this cannot be interpret as panel dataset. As far as I know, data is classified as repeated cross-sectional dataset if sets of individuals are different for each particular period of observation. In my case, this is longitudinal observation of the same set of citizens, but some of them "drop out" from data due to unknown causes (changed place of living, death etc.), what I mentioned like the problem of unbalanced data.

      Comment


      • #4
        Martin: It's a good sign that the estimates are "similar" on the balanced and unbalanced panels. You could construct a Hausman test but that requires some work -- especially to make it robust to serial correlation and heteroskedasticity.

        In my work on selection in panel data models -- and this is described in Section 19.9 of my MIT Press book -- you can use the unbalanced panel to test for attrition bias. Define a complete cases indicator, s(i,t), equal to one if all data are observed for unit i in time t. Then, in the fixed effects estimation, include s(i,t+1) and do a cluster-robust t statistic.

        As I discuss in my book, another advantage of fixed effects is that it has more resiliency to unbalanced panels that methods such as random effects. That's because selection can be systematically correlated with the unobserved effect if you use FE. But not if you use RE. So, FE on the unbalanced panel, supplemented with the estimates for the balanced panel, and the simple test statistic above is what I recommend.

        Comment


        • #5
          Martin:
          as Jeff already provided you with an enlightening reply about your question #1, I will focus on your second question.
          Attrition is common in panel dataset and dealing with an unbalanced panel dataset is not an issue, especially, as Jeff pointed out, if the coefficients between the created balanced and real unbalanced datasets are similar.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment

          Working...
          X