Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to balance an unbalanced panel on the year variable?

    Hello everyone!

    I am fairly new to Stata and am unable to solve (perhaps) very basic problems. I am working with a panel data for the first time, and the dataset has observations about schools, educational attainment of children with their gender, total enrolment, school localities, school types, etc.

    In one of the exercises, I am required to balance my dataset on the year variable such that it has the same set of schools in every year across all the available years. Honestly speaking, I do not know what this means. After browsing through the internet for quite some time, I was only able to get this far:

    xtset year

    I then tried to generate a new variable from the school_name so as to balance the year variable such that it has the same set of schools every year. I did this: egen s_id = group(school_id)

    But I do not know how to go on further from here. I also don't know if whatever I am doing is correct.

    Could someone please help me? A very small portion of my dataset (with year and school name) looks like this:

    emiscode old_emis year school_name

    32120046 32120046 2004 GES BASTI AZEEM
    32120046 32120046 2005 GES BASTI AZEEM
    32120046 32120046 2006 GES BASTI AZEEM
    32120046 . 2007 GES BASTI AZEEM
    31220059 31220059 2004 GES BASTI DOCTOR MUNEER
    31220059 31220059 2005 GES BASTI DOCTOR MUNEER
    31220059 31220059 2006 GES BASTI DOCTOR MUNEER
    31220059 . 2007 GES BASTI DOCTOR MUNEER
    32110081 32110081 2004 GES BASTI FAUJA
    32110081 32110081 2005 GES BASTI FAUJA
    32110081 32110081 2006 GES BASTI FAUJA
    32110081 . 2007 GES BASTI FAUJA
    32110078 32110078 2004 GES BASTI JAM
    32110078 32110078 2005 GES BASTI JAM
    32110078 32110078 2006 GES BASTI JAM
    32110078 . 2007 GES BASTI JAM
    Last edited by Romasa Ali; 30 Nov 2021, 14:07.

  • #2
    Well, the example data you show is already strongly balanced, so there is nothing to do. Assuming that there are, however, some schools in the full data set that do not appear in every year, you can balance the data set with the -fillin- command. (You do not need to -xtset- the data to do this.) Read -help fillin- for the details.

    Comment


    • #3
      Romasa:
      welcome to this forum.
      Clyde gave excellent advice.
      However, if your question is about a class/home assignment, please see https://www.statalist.org/forums/help#adviceextras #4.
      That said, in general balancing an unbalanced panel dataset is not a good idea, as it may imply ending up with a dataset that has a little to do with the original one.
      In addition, Stata can handle both balanced and unbalanced panel datasets without any problem.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        I completely agree with Carlo Lazzaro . There is no need to balance a data set unless you are planning to export it to some other software that requires it. Within Stata, unbalanced data sets work just fine with everything where it is statistically legitimate to analyze unbalanced data. I will say, however, that the approach in #2, based on the -fillin- command creates observations with all missing values for every variable other than the school and year, so if you refrain from overwriting those missing values with other data, the presence of these unnecessary observations will not affect any analyses you do, since missing observations will always be excluded from the analysis anyway.

        Comment

        Working...
        X