Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • balance vs unbalance panel data for impact evaluation

    Hi everyone
    I'm trying to estimate an impact evaluation with panel data, T=3, groups are students following in 3 academic periods, so I have 11500 observations for 3000 students.
    The panel is unbalanced because a student can appear in one, two, or three periods.
    What implications have the estimations if I decide to do it with an unbalanced panel? Is the variance affected?
    But, Can I decide to estimate with a balanced panel, discarding all the students (for treatment and control groups) that don't appear in the 3 periods?
    I'm working with the population, not the sample.
    My control group is bigger than the treatment group.
    If I work with an unbalanced panel, I'd lose observations for both groups.
    Thanks.

  • #2
    it may depend on why they are missing (random, attrition). I worry sometimes that unbalanced panels, if the comers and goers are odd in some way, can cause issues. that might depend on the range the variables can take.

    If missing at random, then I suspect you've got a proper sample and not the population. If you have the population, one might say statistical testing is unnecessary (though, you might think of this as a sample from the population of similar students).

    Do it both ways and see what the difference is, and then try to figure out why much different (if so).

    You can also test to see if the missing units are different in the Y/X than the non-missing units.




    Comment


    • #3
      Thank you, Mr. Ford. data are missing because, for example, the student starts his firts year in T= 2 or T= 3.
      Or, for example, because the student droped out (thats my outcome variable) in T=1. ¿this would be random or attririon?

      Comment


      • #4
        so the outcome is dropping out in the first period?

        Comment


        • #5
          the outcome variable is dropping out in one of the 3 periods. ¿it is possible?
          Thanks again

          Comment


          • #6
            how do you know if someone drops out in T1?

            Comment


            • #7
              and how do you know if someone moves rather than drops out?

              Comment


              • #8
                We measure dropout at the end of the period. For dropout behavior within this university, the outcome variable includes dropout during and between periods at the program level, which means we are not interested in the situation if the student continues in another university or another program at the same university.
                Please, let me know if this answers your questions.

                Additionally, treatment variable is about a feeding program.

                Thank you very much.

                Comment


                • #9
                  A dropout in T2 won't appear in T3, so if you balance the panel, won't you have no dropouts except in 3?

                  Comment


                  • #10
                    I'm probably wrong, but it seems you have a bunch of students show up in T1, and they can drop out in T1 T2 or T3.

                    You might require only those that arrive in T1 as the sample. Those that show up later might be different (and may have dropped out or moved from elsewhere).

                    Comment


                    • #11
                      It could be a student dropping out in T3 because in T1 and T2 he didn't drop out.
                      I understand your point. "it seems you have a bunch of students show up in T1, and they can drop out in T1 T2, or T3" That would be the case with a balanced panel.

                      Your answer also makes me wonder if it would be better to estimate the effects by period. That is because analyzing the data, I do not find continuity in the treatment for each group (student) since in the three periods considered a student has been treated only 40% of the time.
                      Thank you very much for your time.

                      Comment


                      • #12
                        A truly balanced panel would require observations in all periods. A T2 dropout would not appear in T3, so you'd lose that. (Note: a survival model just popped into my head; not sure you've considered that).

                        I'm not sure the period matters much at all. FE on period should do, unless you think there's something dynamic going on (is participation two years different than one year?). If you're in and around Covid, time may be relevant.

                        Comment


                        • #13
                          In this case, a student would have this: T1= persist T2= dropout T3= persist, or T1=dropout T2= dropout T3=persist, or all possible combinations. That's because student enrollment regulations allow students to come back.

                          Likewise, treatment would happen for any student that: T1= treated T2= treated T3= untreated, or T1=untreated T2=treated T3=untreated, or all possible combinations. So, I have a time-varying variable for treatment.

                          Survival analysis is an excellent option, but it's necessary to estimate a causal effect.

                          Thank you.

                          Comment


                          • #14
                            Maria Camila: It appears this is the data set that you hope to use with an instrumental variables strategy. I wonder if you have checked whether fixed effects estimation (at the student level) will be sufficient. And I'm curious to know what you'd use as an IV in this setting.

                            Hopefully, your treatment changes over time, or is randomly assigned the first time a student appears. If it's the former, you should use fixed effects (with two time dummies). FE has some resiliency to attrition as I discuss in Chapter 19 of my 2010 MIT Press book. If you use FE, any unit with T = 1 will drop out. You don't really have enough periods to obtain a good test of attrition bias. The best is to use the subpanel where you observe t = 1 and t = 2 and apply FE on the first two periods, including an attrition indicator for t = 3. You hope that indicator is insignficant.

                            Still curious as to how you tested for "endogeneity" in this application.

                            Comment

                            Working...
                            X