balance vs unbalance panel data for impact evaluation

MariaCamila Jimenez

Join Date: Oct 2023

Posts: 8
#1

balance vs unbalance panel data for impact evaluation

31 Oct 2023, 07:42

Hi everyone
I'm trying to estimate an impact evaluation with panel data, T=3, groups are students following in 3 academic periods, so I have 11500 observations for 3000 students.
The panel is unbalanced because a student can appear in one, two, or three periods.
What implications have the estimations if I decide to do it with an unbalanced panel? Is the variance affected?
But, Can I decide to estimate with a balanced panel, discarding all the students (for treatment and control groups) that don't appear in the 3 periods?
I'm working with the population, not the sample.
My control group is bigger than the treatment group.
If I work with an unbalanced panel, I'd lose observations for both groups.
Thanks.
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3179
#2

31 Oct 2023, 10:02

it may depend on why they are missing (random, attrition). I worry sometimes that unbalanced panels, if the comers and goers are odd in some way, can cause issues. that might depend on the range the variables can take.

If missing at random, then I suspect you've got a proper sample and not the population. If you have the population, one might say statistical testing is unnecessary (though, you might think of this as a sample from the population of similar students).

Do it both ways and see what the difference is, and then try to figure out why much different (if so).

You can also test to see if the missing units are different in the Y/X than the non-missing units.
Comment
MariaCamila Jimenez

Join Date: Oct 2023

Posts: 8
#3

31 Oct 2023, 12:09

Thank you, Mr. Ford. data are missing because, for example, the student starts his firts year in T= 2 or T= 3.
Or, for example, because the student droped out (thats my outcome variable) in T=1. ¿this would be random or attririon?
Comment
George Ford

Join Date: Aug 2014

Posts: 3179
#4

31 Oct 2023, 12:24

so the outcome is dropping out in the first period?
Comment
MariaCamila Jimenez

Join Date: Oct 2023

Posts: 8
#5

31 Oct 2023, 12:28

the outcome variable is dropping out in one of the 3 periods. ¿it is possible?
Thanks again
Comment
George Ford

Join Date: Aug 2014

Posts: 3179
#6

31 Oct 2023, 12:52

how do you know if someone drops out in T1?
Comment
George Ford

Join Date: Aug 2014

Posts: 3179
#7

31 Oct 2023, 12:52

and how do you know if someone moves rather than drops out?
Comment
MariaCamila Jimenez

Join Date: Oct 2023

Posts: 8
#8

31 Oct 2023, 13:29

We measure dropout at the end of the period. For dropout behavior within this university, the outcome variable includes dropout during and between periods at the program level, which means we are not interested in the situation if the student continues in another university or another program at the same university.
Please, let me know if this answers your questions.

Additionally, treatment variable is about a feeding program.

Thank you very much.
Comment
George Ford

Join Date: Aug 2014

Posts: 3179
#9

31 Oct 2023, 13:35

A dropout in T2 won't appear in T3, so if you balance the panel, won't you have no dropouts except in 3?
Comment
George Ford

Join Date: Aug 2014

Posts: 3179
#10

31 Oct 2023, 13:37

I'm probably wrong, but it seems you have a bunch of students show up in T1, and they can drop out in T1 T2 or T3.

You might require only those that arrive in T1 as the sample. Those that show up later might be different (and may have dropped out or moved from elsewhere).
Comment
MariaCamila Jimenez

Join Date: Oct 2023

Posts: 8
#11

31 Oct 2023, 14:00

It could be a student dropping out in T3 because in T1 and T2 he didn't drop out.
I understand your point. "it seems you have a bunch of students show up in T1, and they can drop out in T1 T2, or T3" That would be the case with a balanced panel.

Your answer also makes me wonder if it would be better to estimate the effects by period. That is because analyzing the data, I do not find continuity in the treatment for each group (student) since in the three periods considered a student has been treated only 40% of the time.
Thank you very much for your time.
Comment
George Ford

Join Date: Aug 2014

Posts: 3179
#12

31 Oct 2023, 14:49

A truly balanced panel would require observations in all periods. A T2 dropout would not appear in T3, so you'd lose that. (Note: a survival model just popped into my head; not sure you've considered that).

I'm not sure the period matters much at all. FE on period should do, unless you think there's something dynamic going on (is participation two years different than one year?). If you're in and around Covid, time may be relevant.
Comment
MariaCamila Jimenez

Join Date: Oct 2023

Posts: 8
#13

31 Oct 2023, 15:20

In this case, a student would have this: T1= persist T2= dropout T3= persist, or T1=dropout T2= dropout T3=persist, or all possible combinations. That's because student enrollment regulations allow students to come back.

Likewise, treatment would happen for any student that: T1= treated T2= treated T3= untreated, or T1=untreated T2=treated T3=untreated, or all possible combinations. So, I have a time-varying variable for treatment.

Survival analysis is an excellent option, but it's necessary to estimate a causal effect.

Thank you.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2192
#14

31 Oct 2023, 16:26

Maria Camila: It appears this is the data set that you hope to use with an instrumental variables strategy. I wonder if you have checked whether fixed effects estimation (at the student level) will be sufficient. And I'm curious to know what you'd use as an IV in this setting.

Hopefully, your treatment changes over time, or is randomly assigned the first time a student appears. If it's the former, you should use fixed effects (with two time dummies). FE has some resiliency to attrition as I discuss in Chapter 19 of my 2010 MIT Press book. If you use FE, any unit with T = 1 will drop out. You don't really have enough periods to obtain a good test of attrition bias. The best is to use the subpanel where you observe t = 1 and t = 2 and apply FE on the first two periods, including an attrition indicator for t = 3. You hope that indicator is insignficant.

Still curious as to how you tested for "endogeneity" in this application.
Comment

Announcement

balance vs unbalance panel data for impact evaluation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment