Hi everyone,
I am working with an unbalanced individual-level panel from a survey and would appreciate some guidance on whether any of the staggered-adoption DiD estimators is appropriate in my setting. The panel spans up to 8 survey waves (which in the actual data correspond to calendar years), but it is far from balanced. Individuals may enter the survey after the first wave, and many respondents miss one or more intermediate waves. Treatment is an individual-level shock that occurs at different times for different individuals and is absorbing once it occurs. A simplified toy example is shown below:
Since this is clearly a staggered-adoption setting, I was considering estimators such as Callaway & Sant'Anna (e.g. csdid) or an event-study approach along the lines of Clarke and Tapia-Schythe (2020, eventdd). In fact, the "gvar" variable reported above is constructed to be used as the group variable in csdid. However, my understanding is that many of these implementations either require or strongly prefer a balanced panel structure.
One idea I considered was constructing an alternative time variable based on the number of survey appearances for each individual. For example, I could keep only respondents observed e.g. 5 times and define a within-individual time index running from 1 to 5, regardless of the underlying survey year. This would create something closer to a balanced panel. However, my concern is that doing so would discard the calendar-time dimension, which seems important to track any time or cohort effects.
Is there a standard way to handle this kind of unbalanced survey panel within the Callaway-Sant'Anna framework (or related estimators), or would the missing waves and staggered entry create identification problems that require a different approach? Any advice on how to proceed would be greatly appreciated!
I am working with an unbalanced individual-level panel from a survey and would appreciate some guidance on whether any of the staggered-adoption DiD estimators is appropriate in my setting. The panel spans up to 8 survey waves (which in the actual data correspond to calendar years), but it is far from balanced. Individuals may enter the survey after the first wave, and many respondents miss one or more intermediate waves. Treatment is an individual-level shock that occurs at different times for different individuals and is absorbing once it occurs. A simplified toy example is shown below:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(id time y shock gvar) 1 1 9 0 0 1 2 9 0 0 1 3 11 0 0 1 4 12 0 0 1 5 11 0 0 1 6 12 0 0 1 7 12 0 0 2 1 9 0 0 2 2 9 0 0 2 3 8 0 0 2 4 9 1 4 2 5 8 0 4 2 6 7 0 4 2 7 8 0 4 2 8 9 0 4 3 1 12 0 0 3 2 12 0 0 3 3 11 0 0 3 4 13 0 0 3 5 15 0 0 3 6 14 1 6 3 7 16 0 6 3 8 16 0 6 4 3 10 0 0 4 4 10 0 0 4 5 12 0 0 4 6 12 0 0 4 7 13 1 7 4 8 14 0 7 5 1 11 0 0 5 2 13 0 0 5 3 12 0 0 5 4 12 0 0 5 5 11 0 0 5 8 9 0 0 end
One idea I considered was constructing an alternative time variable based on the number of survey appearances for each individual. For example, I could keep only respondents observed e.g. 5 times and define a within-individual time index running from 1 to 5, regardless of the underlying survey year. This would create something closer to a balanced panel. However, my concern is that doing so would discard the calendar-time dimension, which seems important to track any time or cohort effects.
Is there a standard way to handle this kind of unbalanced survey panel within the Callaway-Sant'Anna framework (or related estimators), or would the missing waves and staggered entry create identification problems that require a different approach? Any advice on how to proceed would be greatly appreciated!

Comment