Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dynamic DiD Troubleshooting

    Hi everyone,

    I'm working with panel data from a German household survey to estimate the causal effects of the German parental leave reform that took effect on July 1st, 2015. The reform applied to all parents whose first child was born on or after that date. My goal is to estimate the reform's effects separately for mothers and fathers on outcomes such as weekly hours worked, time spent on childcare, and housework. I’m torn between two empirical strategies and would appreciate feedback.

    1) Model 1: RD-DiD Design (Restricted to First Births in 2015): My first approach is a regression discontinuity difference-in-differences (RD-DiD) design, where I restrict the sample to parents whose first child was born in 2015—thus placing all observations close to the July 1st cutoff. I define treatment status (treat) based on whether the child was born before or after this reform date. If the child was born on or after July 1st, 2015, treat =1, otherwise treat = 0.

    The post-treatment period (post) is defined as survey years 2016 and onwards. Since the data I have is a yearly panel in long format, each row in my dataset represents an individual-year observation. My regression command (run separately for mothers and fathers) is:

    xtreg hoursworked i.treat##i.post i.post i.syear i.imonth, fe vce(cluster pid)

    Here, syear is the survey year and imonth is the month of interview. pid is the personal identifier for each individual, and I use clustered standard errors at that level. The data is unbalanced and not all parents are observed every year after birth. Also, not everyone reports valid outcomes for hours worked, so there is a substantial amount of missingness in the dependent variables.
    A few issues have made me uncertain about this approach. First, sample size is already small because I restrict to 2015 births only, which limits power. Second, while my goal is to measure outcomes one to two years after birth, the GSOEP’s yearly structure and variation in interview months means some “post-treatment” outcomes may be observed as early as two months after birth (e.g., for children born in December 2015 interviewed in early 2016). This could bias estimates of hours worked if, say, the parent is still on leave or not yet back at work. I considered shifting the post period to 2017 and later, but then the control group’s children (born before July 2015) would already be around two years old, potentially confounding effects of the reform with child age.

    2) Model 2: Dynamic DiD with Normalized Time Since Birth (Births from 2014–2016): To address these issues, my second approach expands the sample to first births from January 2014 through December 2016. Treatment is again based on the July 1st, 2015 cutoff: parents whose child was born before that date are the control group, and those whose child was born on or after are treated.

    Rather than using a simple post variable, I normalize time based on years since birth. I construct a variable t10 (time normalized to 10, so no negative values) to track time relative to birth:
    • t10 = 10 --> which is year of birth
    • t10 = 11 --> which is one year after birth
    • t10 = 12 --> two years after birth
    • etc.
    So for example, for someone who had a child in 2014, the survey year 2015 becomes t10 = 11, and 2016 becomes t10 = 11. For a 2015 birth, t10= 10 in 2015 and t10=11 in 2016, etc. I include values from t10 = 7 to t10 = 13 to cover a window of three years before and after birth. This way, outcomes are aligned by age of the child rather than by calendar year, which should help avoid the problems with early interviews or different child ages across groups. The regression I run is:

    xtreg hoursworked i.treat##ib9.t10 i.syear i.kbirthmon i.imonth, fe vce(cluster pid)

    Here again, syear is the survey year, kbirthmon is the birth month of the child, and imonth is the interview month. I cluster standard errors at the individual level using pid. The interaction between treat and t10 (with t9 as the base category, so one year before birth for everyone) allows me to estimate how the treatment effect evolves over time since birth.

    This strategy solves the timing issue and gives me a larger sample size, but I’m struggling to find references or existing literature that use a similar approach—namely, a dynamic DiD where time is normalized relative to a key life event (birth). I’m also unsure whether this specification, particularly the treatment × t10 interaction structure, is the best way to model this and if my implementation in stata is correct or not.

    Given the trade-off between the clean cutoff in the RD-DiD design and the improved sample size and timing alignment in the dynamic DiD, which model would you find more credible? And do you know of any literature using an approach similar to model 2 (normalized time since event in DiD)? Also, I am not sure I chose the most appropriate stata commands for my approaches. Would love to hear any thoughts on how to justify that model or improve the implementation.

    Thanks in advance!

    stata
    CopyEdit§
    Last edited by mart mai; 04 May 2025, 13:55.
Working...
X