Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xthdidregress with time-variant covariates with TWFE (Woolridge) & AIPW (Callaway and Sant'Anna) - Stata 18 or 19

    Thanks in advance. I recently posted a question (this has the large datax for models in the table below) about some confusing results that differed vastly between xthdidregress options aipw and twfe. In the summary below, the aipw option behaves similarly to classic FE and DiD models, while I get vastly different results using the twfe option, which should be the most appropriate model due to simultaneous treatment. I did not get a clear answer in the first attempt (I was unclear), but after further reading I think I found an explanation and want to get some confirmation and/or advice.

    My data:
    • 90 cities in a panel from 2006 to 2011
    • 15 cities get an unanticipated treatment simultaneously in 2009
      • Because all treatment time is the same, I should be using Wooldridge's method to test for heterogeneity in treatment effects among units
    • The treatment is a sudden influx of migrants. I want to test for economic increase post-treatment above expected increases resulting from logged population.
      • Crucially, log_pop is time-variant AND I expect log_pop to increase faster for post-treatment treated observations (this is what I'm controlling for)
    In all the examples I could find for xthdidregress twfe, the controls were time-invariant, and I could not find anything on the topic in the Wooldridge (2021) paper. In my original post, I expressed concern about time-varying controls, especially when I expect a positive effect on treated cases, post-treatment. Since then, I've seen that FernandoRios advises avoiding time-varying covariates in #15 of this post and the section on covariates on his site.

    Questions:
    1. Is it correct that the xthdidregress twfe specification is not appropriate for time-varying controls?
    2. And/or is it specifically a problem when the control is expected to change for treated after treatment?
    3. Is there some solution that lets me control for time-varying factors while also checking for heterogeneity in treatment effects for time-invariant factors of interest (like pre-treatment average population, for example)?
    One idea: if the model (#4 in table) is not appropriate, I could create a time-invariant unit mean of log_pop (or unit meant of log_pop for the pre-treatment period) and check for heterogeneity in treatment effects among units. Doing this with the twfe option produced coefficients very similar to Models 1-3 (log_pop changes results only marginally). But with the current command, I cannot model heterogeneity in unit average population AND control for time-variant population changes. Can I improve on this?

    See key columns 3 and 4 in the output summarized from my original post's datax:
    Code:
        
      
    (1) (2) (3) (4)
    TWFE Classic_DiD AIPW TWFE_Woolridge
    treatment_event 0.069*** 0.069***
    (0.017) (0.017)
    log_pop 0.028 0.028
    (0.074) (0.096)
    year=2006 0.000 0.000
    (.) (.)
    year=2007 -0.024* -0.024** 0.003
    (0.011) (0.009) (0.018)
    year=2008 0.004 0.004 0.010
    (0.011) (0.011) (0.015)
    year=2009 -0.011 -0.011 0.023 0.010
    (0.012) (0.016) (0.015) (0.027)
    year=2010 -0.017 -0.017 0.072*** 0.037
    (0.012) (0.016) (0.022) (0.042)
    year=2011 0.007 0.007 0.093** 0.046
    (0.013) (0.017) (0.030) (0.048)
    Observations 540 540 540 540

  • #2
    David: As per the FAQ, you are asked to show the set of Stata commands you used and what the output is -- not using outreg2. There are several red flags here. You don’t even have estimates for ETWFE and AIPA. You should have one for each treated year. My guess is you didn’t correct specify the treatment variable, but I need to see more commands and output.

    Comment


    • #3
      Repeated post ...

      Comment


      • #4
        Hi Jeff Wooldridge, sorry for the confusion on data input and output. I thought it would be sufficient to link to that unanswered post with my new, more specific questions. I had not yet found the linked material that discusses time-varying controls when I made the original post, so I believe I was barking up the wrong tree to an extent. I would be grateful if you could take another look. There is no mention in the Stata documentation that I could find that discusses time-variant controls in the ETWFE, but this is discussed by Fernando Rios in the linked material in #1.

        The outreg2 table has the AIPA and ETWFE coefficients for years as year=2009 ... year=2011. Without re-labeling, the table was quite long. I worried that a lack of parsimony was the reason the original post lost traction. I will put the full model output for the AIPW and ETWFE models below. Anyone who wants to run the code input and models (it is quite large) can see the attached .do file or the post with the data input and code.

        The takeaway is that only the ETWFE model generates highly differing results when I use the time-variant control log_pop. It is unclear whether one should generally avoid using time-variant controls for ETWFE or whether this is specifically related to my data, which is expected to vary more for treated cases in treatment years. I can also note here that when I generate a time-invariant mean of log_pop by id, all the models generate similar results again. This is in the .do file, and I can post it here if this is of interest.
        .
        AIPW code and output:
        Code:
        xthdidregress aipw (log_gdp log_pop) (treatment_event) , group(id) 
        
        . xthdidregress aipw (log_gdp log_pop) (treatment_event) , group(id) 
        note: variable _did_cohort, containing cohort indicators formed by treatment variable
        treatment_event and group variable id, was added to the dataset.
        
        Computing ATET for each cohort and time:
        Cohort 2009 (5): ..... done
        
        Treatment and time information
        
        Time variable: year
        Time interval: 2006 to 2011
        Control:       _did_cohort = 0
        Treatment:     _did_cohort > 0
        
        _did_cohort
        
        Number of cohorts            2
        
        Number of obs     
        Never treated          450
        2009           90
        
        
        Heterogeneous-treatment-effects regression              Number of obs    = 540
        Number of panels =  90
        Estimator:       Augmented IPW
        Panel variable:  id
        Treatment level: id
        Control group:   Never treated
        
        (Std. err. adjusted for 90 clusters in id)
        
        Robust
        Cohort              ATET   std. err.      z    P>z     [95% conf. interval]
        
        year 
        2007     .0025336   .0175682     0.14   0.885    -.0318995    .0369667
        2008     .0102605    .015251     0.67   0.501    -.0196308    .0401519
        2009     .0230131   .0151939     1.51   0.130    -.0067663    .0527926
        2010      .072187   .0215461     3.35   0.001     .0299574    .1144165
        2011     .0932267   .0303568     3.07   0.002     .0337285    .1527249
        
        Note: ATET computed using covariates.
        Note: Base time for pretreatment ATETs is the previous period.
        ETWFE (twfe_wooldridge above):
        Code:
        xthdidregress twfe (log_gdp log_pop) (treatment_event) , group(id) hettype(time)
        
        . xthdidregress twfe (log_gdp log_pop) (treatment_event) , group(id) hettype(time)
        note: variable _did_cohort, containing cohort indicators formed by treatment variable
        treatment_event and group variable id, was added to the dataset.
        
        Treatment and time information
        
        Time variable: year
        Time interval: 2006 to 2011
        Control:       _did_cohort = 0
        Treatment:     _did_cohort > 0
        
        _did_cohort
        
        Number of cohorts            2
        
        Number of obs     
        Never treated          450
        2009           90
        
        
        Heterogeneous-treatment-effects regression              Number of obs    = 540
        Number of panels =  90
        Estimator:       Two-way fixed effects
        Panel variable:  id
        Treatment level: id
        Control group:   Never treated
        Heterogeneity:   Time
        
        (Std. err. adjusted for 90 clusters in id)
        
        Robust
        Time        ATET   std. err.      t    P>t     [95% conf. interval]
        
        year 
        2009      .010136   .0273606     0.37   0.712    -.0442289    .0645009
        2010     .0365073   .0419065     0.87   0.386      -.04676    .1197746
        2011     .0464675   .0477346     0.97   0.333    -.0483802    .1413152
        
        Note: ATET computed using covariates.
        Attached Files

        Comment


        • #5
          I've discovered things about relatively efficient estimators, and a student and I are currently working on GLS type solutions. But in your particular case, I suspect you have lots of positive serial correlation. xthdidregress aipw uses differences, and this can be more efficient than twfe. You can confirm this by also using xthdregress ra, which is the regression-based version based on long differences. I suspect you'll still see smaller standard errors than twfe.

          The time-varying population variable could be playing a role, but I doubt that's the explanation.

          xthdidregress twfe tends to be more efficient with small amounts of serial correlation.

          Comment

          Working...
          X