xthdidregress with time-variant covariates with TWFE (Woolridge) & AIPW (Callaway and Sant'Anna) - Stata 18 or 19

David Ray McCoy

Join Date: Dec 2016
Posts: 24

xthdidregress with time-variant covariates with TWFE (Woolridge) & AIPW (Callaway and Sant'Anna) - Stata 18 or 19

06 May 2025, 13:31

Thanks in advance. I recently posted a question (this has the large datax for models in the table below) about some confusing results that differed vastly between xthdidregress options aipw and twfe. In the summary below, the aipw option behaves similarly to classic FE and DiD models, while I get vastly different results using the twfe option, which should be the most appropriate model due to simultaneous treatment. I did not get a clear answer in the first attempt (I was unclear), but after further reading I think I found an explanation and want to get some confirmation and/or advice.

My data:

90 cities in a panel from 2006 to 2011
15 cities get an unanticipated treatment simultaneously in 2009
- Because all treatment time is the same, I should be using Wooldridge's method to test for heterogeneity in treatment effects among units
The treatment is a sudden influx of migrants. I want to test for economic increase post-treatment above expected increases resulting from logged population.
- Crucially, log_pop is time-variant AND I expect log_pop to increase faster for post-treatment treated observations (this is what I'm controlling for)

In all the examples I could find for xthdidregress twfe, the controls were time-invariant, and I could not find anything on the topic in the Wooldridge (2021) paper. In my original post, I expressed concern about time-varying controls, especially when I expect a positive effect on treated cases, post-treatment. Since then, I've seen that FernandoRios advises avoiding time-varying covariates in #15 of this post and the section on covariates on his site.

Questions:

Is it correct that the xthdidregress twfe specification is not appropriate for time-varying controls?
And/or is it specifically a problem when the control is expected to change for treated after treatment?
Is there some solution that lets me control for time-varying factors while also checking for heterogeneity in treatment effects for time-invariant factors of interest (like pre-treatment average population, for example)?

One idea: if the model (#4 in table) is not appropriate, I could create a time-invariant unit mean of log_pop (or unit meant of log_pop for the pre-treatment period) and check for heterogeneity in treatment effects among units. Doing this with the twfe option produced coefficients very similar to Models 1-3 (log_pop changes results only marginally). But with the current command, I cannot model heterogeneity in unit average population AND control for time-variant population changes. Can I improve on this?

See key columns 3 and 4 in the output summarized from my original post's datax:

Code:

    
  
(1)
(2)
(3)
(4)


TWFE
Classic_DiD
AIPW
TWFE_Woolridge







treatment_event
0.069***
0.069***




(0.017)
(0.017)









log_pop
0.028
0.028




(0.074)
(0.096)









year=2006
0.000
0.000




(.)
(.)









year=2007
-0.024*
-0.024**
0.003



(0.011)
(0.009)
(0.018)








year=2008
0.004
0.004
0.010



(0.011)
(0.011)
(0.015)








year=2009
-0.011
-0.011
0.023
0.010


(0.012)
(0.016)
(0.015)
(0.027)







year=2010
-0.017
-0.017
0.072***
0.037


(0.012)
(0.016)
(0.022)
(0.042)







year=2011
0.007
0.007
0.093**
0.046


(0.013)
(0.017)
(0.030)
(0.048)







Observations
540
540
540
540

Tags: event study, fixed effects, panel data, regression, xthdidregress

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#2

07 May 2025, 22:39

David: As per the FAQ, you are asked to show the set of Stata commands you used and what the output is -- not using outreg2. There are several red flags here. You don’t even have estimates for ETWFE and AIPA. You should have one for each treated year. My guess is you didn’t correct specify the treatment variable, but I need to see more commands and output.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#3

08 May 2025, 21:18

Repeated post ...
Comment

David Ray McCoy

Join Date: Dec 2016
Posts: 24

13 May 2025, 13:01

Hi Jeff Wooldridge, sorry for the confusion on data input and output. I thought it would be sufficient to link to that unanswered post with my new, more specific questions. I had not yet found the linked material that discusses time-varying controls when I made the original post, so I believe I was barking up the wrong tree to an extent. I would be grateful if you could take another look. There is no mention in the Stata documentation that I could find that discusses time-variant controls in the ETWFE, but this is discussed by Fernando Rios in the linked material in #1.

The outreg2 table has the AIPA and ETWFE coefficients for years as year=2009 ... year=2011. Without re-labeling, the table was quite long. I worried that a lack of parsimony was the reason the original post lost traction. I will put the full model output for the AIPW and ETWFE models below. Anyone who wants to run the code input and models (it is quite large) can see the attached .do file or the post with the data input and code.

The takeaway is that only the ETWFE model generates highly differing results when I use the time-variant control log_pop. It is unclear whether one should generally avoid using time-variant controls for ETWFE or whether this is specifically related to my data, which is expected to vary more for treated cases in treatment years. I can also note here that when I generate a time-invariant mean of log_pop by id, all the models generate similar results again. This is in the .do file, and I can post it here if this is of interest.
.
AIPW code and output:

Code:

xthdidregress aipw (log_gdp log_pop) (treatment_event) , group(id) 

. xthdidregress aipw (log_gdp log_pop) (treatment_event) , group(id) 
note: variable _did_cohort, containing cohort indicators formed by treatment variable
treatment_event and group variable id, was added to the dataset.

Computing ATET for each cohort and time:
Cohort 2009 (5): ..... done

Treatment and time information

Time variable: year
Time interval: 2006 to 2011
Control:       _did_cohort = 0
Treatment:     _did_cohort > 0

_did_cohort

Number of cohorts            2

Number of obs     
Never treated          450
2009           90


Heterogeneous-treatment-effects regression              Number of obs    = 540
Number of panels =  90
Estimator:       Augmented IPW
Panel variable:  id
Treatment level: id
Control group:   Never treated

(Std. err. adjusted for 90 clusters in id)

Robust
Cohort              ATET   std. err.      z    P>z     [95% conf. interval]

year 
2007     .0025336   .0175682     0.14   0.885    -.0318995    .0369667
2008     .0102605    .015251     0.67   0.501    -.0196308    .0401519
2009     .0230131   .0151939     1.51   0.130    -.0067663    .0527926
2010      .072187   .0215461     3.35   0.001     .0299574    .1144165
2011     .0932267   .0303568     3.07   0.002     .0337285    .1527249

Note: ATET computed using covariates.
Note: Base time for pretreatment ATETs is the previous period.

ETWFE (twfe_wooldridge above):

Code:

xthdidregress twfe (log_gdp log_pop) (treatment_event) , group(id) hettype(time)

. xthdidregress twfe (log_gdp log_pop) (treatment_event) , group(id) hettype(time)
note: variable _did_cohort, containing cohort indicators formed by treatment variable
treatment_event and group variable id, was added to the dataset.

Treatment and time information

Time variable: year
Time interval: 2006 to 2011
Control:       _did_cohort = 0
Treatment:     _did_cohort > 0

_did_cohort

Number of cohorts            2

Number of obs     
Never treated          450
2009           90


Heterogeneous-treatment-effects regression              Number of obs    = 540
Number of panels =  90
Estimator:       Two-way fixed effects
Panel variable:  id
Treatment level: id
Control group:   Never treated
Heterogeneity:   Time

(Std. err. adjusted for 90 clusters in id)

Robust
Time        ATET   std. err.      t    P>t     [95% conf. interval]

year 
2009      .010136   .0273606     0.37   0.712    -.0442289    .0645009
2010     .0365073   .0419065     0.87   0.386      -.04676    .1197746
2011     .0464675   .0477346     0.97   0.333    -.0483802    .1413152

Note: ATET computed using covariates.

Attached Files

het twfe example.do (23.5 KB, 1 view)

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#5

13 May 2025, 21:05

I've discovered things about relatively efficient estimators, and a student and I are currently working on GLS type solutions. But in your particular case, I suspect you have lots of positive serial correlation. xthdidregress aipw uses differences, and this can be more efficient than twfe. You can confirm this by also using xthdregress ra, which is the regression-based version based on long differences. I suspect you'll still see smaller standard errors than twfe.

The time-varying population variable could be playing a role, but I doubt that's the explanation.

xthdidregress twfe tends to be more efficient with small amounts of serial correlation.
Comment

Announcement

xthdidregress with time-variant covariates with TWFE (Woolridge) & AIPW (Callaway and Sant'Anna) - Stata 18 or 19

Comment

Comment

Comment

Comment