Inclusion of covariates in non-linear DiD estimation

Lukas Fervers

Join Date: Jun 2025

Posts: 5
#1

Inclusion of covariates in non-linear DiD estimation

22 Jun 2025, 14:06

I have two questions concerning the inclusion of control variables in non-linear difference-in-differences models. For example: I want to estimate the impact of a (non-experimental) training programme for adult workers on the likelihood of being employed. The two groups differ in their level of education. As skill demands in the labour market might change over time asymetrically between skill levels, these differences might cause diverging trends even in the absence of the treatment, therefore violating the common trends assumption. I therefore want to allow for diverging trends w.r.t. level of education.

Jeff Wooldridge has recently suggested new approaches to non-linear DiD (https://doi.org/10.1093/ectj/utad016). It is based on an imputation approach though the estimation is also possible via pooled estimation and is possible with and without covariates. When including covariates (in the pooled estimation), this entails including a full set of interactions between the time-varying treatment variable (treat_dyn) and demeaned covariates (educ_dm), time-invariant treatment group dummy (treat) and covariates, and the post-treatment dummy (after) and covariates. I can replicate the equivalence between the pooled estimation and the imputation approach for the both the models with and without covariates.

Now my questions: in the dataset I am actually using, I have a rather small treatment group (about 60 persons). I would therefore prefer a more parsimonious model that only includes an interaction between education and time. If I include the full set of interactions, the coefficients do not change but the standard errors get very large. However, I do not manage to replicate the equivalence between the imputation approach and the pooled estimation when I only include the time x education interaction (though differences are minor with only one covariate). Therefore, my questions are
1) does it make sense at all in the non-linear context to only include the education x time interaction, and
2) how could one replicate the pooled estimation with the imputation approach.

The following example illustrates my point. I use the LaLonde dataset that is automatically shipped with the ebalance package (I cannot use my own data and results for data protection issues). I first replicate the equivalence between pooled estimation and imputation without and with covariates (full set of interaction). I then do it with only the time interaction where the equivalence breaks down. I would be very happy if someone could help!

Code:

ssc install ebalance, replace use cps1re74.dta, clear *Prepare data: reshape, gen after dummy de-meaned covariates and binary outcome gen id = _n reshape long re, i(id) j(year) gen after = year == 78 gen treat_dyn = treat*after gen employed = re > 0 & re !=. sum educ gen educ_dm = educ - `r(mean)' xtset id year *a) without covariates logit employed i.treat_dyn treat after, vce(cluster id) margins, dydx(treat_dyn) at(after == 1) subpop(if treat == 1) noestimcheck vce(uncond) scalar logit_pooled_nocovs = r(table)[1,2] logit employed treat after if treat_dyn==0, vce(cluster id) predict employed_hat if treat_dyn == 1 gen treat_ind = employed - employed_hat sum treat_ind scalar logit_imputation_no_covs = r(mean) scalar list logit_pooled_nocovs logit_imputation_no_covs //result: identical *b) with covariates - full set of interactions logit employed i.treat_dyn i.treat_dyn#c.educ_dm educ_dm treat c.treat#c.educ_dm after c.after#c.educ_dm, vce(cluster id) margins, dydx(treat_dyn) at(after == 1) subpop(if treat == 1) noestimcheck vce(uncond) scalar logit_pooled_covs = r(table)[1,2] logit employed treat after educ_dm c.treat#c.educ_dm c.after#c.educ_dm if treat_dyn==0, vce(cluster id) predict employed_hat1 if treat_dyn == 1 gen treat_ind1 = employed - employed_hat1 sum treat_ind1 scalar logit_imputation_covs = r(mean) scalar list logit_pooled_covs logit_imputation_covs //result: identical *c) only interacted with time logit employed i.treat_dyn educ_dm treat after c.after#(c.educ_dm), vce(cluster id) margins, dydx(treat_dyn) at(after == 1) subpop(if treat == 1) noestimcheck vce(uncond) scalar logit_pooled_covs_time_int = r(table)[1,2] logit employed treat after educ_dm c.after#c.educ_dm if treat_dyn==0, robust predict employed_hat2 if treat_dyn==1, pr gen treat_ind2 = employed - employed_hat2 if treat_dyn == 1 sum treat_ind2 scalar logit_imputation_covs_time_int = r(mean) scalar list logit_pooled_covs_time_int logit_imputation_covs_time_int //result: not identical
Tags: Conditional common trends, difference-in-differences, non-linear DiD

George Ford

Join Date: Aug 2014
Posts: 3187

23 Jun 2025, 11:46

added some interactions in the last group.

Code:

*c) only interacted with time

logit  employed i.treat_dyn educ_dm treat after c.after#c.educ_dm c.treat#c.educ_dm i.treat_dyn#c.educ_dm, vce(cluster id)
margins, dydx(treat_dyn) at(after == 1) subpop(if treat == 1) noestimcheck vce(uncond) 
scalar     logit_pooled_covs_time_int = r(table)[1,2]

logit  employed treat after educ_dm c.after#c.educ_dm c.treat#c.educ_dm  if treat_dyn==0, robust
capture drop emp*hat2 tre*ind_2
predict   employed_hat2 if treat_dyn==1, pr
gen   treat_ind2 = employed - employed_hat2    if treat_dyn == 1
sum   treat_ind2
scalar logit_imputation_covs_time_int = r(mean)

scalar     list logit_pooled_covs_time_int logit_imputation_covs_time_int //result: not identical

Comment

Lukas Fervers

Join Date: Jun 2025

Posts: 5
#3

23 Jun 2025, 23:28

Thank you very much for your help, George!

But if I see it correctly, our suggestion is now identical with my version b) - the full set of interaction.

What I am aiming at is to only include an interaction between education and time. I know that this is possible in the linear case, both the jwdid and xthdidregress commands have such an option. However, I am not sure whether it is evenly possible in the non-linear case (at least, it does not seem to be identical to the imputation approach, that is why I am a bit insecure).
Comment
George Ford

Join Date: Aug 2014

Posts: 3187
#4

24 Jun 2025, 07:38

why do you want to modify the procedure by using fewer interactions? they are there for good reason.
Comment
Lukas Fervers

Join Date: Jun 2025

Posts: 5
#5

24 Jun 2025, 10:47

I do see the merit of the full set of interactions as they allow the covariates to impact both groups differently and the treatment effect to vary with covariates. However, in the dataset I am actually using (which I cannot post here due to data security issues), the treatment group is very small (about 60 persons). With all interactions included, the confidence therefore get so large that the estimates are useless. I would therefore prefer a more parsimonious specification.
Comment
George Ford

Join Date: Aug 2014

Posts: 3187
#6

26 Jun 2025, 07:45

With Wooldridge's recent stuff on flexible DID regression models it is best, without good reason, to follow his standard procedure exactly. All the interactions and centering serve a purpose. You leave stuff out, and you get a biased coefficient.
Comment
Lukas Fervers

Join Date: Jun 2025

Posts: 5
#7

26 Jun 2025, 08:26

Thanks for taking some time time to answer again.

As I said, the problem is that confidence intervals get extremely large in the specification with all interactions (though the coefficient does not change). The xthdidregress and also the jwdid ado (which allow for non-linear DiD) offer the option to just interact covariates with the time variable. It is therefore not unreasonable, I guess. Moreover, I thnk it mostly covers what I want to control for (unequal effects of education in the post-treatment period). I just want to know whether it makes sens in the linear context and how this relates to the imputation approach.
Comment
Lukas Fervers

Join Date: Jun 2025

Posts: 5
#8

26 Jun 2025, 09:34

....P.s.: in the last sentence, I actually meant "whether it makes sense in the non-linear context and how this relates to the imputation approach."
Comment

Announcement

Inclusion of covariates in non-linear DiD estimation

Comment

Comment

Comment

Comment

Comment

Comment

Comment