Generalized Difference-in-Differences with a moderator variable - repeated time values within panel

Chris Helling

Join Date: Dec 2018
Posts: 1

Generalized Difference-in-Differences with a moderator variable - repeated time values within panel

20 Dec 2018, 04:04

Hello Statalists!
I'm currently calculating a generalized DiD model and I'm unsure if my approach is the right one. Accordingly, I would be very happy about your help, many thanks in advance!

Basically I have panel data holding the data of several firms for several quarters. The firms introduced a creative training in some quarter (the treatment) and I wan't to check how this affects the quarterly quality of submitted ideas from the employees that received the training to a suggestion system these firms introduced in the past. Further, I want to check whether this training was more efficient for employees from innovation-related departments of the firm (creative department etc.). I also have some control variables, including the industry of the firm, the total number of idea submissions and the age (in quarters) since the suggestion system was introduced. My dataframe looks like this (dummy data):

firm	quarter	inno_department	training	average_quality	industry	submissions	age
abc	2015Q1	0	0	0.7	manufacturing	115	4
abc	2015Q1	1	0	0.8	manufacturing	115	4
abc	2015Q2	0	0	0.9	manufacturing	120	5
abc	2015Q2	1	0	0.7	manufacturing	120	5
abc	2015Q3	0	1	0.1	manufacturing	114	6
abc	2015Q3	1	1	0.3	manufacturing	114	6
abc	2015Q4	0	1	0.3	manufacturing	130	7
abc	2016Q1	0	1	0.4	manufacturing	125	8
abc	2016Q1	1	1	0.4	manufacturing	125	8
abc	2016Q2	1	1	0.5	manufacturing	115	9
def	2013Q1	0	0	0.3	IT	80	1
def	2013Q2	0	0	0.4	IT	75	2
def	2013Q2	1	0	0.3	IT	75	2
def	2013Q3	0	0	0.3	IT	70	3
def	2013Q3	1	0	0.2	IT	70	3
def	2013Q4	0	0	0.2	IT	75	4
def	2013Q4	1	0	0.5	IT	75	4
def	2014Q1	0	1	0.4	IT	80	5
def	2014Q1	1	1	0.3	IT	80	5
def	2014Q2	0	1	0.3	IT	90	6
def	2014Q3	0	1	0.2	IT	80	7
def	2014Q3	1	1	0.3	IT	80	7
ghi	2016Q1	0	0	0.4	manufacturing	40	3
ghi	2016Q1	1	0	0.5	manufacturing	40	3
ghi	2016Q2	0	0	0.5	manufacturing	30	4
ghi	2016Q4	0	0	0.4	manufacturing	70	6

Without the inno_department variable I would "just" take the quarterly average over all employees and calculate it like this:

quality_it= α_i+ δ_t+ βtraining_it-1+ ε_it

Code:

xtest firm quarter
xtreg average_quality L.training, fe vce cluster(firm)

And with control variables:

quality_it= α_i+ δ_t+ βtraining_it-1+ γX_it+ ε_it

Code:

xtest firm quarter
xtreg average_quality L.training i.industry submissions age, fe vce cluster(firm)

With the inno_department variable I have repeated time values within my panel and I'm not quite sure how to approach this. Basically I would like to caluclate a moderation effect like this:

quality_it= α_i + δ_t+ βtraining_it-1+ π(training_it-1∗ innodep_it) + ρinnodep_it+ ε_it

So my question is: Is this even possible? Would I have to restructure my data for this? How should I setup xtest and xtreg? Thank you!

Tags: difference-in-differences, fixed effects, panel data

Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#2

20 Dec 2018, 09:28

Concerning the repeated time values within panel, the problem here is that you do not have ordinary panel data. You have a 3-level data set: repeated observations over time within department (or perhaps these are groups of departments with averaged results) within firm. -xtset firm quarter- attempts to skip over the department level within the data, but Stata won't let you do that. Usually the simple solution is to omit the time variable from the -xtset-, but you can't do that because you need to use the lag operator, and that requires a time variable. I presume that you want the lag operator to "respect" the distinction between innovative and non-innovative department. So what you need to do is:

Code:

egen dept = group(firm inno_department) xtset dept quarter xtreg average_quality i.inno_department##L.training submissions age, fe vce(cluster dept)

Notice the syntax for the vce() option, and also notice that the correct level for clustering is dept, not firm.

That said, I have left industry out of the command because your attempt to introduce industry as a covariate is almost certainly going to fail. Unless firms participate in more than one industry (and that never seems to be the case in these data sets), the industry variable is going to be colinear with the firm fixed effects, and so the industry variables will be omitted. That's linear algebra and there is no way around that. Whatever industry-level effects there are, they are properly adjusted for in the model by the fixed effects already in the model, but there is no way, in a fixed effects model, to actually separately estimate those industry level effects.
Comment

Announcement

Generalized Difference-in-Differences with a moderator variable - repeated time values within panel

Comment