Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generalized Difference-in-Differences with a moderator variable - repeated time values within panel

    Hello Statalists!
    I'm currently calculating a generalized DiD model and I'm unsure if my approach is the right one. Accordingly, I would be very happy about your help, many thanks in advance!


    Basically I have panel data holding the data of several firms for several quarters. The firms introduced a creative training in some quarter (the treatment) and I wan't to check how this affects the quarterly quality of submitted ideas from the employees that received the training to a suggestion system these firms introduced in the past. Further, I want to check whether this training was more efficient for employees from innovation-related departments of the firm (creative department etc.). I also have some control variables, including the industry of the firm, the total number of idea submissions and the age (in quarters) since the suggestion system was introduced. My dataframe looks like this (dummy data):
    firm quarter inno_department training average_quality industry submissions age
    abc 2015Q1 0 0 0.7 manufacturing 115 4
    abc 2015Q1 1 0 0.8 manufacturing 115 4
    abc 2015Q2 0 0 0.9 manufacturing 120 5
    abc 2015Q2 1 0 0.7 manufacturing 120 5
    abc 2015Q3 0 1 0.1 manufacturing 114 6
    abc 2015Q3 1 1 0.3 manufacturing 114 6
    abc 2015Q4 0 1 0.3 manufacturing 130 7
    abc 2016Q1 0 1 0.4 manufacturing 125 8
    abc 2016Q1 1 1 0.4 manufacturing 125 8
    abc 2016Q2 1 1 0.5 manufacturing 115 9
    def 2013Q1 0 0 0.3 IT 80 1
    def 2013Q2 0 0 0.4 IT 75 2
    def 2013Q2 1 0 0.3 IT 75 2
    def 2013Q3 0 0 0.3 IT 70 3
    def 2013Q3 1 0 0.2 IT 70 3
    def 2013Q4 0 0 0.2 IT 75 4
    def 2013Q4 1 0 0.5 IT 75 4
    def 2014Q1 0 1 0.4 IT 80 5
    def 2014Q1 1 1 0.3 IT 80 5
    def 2014Q2 0 1 0.3 IT 90 6
    def 2014Q3 0 1 0.2 IT 80 7
    def 2014Q3 1 1 0.3 IT 80 7
    ghi 2016Q1 0 0 0.4 manufacturing 40 3
    ghi 2016Q1 1 0 0.5 manufacturing 40 3
    ghi 2016Q2 0 0 0.5 manufacturing 30 4
    ghi 2016Q4 0 0 0.4 manufacturing 70 6

    Without the inno_department variable I would "just" take the quarterly average over all employees and calculate it like this:

    qualityit = αi + δt + βtrainingit-1 + εit
    Code:
    xtest firm quarter
    xtreg average_quality L.training, fe vce cluster(firm)
    And with control variables:

    qualityit = αi + δt + βtrainingit-1 + γXit + εit
    Code:
    xtest firm quarter
    xtreg average_quality L.training i.industry submissions age, fe vce cluster(firm)

    With the inno_department variable I have repeated time values within my panel and I'm not quite sure how to approach this. Basically I would like to caluclate a moderation effect like this:

    qualityit = αi + δt + βtrainingit-1+ π(trainingit-1 ∗ innodepit) + ρinnodepit + εit


    So my question is: Is this even possible? Would I have to restructure my data for this? How should I setup xtest and xtreg? Thank you!

  • #2
    Concerning the repeated time values within panel, the problem here is that you do not have ordinary panel data. You have a 3-level data set: repeated observations over time within department (or perhaps these are groups of departments with averaged results) within firm. -xtset firm quarter- attempts to skip over the department level within the data, but Stata won't let you do that. Usually the simple solution is to omit the time variable from the -xtset-, but you can't do that because you need to use the lag operator, and that requires a time variable. I presume that you want the lag operator to "respect" the distinction between innovative and non-innovative department. So what you need to do is:
    Code:
    egen dept = group(firm inno_department)
    xtset dept quarter
    xtreg average_quality i.inno_department##L.training submissions age, fe vce(cluster dept)
    Notice the syntax for the vce() option, and also notice that the correct level for clustering is dept, not firm.

    That said, I have left industry out of the command because your attempt to introduce industry as a covariate is almost certainly going to fail. Unless firms participate in more than one industry (and that never seems to be the case in these data sets), the industry variable is going to be colinear with the firm fixed effects, and so the industry variables will be omitted. That's linear algebra and there is no way around that. Whatever industry-level effects there are, they are properly adjusted for in the model by the fixed effects already in the model, but there is no way, in a fixed effects model, to actually separately estimate those industry level effects.

    Comment

    Working...
    X