Hello everybody
I am conducting a staggered difference-in-difference. I look at how employees are affected when they are outsourced from public employment to private employment. My dataset is in a long format with seven time periods (years). My data is on a confidential server, so I cannot share it, but I have made an example dataset with 14 individuals. In the example data set, my treatment group is outsourced in time 100.
I have used a Coarsened Exact Matching procedure to identify a similar group in terms of job type in year 99. Thus, I want to compare, e.g., the salary of treated individuals (i.e., outsourced employees) to control individuals.
I have done some research, and I have found that to conduct a staggered difference-in-difference, I will need to do the following:
treatment is 1 in the group that receives the treatment (and is 1 in those observations at all times, including before treatment started) and 0 in all observations for the untreated group.
active_treatment is 1 in the treatment group after treatment begins, but is 0 in the treatment group before treatment begins and is 0 in all observations in the control group (taken from a reply by Clyde Schechter (#2) in this post: https://www.statalist.org/forums/for...y-is-staggered).
I have further read that the coefficient of active_treatment is then my generalized DID estimate. In the example dataset, the coefficient is -1159.722. Would it then be correct to interpret that the treated individuals have a salary that is, on average, 1159.722 lower compared to the controlgroup?
Would it make sense to compute the development (from t-3 to t1, from t-3 to t2, from t-3 to t3) and then list the coefficients, e.g.:
If so, could I somehow visualize these results?
Would it furthermore be possible to interact the treatment group with, say, a dummyvariable for job type (e.g. cleaning vs. health care assistants)? And I that case, should I interact with treatment or active_treatment?
Finally, can somebody provide a reason (e.g. some literature) on why it is not possible to simply interact the treatment and time variable, e.g.:
Sorry for all the questions!
Gustav
I am conducting a staggered difference-in-difference. I look at how employees are affected when they are outsourced from public employment to private employment. My dataset is in a long format with seven time periods (years). My data is on a confidential server, so I cannot share it, but I have made an example dataset with 14 individuals. In the example data set, my treatment group is outsourced in time 100.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(id time treatment) double cem_weights float(cem_weights_all outsourcing_year company_id sector job_code salary) int cem_strata double cem_matched float(matched_all active_treatment) 1 97 1 . 1 0 1 0 10 20000 . . 1 0 1 98 1 . 1 0 1 0 10 20000 . . 1 0 1 99 1 . 1 0 1 0 10 20000 . . 1 0 1 100 1 1 1 1 2 1 20 15000 2 1 1 1 1 101 1 . 1 0 2 1 20 15000 . . 1 1 1 102 1 . 1 0 2 1 20 15000 . . 1 1 1 103 1 . 1 0 2 1 20 15000 . . 1 1 2 97 1 . 1 0 1 0 10 20000 . . 1 0 2 98 1 . 1 0 1 0 10 20000 . . 1 0 2 99 1 . 1 0 1 0 10 20000 . . 1 0 2 100 1 1 1 1 2 1 10 17000 1 1 1 1 2 101 1 . 1 0 2 1 10 17000 . . 1 1 2 102 1 . 1 0 2 1 10 17000 . . 1 1 2 103 1 . 1 0 2 1 10 17000 . . 1 1 3 97 1 . 1 0 1 0 10 21000 . . 1 0 3 98 1 . 1 0 1 0 10 21000 . . 1 0 3 99 1 . 1 0 1 0 10 21000 . . 1 0 3 100 1 1 1 1 2 1 20 24000 2 1 1 1 3 101 1 . 1 0 2 1 20 24000 . . 1 1 3 102 1 . 1 0 2 1 20 24000 . . 1 1 3 103 1 . 1 0 2 1 20 24000 . . 1 1 4 97 1 . 1 0 1 0 10 19000 . . 1 0 4 98 1 . 1 0 1 0 10 19000 . . 1 0 4 99 1 . 1 0 1 0 10 19000 . . 1 0 4 100 1 1 1 1 2 1 20 15000 2 1 1 1 4 101 1 . 1 0 2 1 20 15000 . . 1 1 4 102 1 . 1 0 2 1 20 15000 . . 1 1 4 103 1 . 1 0 2 1 20 15000 . . 1 1 5 97 1 . 1 0 1 0 10 20000 . . 1 0 5 98 1 . 1 0 1 0 10 20000 . . 1 0 5 99 1 . 1 0 1 0 10 20000 . . 1 0 5 100 1 1 1 1 2 1 20 21000 2 1 1 1 5 101 1 . 1 0 2 1 20 22000 . . 1 1 5 102 1 . 1 0 2 1 20 22000 . . 1 1 5 103 1 . 1 0 2 1 20 22000 . . 1 1 6 97 0 . 2.3333333 . 1 0 20 24000 . . 1 0 6 98 0 . 2.3333333 . 1 0 20 24000 . . 1 0 6 99 0 . 2.3333333 . 1 0 20 24000 . . 1 0 6 100 0 2.3333333333333335 2.3333333 . 1 0 20 24000 2 1 1 0 6 101 0 . 2.3333333 . 1 0 20 24000 . . 1 0 6 102 0 . 2.3333333 . 1 0 20 24000 . . 1 0 6 103 0 . 2.3333333 . 1 0 20 24000 . . 1 0 7 97 0 . .3888889 . 3 0 10 20000 . . 1 0 7 98 0 . .3888889 . 3 0 10 20000 . . 1 0 7 99 0 . .3888889 . 3 0 10 20000 . . 1 0 7 100 0 .3888888888888889 .3888889 . 3 0 10 20000 1 1 1 0 7 101 0 . .3888889 . 3 0 10 20000 . . 1 0 7 102 0 . .3888889 . 3 0 10 20000 . . 1 0 7 103 0 . .3888889 . 3 0 10 20000 . . 1 0 8 97 0 . 2.3333333 . 9 0 20 20000 . . 1 0 8 98 0 . 2.3333333 . 9 0 20 20000 . . 1 0 8 99 0 . 2.3333333 . 9 0 20 20000 . . 1 0 8 100 0 2.3333333333333335 2.3333333 . 9 0 20 20000 2 1 1 0 8 101 0 . 2.3333333 . 9 0 20 20000 . . 1 0 8 102 0 . 2.3333333 . 9 0 20 20000 . . 1 0 8 103 0 . 2.3333333 . 9 0 20 20000 . . 1 0 9 97 0 . .3888889 . 1 0 10 21000 . . 1 0 9 98 0 . .3888889 . 1 0 10 21000 . . 1 0 9 99 0 . .3888889 . 1 0 10 21000 . . 1 0 9 100 0 .3888888888888889 .3888889 . 1 0 10 21000 1 1 1 0 9 101 0 . .3888889 . 1 0 10 21000 . . 1 0 9 102 0 . .3888889 . 1 0 10 21000 . . 1 0 9 103 0 . .3888889 . 1 0 10 21000 . . 1 0 10 97 0 . .3888889 . 4 0 10 21000 . . 1 0 10 98 0 . .3888889 . 4 0 10 21000 . . 1 0 10 99 0 . .3888889 . 4 0 10 21000 . . 1 0 10 100 0 .3888888888888889 .3888889 . 4 0 10 20000 1 1 1 0 10 101 0 . .3888889 . 4 0 10 20000 . . 1 0 10 102 0 . .3888889 . 4 0 10 20000 . . 1 0 10 103 0 . .3888889 . 4 0 10 20000 . . 1 0 12 97 1 . 1 0 11 0 60 20000 . . 1 0 12 98 1 . 1 0 11 0 60 20000 . . 1 0 12 99 1 . 1 0 11 0 60 20000 . . 1 0 12 100 1 1 1 1 22 1 80 20000 4 1 1 1 12 101 1 . 1 0 22 1 80 20000 . . 1 1 12 102 1 . 1 0 22 1 80 19000 . . 1 1 12 103 1 . 1 0 22 1 80 19000 . . 1 1 13 97 0 . .5833333 . 11 0 80 23000 . . 1 0 13 98 0 . .5833333 . 11 0 80 23000 . . 1 0 13 99 0 . .5833333 . 11 0 80 23000 . . 1 0 13 100 0 .5833333333333334 .5833333 . 11 0 80 23000 4 1 1 0 13 101 0 . .5833333 . 11 0 80 23000 . . 1 0 13 102 0 . .5833333 . 11 0 80 23000 . . 1 0 13 103 0 . .5833333 . 11 0 80 22000 . . 1 0 14 97 0 . .5833333 . 33 0 80 20000 . . 1 0 14 98 0 . .5833333 . 33 0 80 20000 . . 1 0 14 99 0 . .5833333 . 33 0 80 22000 . . 1 0 14 100 0 .5833333333333334 .5833333 . 33 0 80 20000 4 1 1 0 14 101 0 . .5833333 . 33 0 80 20000 . . 1 0 14 102 0 . .5833333 . 33 0 80 20000 . . 1 0 14 103 0 . .5833333 . 33 0 80 20000 . . 1 0 end label values sector sektor label def sektor 0 "public", modify label def sektor 1 "private", modify
I have used a Coarsened Exact Matching procedure to identify a similar group in terms of job type in year 99. Thus, I want to compare, e.g., the salary of treated individuals (i.e., outsourced employees) to control individuals.
I have done some research, and I have found that to conduct a staggered difference-in-difference, I will need to do the following:
Code:
xtreg salary i.treatment i.active_treatment i.time [aw = cem_weights_all], fe cluster(id)
active_treatment is 1 in the treatment group after treatment begins, but is 0 in the treatment group before treatment begins and is 0 in all observations in the control group (taken from a reply by Clyde Schechter (#2) in this post: https://www.statalist.org/forums/for...y-is-staggered).
I have further read that the coefficient of active_treatment is then my generalized DID estimate. In the example dataset, the coefficient is -1159.722. Would it then be correct to interpret that the treated individuals have a salary that is, on average, 1159.722 lower compared to the controlgroup?
Would it make sense to compute the development (from t-3 to t1, from t-3 to t2, from t-3 to t3) and then list the coefficients, e.g.:
Code:
xtreg salary i.treatment i.active_treatment i.time [aw = cem_weights_all] if inrange(time, 97, 101), fe cluster(id) xtreg salary i.treatment i.active_treatment i.time [aw = cem_weights_all] if inrange(time, 97, 102), fe cluster(id) xtreg salary i.treatment i.active_treatment i.time [aw = cem_weights_all] if inrange(time, 97, 103), fe cluster(id)
Would it furthermore be possible to interact the treatment group with, say, a dummyvariable for job type (e.g. cleaning vs. health care assistants)? And I that case, should I interact with treatment or active_treatment?
Finally, can somebody provide a reason (e.g. some literature) on why it is not possible to simply interact the treatment and time variable, e.g.:
Code:
reg salary i.treatment##i.time [aw = cem_weights_all], cluster(id) margins treatment, at(time=(97(1)103)) marginsplot
Gustav