Diff-in-diff - Basic question about data format, control variables, and comparisson accross models.

Marco Antonio

Join Date: Mar 2023
Posts: 3

Diff-in-diff - Basic question about data format, control variables, and comparisson accross models.

13 Mar 2023, 09:02

Hello,

I'm trying to analyze how the creation of an employee venture (spin-out) affects the performance of the firm that previously employed the entrepreneur depending on the relationship formed by the ventures (competition, collaboration, or competition).

My sample comprises around 5k firms that had employees leaving to create new ventures and that were pre-matched with other 5k companies that did not have any spin-out. My data is panel, but it's currently in the wide format, looking something like this (very simplified version):

ID	Treatment_Spinout	Treatment_Spinout_Coopetition	Rev_01	Rev_02	Rev_03	Rev_04	Rev_05	Rev_06	Rev_07	Rev_08
Company 1	1	1	100	110	105	120	125	180	200	230
Company 2	0	0	90	95	92	98	96	100	98	105
Company 3	1	0	50	60	55	60	58	96	98	110
Company 4	0	0	200	202	205	199	198	198	200	205
Company 5	1	0	325	333	332	312	324	485	496	515
Company 6	0	0	552	563	541	523	568	541	549	567

Where ID is the identification of the companies; Treatment_Spinout identifies the ventures that had a spin-out; Treatment_spinout_coopetition identifies the ventures that had a spinout and that the spinout formed coopetitive relationships with the previous employers (I have other treatment variables like that for the other types of relationships); Rev_ are the revenues of the firms in years 01 to 08. So, from my understanding, first, I have to convert the data to a long format, right? So:

reshape long Rev_, i(ID) j(year)
xtset ID year

So, now that I have the data in the long format, my first question is: how do I inform the program that the treatment occurs from years 05 to 06? Should I create a dummy variable (let's call it TIME) registering "0" for all years of the control group and "0" for the treatment group before year 06, and "1" for the treatment group after year 06?

My follow-up questions are about the diff-in-diff code. From what I understood, the regression would look something like this:

xtdidregress (Rev_) (Treatment), group(ID) time(TIME)

But I don't understand a few things:
1) Can I add to the first parenthesis control variables that apply for both treatment and control groups and also that apply only for the treatment group? For example, DEBT_ is a variable that I have for both control and treatment group; but EMPLOYEE EDUCATION I only have for the ventures in the treatment group (that had a spin-out).
2) Do I need to use the group option even if my analysis is at the firm level?

And my last question is: As I intend to compare the effects of the different treatments (i.e., the different types of relationships), how can I contrast their different significance (besides plotting the graphs and checking the coefficients)?

Sorry for this long question!
And thank you in advance! Any help is very welcomed!

Last edited by Marco Antonio; 13 Mar 2023, 09:09.

Tags: None

Announcement

Diff-in-diff - Basic question about data format, control variables, and comparisson accross models.