Hello,
I'm trying to analyze how the creation of an employee venture (spin-out) affects the performance of the firm that previously employed the entrepreneur depending on the relationship formed by the ventures (competition, collaboration, or competition).
My sample comprises around 5k firms that had employees leaving to create new ventures and that were pre-matched with other 5k companies that did not have any spin-out. My data is panel, but it's currently in the wide format, looking something like this (very simplified version):
Where ID is the identification of the companies; Treatment_Spinout identifies the ventures that had a spin-out; Treatment_spinout_coopetition identifies the ventures that had a spinout and that the spinout formed coopetitive relationships with the previous employers (I have other treatment variables like that for the other types of relationships); Rev_ are the revenues of the firms in years 01 to 08. So, from my understanding, first, I have to convert the data to a long format, right? So:
reshape long Rev_, i(ID) j(year)
xtset ID year
So, now that I have the data in the long format, my first question is: how do I inform the program that the treatment occurs from years 05 to 06? Should I create a dummy variable (let's call it TIME) registering "0" for all years of the control group and "0" for the treatment group before year 06, and "1" for the treatment group after year 06?
My follow-up questions are about the diff-in-diff code. From what I understood, the regression would look something like this:
xtdidregress (Rev_) (Treatment), group(ID) time(TIME)
But I don't understand a few things:
1) Can I add to the first parenthesis control variables that apply for both treatment and control groups and also that apply only for the treatment group? For example, DEBT_ is a variable that I have for both control and treatment group; but EMPLOYEE EDUCATION I only have for the ventures in the treatment group (that had a spin-out).
2) Do I need to use the group option even if my analysis is at the firm level?
And my last question is: As I intend to compare the effects of the different treatments (i.e., the different types of relationships), how can I contrast their different significance (besides plotting the graphs and checking the coefficients)?
Sorry for this long question!
And thank you in advance! Any help is very welcomed!
I'm trying to analyze how the creation of an employee venture (spin-out) affects the performance of the firm that previously employed the entrepreneur depending on the relationship formed by the ventures (competition, collaboration, or competition).
My sample comprises around 5k firms that had employees leaving to create new ventures and that were pre-matched with other 5k companies that did not have any spin-out. My data is panel, but it's currently in the wide format, looking something like this (very simplified version):
| ID | Treatment_Spinout | Treatment_Spinout_Coopetition | Rev_01 | Rev_02 | Rev_03 | Rev_04 | Rev_05 | Rev_06 | Rev_07 | Rev_08 |
| Company 1 | 1 | 1 | 100 | 110 | 105 | 120 | 125 | 180 | 200 | 230 |
| Company 2 | 0 | 0 | 90 | 95 | 92 | 98 | 96 | 100 | 98 | 105 |
| Company 3 | 1 | 0 | 50 | 60 | 55 | 60 | 58 | 96 | 98 | 110 |
| Company 4 | 0 | 0 | 200 | 202 | 205 | 199 | 198 | 198 | 200 | 205 |
| Company 5 | 1 | 0 | 325 | 333 | 332 | 312 | 324 | 485 | 496 | 515 |
| Company 6 | 0 | 0 | 552 | 563 | 541 | 523 | 568 | 541 | 549 | 567 |
reshape long Rev_, i(ID) j(year)
xtset ID year
So, now that I have the data in the long format, my first question is: how do I inform the program that the treatment occurs from years 05 to 06? Should I create a dummy variable (let's call it TIME) registering "0" for all years of the control group and "0" for the treatment group before year 06, and "1" for the treatment group after year 06?
My follow-up questions are about the diff-in-diff code. From what I understood, the regression would look something like this:
xtdidregress (Rev_) (Treatment), group(ID) time(TIME)
But I don't understand a few things:
1) Can I add to the first parenthesis control variables that apply for both treatment and control groups and also that apply only for the treatment group? For example, DEBT_ is a variable that I have for both control and treatment group; but EMPLOYEE EDUCATION I only have for the ventures in the treatment group (that had a spin-out).
2) Do I need to use the group option even if my analysis is at the firm level?
And my last question is: As I intend to compare the effects of the different treatments (i.e., the different types of relationships), how can I contrast their different significance (besides plotting the graphs and checking the coefficients)?
Sorry for this long question!
And thank you in advance! Any help is very welcomed!
