Dear Stata community,
I am currently trying to compute a difference-in-difference regression to analyze whether VAR (video assistant referee) has impacted the number of goals, fouls, and cards in the top five soccer leagues across the world (Germany-Bundesliga, England-EPL, Spain-LaLiga, France-Ligue1, and Italy-SerieA). My data runs from 2014/15 season to the 2019/20 season for each league. I collapsed the data to sum the goals, fouls, etc. for each year and league.
Here is what my data looks like:
Sorry for the messy data. FTHG is home goals, FTAG is away goals, hf is home fouls, af is away fouls, and so on. The Bundesliga_dummy and others are dummy variables = 1 for the corresponding league. VAR is a dummy = 1 when the league started using VAR. (In 2017 for Bundesliga and SerieA) (2018 for LaLiga and Ligue1) and (2019 for EPL)
My main question is how do I go about creating a DID regression with multiple treatments and different controls. Should I split my data between different leagues and try running regressions that way?
I know I can't use
because the dates repeat.
I think my best option is just run a regular regression
with an interaction term between the post period and treatment group, but I'm confused on how to my include post period when I have multiple different periods of implementation of VAR across the leagues. I'm also confused on how to input treatment and control groups when I have multiple groups that are categorized as the treatment and control groups.
or should I be using the
regression function? But from my understanding, you usually use xtset along with that, which I seem not to be able to.
Am I just missing something obvious?
Thanks in advance for the help.
I am currently trying to compute a difference-in-difference regression to analyze whether VAR (video assistant referee) has impacted the number of goals, fouls, and cards in the top five soccer leagues across the world (Germany-Bundesliga, England-EPL, Spain-LaLiga, France-Ligue1, and Italy-SerieA). My data runs from 2014/15 season to the 2019/20 season for each league. I collapsed the data to sum the goals, fouls, etc. for each year and league.
Here is what my data looks like:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear div season VAR fthg ftag hf af hy ay hr ar Bundesliga_dummy EPL_dummy SerieA_dummy LaLiga_dummy Ligue1_dummy "Bundesliga" 2014 0 486 357 4547 4797 514 595 29 24 1 0 0 0 0 "Bundesliga" 2015 0 440 335 3812 3970 471 545 14 21 1 0 0 0 0 "Bundesliga" 2016 0 451 335 3754 4025 461 556 26 21 1 0 0 0 0 "Bundesliga" 2017 1 490 365 3956 4221 477 558 22 21 1 0 0 0 0 "Bundesliga" 2018 1 464 380 2915 3009 405 513 16 21 1 0 0 0 0 "Bundesliga" 2019 1 409 344 2519 2649 398 488 11 29 1 0 0 0 0 "EPL" 2014 0 560 415 4273 4349 619 745 26 45 0 1 0 0 0 "EPL" 2015 0 567 459 3822 4356 550 629 25 34 0 1 0 0 0 "EPL" 2016 0 607 457 4204 4430 663 717 21 20 0 1 0 0 0 "EPL" 2017 0 582 436 3883 3984 562 595 17 22 0 1 0 0 0 "EPL" 2018 0 596 476 3858 3916 580 640 18 29 0 1 0 0 0 "EPL" 2019 1 434 350 2960 3140 469 540 17 18 0 1 0 0 0 "LaLiga" 2014 0 584 425 5348 5461 952 1057 44 57 0 0 0 1 0 "LaLiga" 2015 0 615 428 5169 5166 948 1054 45 64 0 0 0 1 0 "LaLiga" 2016 0 578 449 4792 4661 807 909 33 50 0 0 0 1 0 "LaLiga" 2017 0 572 423 5108 5176 866 1001 40 29 0 0 0 1 0 "LaLiga" 2018 1 552 430 5152 5112 963 998 35 46 0 0 0 1 0 "LaLiga" 2019 1 379 270 3444 3529 643 698 25 33 0 0 0 1 0 "Ligue1" 2014 0 536 411 5013 5449 592 683 33 45 0 0 0 0 1 "Ligue1" 2015 0 546 414 4925 5226 660 768 60 57 0 0 0 0 1 "Ligue1" 2016 0 588 406 4676 4813 581 670 27 71 0 0 0 0 1 "Ligue1" 2017 0 565 439 4785 4913 658 774 42 44 0 0 0 0 1 "Ligue1" 2018 1 536 405 4749 5038 616 745 43 56 0 0 0 0 1 "Ligue1" 2019 1 450 305 3802 3835 527 584 36 35 0 0 0 0 1 "SerieA" 2014 0 570 454 5634 5892 862 962 54 64 0 0 1 0 0 "SerieA" 2015 0 559 420 5748 5802 866 983 53 76 0 0 1 0 0 "SerieA" 2016 0 631 492 5312 5300 788 884 36 58 0 0 1 0 0 "SerieA" 2017 1 553 464 4842 4868 725 822 36 53 0 0 1 0 0 "SerieA" 2018 1 564 455 4881 5036 812 955 39 51 0 0 1 0 0 "SerieA" 2019 1 394 352 3527 3645 656 752 32 43 0 0 1 0 0 end
My main question is how do I go about creating a DID regression with multiple treatments and different controls. Should I split my data between different leagues and try running regressions that way?
I know I can't use
Code:
xtset
I think my best option is just run a regular regression
Code:
reg
or should I be using the
Code:
xtreg
Am I just missing something obvious?
Thanks in advance for the help.
Comment