Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running a DID regression with multiple treatment groups

    Dear Stata community,

    I am currently trying to compute a difference-in-difference regression to analyze whether VAR (video assistant referee) has impacted the number of goals, fouls, and cards in the top five soccer leagues across the world (Germany-Bundesliga, England-EPL, Spain-LaLiga, France-Ligue1, and Italy-SerieA). My data runs from 2014/15 season to the 2019/20 season for each league. I collapsed the data to sum the goals, fouls, etc. for each year and league.
    Here is what my data looks like:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
     div       season VAR fthg ftag hf  af   hy  ay hr ar Bundesliga_dummy EPL_dummy SerieA_dummy LaLiga_dummy Ligue1_dummy
    "Bundesliga" 2014 0 486 357 4547 4797 514  595 29 24 1 0 0 0 0
    "Bundesliga" 2015 0 440 335 3812 3970 471  545 14 21 1 0 0 0 0
    "Bundesliga" 2016 0 451 335 3754 4025 461  556 26 21 1 0 0 0 0
    "Bundesliga" 2017 1 490 365 3956 4221 477  558 22 21 1 0 0 0 0
    "Bundesliga" 2018 1 464 380 2915 3009 405  513 16 21 1 0 0 0 0
    "Bundesliga" 2019 1 409 344 2519 2649 398  488 11 29 1 0 0 0 0
    "EPL"        2014 0 560 415 4273 4349 619  745 26 45 0 1 0 0 0
    "EPL"        2015 0 567 459 3822 4356 550  629 25 34 0 1 0 0 0
    "EPL"        2016 0 607 457 4204 4430 663  717 21 20 0 1 0 0 0
    "EPL"        2017 0 582 436 3883 3984 562  595 17 22 0 1 0 0 0
    "EPL"        2018 0 596 476 3858 3916 580  640 18 29 0 1 0 0 0
    "EPL"        2019 1 434 350 2960 3140 469  540 17 18 0 1 0 0 0
    "LaLiga"     2014 0 584 425 5348 5461 952 1057 44 57 0 0 0 1 0
    "LaLiga"     2015 0 615 428 5169 5166 948 1054 45 64 0 0 0 1 0
    "LaLiga"     2016 0 578 449 4792 4661 807  909 33 50 0 0 0 1 0
    "LaLiga"     2017 0 572 423 5108 5176 866 1001 40 29 0 0 0 1 0
    "LaLiga"     2018 1 552 430 5152 5112 963  998 35 46 0 0 0 1 0
    "LaLiga"     2019 1 379 270 3444 3529 643  698 25 33 0 0 0 1 0
    "Ligue1"     2014 0 536 411 5013 5449 592  683 33 45 0 0 0 0 1
    "Ligue1"     2015 0 546 414 4925 5226 660  768 60 57 0 0 0 0 1
    "Ligue1"     2016 0 588 406 4676 4813 581  670 27 71 0 0 0 0 1
    "Ligue1"     2017 0 565 439 4785 4913 658  774 42 44 0 0 0 0 1
    "Ligue1"     2018 1 536 405 4749 5038 616  745 43 56 0 0 0 0 1
    "Ligue1"     2019 1 450 305 3802 3835 527  584 36 35 0 0 0 0 1
    "SerieA"     2014 0 570 454 5634 5892 862  962 54 64 0 0 1 0 0
    "SerieA"     2015 0 559 420 5748 5802 866  983 53 76 0 0 1 0 0
    "SerieA"     2016 0 631 492 5312 5300 788  884 36 58 0 0 1 0 0
    "SerieA"     2017 1 553 464 4842 4868 725  822 36 53 0 0 1 0 0
    "SerieA"     2018 1 564 455 4881 5036 812  955 39 51 0 0 1 0 0
    "SerieA"     2019 1 394 352 3527 3645 656  752 32 43 0 0 1 0 0
    end
    Sorry for the messy data. FTHG is home goals, FTAG is away goals, hf is home fouls, af is away fouls, and so on. The Bundesliga_dummy and others are dummy variables = 1 for the corresponding league. VAR is a dummy = 1 when the league started using VAR. (In 2017 for Bundesliga and SerieA) (2018 for LaLiga and Ligue1) and (2019 for EPL)

    My main question is how do I go about creating a DID regression with multiple treatments and different controls. Should I split my data between different leagues and try running regressions that way?

    I know I can't use
    Code:
    xtset
    because the dates repeat.

    I think my best option is just run a regular regression
    Code:
    reg
    with an interaction term between the post period and treatment group, but I'm confused on how to my include post period when I have multiple different periods of implementation of VAR across the leagues. I'm also confused on how to input treatment and control groups when I have multiple groups that are categorized as the treatment and control groups.

    or should I be using the
    Code:
    xtreg
    regression function? But from my understanding, you usually use xtset along with that, which I seem not to be able to.

    Am I just missing something obvious?

    Thanks in advance for the help.
    Last edited by Irving Oconnor; 29 Mar 2020, 17:28.

  • #2
    Sorry for the messy data.
    The only thing wrong with your data is that somehow you mangled the copy/paste and part of it (-input str 12-) got cutoff. But that's the way -dataex- output is supposed to look. It is not intended to be read by human eyes. It's purpose is to quickly and easily replicate your Stata dataset in Stata so that code can be crafted and tested. So no apologies needed.

    My main question is how do I go about creating a DID regression with multiple treatments and different controls.
    You can't, because you do not have any data on any league that never adopted VAR. There is no control group. What you can do is a pre-post study. (Actually, what you really have here is a non-randomized version of a stepped wedge design. It has some of the virtue of a DID design in that it incorporates both a pre-post contrast within leagues and a using/not-using VAR contrast across leagues at different time points) That's a weaker design than DID, but it's all you can do with this data design.

    Should I split my data between different leagues and try running regressions that way?
    No. With only 6 observations for each league, you have almost no chance of learning anything useful if you analyze them separately. Only by keeping them together and, in effect, pooling their information, do you stand a reasonable chance of seeing meaningful results.

    I know I can't use

    Code:
    xtset
    because the dates repeat.
    Not true. Overall the dates repeat, but within league they do not. But you need a numeric variable to indicate the league in order to use -xtset-. See below.

    I think my best option is just run a regular regression

    Code:
    reg
    with an interaction term between the post period and treatment group, but I'm confused on how to my include
    post
    period when I have multiple different periods of implementation of VAR across the leagues. I'm also confused on how to input treatment and control groups when I have multiple groups that are categorized as the treatment and control groups.
    Well, since your data are not appropriate for a DID analysis, it actually speaks well of you that you are confused trying to fit your square peg into the round hole!

    Here's how I would handle the data. First, get rid of the league dummies: they are of no use, they are just cluttering up your data set. Next, create a numeric variable indicating the leagues. Then use fixed-effects regression:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 div float(season VAR fthg ftag hf af hy ay hr ar)
    "Bundesliga" 2014 0 486 357 4547 4797 514  595 29 24
    "Bundesliga" 2015 0 440 335 3812 3970 471  545 14 21
    "Bundesliga" 2016 0 451 335 3754 4025 461  556 26 21
    "Bundesliga" 2017 1 490 365 3956 4221 477  558 22 21
    "Bundesliga" 2018 1 464 380 2915 3009 405  513 16 21
    "Bundesliga" 2019 1 409 344 2519 2649 398  488 11 29
    "EPL"        2014 0 560 415 4273 4349 619  745 26 45
    "EPL"        2015 0 567 459 3822 4356 550  629 25 34
    "EPL"        2016 0 607 457 4204 4430 663  717 21 20
    "EPL"        2017 0 582 436 3883 3984 562  595 17 22
    "EPL"        2018 0 596 476 3858 3916 580  640 18 29
    "EPL"        2019 1 434 350 2960 3140 469  540 17 18
    "LaLiga"     2014 0 584 425 5348 5461 952 1057 44 57
    "LaLiga"     2015 0 615 428 5169 5166 948 1054 45 64
    "LaLiga"     2016 0 578 449 4792 4661 807  909 33 50
    "LaLiga"     2017 0 572 423 5108 5176 866 1001 40 29
    "LaLiga"     2018 1 552 430 5152 5112 963  998 35 46
    "LaLiga"     2019 1 379 270 3444 3529 643  698 25 33
    "Ligue1"     2014 0 536 411 5013 5449 592  683 33 45
    "Ligue1"     2015 0 546 414 4925 5226 660  768 60 57
    "Ligue1"     2016 0 588 406 4676 4813 581  670 27 71
    "Ligue1"     2017 0 565 439 4785 4913 658  774 42 44
    "Ligue1"     2018 1 536 405 4749 5038 616  745 43 56
    "Ligue1"     2019 1 450 305 3802 3835 527  584 36 35
    "SerieA"     2014 0 570 454 5634 5892 862  962 54 64
    "SerieA"     2015 0 559 420 5748 5802 866  983 53 76
    "SerieA"     2016 0 631 492 5312 5300 788  884 36 58
    "SerieA"     2017 1 553 464 4842 4868 725  822 36 53
    "SerieA"     2018 1 564 455 4881 5036 812  955 39 51
    "SerieA"     2019 1 394 352 3527 3645 656  752 32 43
    end
    
    encode div, gen(division)
    xtset division season
    
    local outcomes fthg-ar
    foreach o of local outcomes {
     xtreg `o' i.VAR i.season, fe
    }
    If you are not familiar with factor-variable notation, read -help fvvarlist- to learn what the i. notation does.
    Last edited by Clyde Schechter; 29 Mar 2020, 18:49.

    Comment


    • #3
      Thank you so much for this reply! This makes a lot more sense than trying to brute force a "square peg into a round hole"!

      You are a lifesaver! Sorry for all the exclamation points, just very glad I found some help. I was stuck on this conundrum for days upon end.

      Thanks again.

      Comment


      • #4
        Hi Clyde,

        When I run the regression using the local and for loop, I can't seem to have stata loop each variable (from fthg-ar) to become the DV. And only fthg is shown as the DV in the fixed effects regression that comes out after I run the regression and forloop. For some reason, the other variables that are in the local are inputted as IVs with the VAR variable.

        From my understanding, the local and forloop should have told stata to have each of the variables in the local to act as the DV in each regression run by the forloop. However, like I stated above, the forloop only produces one regression with fthg as the DV and all the other variables inside the local as IVs.

        Thanks again.

        Comment


        • #5
          Sorry, my mistake. The -local- command is wrong. It should be:
          Code:
          unab outcomes: fthg-ar

          Comment


          • #6
            Ah, great, did the trick. Thanks again!

            Comment

            Working...
            X