Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Diff-in-Diff with Waves?

    Hi Statalist Community!

    I have a question that has been puzzling me for weeks... and I decided to post in here hoping that someone could kindly help me.

    I am working on a project that tries to understand if popular protests in autocratic regimes are more or less effective in countries with higher levels of economic growth. I coded a variable for “regime”, for each autocratic regime, “demprotest”, a dummy for when a popular protest occurs, and “e_demotrans” a dummy that takes up 1 if there is a transition to democracy (data in the picture below). My IV would be “hec”, a dummy taking up the value of 1 if the country has high levels of economic growth.

    I wanted to apply a diff-in-diff analysis to this project with my dependent variable being the democratisation variable “e_demotrans”.
    My Treatment Group would be the “hec=1” countries, and my Control Group, the “hec=0” countries.

    However, problems arise when I try to define my time variable… That is because each regime might have multiple protests (see picture). I was wondering if it would be possible to have post-treatment “waves”. For example, in a certain regime with multiple protests and only the last one leads to democratisation, should I assume countries have for example 5 years to democratise after a protest happens? And define my time variable as a dummy that takes up the value of 1 for the five years after a protest (“t5” in picture) until and if a democratisation occurs?


    Thank you so much in advance for your help!
    Catarina

    Attached Files

  • #2
    Cat, I would suggest defining a "cumulative stock of protests" which is the total number of previous years when protests occurred. For example, given the data in the picture, the variable = 0 if year <= 1969, = 1 if year = 1970, = 2 if year >= 1971 & year <= 1982, etc. It seems reasonable that a country is more likely to democratize if the cumulative experience of protests is large enough. More importantly, this definition clearly marks the starting point of protests, which is convenient for DiD or event studies. For example, in DiD, you may define the time-dimension indicator to be 1 right after the starting point (and 0 before the point). You may also take advantage of the cumulative number of years with protests to examine heterogeneous effects along the time dimension after the starting point. Anyway, it's going to be a staggered DiD because starting points differ by countries.

    The other dimension of DiD, I think, should be a variable indicating whether a country has ever had a protest. In other words, the control group should be the countries never having protests. As for the usefulness of protests in high-growth v.s. low-growth countries, it's more like a heterogeneity examination based on the DiD framework (dimension 1: ever v.s. never having protests, dimension 2: before v.s. after the first protest), or a DDD analysis.

    If all your countries have ever had protests, then you may (1) select certain periods of study such that some countries have never had protests during the period, or (2) use other methods instead of DiD. The first alternative method would be event studies, the second would be a survival analysis in a panel-data setting.
    Last edited by Fei Wang; 10 Nov 2021, 08:34.

    Comment


    • #3
      Dear Fei Wang,
      Thank you SO much for this brilliant response. It was really helpful, and if I understood your advice correctly you just solved all my problems

      I created "t" the cumulative protest variable, as you suggest, and the idea is to use it as the time variable, right?
      Because I do have countries that never had a protest, I will use them as a control group as you also suggest, and create the variable "control"

      So, if I understood correctly, my "did" variable should be: did = t * control, is that correct?

      The part I did not understand so well is then how do I then test the effects in high/low economic growth countries. Do you think the best way would be with a DDD and the didregress command? Something like (being "hec" a dummy for countries with high economic growth)?:

      didregress (e_demotrans) (hec), group(control) time(t)

      Thank you so much for your precious help, and for your time! I really appreciate it!
      Attached Files

      Comment


      • #4
        Originally posted by Cat Santos View Post
        Dear Fei Wang,
        Thank you SO much for this brilliant response. It was really helpful, and if I understood your advice correctly you just solved all my problems

        I created "t" the cumulative protest variable, as you suggest, and the idea is to use it as the time variable, right?
        Because I do have countries that never had a protest, I will use them as a control group as you also suggest, and create the variable "control"

        So, if I understood correctly, my "did" variable should be: did = t * control, is that correct?

        The part I did not understand so well is then how do I then test the effects in high/low economic growth countries. Do you think the best way would be with a DDD and the didregress command? Something like (being "hec" a dummy for countries with high economic growth)?:

        didregress (e_demotrans) (hec), group(control) time(t)

        Thank you so much for your precious help, and for your time! I really appreciate it!

        Sorry for the follow up question but just out of curiosity, why wouldn’t the 5 year waves after a protest happens work with DD? Thank you so much



        Comment


        • #5
          Cat, I think the definition of "t" is almost correct. For the Philippines, there was protests in 1986, so the value of "t" in 1986 should be 6 instead of 0. Essentially, you may generate "t" with the following code.

          Code:
          bys country_name (year): gen t = sum(demprotest)
          After defining t, you may define a "did" variable (=0 if a country had never had protests by some year, = 1 if a country had ever had protests by some year) with the following code.

          Code:
          gen did = t > 0 & t < .
          The "did" variable is exactly the key regressor for DiD analysis. You may wonder why this variable is not generated by multiplying a "treatment group dummy" with a "post dummy". If the first protest occurred in the same year for all countries (situation for traditional DiD), then you may generate the "did" variable with multiplication. But your case is staggered DiD, the post period started differently across countries and there is no way to generate the "did" variable with one single multiplication. You may start your DiD analysis as below.

          Code:
          xtset countryid year    // need to define a numeric countryid, as -xtset- doesn't allow for string panel id.
          xtreg y did covariates i.year, fe vce(cluster countryid)
          If you'd like to further explore how the effects of "did" on "y" vary by country economic growth, then you may define a dummy variable "high" (=1 for high-growth country, = 0 for low-growth country), and then interact it with "did", as below (essentially a DDD).

          Code:
          xtreg y c.did#c.high did high covariates i.year, fe vce(cluster countryid)
          You may further take advantage of "t" to examine the effect of "protesting density" on "y", simply by replacing "did" of the above regressions with "t".

          All the procedure above is quite standard for staggered DiD until a couple of years ago when some papers found potential flaws of doing so. Wooldridge has a very nice working paper on this issue: https://papers.ssrn.com/sol3/papers....act_id=3906345.

          Comment


          • #6
            Originally posted by Cat Santos View Post


            Sorry for the follow up question but just out of curiosity, why wouldn’t the 5 year waves after a protest happens work with DD? Thank you so much


            Two reasons. First, 5 years are a bit arbitrary, and we don't know in advance when transition is going to happen. Second, I assume protests work in a cumulative way. If you only look at protests within 5 years before democratization, then the effects of protests longer before would be ignored.

            Comment


            • #7
              Dear Fei Wang,

              Thank you so much for this amazing explaination. Not only did this fix my paper, but it was a very valuable stats lesson. If I do publish this paper, I will certainly include you in the acknowledgements and I will let you know.

              I really appreciate the time you spent helping me!
              Have a lovely day!

              Catarina

              Comment

              Working...
              X