Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Visualizing treatment effect over time from difference in difference

    I have a question that relates to difference in difference estimation and plots of treatment effects over time. I have created some toy data to illustrate and would just like to verify that this is the correct approach.

    To illustrate the effect of a treatment over time you would do something like:

    Code:
    xtreg y  i.treated##i.year x , fe
    margins year, dydx(treated)  noestimcheck
    marginsplot
    Whereas the standard set up would look something like

    Code:
    xtreg y x  i.treated##i.post i.year , fe
    Is that correct?

    Here’s what I see in many journals. Just would like to verify that above is correct.




    Lastly, how would I reconfigure the code above to create “years relative to the intervention” (which in this case is 2011)?


    Here’s the toy data:

    Code:
    clear
    set more off
     
    set seed 12345
    set obs 10
    gen int firm = _n
     
     
    expand 15
    bys firm: gen year = 2000 + _n
     
    gen y = runiform(10,50)
    gen x = runiform(1,20)
     
    gen int treated = (firm >= 8)
    gen int post = (year >= 2011)
    replace y = 50 + x if year >= 2011 & treated == 1
     
    xtset firm year
     
    * standard did
    xtreg y x  i.treated##i.post i.year , fe
     
     
    * effect over time
    xtreg y  i.treated##i.year x , fe
     
    * plot effect over time
    margins year, dydx(treated)  noestimcheck
    marginsplot , xline(2011) level(50) xlab(, angle(v))  xtitle("")
    Thanks,

    Justin

  • #2
    Is that correct?

    Well, that is a correct approach to generalized difference-in-differences estimation. But since in your data the intervention begins at the same time (2011) for all entities, there is no need to use generalized diff-in-diff; stick to the simpler "standard setup." It's easier to interpret.

    Lastly, how would I reconfigure the code above to create “years relative to the intervention” (which in this case is 2011)?
    Code:
    gen years_relative = year - 2011

    Comment


    • #3
      Thanks for your help as always, Clyde. I have a couple of followup questions if you don't mind.

      Well, that is a correct approach to generalized difference-in-differences estimation. But since in your data the intervention begins at the same time (2011) for all entities, there is no need to use generalized diff-in-diff; stick to the simpler "standard setup." It's easier to interpret.
      How would I visualize an effect over time in the standard setup? I tend to see these types of charts in journals and would like to know how to set them up in a difference in difference framework.

      Lastly,
      Code:
       gen years_relative = year - 2011
      Does this mean
      Code:
       xtreg y x  i.treated##i.post i.year , fe
      becomes
      Code:
       xtreg y x  i.treated##i.years_relative i.year , fe
      or
      Code:
       xtreg y x i.treated##i.years_relative i.years_relative , fe

      Comment


      • #4
        How would I visualize an effect over time in the standard setup?

        The same -margins- and -marginsplot- commands you showed will work in the standard setup as well.

        Does this mean
        Code:

        xtreg y x i.treated##i.post i.year , fe
        becomes
        Code:

        xtreg y x i.treated##i.years_relative i.year , fe
        or
        Code:

        xtreg y x i.treated##i.years_relative i.years_relative , fe
        Either one would be fine. The results will be the same, except for the cdonstant term. Possibly, just for ease of understanding, the second approach is preferable.

        Comment


        • #5
          Thanks, Clyde.

          I really hope you'll one day consider writing a practitioner's guide to difference in difference studies using Stata!

          Comment


          • #6
            Hi Clyde, I came across your reply to Justin's post.

            I am trying to produce the same plot, and I have used xtdidregres.

            After running the xtdidregress, I have used estat grangerplot to produce the time-specific ATET over time.

            I am getting the error message that states "treatment assignment times vary; not allowed with estat grangerplot".

            My data structure is a panel data set similar to that of the artificial example provided by tedidregress Stat manual page. (use https://www.stata-press.com/data/r17/hospdd
            (Artificial hospital admission procedure data)

            My question is:

            1. How do I first inspect where the time-varying assignments exist in my data set? I have a quarterly data set where I assign the beginning of the treatment at 2019q1, so I don't understand how the treatment assignment varies in this case.

            2. Even with the time-varying treatment assignments, is there a way to produce the plot Justin originally requested or a similar one to the grangerplot?

            Help is much appreciated!

            Thanks.


            Originally posted by Clyde Schechter View Post
            Well, that is a correct approach to generalized difference-in-differences estimation. But since in your data the intervention begins at the same time (2011) for all entities, there is no need to use generalized diff-in-diff; stick to the simpler "standard setup." It's easier to interpret. [/FONT][/COLOR][/LEFT]


            Code:
            gen years_relative = year - 2011

            Comment


            • #7
              I'm sorry, but I am not famliar with the -xtdidregress- and -estat grangerplot- commands, so I cannot advise you on their use.

              Concerning your first question, assuming that your panel identifier is called panelid, your time variable is called time, and your pre vs post intervention indicator is called pre_post (with 0 = pre and 1 = post) you can do this:
              Code:
              tabstat time if pre_post == 1, by(panelid) statistics(min)
              and you will see the starting time for intervention in each panel.



              Comment


              • #8
                Hi Clyde, this is extremely helpful.

                Let me dig into this more.

                Thanks a lot!

                Originally posted by Clyde Schechter View Post
                I'm sorry, but I am not famliar with the -xtdidregress- and -estat grangerplot- commands, so I cannot advise you on their use.

                Concerning your first question, assuming that your panel identifier is called panelid, your time variable is called time, and your pre vs post intervention indicator is called pre_post (with 0 = pre and 1 = post) you can do this:
                Code:
                tabstat time if pre_post == 1, by(panelid) statistics(min)
                and you will see the starting time for intervention in each panel.


                Comment


                • #9
                  Hi Clyde,

                  I inspected my data to see in which pandelid I have such cases.

                  The basic issue I face is that some panelid does not have data points at the treatment assigned quarters, so they start some quarters later.

                  I would like to drop any panelid that have data points start later than the treatment assignment date.

                  For example, consider this example: the treatment begins at 2020 Quarter 1.
                  hospital year qtr yrqtr pre_post
                  1 2019 4 2019q4 0
                  1 2020 1 2020q1 1
                  1 2020 2 2020q2 1
                  2 2021 1 2021q1 1
                  2 2021 2 2021q2 1
                  Here, the panelid is hospital, and hospital 2's data starts at 2021 Quarter 1, so I would like to drop hospital 2 altogether from the data.

                  What might be the best to do this?

                  I thought about something like:

                  Code:
                  drop if pre_post ==1 & yrqtr !=2020q1
                  But, then I realized it would also drop hospital 1's 2020Q2 data point.

                  Any advice would be much appreciated!

                  Thanks.

                  Originally posted by Clyde Schechter View Post
                  I'm sorry, but I am not famliar with the -xtdidregress- and -estat grangerplot- commands, so I cannot advise you on their use.

                  Concerning your first question, assuming that your panel identifier is called panelid, your time variable is called time, and your pre vs post intervention indicator is called pre_post (with 0 = pre and 1 = post) you can do this:
                  Code:
                  tabstat time if pre_post == 1, by(panelid) statistics(min)
                  and you will see the starting time for intervention in each panel.


                  Comment


                  • #10
                    I would like to drop any panelid that have data points start later than the treatment assignment date.
                    Code:
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input byte hospital int year byte(qtr pre_post) float yrqtr
                    1 2019 4 0 239
                    1 2020 1 1 240
                    1 2020 2 1 241
                    2 2021 1 1 244
                    2 2021 2 1 245
                    end
                    format %tq yrqtr
                    
                    by hospital (yrqtr), sort: egen earliest_post_quarter = ///
                        min(cond(pre_post, yrqtr, .))
                        
                    drop if earliest_post_quarter > tq(2020q1)
                    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                    Comment


                    • #11
                      Hi Clyde,

                      Thanks for the comments on the -dataex- command. I just got started on this forum, and this convention seems very helpful.

                      Just a quick question on your conditional statement. you have pre_post as the first argument, but what does it mean for this to be true?

                      I look at cond command on Stata, but without any Boolean logic, I am not sure how your command would return?

                      Any clarification would be appreciated.

                      Once again, thanks!

                      Originally posted by Clyde Schechter View Post

                      Code:
                      * Example generated by -dataex-. For more info, type help dataex
                      clear
                      input byte hospital int year byte(qtr pre_post) float yrqtr
                      1 2019 4 0 239
                      1 2020 1 1 240
                      1 2020 2 1 241
                      2 2021 1 1 244
                      2 2021 2 1 245
                      end
                      format %tq yrqtr
                      
                      by hospital (yrqtr), sort: egen earliest_post_quarter = ///
                      min(cond(pre_post, yrqtr, .))
                      
                      drop if earliest_post_quarter > tq(2020q1)
                      In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                      Comment


                      • #12
                        In Stata, when a variable or numeric expression is used in a Boolean context, 0 is interpreted as false, and anything other than zero (including missing value) is interpreted as true. So in the -cond()- function, when pre_post is zero, it will return ., and when pre_post is anything else (which, for this variable is just 1), it will return yrqtr.

                        Comment

                        Working...
                        X