No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-difference (DID) analysis in Stata

    Hello Stata experts,

    I am quite new to the use of Stata. Luckily I found this forum because I have a specific question and hope that you can help me with. I would like to use the DID approach to estimate the impact of an intervention on another variable [retention], and i'm not sure how to proceed. I am comparing two groups of individuals; one that received the intervention and another that did not. I have only three quarterly time points, 2 before the intervention, and one after. I am trying to follow this tutorial from princeton but I have hit a block with the very first command. I am trying to create a dummy variable to indicate the time when the treatment started using the command [generate time = (quarter>=2017Q4)], but this returns the error [2017Q4 invalid name]. I entered the data manually, and formatted "quarter" as a numeric variable. I'm not sure what I'm doing wrong here.


  • #2
    Hi Maggy,

    from my understand about DiD, you need to proceed with the following steps:

    * define treatment and control-group
    gen DummyTreatment = (Group == 1)    // creates a dummy with 1 for Group 1 and 0 for all others
    * define the event (-> when does the post-treatment start? )
    gen DummyPost = (quarter>=4)
    * create the DiD-term (-> interaction)
    gen DiD = DummyTreatment*DummyPost
    * regression
    reg Y DummyTreatment DummyPost DiD Controls
    In your case "2017Q4" can't be numeric. How should Stata now that "2017Q4" is larger than "2017Q3"? I would simply create a column which extracts the quarter and thus, only contains 1,2,3,4. Then it should work.


    • #3
      Thanks for your reply.

      I converted 2017Q4 from a string variable to a numeric variable. By default, Stata displays this as a value. The mistake i was making was to format the value to display as quarter [2017Q4]. I started again, this time using Stata's default formats, and I have been able to proceed through all the steps you've outlined. When I input the code
      "reg Y DummyTreatment DummyPost DiD Controls", Stata returns the error "variable y not found" Any help on this?


      • #4
        Welcome to Stata.

        It seems possible that you lack a firm understanding of how Stata treats dates and times.

        Stata's "date and time" variables are complicated and there is a lot to learn. If you have not already read the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF, do so now. If you have, it's time for a refresher. After that, the help datetime documentation will usually be enough to point the way. You can't remember everything; even the most experienced users end up referring to the help datetime documentation or back to the manual for details. But at least you will get a good understanding of the basics and the underlying principles. An investment of time that will be amply repaid.

        All Stata manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

        With that said, if your variable quarter is indeed stored as a SIF (Stata Internal Format) quarterly variable, then - as described in the section of help datetime titled "Conveniently typing SIF values" - you will need to compare it to tq(2017Q4) - the tq() function will create the SIF value that corresponds to the date 2017Q4.

        But if this is at all unclear to you, please save yourself a lot of trouble and do the recommended reading before you proceed further.

        Added in edit: this post crossed with post #3. The mistake was not in using Stata's default formats, the mistake was in not understanding how Stata stores and displays dates and times, and the difference between how they are stored and how they are displayed. You should definitely do the recommended reading. Stata does not require you to know that 231 is the SIF representation of 2017Q4, and your work will be much more comprehensible and less prone to error if you deal with dates and times in the way Stata is designed to.
        Last edited by William Lisowski; 16 Sep 2018, 08:16.


        • #5
          I just put "Y" as a random dependent variable. Of course you need to adjust the code according to your dataset. If your dependent variable is called "retention", replace Y with that. Same applies for whatg I have specified as "Controls".

          Some further hints for you, since it seems that you are really very new to Stata:
          • type "help regress" into Stata console. This will display the documentation of the regress command. Usually that is very helpful and you get a first grisp on how the command works.
          • there are plenty of helpful introductions to Stata. Just use google. Also on Youtube there are helpful explanations to get started.
          • I found these slides very helpful regarding DiD ->
          Hope I could help.


          • #6
            Building on the advice in post #5, Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it.

            When I began using Stata in a serious way, I started - as others here did - by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through Stata's Help menu. The objective in doing this was not so much to master Stata as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and manual.

            The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.


            • #7
              Hi again Max, I just needed to replace it with retention as you suggested. I've been able to complete the section. A million thanks


              • #8
                Hello William,

                Thanks for your input. I am completely new to stata. The focus of my work is the DID approach, and it's a requirement that I use Stata. The learning curve is steep, i'll go through the recommended reading to learn more.