Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DiD with and w/o ID FE (Multiple Time Points)

    I want to estimate a DiD where there are several times points and where treatment does not occur at the same time for all treated units. Treatment is binary. Angrist and Pischke (2009) in MHE p. 233 or this post discuss differences between DiD with 2 time points and multiple time points. The proposed estimation is:

    Yit = a + b Treat it + d Dit + t Year + e

    Estimating this in Stata:

    Code:
    webuse nlswork, clear
    
    gen treated = (idcode <= 2500)
    
    gen post = 0 if treated == 1 // treatment not appearing at the same time for everyone
    replace post = 1 if idcode <=500 & year >= 73 & treated == 1
    replace post = 1 if idcode >500 & idcode <=1000 & year >= 71 & treated == 1
    replace post = 1 if idcode >1000 & idcode <=2000 & year >=80 & treated == 1
    replace post = 1 if idcode >=2000 & idcode <=2500 & year >=79 &treated == 1
    
    
    reg ln_wage treated did i.year, cl(idcode)
    My question is: why/why not include individual fixed effects as well? Why not estimate:

    Yit = a + b Treat it + d Dit + t Year + x ID + e

    which would be an xtreg estimation:

    Code:
    xtset idcode
    xtreg ln_wage treated did i.year, fe cl(idcode)
    Clearly the two estimations produce different results because one includes individual FE and the other does not. What is the motivation to choose one estimation over the other GIVEN the DiD framework?
    Last edited by Raluca Brown; 20 Sep 2018, 06:28.

  • #2
    The code you show in the first code block of #1 not only does not make sense, it doesn't even run: the -reg- command includes a variable did which is never defined and does not exist. Moreover the variable post that is created there is never used. And it shouldn't be used because if you did, it would result in the exclusion of all of the untreated observations from the regression. I don't know where the code you show comes from, but it is just wrong on multiple levels.

    In any case, with treatment beginning at different times for different entities, you cannot do a classic DID analysis. You may, instead, use generalized DID. See https://www3.nd.edu/~rwilliam/stats/Margins01.pdf for a full explanation and some examples. You will see, there, that your instinct that the fixed effect for the id is, indeed, a key ingredient.

    My approach to something like what you show in the first code block of #1 would be:

    Code:
    webuse nlswork, clear
    
    gen treated = (idcode <= 2500)
    
    gen now_being_treated = 0 
    replace now_being_treated = 1 if idcode <=500 & year >= 73 & treated == 1
    replace now_being_treated = 1 if idcode >500 & idcode <=1000 & year >= 71 & treated == 1
    replace now_being_treated = 1 if idcode >1000 & idcode <=2000 & year >=80 & treated == 1
    replace now_being_treated = 1 if idcode >=2000 & idcode <=2500 & year >=79 &treated == 1
    
    
    xtset idcode year
    xtreg ln_wage now_being_treated i.year, fe vce(cluster idcode)
    By the way, a brief citation like Angrist and Pischke (2009) in MHE p. 233 is not helpful. What is MHE? Maybe everybody in your circle knows, but this is a multi-disciplinary international forum. If you want to provide references, they should be completely spelled out so that anyone can find them (or, better still, if possible, provide a link.)


    Comment


    • #3
      Thanks for the comments Clyde. There was a typo indeed in which this line of code did not appear
      Code:
      gen did= treated*post
      as per the explanation in the model described above and the link attached.

      I was sure anyone commenting on a causal inference post would know the Mostly Harmless Econometrics book by Angrist and Pischke (2009) as referenced.

      Whilist apologetic for the editorial flaws you signalled out so professionally above:
      - I am still not sure from your answer what is the motivation for including individual fixed effects.
      - I am still pretty sure that equation (1) above represents a generalized difference in differences regression which allows for different timings of the treatment for different treated units -- this means I am not sure why your suggested answer (generating "now_being_treated") is not the same with the "did" variable created.

      Comment


      • #4
        Well, for one thing, the post variable in #1 has missing values for all of the non-treated id's, which means that the did variable does also. And that means that the -reg- command there would exclude all of the non-treated id's from the estimation.

        Now, my editorial gaffe. The link I included in #2 was to an excellent introduction to the -margins- command, instead of to the slides on generalized DID. I meant to show this link: https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf, which has a complete explanation of the approach, including the use of fixed-effects. Sorry about that.

        I was sure anyone commenting on a causal inference post would know the Mostly Harmless Econometrics book by Angrist and Pischke (2009) as referenced.
        Well, as I say, this is a multi-disciplinary forum. While I would guess that economists/econometricians/finance professionals probably constitute about half of the active mebmership, that still leaves the other half. So when posting here it is best to assume that the only shared knowledge is: some basics about statistics and Stata, and things that any college graduate could be expected to know regardless of what fields he or she studied.​​​ That applies to references, jargon, and abbreviations.

        Comment


        • #5
          Thanks for the slides! I do see your point about the missing values when treatment is zero, but I am thinking equation (1) estimates the average treatment effect for the treated (that is, before and after the the treatment switches on)?

          Looking on slide 30 from the slides you attached, I'm still thinking there needs to be a "post" variable that switches on to indicate which units get treated and when, while your answer above suggests that one should ignore the moment of the switch on. I'm thinking that's problematic as the variable you define above "now_being_treated" mixes in "0" the control and the "pre" period of the treatment group. This is a problem because we expect that the treatment and control to be neverthelss different in the pre-treatment period (parallel trend assumption).

          Would you mind explaining further your answer in this respect? I may be missing something obvious.

          Comment


          • #6
            The inclusion of the time fixed effect in the model carries the information that a "post" variable would otherwise carry. The now_being_treated variable plays an analogous role to the classical DID's treated#post interaction term. The combination of now_being_treated, individual fixed effect and time fixed effect provide all the information needed.

            Comment

            Working...
            X