Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Should I include multiple year dummies in my difference-in-difference regression?

    Hi everyone. I am doing an analysis of the effects of access to school-based health centers (SBHCs) on algebra achievement. I have collected data on school districts in Maryland that implemented an SBHC between 2007 and 2016, as well as control schools that never implemented an SBHC. I am using a difference-in-difference technique to examine algebra scores pre- and post-implementation of SBHCs, and I am having trouble with my regression because each school district implemented their SBHC in different years. I have created dummy variables for each year relative to a school's implementation of an SBHC -- for example, if an SBHC was implemented in 2010, then that district has a 1 for dummy variables one through six, corresponding to 2011 (the first year following implementation) through 2016, as well as 1 for dummy variables -1 through -3, corresponding to the pre-treatment years of 2007-2009. My main question is -- should each of these dummy variables (I have seventeen of them ranging from -8 to +9) be included in the regression along with my fixed effects for year? This doesn't seem right to me, but it was suggested to me by my professor, unless I understood her wrong. Alternatively, I have created one single variable, called time, which contains this same information with values ranging from -8 to +9. However, both of these tactics seem problematic because they only pertain to school districts that implemented an SBHC at some point and ignore those that never implemented an SBHC.

    Here is what I've written so far:

    Code:
    reg basicpct treat i.school_id i.year negeight negseven negsix negfive negfour negthree ///
        negtwo negone one two three four five six seven eight nine, robust cluster(school_id)
    basicpct is the percent of students achieving basic proficiency in Algebra in a school district, and treat is a dummy for school's treatment status.

    I feel really off-base, and I'm pretty sure I'm in over my head. I would really appreciate it if anyone who is more experienced with difference-in-differences could walk me through this. Thank you so much in advance.

  • #2
    Your data are not suitable for a classical difference in differences analysis because the intervention takes place at different times in each school. You must, instead, do a generalized difference in differences analysis. For a nice overview of the approach, see https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf.

    Specifically as it applies to your problem, you need three key variables. One is a school identifier. The next is a calendar year identifier. Then you need a variable, leg's call it gdid, which takes on the value 1 in those observations where the school has an open SBHC, and 0 in all other observations. So this variable will be 0 in every observation for those schools that never open an SBHC. It will be 0 or 1 in the schools that do open one, depending on whether the observation is for a year before (0) or after (1) the opening. You then -regress algebra_performance i.school i.year i.gdid-. You can add covariates if that is appropriate. Also, since the observations are probably not independent within schools, you might want to use the -vce(cluster school)- option to account for that. Your -8 to +9 variable is not helpful here, because it is undefinable for the schools that never open an SBHC.

    Comment


    • #3
      Thanks for your help, Clyde. The only thing now that's confusing to me is how I would interpret the results of this regression, as I'm getting coefficients for each individual school and year which I'm not particularly concerned with. Is the coefficient for i.gdid the most useful part of the output? Perhaps this is the effect on algebra performance of a school having an open SBHC?

      Comment


      • #4
        Yes, you can ignore the school coefficients. The "pay dirt" here is the coefficient of gdid: that coefficient is your generalized DID estimate of the effect of opening an SBHC.

        Added: If the number of schools is large and that part of the output is just cumbersome, you can suppress it by using a slightly different set of commands:

        Code:
        xtset school
        xtreg algebra_performance i.gdid i.year, fe vce(cluster school)
        This is equivalent to the earlier analysis using -reg-, but it won't dump a lot of school results on you.

        Comment


        • #5
          Okay, this makes sense to me now. Thank you so much Clyde -- you really helped me out!

          Comment

          Working...
          X