Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Staggered Difference in Difference: How to properly regress

    Hey everyone, ok so I'm completely new to Stata and I have no clue how to run my Staggered Difference in Difference regression on here.... My goal is to explain if crime rates increase in a city with an addition of a Sports Stadium being built. I gathered my data and I came up with something like this:
    City with Treat Y1 Period CR1 Treatment1 Cits without Treat Y2 Period2 CR2 Treatment2
    Denver 2000 0 4692.5 0 Sacramento 2000 0 4636.8 0
    Denver 2001 1 4273.5 1 Sacramento 2001 1 4210 0
    Milwaukee 2000 0 4626.7 0 Norfolk 2000 0 4535.2 0
    Milwaukee 2001 1 4539.3 1 Norfolk 2001 1 3737 0
    Pittsburgh 2000 0 2751.4 0 San Jose 2000 0 2776.8 0
    Pittsburgh 2001 1 2598.5 1 San Jose 2001 1 2628.6 0
    Detroit 2001 0 4686.5 0 Chicago 2001 0 5046.3 0
    Detroit 2002 1 4297.8 1 Chicago 2002 1 5132 0
    Foxborough 2001 0 760 0 Weymouth 2001 0 900 0
    Foxborough 2002 1 1267 1 Weymouth 2002 1 1211 0
    Houston 2001 0 5046.3 0 Boston 2001 0 5072 0
    Houston 2002 1 5505.4 1 Boston 2002 1 5361 0
    Seattle 2001 0 5221.1 0 Baltimore 2001 0 5565.9 0
    Seattle 2002 1 5219.4 1 Baltimore 2002 1 5124.3 0
    I am using similar cities based off of population and crime as the control group who did not build a stadium and the treatment is obviously cities that built a stadium. The time period is from 2000 - 2016, I have 69 Control and Treatment Variables. 0 is the year before a Stadium is built and 1 is when a Stadium is built.

    Can someone help me out with how to write the code to run this or give me some pointers on what to do? Anything would be appreciated.

    Thanks!

  • #2
    Please explain your data organization. Is each row of the tableau you show a matched pair? If not, why do you have data on two cities in each observation? You speak of only a single treatment, building a stadium, but you have two different treatment variables. Why? Y1 and Y2 are always equal in the example you show. Why do you have both variables? (Or is that not true in your data set as a whole?)

    In addition to explaining these when posting back, please provide a usable example from your actual Stata data set, using the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Finally, explain what you mean by a "staggered" difference in difference model. What exactly do you wish to see staggered and in what way?

    Comment


    • #3
      Hi Clyde,

      Thanks for the response! My overall goal is to see the effect of a stadium being added to a city has on the respective cities crime rate. The question I had is that there is obviously multiple years where a stadium was added to a city (instead of there being 1 specific uniform year for all cities). I did some research and found a the staggered difference in difference model which essentially would take a look at the year prior to a stadium being built, the year it is built (treated), and then the year after it was built. This was to see the effect of adding the stadium had in comparison to a city who never added one. By doing this approach, I would be assigning all the cities a common period where the treatment occurs, My logic was to assign a value or period (0,1,2) to each of the years to give them all a common point of treatment (1 being the treatment period). The idea was to included cities from 2001 - 2017 that added a stadium and an equal amount of cities who are similar in population and crime rate at period 0. The cities who never built a stadium are classified as my Control Group while the ones who do add a stadium are the treatment group.

      The table i posted earlier was a little odd looking (I apologize for that) But here is the -dataex- command for my data. I have more observations but this is basically what I came up with.
      I also grouped the cities by year together for when they built a stadium and added respective control cities who are similar to them in the same group (I don't know if that is needed or not; example, Denver and Milwaukee added a stadium in 2001 so they are assigned group 1, Detroit and Houston did in 2002 so they are in group 2).

      By data below shows:
      the groups (1,2,3,4,5,ect)
      City
      Years observed- 2000, 2001, 2002, ect.
      Period (0,1,2)- 0= before stadium, 1=year stadium was opened (or not), 2= year after stadium was opened
      Crime- Cities respective number of crimes for that period
      Treatment- 1 for periods 0,1,2 for treatment group and 0 for control group)
      Post- 1 for year a stadium was built and after; 0 for if a stadium was never built


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte number str13 city int year byte period float crime byte(treatment post)
      1 "Denver"        2000 0 4692.5 1 0
      1 "Denver"        2001 1 4273.5 1 1
      1 "Denver"        2002 2   4821 1 1
      1 "Milwaukee"     2000 0 4626.7 1 0
      1 "Milwaukee"     2001 1 4539.3 1 1
      1 "Milwaukee"     2002 2   4690 1 1
      1 "Pittsburgh"    2000 0 2751.4 1 0
      1 "Pittsburgh"    2001 1 2598.5 1 1
      1 "Pittsburgh"    2002 2   2772 1 1
      1 "Sacramento "   2000 0 4636.8 0 0
      1 "Sacramento "   2001 1   4210 0 0
      1 "Sacramento "   2002 2 4830.2 0 0
      1 "Norfolk, VA"   2000 0 4535.2 0 0
      1 "Norfolk, VA"   2001 1   3737 0 0
      1 "Norfolk, VA"   2002 2 4478.8 0 0
      1 "San Jose"      2000 0 2776.8 0 0
      1 "San Jose"      2001 1 2628.6 0 0
      1 "San Jose"      2002 2 2645.4 0 0
      2 "Detroit"       2001 0 4686.5 1 0
      2 "Detroit"       2002 1 4297.8 1 1
      2 "Detroit"       2003 2 3360.3 1 1
      2 "Foxborough"    2001 0    760 1 0
      2 "Foxborough"    2002 1   1267 1 1
      2 "Foxborough"    2003 2   3245 1 1
      2 "Houston"       2001 0 5046.3 1 0
      2 "Houston"       2002 1 5505.4 1 1
      2 "Houston"       2003 2 5097.1 1 1
      2 "Seattle"       2001 0 5221.1 1 0
      2 "Seattle"       2002 1 5219.4 1 1
      2 "Seattle"       2003 2 5458.4 1 1
      2 "Chicago"       2001 0 5046.3 0 0
      2 "Chicago"       2002 1   5132 0 0
      2 "Chicago"       2003 2 5239.7 0 0
      2 "Weymouth"      2001 0    900 0 0
      2 "Weymouth"      2002 1   1211 0 0
      2 "Weymouth"      2003 2   1433 0 0
      2 "Boston"        2001 0   5072 0 0
      2 "Boston"        2002 1   5361 0 0
      2 "Boston"        2003 2 2830.5 0 0
      2 "Baltimore"     2001 0 5565.9 0 0
      2 "Baltimore"     2002 1 5124.3 0 0
      2 "Baltimore"     2003 2 4701.2 0 0
      3 "Los Angeles"   2002 0 3998.3 1 0
      3 "Los Angeles"   2003 1 3675.5 1 1
      3 "Los Angeles"   2004 2 3518.9 1 1
      3 "Chicago"       2002 0 6637.4 1 0
      3 "Chicago"       2003 1 6698.1 1 1
      3 "Chicago"       2004 2   7000 1 1
      3 "Cincinnati"    2002 0 4541.5 1 0
      3 "Cincinnati"    2003 1 4517.8 1 1
      3 "Cincinnati"    2004 2 4032.1 1 1
      3 "Philadelphia"  2002 0 3389.6 1 0
      3 "Philadelphia"  2003 1 3446.1 1 1
      3 "Philadelphia"  2004 2   3851 1 1
      3 "New York"      2002 0 3998.3 0 0
      3 "New York"      2003 1   2659 0 0
      3 "New York"      2004 2 2535.1 0 0
      3 "Washington DC" 2002 0 4047.1 0 0
      3 "Washington DC" 2003 1 3862.3 0 0
      3 "Washington DC" 2004 2   2909 0 0
      3 "Norfolk, VA"   2002 0 4478.8 0 0
      3 "Norfolk, VA"   2003 1 3558.3 0 0
      3 "Norfolk, VA"   2004 2 4066.8 0 0
      3 "Detroit"       2002 0 4297.8 0 0
      3 "Detroit"       2003 1 3360.3 0 0
      3 "Detroit"       2004 2 3070.1 0 0
      4 "Philadelphia"  2003 0 5508.8 1 0
      4 "Philadelphia"  2004 1   3851 1 1
      4 "Philadelphia"  2005 2 3360.8 1 1
      4 "San Diego"     2003 0 4187.8 1 0
      4 "San Diego"     2004 1 4111.2 1 1
      4 "San Diego"     2005 2 3777.1 1 1
      4 "Detroit"       2003 0 3360.3 0 0
      4 "Detroit"       2004 1 3070.1 0 0
      4 "Detroit"       2005 2 3292.1 0 0
      4 "Minneapolis"   2003 0 3766.9 0 0
      4 "Minneapolis"   2004 1 3728.3 0 0
      4 "Minneapolis"   2005 2 3983.5 0 0
      5 "Frisco"        2004 0 5198.7 1 0
      5 "Frisco"        2005 1 3318.2 1 1
      5 "Frisco"        2006 2 3720.3 1 1
      5 "Orlando"       2004 0 4992.2 0 0
      5 "Orlando"       2005 1 5135.4 0 0
      5 "Orlando"       2006 2 5200.1 0 0
      6 "Bridgeview"    2005 0 2261.7 1 0
      6 "Bridgeview"    2006 1 2261.7 1 1
      6 "Bridgeview"    2007 2 2147.2 1 1
      6 "Glendale"      2005 0 2215.1 1 0
      6 "Glendale"      2006 1 2132.4 1 1
      6 "Glendale"      2007 2 2097.1 1 1
      6 "St. Louis"     2005 0 3937.3 1 0
      6 "St. Louis"     2006 1 4297.3 1 1
      6 "St. Louis"     2007 2 4037.7 1 1
      6 "Burbank, IL"   2005 0   4001 0 0
      6 "Burbank, IL"   2006 1   3761 0 0
      6 "Burbank, IL"   2007 2 3872.3 0 0
      7 "Denver"        2006 0 4130.1 1 0
      7 "Denver"        2007 1 2496.6 1 1
      7 "Denver"        2008 2 3236.7 1 1
      7 "Albuquerque"   2005 0 5753.2 0 0
      end
      Thanks for the help. I am very new to Stata and have been going through posts left and right to figure out a solution with no luck. I am also toying around with doing a FE model of this instead but i figured I'd get some input first.

      Comment


      • #4
        OK, so your "staggered DID" is what I generally refer to as generalized DID. There's an additional wrinkle here: you have two different treatment periods: year of building the stadium, and subsequent years, contrasting with pre-stadium building years. I can't tell from your example data, but I will assume that your control cities have period coded as 0 in all observations.

        I'm a little bit confused by one aspect of your data. There are instances where the same city has two or more observations for the same year, and in one instance, Detroit, there are three observations for year 2003. It appears that in these instances, the city actually changes state during the course of that year, although I don't get how Detroit could have gone from pre-stadium through building the stadium to being post-stadium all in the space of one year. Is that one a data error? This aspect of your data is also confusing because in some cases you have different values for the crime rates, and in others you have the same value for both observations in the year. What's going on here? Do I have this general understanding right?

        With those assumptions, your variable period is, in fact, the treatment X time interaction variable that is central to any kind of DID analysis. Now, I imagine you will need to adjust your analysis for some additional variables. I won't belabor the point: it's a science question that you and your colleagues know more about than I would.

        So here's how I would set up the bare bones analysis:

        Code:
        encode city, gen(ncity)
        xtset ncity
        xtreg crime i.period i.year, fe // CONSIDER vce(cluster ncity)
        The output of the -xtreg- command will include two rows for period, each of which represents the difference in expected crime rates between the corresponding period and the pre-construction period.

        You definitely should use an FE model for this. First, you cannot just use -regress- because you have repeated observations on the same cities, so the observations cannot be considered independent. So you must use a panel data analysis. Since the effect you are interested in is a purely within-city effect, you are best off with the fixed effects estimator.

        Comment


        • #5
          Thanks for the clarification Clyde. And thank you for pointing out the discrepancies in my data. I guess it didn't transfer properly. Quick question about the code.
          Code:
           
           encode city, gen(ncity) xtset ncity xtreg crime i.period i.year, fe
          How is this equation interpreted? is the i.period and i.year just dummies and is this still a Diff-in-Deiff equation? Just trying to visualize what is being applied here since it doesn't look like they are being interacted and obvioulsy I'm still not use to Stata.

          Comment


          • #6
            The -encode- command is just to create a numeric version of the city variable, since -xtset- will not accept a string as the panel identifier.

            As for the xtreg, the variable period, as you created it, is the interaction term. It's not written in the usual form of an interaction term but it has exactly the right values. For a city in the treatment group it takes on the value 0 before the stadium was built, 1 in the year the stadium is built, and 2 in the years afterwards. And for the control group it is everywhere 0. That is exactly the behavior you want from an interaction term. Then, since this is a generalized (or, as you call it, staggered), not a classical, DID, you need panel fixed effects (which FE providesS) and year fixed effects, which i.year provides. Voila!

            Comment

            Working...
            X