Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question on Staggered-Difference-in-Difference and Parallel Trends Testing in Stata

    Hi all,

    I am working on a project investigating the impact of a rural road construction on firm creation. I have data gathered from an Economic Census for the years 1990, 1998, 2005, and 2013 (4 waves) aggregated to the village level (N= 80 000) and data on the construction year of roads under the program. Different villages received roads at different years, thus, I am in the staggered-difference-in-difference world. Roads were built between 2000-2015, meaning I have two pure pre-treatment periods (for all observations -1990 and 1998), and one partial treatment period where some treatment units were treated (2005), and a final treatment period where the majority of treated units had been treated by (2013). However, some villages which had received roads built-in 2014 or 2015 had not been treated by the final year of my dataset.

    My sample consists of all villages without paved road at baseline (2001), which the road construction program aimed to treat.

    I have posted example code below. My identifier is shrid_numeric_id and my treatment outcome is r_pmgsy (which is 1 if a treated unit is treated by year x, and 0 if not). In other words, if a unit is treated by 2005, it is 1 in 2005 and 1 in 2013.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(emp_all_ec emp_manuf_ec emp_service_ec) long shrid_numeric_id float(year r_pmgsy y_pmgsy)
     49   3  46 15725 1990 0 0
     98  22  76 15725 1998 0 0
      8   0   8 15725 2005 0 0
     12   0  12 15725 2013 1 6
      4   0   4 24591 1990 0 0
      4   0   4 24591 1998 0 0
      2   0   2 24591 2005 0 0
      4   0   4 24591 2013 0 0
      3   0   3 24838 1990 0 0
      3   1   2 24838 1998 0 0
      5   1   4 24838 2005 0 0
      4   1   3 24838 2013 0 0
     22  12  10 24909 1990 0 0
     27  21   6 24909 1998 0 0
      8   0   8 24909 2005 0 0
     53  23  30 24909 2013 0 0
      8   4   4 26461 1990 0 0
     10   0  10 26461 2005 0 0
     19   2  17 26461 2013 0 0
     59   8  51 26743 1990 0 0
     68   6  62 26743 1998 0 0
    127  14 113 26743 2005 0 0
    114  13 101 26743 2013 0 0
     14   1  13 27711 1990 0 0
     21   3  18 27711 1998 0 0
     25   4  21 27711 2005 0 0
     26   3  23 27711 2013 0 0
      9   0   9 32294 1990 0 0
     16   1  15 32294 1998 0 0
     11   0  11 32294 2005 0 0
    127 106  21 32294 2013 0 0
     11   3   8 32364 1990 0 0
      6   0   6 32364 1998 0 0
     21   2  19 32364 2005 0 0
     80  35  45 32364 2013 0 0
     13   0  13 32814 1990 0 0
     15   1  14 32814 1998 0 0
     16   1  15 32814 2005 0 0
     29  14  15 32814 2013 0 0
      0   0   0 33240 1990 0 0
      6   0   6 33240 2005 0 0
     21  10  11 33240 2013 1 5
      2   0   2 33724 1990 0 0
      8   0   8 33724 1998 0 0
      2   0   2 33724 2005 0 0
      4   0   4 33724 2013 0 0
    280 214  66 33835 1990 0 0
    147  86  61 33835 1998 0 0
    289 247  42 33835 2005 0 0
    159  85  74 33835 2013 1 3
      8   0   8 33919 1990 0 0
     15   1  14 33919 1998 0 0
     21   6  15 33919 2005 0 0
     21   4  17 33919 2013 0 0
      4   1   3 34680 1990 0 0
      9   1   8 34680 1998 0 0
      5   1   4 34680 2005 0 0
     32   6  26 34680 2013 0 0
      2   0   2 34689 1990 0 0
      2   0   2 34689 1998 0 0
      1   0   1 34689 2005 0 0
      2   0   2 34689 2013 0 0
      2   0   2 34697 1990 0 0
     12   0  12 34697 1998 0 0
     15   0  15 34697 2005 0 0
      4   0   4 34697 2013 0 0
     22   4  18 34698 1990 0 0
     35   2  33 34698 1998 0 0
     17   1  16 34698 2005 0 0
     71  57  14 34698 2013 0 0
      0   0   0 34788 1990 0 0
      0   0   0 34788 2005 0 0
      0   0   0 34788 2013 0 0
      0   0   0 34868 1990 0 0
      0   0   0 34868 2005 0 0
      1   0   1 34868 2013 0 0
      5   0   5 35162 1990 0 0
      7   1   6 35162 1998 0 0
      9   0   9 35162 2005 0 0
      6   0   6 35162 2013 0 0
      0   0   0 35163 1990 0 0
     10   0  10 35163 1998 0 0
      0   0   0 35163 2005 0 0
      3   0   3 35163 2013 0 0
      6   1   5 35164 1990 0 0
     24   5  19 35164 1998 0 0
      3   0   3 35164 2005 0 0
     11   3   8 35164 2013 0 0
      0   0   0 35165 1990 0 0
      0   0   0 35165 2005 0 0
      2   0   2 35165 2013 0 0
     17   4  13 35168 1990 0 0
     76   7  69 35168 1998 0 0
     24   5  19 35168 2005 0 0
     35   5  30 35168 2013 0 0
     18   2  16 35169 1990 0 0
     20   4  16 35169 1998 0 0
     28   4  24 35169 2005 0 0
     34   4  30 35169 2013 0 0
      1   0   1 35170 1990 0 0
    end
    format %ty year
    I am trying to conduct this analysis in Stata using two-way fixed effects (including time and unit fixed effects). In addition, to 'robustify' my regression, I am comparing models with and without state and state-district specific linear trends. My main outcome is employment (total) or employment in manufacturing or services.

    I had four questions regarding this analysis:

    1) I am trying to test parallel trends in my sample. I understand how to normally perform this in a classic difference in difference sample (comparing historical trends), but I am unsure of how to do this in my case, where different units are treated in different time periods. I should be able to test historical trends for all units using 1990 and 1998 observations (and 2005 for units treated in 2013), but could someone advise the best methodology to check historical trends are parallel in Stata?

    2) (This is more of a theoretical than practical question) I have some units in my sample (villages) which were treated only after my last round (2013). There are relatively few villages (around 3, 000) where this is the case). Conversely, I have a larger sample of around 40 000 villages who were never treated (even after the last round of the sample. In Difference in Difference analysis, should I utilise the never-treated units as controls, or is standard practice in such analysis to only compare units treated earlier or later?

    3) I was trying to robustify the regression through including unit and /or higher order linear trends. However, because of the large sample of villages in my estimation sample, it is not possible to include unit specific linear trends in Stata. Should I go with higher order trends, and if so , how can I implement this in Stata?

    4) I have data on the year of construction in addition to whether or not a village was actually treated in a certain round. I have constructed a variable y_pmgsy (in addition to the binary indicator r_pmgsy) which is a continuous variable marking how many years a village has received a rural road by at the time of the census round (0 if never treated). Is this the right approach to testing whether treatment outcomes vary depending on years of treatment, or is there a way to include interaction terms for this?

    Many thanks,

  • #2
    Read this paper.

    Comment

    Working...
    X