Counterintuitive difference-in-difference results?

Tarjei W. Havneraas

Join Date: Nov 2016
Posts: 136

Counterintuitive difference-in-difference results?

30 Oct 2018, 07:48

Hi Statalisters

I am evaluating a training program for GPs to reduce suicide rate and I have some difference-in-difference results that seem counterintuitive. Descriptive statistics indicate that there should be a reduction in the suicide rate in the intervention region, but the DD-results shows no statistically significant effect. As it seems to be a null-finding I want to be completely sure that it's based on a well-specified model. I've been working on the model for some time and can't seem to find any misspecifications other than potentially how the standard errors are handled. I would be very happy for some input on the specifications of the SEs and whether there are other possible misspecifications.

Here's info on the model: I follow six country regions where two regions are control regions ("intervention" = 0) and four are intervention regions ("intervention" = 1) from 2012 to 2017. The intervention is rolled out between january 2016 to july 2016 so the pre-post variable ("pre_post_all") is defined as 0 for months before july 2016 and 1 for months following july 2016. Main outcome is monthly suicide rate per 100 000. I have checked the common trend assumption visually and formally with a test following the post by Ricardo Cavalho here. I've included fixed-effects, a month variable to control for time and clustered standard errors at the region level.

Here is some descriptive statistics:

Yearly suicide rate per 100 000	Control region	Intervention region
2012	11.39	11.02
2013	10.75	8.94
2014	11.03	7.29
2015	8.94	8.89
2016	7.81	8.20
2017	10.70	6.27

How the panel data is set and the model:

Code:

xtset region monthly_date
xtreg suiciderate i.intervention##i.pre_post_all monthly_date, fe cluster(region)

And the output:

Code:

Fixed-effects (within) regression               Number of obs     =        432
Group variable: region                          Number of groups  =          6

R-sq:                                           Obs per group:
     within  = 0.0592                                         min =         72
     between = 0.1034                                         avg =       72.0
     overall = 0.0565                                         max =         72

                                                F(3,5)            =      16.75
corr(u_i, Xb)  = 0.0311                         Prob > F          =     0.0049

                                              (Std. Err. adjusted for 6 clusters in region)
-------------------------------------------------------------------------------------------
                          |               Robust
              suiciderate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------------+----------------------------------------------------------------
             intervention |
            Intervention  |          0  (omitted)
           1.pre_post_all |   .1059561   .2775733     0.38   0.718    -.6075687    .8194809
                          |
intervention#pre_post_all |
          Intervention#1  |  -.0922136   .3400909    -0.27   0.797    -.9664452     .782018
                          |
             monthly_date |  -.0048216   .0016342    -2.95   0.032    -.0090225   -.0006206
                    _cons |   3.938659   1.092318     3.61   0.015     1.130765    6.746552
--------------------------+----------------------------------------------------------------
                  sigma_u |   .1652054
                  sigma_e |  .35753605
                      rho |  .17594101   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------------

Following Wing et al. (2018) "Designing Difference in Difference Studies: Best Practice for Public Health Policy Research" who point out that studies with small numbers of cluster should account for this, I've tried a cluster bootstrap approach to the standard errors:

Code:

xtreg suiciderate i.intervention##i.pre_post_all monthly_date, fe vce(bootstrap) cluster(region)

I've also tried different follow-up periods to examine whether there is a time-limited effect that vanishes 6 or 12 months after the intervention, e.g. by

Code:

// Post 12 m
gen byte pre_post12 = monthly_date > tm(2016m6) & monthly_date < tm(2017m6) if !missing(monthly_date)

Here's a data example:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(suiciderate intervention) long region float(monthly_date year)
 .4736469 0 1 625 2012
 .3724667 0 1 673 2016
1.1757903 1 4 652 2014
 .2885867 1 5 684 2017
 .9631134 1 4 642 2013
 .9401236 1 5 639 2013
 .6916414 1 4 648 2014
  .376433 1 6 648 2014
 1.990496 1 2 638 2013
 .4268652 1 6 685 2017
 .3213006 1 2 684 2017
 .7600408 1 5 663 2015
  .635285 1 2 663 2015
.19118175 1 5 673 2016
 .2885867 1 5 695 2017
end
format %tmMCY monthly_date
format %ty year
label values intervention intervention
label def intervention 0 "Control", modify
label def intervention 1 "Intervention", modify
label values region region
label def region 1 "North central region", modify
label def region 2 "North east region", modify
label def region 4 "South central region", modify
label def region 5 "South east region", modify
label def region 6 "South west region", modify

Last edited by Tarjei W. Havneraas; 30 Oct 2018, 08:06.

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#2

30 Oct 2018, 08:20

Tarjei:
my two cents about your query:
- with 6 groups only, clustered standard errors (SEs) may be misleading (and their bias can be worse than using default SEs);
- invoking clustered SEs rules out the chance for Stata to report the F-test at the footnote of the outcome table about the correctness of going -fe- vs pooled OLS; if the test lacks statistical significance, you should switch to pooled OLS. I would check the F-test outcome running -xtreg,fe- with defaulst SEs, just to have an idea of what's the matter with my data.

Kind regards,
Carlo
(Stata 19.0)
Comment
Tarjei W. Havneraas

Join Date: Nov 2016

Posts: 136
#3

30 Oct 2018, 10:53

Thank you for your helpful input. The effect is still not statistically significant after using regular SEs and the F-test supports the use of fixed effects. However, now I'm more assured of a correct specification. Would you recommend any references on clustered vs default SEs when the number of clusters is low?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#4

30 Oct 2018, 11:11

Tarjei:
the main assumption underlying clustering standard errors is that clusters should go to infinity for -vce(cluster clusterid) to work properly (see https://www.stata.com/bookstore/micr...metrics-stata/ page 335).
Unfortunately (just like for many other statistical issues) ther's no hard and fast (or thumb) rule that tells us when, other things being equal, clusters are really enough to satisfied the asymptotic property mentioned above.

Kind regards,
Carlo
(Stata 19.0)
Comment
Tarjei W. Havneraas

Join Date: Nov 2016

Posts: 136
#5

30 Oct 2018, 12:19

Ok, great, thank you for your explanation. It makes sense now knowing that there's an asymptotic property. I have the Cameron & Trivedi book and I'll look more into it.
Comment

Tarjei W. Havneraas

Join Date: Nov 2016
Posts: 136

31 Oct 2018, 04:46

Hi again

Thank you for the reference which I've now looked into. After trying the new specification I've found a statistically significant result of the intervention on suicide attempts rate per 100 000 (another main outcome). However, if I interpret it correctly it seems like there is a positive effect of the intervention, which in this case corresponds to an increase in suicide attempts in the intervention group. It would be very helpful if you or anyone else could see if I'm interpreting this right?

Code:

xtreg attemptrate i.intervention##i.pre_post_all monthly_date, fe

Code:

Fixed-effects (within) regression               Number of obs     =        432
Group variable: region                          Number of groups  =          6

R-sq:                                           Obs per group:
     within  = 0.0337                                         min =         72
     between = 0.2947                                         avg =       72.0
     overall = 0.0002                                         max =         72

                                                F(3,423)          =       4.92
corr(u_i, Xb)  = -0.1915                        Prob > F          =     0.0023

-------------------------------------------------------------------------------------------
              attemptrate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------------+----------------------------------------------------------------
             intervention |
            Intervention  |          0  (omitted)
           1.pre_post_all |  -.2537841   .1587266    -1.60   0.111    -.5657752    .0582071
                          |
intervention#pre_post_all |
          Intervention#1  |   .3769097   .1626356     2.32   0.021     .0572351    .6965844
                          |
             monthly_date |  -.0048548   .0024153    -2.01   0.045    -.0096023   -.0001073
                    _cons |   5.703895   1.571634     3.63   0.000     2.614709     8.79308
--------------------------+----------------------------------------------------------------
                  sigma_u |  .86064654
                  sigma_e |   .6900046
                      rho |  .60872864   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------------
F test that all u_i=0: F(5, 423) = 102.52                    Prob > F = 0.0000

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#7

31 Oct 2018, 05:04

Tarjei:
the main issue with your panel data is that you have T>N.
Hence, I would re-run the analysis with -xtregar, fe- instead of -xtreg, fe- before commenting on results.

Kind regards,
Carlo
(Stata 19.0)
Comment
Tarjei W. Havneraas

Join Date: Nov 2016

Posts: 136
#8

31 Oct 2018, 05:31

Ok, thank you. Intuitively results make more sense now as there is no increase in the intervention regions. However, I'm not well-acquainted with -xtregar, fe- but this strategy is well-suited for DD with T>N?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#9

31 Oct 2018, 05:55

Tarjei:
-xtregar- are recommended whenever you have a T>N panel data structure, when the autocorrelation preocess is AR1 (something unfeasible with -xtreg-).

Kind regards,
Carlo
(Stata 19.0)
Comment
Tarjei W. Havneraas

Join Date: Nov 2016

Posts: 136
#10

31 Oct 2018, 07:29

Great, thank you. I'll definitely read up on this.
Comment

Announcement

Counterintuitive difference-in-difference results?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment