Small Sample Inference for DID

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

Small Sample Inference for DID

21 Aug 2024, 14:12

I've been curious as of late about how to estimate SEs for small samples in DID. Say we estimate

Code:

clear *
net from "https://raw.githubusercontent.com/jgreathouse9/FDIDTutorial/main"
net install fdid, replace
u basque, clear
fdid gdpcap, tr(treat) gr2opts(scheme(sj))

mkf newframe

cwf newframe
cls
svmat e(series), names(col)
g time = _n
su time if eventt==0

loc lastneg = r(mean)-1
bootstrap, nodrop: reg te5 ib(`lastneg').time, nocons

my goal in the above code is to estimate bootstrap standard errors for each individual treatment effect. Yet, this is returned

Code:

. bootstrap, nodrop: reg te5 ib(`lastneg').time, nocons
(running regress on estimation sample)

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    50
insufficient observations to compute bootstrap standard errors
no results will be saved
r(2000);

end of do-file

r(2000);

Naturally, I'd expect the bootstrap SEs to be returned. If I get rid of the bootstrapping, we get

Code:

. reg te5 ib(`lastneg').time, nocons

      Source |       SS           df       MS      Number of obs   =        43
-------------+----------------------------------   F(42, 1)        =    197.06
       Model |  20.7657675        42  .494423035   Prob > F        =    0.0565
    Residual |  .002508966         1  .002508966   R-squared       =    0.9999
-------------+----------------------------------   Adj R-squared   =    0.9948
       Total |  20.7682764        43  .482983173   Root MSE        =    .05009

------------------------------------------------------------------------------
         te5 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        time |
          1  |   .0952546   .0500896     1.90   0.308    -.5411939    .7317031
          2  |   .0376283   .0500896     0.75   0.590    -.5988202    .6740768
          3  |  -.0217828   .0500896    -0.43   0.739    -.6582313    .6146656
          4  |  -.0741609   .0500896    -1.48   0.378    -.7106094    .5622876
          5  |  -.1258603   .0500896    -2.51   0.241    -.7623088    .5105882
          6  |  -.1159347   .0500896    -2.31   0.260    -.7523832    .5205138
          7  |   -.103331   .0500896    -2.06   0.287    -.7397795    .5331174
          8  |  -.0398849   .0500896    -0.80   0.572    -.6763334    .5965636
          9  |   .0090296   .0500896     0.18   0.886    -.6274189    .6454781
         10  |   .0795808   .0500896     1.59   0.358    -.5568677    .7160293
         11  |   .1404565   .0500896     2.80   0.218     -.495992     .776905
         12  |   .0977903   .0500896     1.95   0.301    -.5386582    .7342388
         13  |   .0518745   .0500896     1.04   0.489    -.5845739     .688323
         14  |   .0529453   .0500896     1.06   0.482    -.5835032    .6893938
         15  |   .0453055   .0500896     0.90   0.532     -.591143     .681754
         16  |  -.0016811   .0500896    -0.03   0.979    -.6381296    .6347674
         17  |  -.0322798   .0500896    -0.64   0.636    -.6687283    .6041687
         18  |  -.0548448   .0500896    -1.09   0.471    -.6912933    .5816037
         19  |  -.0901919   .0500896    -1.80   0.323    -.7266404    .5462566
         21  |   .1747319   .0500896     3.49   0.178    -.4617166    .8111804
         22  |  -.0432762   .0500896    -0.86   0.546    -.6797247    .5931723
         23  |  -.2550735   .0500896    -5.09   0.123     -.891522     .381375
         24  |  -.5257106   .0500896   -10.50   0.060    -1.162159    .1107379
         25  |  -.6823086   .0500896   -13.62   0.047    -1.318757   -.0458601
         26  |  -.8041667   .0500896   -16.05   0.040    -1.440615   -.1677182
         27  |  -.9361285   .0500896   -18.69   0.034    -1.572577     -.29968
         28  |  -1.005573   .0500896   -20.08   0.032    -1.642022   -.3691249
         29  |  -1.074268   .0500896   -21.45   0.030    -1.710716   -.4378194
         30  |  -1.007323   .0500896   -20.11   0.032    -1.643771   -.3708741
         31  |  -.9358075   .0500896   -18.68   0.034    -1.572256    -.299359
         32  |  -1.010143   .0500896   -20.17   0.032    -1.646591   -.3736942
         33  |  -1.068733   .0500896   -21.34   0.030    -1.705182    -.432285
         34  |  -1.149782   .0500896   -22.95   0.028    -1.786231   -.5133338
         35  |  -1.214765   .0500896   -24.25   0.026    -1.851213   -.5783162
         36  |   -1.18513   .0500896   -23.66   0.027    -1.821578    -.548681
         37  |  -1.174418   .0500896   -23.45   0.027    -1.810866   -.5379694
         38  |   -1.11872   .0500896   -22.33   0.028    -1.755169   -.4822716
         39  |  -1.063022   .0500896   -21.22   0.030     -1.69947   -.4265732
         40  |  -1.112293   .0500896   -22.21   0.029    -1.748742   -.4758449
         41  |  -.9926846   .0500896   -19.82   0.032    -1.629133   -.3562361
         42  |  -.9901853   .0500896   -19.77   0.032    -1.626634   -.3537368
         43  |  -.9516248   .0500896   -19.00   0.033    -1.588073   -.3151763
------------------------------------------------------------------------------

We can, however, redo this exact same estimation, using xtreg. This is the same model with a little more legwork.

Code:

clear *

u basque, clear

tempvar cohort

bys id: egen `cohort' = min(year) if treat==1

bys id: egen cohort = max(`cohort')

g event = year-cohort

bys id: g time = _n

replace time = 0 if missing(cohort)

summ time
g shifted_ttt = time - r(min)
summ shifted_ttt if event == 0
local true_neg1 = r(mean)-1
cls
* Regress on our interaction terms with FEs for group and year,
* clustering at the group (state) level
* use ib# to specify our reference group
xtreg gdpcap ib(`true_neg1').shifted_ttt i.year if inlist(id,2,5,10), fe vce(bootstrap)


* Pull out the coefficients and SEs
g coef = .
g se = .
levelsof shifted_ttt, l(times)
foreach t in `times' {
    replace coef = _b[`t'.shifted_ttt] if shifted_ttt == `t'
    replace se = _se[`t'.shifted_ttt] if shifted_ttt == `t'
}

* Make confidence intervals
g ci_top = coef+1.96*se
g ci_bottom = coef - 1.96*se

* Limit ourselves to one observation per quarter
* now switch back to time_to_treat to get original timing
keep event coef se ci_*
duplicates drop

sort event

* Create connected scatterplot of coefficients
* with CIs included with rcap
* and a line at 0 both horizontally and vertically
summ ci_top
local top_range = r(max)
summ ci_bottom
local bottom_range = r(min)

twoway (sc coef event, connect(line)) ///
    (rcap ci_top ci_bottom event), ///
    xtitle("Time to Treatment") caption("95% Confidence Intervals Shown") ///
    scheme(sj) xli(-1, lpat(dash)) yli(0)

My question, then, is how can I estimate bootstrapped SEs for the first block of code I provided? That is, we must estimate the SE for all periods, with the exception of the time period right before the treatment begins. Do I need to write a custom bootstrap program to do this? Or what other options might I have?

Tags: None

George Ford

Join Date: Aug 2014

Posts: 3138
#2

23 Aug 2024, 10:29

Look at fect (starting at line 580). It appears it uses bsample and then re-estimates the model repeatedly (no MATA). with fdid, that might take some time.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#3

23 Aug 2024, 14:27

Yeah George Ford this was actually the idea I'd come up with (few days ago I think). I'm kind of debating on going the extra mile and just having fect handle the final estimation on the reduced donor pool. Of course, I'd need to redo a little of fdid's syntax under the hood so that it respects fect's data structure requirements. But, I suspect this would be the most convenient way of doing things. After all, when I estimate

Code:

fect cigsale if inlist(id,3,4,5,19,21), /// treat(treated) unit(id) time(year) se nboots(500) /// vartype("bootstrap")

the returned point estimate is -13.64671, the exact same thing FDID currently returns. At that point, one could likely use the e(ATTs) matrix to combine N_1 treated units together, and calculate Cohort ATTs and standard errors, and event study plots without much effort. So, I'll consider it! Thing is, Kathy didn't do this in her original paper and (for Stata Journal purposes anyways), I can see reviewers complaining about how this method wasn't the one originally done in the paper so it would need more validation (i.e., simulation, etc). On the other hand, it would be the most straightforward way to approach this, I think.
Comment
George Ford

Join Date: Aug 2014

Posts: 3138
#4

24 Aug 2024, 07:46

That's one interesting thing about fdid. It's basically choosing a control group and then proceeding in a fairly straightforward way. If I can find the time, then I'll try to implement the fect style into fdid and send it along.
Comment

Announcement

Small Sample Inference for DID

Comment

Comment

Comment