Bootstrapped Standard Errors with Panel Data: Works with "simulate" but not "bootstrap"

David Beheshti

Join Date: Mar 2018
Posts: 36

Bootstrapped Standard Errors with Panel Data: Works with "simulate" but not "bootstrap"

24 May 2018, 14:37

I have panel data (state-by-year) and am running an event study where I interact year fixed effects with the treatment variable. I want to get bootstrapped standard errors on the average annual effect for the years after the treatment takes place. Specifically, I want a standard error for the average of six coefficients: 2011.year#c.xvar-2016.year#c.xvar. Since I have panel data, I would like to bootstrap by resampling clusters.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
set more off
input double STFIPS float year byte xvar float yvar
 1 2004 64 18.951807
 1 2005 64    7.7751
 1 2006 64 10.911647
 1 2007 64 14.136546
 1 2008 64 16.875502
 1 2009 64 17.493977
 1 2010 64 15.859438
 1 2011 64 18.819277
 1 2012 64 21.558233
 1 2013 64  24.91566
 1 2014 64  30.17269
 1 2015 64 26.771086
 1 2016 64  37.99197
 2 2004 91 12.251126
 2 2005 91  18.25563
 2 2006 91 13.364865
 2 2007 91  21.20946
 2 2008 91  14.13964
 2 2009 91 14.962838
 2 2010 91 15.108109
 2 2011 91  14.13964
 2 2012 91 22.565315
 2 2013 91 24.114864
 2 2014 91 28.763514
 2 2015 91  32.83108
 2 2016 91  35.73649
 4 2004 74  25.71491
 4 2005 74  25.60088
 4 2006 74  22.29386
 4 2007 74 24.688597
 4 2008 74 24.061403
 4 2009 74  22.06579
 4 2010 74     29.25
 4 2011 74 33.811405
 4 2012 74 36.035088
 4 2013 74  38.71491
 4 2014 74  46.64035
 4 2015 74  46.92544
 4 2016 74  54.56579
 5 2004 64     17.64
 5 2005 64     15.54
 5 2006 64     21.77
 5 2007 64     28.84
 5 2008 64     27.37
 5 2009 64      23.8
 5 2010 64     28.28
 5 2011 64     22.68
 5 2012 64     22.75
 5 2013 64     24.08
 5 2014 64 34.719997
 5 2015 64     41.58
 5 2016 64     41.09
 6 2004 46        26
 6 2005 46  25.42339
 6 2006 46  23.53629
 6 2007 46  23.69355
 6 2008 46  22.22581
 6 2009 46  21.64919
 6 2010 46  20.18145
 6 2011 46  28.46371
 6 2012 46  30.14113
 6 2013 46  33.33871
 6 2014 46 36.326614
 6 2015 46 37.008064
 6 2016 46  38.52822
 8 2004 60    22.125
 8 2005 60 18.160713
 8 2006 60 18.160713
 8 2007 60    19.875
 8 2008 60 22.660713
 8 2009 60  23.94643
 8 2010 60 22.982143
 8 2011 60  25.82143
 8 2012 60 28.767857
 8 2013 60 32.142857
 8 2014 60  39.21429
 8 2015 60 35.732143
 8 2016 60  44.78571
 9 2004 89  34.79427
 9 2005 89    34.875
 9 2006 89  35.76302
 9 2007 89 29.466146
 9 2008 89 36.005207
 9 2009 89  30.27344
 9 2010 89 29.466146
 9 2011 89 27.609375
 9 2012 89  31.96875
 9 2013 89  39.88021
 9 2014 89  51.42448
 9 2015 89  52.95833
 9 2016 89  71.92969
10 2004 69  14.25352
10 2005 69  35.94366
10 2006 69  45.03286
10 2007 69 33.361504
10 2008 69  45.65258
10 2009 69 33.568077
10 2010 69  26.02817
10 2011 69  37.69953
10 2012 69  48.23474
end


cap program drop xt_boot
program define xt_boot, rclass
    preserve
        bsample, cluster(STFIPS) idcluster(newID)
        reg `1' b2009.year##c.`2' i.STFIPS
        return scalar avg_effect = (_b[2011.year#c.`2']+_b[2012.year#c.`2']+ ///
                       _b[2013.year#c.`2']+_b[2014.year#c.`2']+ ///
                       _b[2014.year#c.`2']+_b[2016.year#c.`2'])*(1/6) 
    restore
end 

reg yvar b2009.year##c.xvar i.STFIPS, cluster(STFIPS)
scalar observed_avg = (_b[2011.year#c.xvar]+_b[2012.year#c.xvar]+ ///
           _b[2013.year#c.xvar]+_b[2014.year#c.xvar]+ ///
           _b[2014.year#c.xvar]+_b[2016.year#c.xvar])*(1/6)

preserve
    simulate avg_effect=r(avg_effect), reps(10) seed(123): xt_boot yvar xvar
    bstat avg_effect, stat(observed_avg)
restore

I believe the above code does what I describe. However, when I try to do the same thing with the bootstrap command below (see this post for reference: https://www.stata.com/support/faqs/s...th-panel-data/)

Code:

cap program drop xt_boot2
program define xt_boot2, rclass
    reg `1' b2009.year##c.`2' i.STFIPS
    return scalar avg_effect = (_b[2011.year#c.`2']+_b[2012.year#c.`2']+ ///
                   _b[2013.year#c.`2']+_b[2014.year#c.`2']+ ///
                   _b[2014.year#c.`2']+_b[2016.year#c.`2'])*(1/6) 
end 

gen newID = STFIPS
xtset newID year
bootstrap avg_effect = r(avg_effect), rep(10) nodrop seed(123) ///
    cluster(STFIPS) idcluster(newID): xt_boot2 yvar xvar

I get the following error:

insufficient observations to compute bootstrap standard errors
no results will be saved
r(2000);

I think I am making a mistake in setting the cluster and idcluster options, but I'm not sure what exactly the issue is.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

24 May 2018, 15:48

If you add the -noisily- option to your bootstrap command you will see that all of the regressions fail with a message that the colinearity pattern in the individual bootstrap runs do not match that in the original full sample. The problem is that you cannot both sample on clusters and then use i.cluster in the regression, because there will be different clusters in each regression, so that the results from different bootstrap samples will be based on different models and cannot be pooled together.

I think what you want is to use -strata()- instead of -cluster-. This, however, will sample individual observations within strata, and will guarantee representation from every STFIPS.

Code:

bootstrap avg_effect = r(avg_effect), rep(10) nodrop seed(123) /// strata(STFIPS): xt_boot2 yvar xvar

Or, if you really want to do bootstrap resampling at the STFIPS cluster level, with the -cluster()- option, then you have to take i.STFIPS out of the regression model because there is no way to assure that it will mean the same thing each time. (In fact you can be assured that it will vary considerably from one call to the next.)

You may be wondering why it worked with your first approach. The answer is that by using -bsample- inside the program instead of calling -bootstrap- you bypassed the code that checks for problems like this. So you fooled Stata into putting together the results from different regression models as if they were one. Stata complied and gave you answers, but I believe those answers are meaningless. You are bootstrapping coefficients from different regressions on each cycle and that makes no sense.

Finally, I don't really get why you want to do any of this in the first place. You're not trying to get standard errors on some exotic statistic: your looking for standard errors on regression coefficients. Why do you feel the need to -bootstrap- for this?
1 like
Comment
David Beheshti

Join Date: Mar 2018

Posts: 36
#3

24 May 2018, 16:45

Hi Clyde,

Thank you for your very helpful answer! In response to your last point, I just greatly underestimated how simple it is to get standard errors on linear combinations of regression coefficients using lincom.

However, I'm still curious (say if I was trying to get the standard error on an exotic statistic) if I changed the first approach to include i.newID as opposed to i.STFIPS, i.e.:

cap program drop xt_boot
program define xt_boot, rclass
preserve
bsample, cluster(STFIPS) idcluster(newID)
qui reg `1' b2009.year##c.`2' i.newID
return scalar avg_effect = (_b[2011.year#c.`2']+_b[2012.year#c.`2']+ ///
_b[2013.year#c.`2']+_b[2014.year#c.`2']+ ///
_b[2014.year#c.`2']+_b[2016.year#c.`2'])*(1/6)
restore
end

reg yvar b2009.year##c.xvar i.STFIPS, cluster(STFIPS)
scalar observed_avg = (_b[2011.year#c.xvar]+_b[2012.year#c.xvar]+ ///
_b[2013.year#c.xvar]+_b[2014.year#c.xvar]+ ///
_b[2014.year#c.xvar]+_b[2016.year#c.xvar])*(1/6)

preserve
simulate avg_effect=r(avg_effect), reps(2000) seed(123): xt_boot yvar xvar

bstat avg_effect, stat(observed_avg)
restore

This seems to do what I initially thought I wanted and (when I increase reps) gives standard errors which are nearly identical to what I get from

qui reg yvar b2009.year##c.xvar i.STFIPS, cluster(STFIPS)
lincom (_b[2011.year#c.xvar]+_b[2012.year#c.xvar]+ ///
_b[2013.year#c.xvar]+_b[2014.year#c.xvar]+ ///
_b[2014.year#c.xvar]+_b[2016.year#c.xvar])*(1/6)

although maybe this is a coincidence? I'm not sure this actually addresses your point about there being different clusters in each regression.
Comment

Announcement

Bootstrapped Standard Errors with Panel Data: Works with "simulate" but not "bootstrap"

Comment

Comment