Bootstrapping failures when including regional fixed effects

Basile Boulay

Join Date: Jun 2016

Posts: 4
#1

Bootstrapping failures when including regional fixed effects

27 Jun 2016, 07:20

Dear Stata users
I am having issues with bootstrapping standard errors after including regional fixed effected in my regressions. I know this has been commented on some previous posts and it seems to be a recurrent issue but I have not yet been able to solve my problem with the info posted there.

If I run my command without regional fixed effects as

Code:

bootstrap, reps(1000) seed(8976): probit y x z, cluster(hh)

Stata has no problem and all 1000 thousand replications are successful. However, if I run instead:

Code:

bootstrap, reps(1000) seed(8976): probit y x z i.region, cluster(hh)

many replications fail. Using the noisily option delivers the usual message: collinearity in replicate sample is not the same as the full sample, posting missing values
insufficient observations to compute bootstrap standard errors no results will be saved

My bet is that when a new sample is drawn, observations for a particular region can be missing and hence cause the problem (given that some regions have a small number of observations, this is quite possible). My sample size is quite big and the number of regions is low, so that I am confident this is not a problem of insufficient degrees of freedom.

I have read about using the nodrop command, but this does not work. If what is happening is the problem I just mentioned (some regions not appearing in some replications), then I was thinking about specifying more than 1000 replications and keeping the first thousand successful ones. This may look a bit of a dirty practice, but to the extent that it is only due to some regions being minor in terms of observations in the dataset, it could be a solution. Does anyone know how to keep the first 1000 succesful replications our of (say) 3000?

Many thanks in advance for your help,

Basile
Tags: bootstrap, fixed effects, replication
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#2

27 Jun 2016, 07:31

Basile:
welcome to the list.
Just out of curiosity: can't you switch to clustered standard errors if -bootstrap- is bothering you?
That said, if you want to go along the way that you have described, you can use -runiform()- to act as a filter to delete the -bootstrap- replications in excess.
The following toy-example considers discarding 50 out of 100 faked bootstrap replications:

Code:

. set obs 100 number of observations (_N) was 0, now 100 . g faked_boot_repl=23*runiform() . g counter=runiform() . sort counter . drop if _n>50 (50 observations deleted)

Kind regards,
Carlo
(Stata 19.0)
Comment
Basile Boulay

Join Date: Jun 2016

Posts: 4
#3

27 Jun 2016, 07:51

Dear Carlo
Many thanks for your answer. I am sorry I forgot to add that the reason I am bootstraping in the first place is because one of the regressor is a generated regressor from a previous estimation. Sorry about this. So the code would read:

Code:

bootstrap, reps(1000) seed(8976): probit y x zhat i.region, cluster(hh)

Where zhat is the generated regressor

Thanks a lot
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#4

27 Jun 2016, 08:09

Your diagnosis of the problem may well be correct. I think there is a better solution. Specify the -cluster(region)- and -idcluster(pick_a_new_variable_name)- in the -bootstrap- prefix and the re-sampling will respect the region structure of your data.

Also, while Stata still accepts the older syntax -cluster(hh)- to specify cluster-robust covariance estimation in the -probit- command, the current and preferred syntax for this is -vce(cluster hh)-. Although it doesn't affect how anything runs, in the context where you need to specify a -cluster()- option in the bootstrap prefix, using -vce(cluster hh)- will be less confusing to read.

All of that said, it looks like you are trying to get around the non-existence of an -xtprobit, fe-. Without trying to counsel you on the wisdom of that, I should point out that if -xtprobit, fe- did exist, you would not be able to specify region as the panel variable and then obtain robust VCE clustered on hh: in fixed-effect estimators the robust VCE clusters have to have panels nested in them, not the other way around. My theoretical knowledge in these areas is limited, so I can't give you a full and robust (no pun intended) explanation of my concerns, but I sense you are skating on thin statistical ice here, even if you can get your code to run and produce output.
Comment
Basile Boulay

Join Date: Jun 2016

Posts: 4
#5

27 Jun 2016, 08:28

Dear Clyde. Thanks for your answer. I have tried this option but it does not work either. In fact, using the -cluster(region)- and -idcluster(new_var)- produces 100 per cent failure rate, so even worse than before... For example:

Code:

bootstrap id, reps(1000) seed(9876) cluster(region) idcluster(id): probit y x zhat i.region, vce(cluster hh)

produces full failure...

Regarding the 'big picture' problem I do agree this can be seen as thin ice, and I will give it more thought!
Comment
Basile Boulay

Join Date: Jun 2016

Posts: 4
#6

28 Jun 2016, 03:30

Hi again,
I have found that using the jackknife option instead of the bootstrap option works perfectly fine, even after inclusion of regional fixed effects. It seems that jackknife is a 'special case' of bootstrap, so I'm wondering whether it is ok to use it? In economics research one often comes cross bootstrap but not so much jackknife.

I am thinking that if bootstrapping fails because some regions are not including after resampling, then jackknife could be a solution since once the observation has been 'used', it would be dropped in the next re sampling procedure (which has N-1 observations), so that it could not cause any failure in the future (as in bootstrap). Does this reasoning make sense or am I on the wrong track?

Many thanks again!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#7

28 Jun 2016, 06:41

Code:

bootstrap id, reps(1000) seed(9876) cluster(region) idcluster(id): probit y x zhat i.region, vce(cluster hh)

is legal syntax, but it is probably not what you intend, and it may be why you get failure on every replication here. The bolded italicized id does not belong there. (The one in -idcluster(id)- is fine, assuming id does not already exist as a variable in your data.) It is telling Stata to save "the value of id" and bootstrap it. GIven that id does not even exist until after the resampling is done (because it is a new variable created by the -idcluster()- option, and given that it is a variable, not a scalar, I'm not sure what Stata makes of it. At the very least, I'm sure that Stata is unable to find anything called id in the returned results of -probit- and therefore is, at best, posting a missing value each time.

If I were you, I would try re-running this with that id removed.
Comment
Francis Ostermeijer

Join Date: Nov 2018

Posts: 1
#8

07 Nov 2018, 02:43

Because you are clustering your bootstraps the the regional level and include regional fixed effects, in some bootstrap runs, not all the regions are sampled. This means you cannot get an estimate for the fixed effect parameter and causes an issue with the bootstrap. I had a similar issue. One way I managed to get around a similar problem (assuming you are not interested in the fixed effect coefficients) is by estimating the probit model quietly (e.g.

Code:

capture program drop spec program spec, eclass qui: probit y x z, vce(cluster hh) end bootstrap_b, reps(1000) seed(9876) cluster(region): spec

Then extract the coefficients of interest using

Code:

esttab

or

Code:

estout

.

Note i also ran an additional

Code:

margins

command after quietly running the model, which may be the reason why quietly works in my case.
Comment

Announcement

Bootstrapping failures when including regional fixed effects

Comment

Comment

Comment

Comment

Comment

Comment

Comment