Best practice for cluster bootstrapping with fixed effects

Martin Andresen

Join Date: Apr 2018

Posts: 6
#1

Best practice for cluster bootstrapping with fixed effects

23 Apr 2018, 03:22

Dear statalist,

this is my first post here, but I've been lurking for a while, thanks for all the useful information. I am the author of the mtefe package that estimates marginal treatment effects and was released in the latest issue of the Stata Journal.

I am now making some revisions to the package, in particular in how it handles cluster bootstrap, and I've come across a problem that there must exist some best practice solution for that I haven't been able to find.

The problem is that the program allows the user to specify a cluster variable and use the boostrap, which will tell mtefe to cluster boostrap on the cluster variable. The user may also include the cluster variable in any of the independent variable lists, most commonly as a set of fixed effects, and in those cases the coefficient matrix vary over bootstrap replications and Stata throws an error. In the current version, this was solved by just omitting all coefficients related to the cluster variable before posting the results, a very manual solution. I am now trying to rewrite the program to use the idcluster-option of bootstrap. To this end, I replace the cluster variable with the temporary idcluster variable before running the bootstrap, but this creates a few problems:
The idcluster option creates values for ID ranging from 1 to G, the total number of clusters. What if the users's original variable wasn't sequentially coded? If the cluster variable enters only as fixed effects, then the only problem is the labeling of the output table,

But if the users specified the cluster variable to enter linearly, the coefficients change, This is arguably a weird specification, but ideally I want my program to allow this in some reasonable way.

This solution also clusters on the temporary variable, so that the text "(Replications based on 10 clusters in __00001C)" appears above the table rather than the name of the original cluster variable

Here are some examples using a simple regression to illustrate the problem:

Code:

sysuse auto, clear bootstrap, reps(2) cluster(rep78): regress price i.rep78 //Does not work because of varying coefficient size replace rep78=rep78+5 if rep78==4 //just to create a non-sequential variable gen tempclustvar=rep78 bootstrap, reps(2) cluster(tempclustvar) idcluster(rep78): regress price i.rep78 //Does not work because fixed effets are numbered 1-5 for bootstrap replications drop rep78 egen rep78=group(tempclustvar) bootstrap, reps(2) cluster(tempclustvar) idcluster(rep78): regress price i.rep78 // This works, but a) the cluster variable label is "tempclustvar" and the fixed effects are mislabeled bootstrap, reps(2) cluster(tempclustvar) idcluster(rep78): regress price rep78 // Doesn't even recovering the coefficient on the original c.rep78. regress price tempclustvar // For comparison

This problem seems so fundamental that there must exist some best practice solution I've missed. It should arise in any estimation command that accepts fixed effects and cluster boostrap. Any help is greatly appreciated.

Stata Journal | Article

https://www.stata-journal.com
Tags: None
Martin Andresen

Join Date: Apr 2018

Posts: 6
#2

30 Apr 2018, 02:38

For anyone else interested, I solved this by a) generating a new cluster variable ranging from 1 to G before calling bootstrap and cluster bootstrapping on a temporary copy of this variable and b) replacing the name of the clustered variable with the original name after the bootstrap using ereturn.

I do the replacement in a) only if the variable list contains .clustvar, so I think it is possible for a user to recreate the problem using a specification containing e.g. some fixed effects for the cluster variable _and_ a linear term in the cluster variable. This is arguable a weird specification, but I'd ideally like my program to reproduce the same coefficients with and without bootstrap even for weird specifications. Any input on how to solve this is appreciated.

Last edited by Martin Andresen; 30 Apr 2018, 02:58.
Comment
Rezart Hoxhaj

Join Date: Aug 2018

Posts: 21
#3

04 Oct 2018, 04:43

Hi Martin,

any development on your interesting idea? Indeed, I am facing a problem estimating a biprobit with both fixed effects and cluster bootstrapping.
Any idea on how to solve it?

Regards
Comment
Sergey Alexeev

Join Date: Oct 2016

Posts: 34
#4

17 Aug 2022, 21:18

Hey guys,

Bootstrapping with FE is a known pain in the back. Please inspect this solution.
https://twitter.com/instrumenthull/s...69316010389516
This is a neat way to proceed.

Kind regards,
Sergey Alexeev | The University of Sydney
https://alexeev.pw/
Comment

Announcement

Best practice for cluster bootstrapping with fixed effects

Comment

Comment

Comment