Dear statalist,
this is my first post here, but I've been lurking for a while, thanks for all the useful information. I am the author of the mtefe package that estimates marginal treatment effects and was released in the latest issue of the Stata Journal.
I am now making some revisions to the package, in particular in how it handles cluster bootstrap, and I've come across a problem that there must exist some best practice solution for that I haven't been able to find.
The problem is that the program allows the user to specify a cluster variable and use the boostrap, which will tell mtefe to cluster boostrap on the cluster variable. The user may also include the cluster variable in any of the independent variable lists, most commonly as a set of fixed effects, and in those cases the coefficient matrix vary over bootstrap replications and Stata throws an error. In the current version, this was solved by just omitting all coefficients related to the cluster variable before posting the results, a very manual solution. I am now trying to rewrite the program to use the idcluster-option of bootstrap. To this end, I replace the cluster variable with the temporary idcluster variable before running the bootstrap, but this creates a few problems:
This problem seems so fundamental that there must exist some best practice solution I've missed. It should arise in any estimation command that accepts fixed effects and cluster boostrap. Any help is greatly appreciated.
this is my first post here, but I've been lurking for a while, thanks for all the useful information. I am the author of the mtefe package that estimates marginal treatment effects and was released in the latest issue of the Stata Journal.
I am now making some revisions to the package, in particular in how it handles cluster bootstrap, and I've come across a problem that there must exist some best practice solution for that I haven't been able to find.
The problem is that the program allows the user to specify a cluster variable and use the boostrap, which will tell mtefe to cluster boostrap on the cluster variable. The user may also include the cluster variable in any of the independent variable lists, most commonly as a set of fixed effects, and in those cases the coefficient matrix vary over bootstrap replications and Stata throws an error. In the current version, this was solved by just omitting all coefficients related to the cluster variable before posting the results, a very manual solution. I am now trying to rewrite the program to use the idcluster-option of bootstrap. To this end, I replace the cluster variable with the temporary idcluster variable before running the bootstrap, but this creates a few problems:
- The idcluster option creates values for ID ranging from 1 to G, the total number of clusters. What if the users's original variable wasn't sequentially coded? If the cluster variable enters only as fixed effects, then the only problem is the labeling of the output table,
- But if the users specified the cluster variable to enter linearly, the coefficients change, This is arguably a weird specification, but ideally I want my program to allow this in some reasonable way.
- This solution also clusters on the temporary variable, so that the text "(Replications based on 10 clusters in __00001C)" appears above the table rather than the name of the original cluster variable
Code:
sysuse auto, clear bootstrap, reps(2) cluster(rep78): regress price i.rep78 //Does not work because of varying coefficient size replace rep78=rep78+5 if rep78==4 //just to create a non-sequential variable gen tempclustvar=rep78 bootstrap, reps(2) cluster(tempclustvar) idcluster(rep78): regress price i.rep78 //Does not work because fixed effets are numbered 1-5 for bootstrap replications drop rep78 egen rep78=group(tempclustvar) bootstrap, reps(2) cluster(tempclustvar) idcluster(rep78): regress price i.rep78 // This works, but a) the cluster variable label is "tempclustvar" and the fixed effects are mislabeled bootstrap, reps(2) cluster(tempclustvar) idcluster(rep78): regress price rep78 // Doesn't even recovering the coefficient on the original c.rep78. regress price tempclustvar // For comparison
Comment