Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best practice for cluster bootstrapping with fixed effects

    Dear statalist,

    this is my first post here, but I've been lurking for a while, thanks for all the useful information. I am the author of the mtefe package that estimates marginal treatment effects and was released in the latest issue of the Stata Journal.

    I am now making some revisions to the package, in particular in how it handles cluster bootstrap, and I've come across a problem that there must exist some best practice solution for that I haven't been able to find.

    The problem is that the program allows the user to specify a cluster variable and use the boostrap, which will tell mtefe to cluster boostrap on the cluster variable. The user may also include the cluster variable in any of the independent variable lists, most commonly as a set of fixed effects, and in those cases the coefficient matrix vary over bootstrap replications and Stata throws an error. In the current version, this was solved by just omitting all coefficients related to the cluster variable before posting the results, a very manual solution. I am now trying to rewrite the program to use the idcluster-option of bootstrap. To this end, I replace the cluster variable with the temporary idcluster variable before running the bootstrap, but this creates a few problems:
    • The idcluster option creates values for ID ranging from 1 to G, the total number of clusters. What if the users's original variable wasn't sequentially coded? If the cluster variable enters only as fixed effects, then the only problem is the labeling of the output table,
    • But if the users specified the cluster variable to enter linearly, the coefficients change, This is arguably a weird specification, but ideally I want my program to allow this in some reasonable way.
    • This solution also clusters on the temporary variable, so that the text "(Replications based on 10 clusters in __00001C)" appears above the table rather than the name of the original cluster variable
    Here are some examples using a simple regression to illustrate the problem:

    Code:
    sysuse auto, clear
    bootstrap, reps(2) cluster(rep78): regress price i.rep78 //Does not work because of varying coefficient size
    replace rep78=rep78+5 if rep78==4 //just to create a non-sequential variable
    gen tempclustvar=rep78
    bootstrap, reps(2) cluster(tempclustvar) idcluster(rep78): regress price i.rep78 //Does not work because fixed effets are numbered 1-5 for bootstrap replications
    drop rep78
    egen rep78=group(tempclustvar)
    bootstrap, reps(2) cluster(tempclustvar) idcluster(rep78): regress price i.rep78 // This works, but a) the cluster variable label is "tempclustvar" and the fixed effects are mislabeled 
    bootstrap, reps(2) cluster(tempclustvar) idcluster(rep78): regress price rep78 // Doesn't even recovering the coefficient on the original c.rep78.
    regress price tempclustvar // For comparison
    This problem seems so fundamental that there must exist some best practice solution I've missed. It should arise in any estimation command that accepts fixed effects and cluster boostrap. Any help is greatly appreciated.

  • #2
    For anyone else interested, I solved this by a) generating a new cluster variable ranging from 1 to G before calling bootstrap and cluster bootstrapping on a temporary copy of this variable and b) replacing the name of the clustered variable with the original name after the bootstrap using ereturn.

    I do the replacement in a) only if the variable list contains .clustvar, so I think it is possible for a user to recreate the problem using a specification containing e.g. some fixed effects for the cluster variable _and_ a linear term in the cluster variable. This is arguable a weird specification, but I'd ideally like my program to reproduce the same coefficients with and without bootstrap even for weird specifications. Any input on how to solve this is appreciated.
    Last edited by Martin Andresen; 30 Apr 2018, 02:58.

    Comment


    • #3
      Hi Martin,

      any development on your interesting idea? Indeed, I am facing a problem estimating a biprobit with both fixed effects and cluster bootstrapping.
      Any idea on how to solve it?

      Regards

      Comment


      • #4
        Hey guys,

        Bootstrapping with FE is a known pain in the back. Please inspect this solution.
        https://twitter.com/instrumenthull/s...69316010389516
        This is a neat way to proceed.
        Kind regards,
        Sergey Alexeev | ​The University of Sydney
        https://alexeev.pw/

        Comment

        Working...
        X