No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • boottest ~100X faster after IV/GMM

    Prompted by commentary from James MacKinnon, I completely rethought and reworked how boottest performs the wild bootstrap after IV/GMM--more precisely, how it performs the Wild Restricted Efficient bootstrap of Davidson & MacKinnon (2010). On my benchmark, the Steven Levitt criminology example at the end of the "Fast and Wild" paper about boottest, it is now 233 times faster! This change is distinct from the ~10X speed-up after OLS

    The main things you need to know if you use boottest, or are boottest-curious:
    • One new change can slightly affect results, specifically the bounds for confidence sets--and not just after IV/GMM. I switched to a different search algorithm for pinpointing CI bounds (Chandrupatla 1997). In the bootstrap, the p value as a function of the trial parameter value is a step function when viewed at high resolution: it can only take values n/B where n is an integer and B is the number of bootstrap replications. As a result, when searching for the cross-over point for, say, p = 0.05, values in a small range are equally valid. The new algorithm happens to settle on slightly different points. These discrepancies disappear as you increase the number of replications.
    • The new version is available on SSC with
      ssc install boottest, replace
    • The release history back to 2017 is preserved on Github. To install an old version do
      net install boottest, replace from(
      where X.Y.Z is the version number, such as 3.1.0.
    The big idea behind the update is to exploit the Frisch-Waugh-Lovell theorem to reduce the dimensionality of the estimation problem. As explained in Fast & Wild, when performing a wild bootstrap on an IV/GMM regression, all of the endogenous variables will see their values change from replication to replication. And, whereas in OLS there is only one endogenous variable and the estimator beta = (X'X)^-1 X'Y is linear in that variable, in IV/GMM, there are more endogenous variables and the estimator is nonlinear in some of them, the instrumented variables. E.g., delta = (Z'MZ)^-1 Z'MY where Z is the regressor set, which includes endogenous variables. This nonlinearity gets in the way of the optimization tricks I developed for the OLS wild bootstrap.

    What I realized is that by partialling out a control set that consists of, say, 100 exogenous dummy variables, the dimensionality of the estimator can be reduced during bootstrap replication. Then the matrix to be inverted and multiplied each time, Z'MZ, could be 1x1 instead of 100x100. The cost is that on each replication, the partialled-out variables must also be partialled out of the bootstrap versions of the endogenous variables. But this is a linear operation, of the the sort that occurs when wild-bootstrapping OLS, which I already knew how to speed up.

    The surgery was radical, so please let me know if you run into problems.
    Last edited by David Roodman; 15 Mar 2021, 09:45.

  • #2
    Dear David,

    My version is 3.1.0. The help file said that it does not support linear GMM regression. May I ask if there is any way to get bootstrapped confidence intervals after GMM regression?

    Great thanks.


    • #3
      You can use Stata's bootstrap prefix command. Equivalently, some Stata commands accept vce(bootstrap). These will do a nonparametric bootstrap. For GMM, this has the advantage that the GMM weight matrix will be recomputed on each bootstrap replication, which boottest never did.


      • #4
        Dear David,

        Thanks for your suggestion. I tried the bootstrap prefix command, and it arose the error
        "repeated time values within panel
        the most likely cause for this error is misspecifying the cluster(), idcluster(), or group() option".

        I search a bit on this issue in this forum, but I couldn't find a solution for GMM estimation.