On bootstrapping on clusters

Alfonso Sánchez-Peñalver

Join Date: Mar 2014

Posts: 415
#1

On bootstrapping on clusters

20 Sep 2019, 14:39

Hi everyone,

I have a panel data set of experimental data, on which I want to bootstrap the partial effects. I have programmed my own mixed logit estimator, and I need to be able to identify the right clusters that are included in each iteration, in order to be able to pull the right draws from the random draw set for the random parameters. Since I am going to be using my own estimator with ml model, and my own program to calculate the partial effects of the different variables after the estimation of the mixed logit model, I am wondering exactly how the cluster(varname), idcluster(newvar), and group(varname) work.

My first question is about the group(varname) option. When bootstrapping with clusters my understanding is that the resampling is done of the clusters: the cases (individuals) in my case. Every time a case is included in the random sample used for estimation, all the choices (groups) related to that case should be included. Am I right about this? If so, since all occasions (choices) for the case are included each time that case is in the iteration's sample, what do we need the group(varname) for? Is this only needed when we use commands that require the group(varname) themselves, or something like that to pass the new variable name to that command? Since in my case I would form the data for the estimation myself based on the clusters selected in the variable created by idcluster(newvar), if my understanding is correct, then I won't be needing this variable, so do I still have to specify it in the bootstrap prefix for it to not throw an error?

The second question is what to expect of the variable formed by idcluster(newvar). My understanding is that it basically creates a variable with the same number of observations as in the sample, that has the cluster id for all the observations where that cluster was sampled in the iteration sample. Is this correct? Because if it is, I can then create my own matrices of observations for the dependent, independent variables, and Halton draws to use with my estimator in each iteration, as long as the assumption that all occasions (groups) within the cluster are used, which goes back to my first question. Also is the variable created a full variable, or is it a temporary variable so that I need to access it with the `newvar' syntax?

My final question is about what estimates of the partial effects to use, and of the mixed logit parameters for that matter. This comes after reading Stata's the documentation about bootstrapping. In the documentation it says that the estimates we should use are those we get on the estimation using the whole sample, and only use bootstrapping for the standard errors, that we shouldn't use the mean of the estimates from the bootstrapping for our estimates of the model's parameters, because those estimators would be biased. Is this correct?

I'd really appreciate any help you guys can provide. Thanks!

Last edited by Alfonso Sánchez-Peñalver; 20 Sep 2019, 14:48.

Alfonso Sanchez-Penalver
Tags: None

Announcement

On bootstrapping on clusters