The choice between strata() and cluster() when using vce(bootstrap)

kristoffer.backstrom

Join Date: Nov 2014

Posts: 2
#1

The choice between strata() and cluster() when using vce(bootstrap)

12 Nov 2014, 00:49

Hi Stata users,

I would like to run a negative binomial model with bootstrapped standard errors based on an unbalanced panel data set comprising a number of countries over a thirty-year period. My intention is to run the model with fixed effects by including dummy variables for each country respectively. However, when using vce(bootstrap) I must choose between using cluster() or strata() - which should I choose? As far as I've understood I must choose between either strata() or cluster() in order to estimate correct standard errors (given the panel data structure of the dataset).

To give you an idea of the model I'm trying to run, I intend to use the following code:

nbreg PAT QUOTA DPRICEIND RDEXP TOTP DAUSTRIA DBELGIUM DDENMARK DFINLAND DFRANCE DSPAIN DSWEDEN, dispersion(mean) vce(bootstrap, strata(CTRYID) reps(200))

In vce(bootstrap), should I replace strata() with cluster()?

Finally, another question, is it necessary to drop observations containing missing values when using vce(bootstrap)?

Many thanks,
Kristoffer Bäckström
PhD Candidate
Luleå University of Technology
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#2

12 Nov 2014, 01:34

With the strata() option you will draw independent samples within each country. With the cluster() you will draw entire countries. Neither will take into account any time dependence within countries. You don't need to remove observations with missing values.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
kristoffer.backstrom

Join Date: Nov 2014

Posts: 2
#3

12 Nov 2014, 02:07

Thank you for the reply. But when is it reasonable to choose e.g. strata() over the other? I assume that there are certain situations where one is preferred over the other. Could you please also elaborate on "Neither will take into account any time dependence within countries.".

According to the Stata manual, observations with missing values need to be removed when using bootstrap. However, it does not say anything about such observations when using vce(bootstrap). This made me assume that you don't have to remove observations with missing values when using vce(bootstrap). Still, I don't receive the same results when fitting the same model twice using a dataset with/without missing value observations (when doing the comparison I set the same random-number seed).
Comment
MB Ross

Join Date: Apr 2015

Posts: 15
#4

03 Apr 2016, 10:24

I've been looking around for an answer to the difference between bootstrapping on strata vs. clusters as well and haven't found an answer. Hoping nudging this old thread might get some answers.

I'm wondering what the situation would be where you'd use strata over clusters. For instance, if clusters are unbalanced in terms of size and there are some clusters that are very small, would it be better to use strata which ensures sampling across all strata? Does stata compute the SE differently for each?
Comment

Announcement

The choice between strata() and cluster() when using vce(bootstrap)

Comment

Comment

Comment