Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • The choice between strata() and cluster() when using vce(bootstrap)

    Hi Stata users,

    I would like to run a negative binomial model with bootstrapped standard errors based on an unbalanced panel data set comprising a number of countries over a thirty-year period. My intention is to run the model with fixed effects by including dummy variables for each country respectively. However, when using vce(bootstrap) I must choose between using cluster() or strata() - which should I choose? As far as I've understood I must choose between either strata() or cluster() in order to estimate correct standard errors (given the panel data structure of the dataset).

    To give you an idea of the model I'm trying to run, I intend to use the following code:

    nbreg PAT QUOTA DPRICEIND RDEXP TOTP DAUSTRIA DBELGIUM DDENMARK DFINLAND DFRANCE DSPAIN DSWEDEN, dispersion(mean) vce(bootstrap, strata(CTRYID) reps(200))

    In vce(bootstrap), should I replace strata() with cluster()?


    Finally, another question, is it necessary to drop observations containing missing values when using vce(bootstrap)?



    Many thanks,
    Kristoffer Bäckström
    PhD Candidate
    Luleå University of Technology

  • #2
    With the strata() option you will draw independent samples within each country. With the cluster() you will draw entire countries. Neither will take into account any time dependence within countries. You don't need to remove observations with missing values.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thank you for the reply. But when is it reasonable to choose e.g. strata() over the other? I assume that there are certain situations where one is preferred over the other. Could you please also elaborate on "Neither will take into account any time dependence within countries.".

      According to the Stata manual, observations with missing values need to be removed when using bootstrap. However, it does not say anything about such observations when using vce(bootstrap). This made me assume that you don't have to remove observations with missing values when using vce(bootstrap). Still, I don't receive the same results when fitting the same model twice using a dataset with/without missing value observations (when doing the comparison I set the same random-number seed).

      Comment


      • #4
        I've been looking around for an answer to the difference between bootstrapping on strata vs. clusters as well and haven't found an answer. Hoping nudging this old thread might get some answers.

        I'm wondering what the situation would be where you'd use strata over clusters. For instance, if clusters are unbalanced in terms of size and there are some clusters that are very small, would it be better to use strata which ensures sampling across all strata? Does stata compute the SE differently for each?

        Comment

        Working...
        X