Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrapping with dummies

    Dear Stata Forum,
    I have data with a small number of clusters but with a large number of individuals within a class.
    I have four household surveys over 4 countries and 26 regions (Pooled-Crosssection).

    To overcome the overstated standard errors I´d like to bootstrap my results.
    However, a regional dummy variable is causing me trouble.
    I already know that sometimes a region is included in a bootstrap sample and sometimes left out.
    That is alright since I am not interested in the coefficients of the dummy variable.

    I am using Stata 16.

    Below you can see my data:
    Code:
     Stata 16
    input double rem_edu float(handling_edu1 ln_exp_total ln_remittance) double(rem_freq oecd femmigrant childprim childsec childtert elderlyageabove62 womenabove15) float(hh_edu head_gender) byte(hagriland urban) float(incomepoverty school) byte region
                      0 .03949346  6.050091  4.544163  5 0 1 0 0 0 1 1 2 0 1 1 .5724485 .9207876 11
                      0 .03949346   6.18593  5.642776  3 1 1 0 0 0 1 1 1 0 1 1 .5724485 .9207876 11
       .800000011920929 .03949346  6.777756  6.510276 10 0 0 0 0 0 0 4 3 1 1 1 .5724485 .9207876 11
                      1 .03949346  6.255384  6.510276  5 0 1 2 0 0 0 1 1 1 1 1 .5724485 .9207876 11
                      0 .03949346  5.062107  5.370842  5 1 1 3 2 0 0 2 1 1 1 1 .5724485 .9207876 11
                      0 .03949346  5.837912  7.818609 16 1 1 2 0 0 0 3 3 1 0 1 .5724485 .9207876 11
                    .75 .03949346  7.687905  8.685028  3 1 0 2 1 0 0 3 3 0 0 1 .5724485 .9207876 11
                      0 .03949346  6.927889  5.163203  4 1 1 0 0 0 1 3 2 0 1 1 .5724485 .9207876 11
                      0         0  4.999174  5.999451  2 1 1 1 0 0 0 1 2 0 1 0 .6428978 .7905115 12
                      0         0  4.515176         .  3 1 1 0 0 0 0 1 2 0 1 1 .6428978 .7905115 12
      .4000000059604645         0  8.181123  7.203424  2 1 1 0 0 0 0 3 2 0 1 0 .6428978 .7905115 12
      .5882353186607361         0  6.092485  7.916373 12 0 0 3 0 0 0 3 1 1 1 0 .6767302        1 12
     .44117647409439087         0  4.240481  6.181772 12 0 0 5 2 0 0 1 1 1 1 0 .6767302        1 12
    and this is the regression command I use:
    [CODE][xtset country
    bootstrap, nodrop cluster(country) idcluster(countryb): xtreg rem_edu handling_edu1 ln_exp_total ln_remittance rem_freq oecd femmigrant childprim childsec childtert elderlyageabove62 womenabove15 i.hh_edu head_gender i.hagriland i.urban incomepoverty school i.region, re vce(cluster country)/CODE]

    Is there any solution to the dummy problem?

    Kind regards
    Alexander
    Last edited by Alexander Marx; 30 May 2020, 03:01.

  • #2
    Alexander:
    can't you simply bootstrap your standard errors (see the SE options under -xtreg-)?
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      You mean by
      Code:
      xtreg y x, re vce(boot)
      ?

      It doesnt give me standard errors:

      Code:
      Random-effects GLS regression                   Number of obs     =      2,268
      Group variable: country                         Number of groups  =          4
      
      R-sq:                                           Obs per group:
           within  = 0.0805                                         min =        332
           between = 0.9977                                         avg =      567.0
           overall = 0.2180                                         max =        818
      
                                                      Wald chi2(0)      =          .
      corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =          .
      
                                             (Replications based on 4 clusters in country)
      ------------------------------------------------------------------------------------
                         |   Observed   Bootstrap                         Normal-based
                 rem_edu |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------------+----------------------------------------------------------------
           handling_edu0 |  -.0427359          .        .       .            .           .
            ln_exp_total |   .0197525          .        .       .            .           .
           ln_remittance |   .0023672          .        .       .            .           .

      Comment


      • #4
        Alexander:
        can you please share via -dataex- an excerpt of your data that mirrors exactly your last -xtreg- code?
        In your previous example -country- was not included.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          I am sorry for my mistake,,
          here is the the data including the country variable:
          input float country double rem_edu float(handling_edu1 ln_exp_total ln_remittance) double(rem_freq oecd femmigrant elderlyageabove62 womenabove15) float(hh_edu head_gender) byte(hagriland urban) float(incomepoverty school) byte region
          6 .5882353186607361 0 6.092485 7.916373 12 0 0 0 3 1 1 1 0 .6767302 1 12
          6 .44117647409439087 0 4.240481 6.181772 12 0 0 0 1 1 1 1 0 .6767302 1 12
          6 0 0 3.257499 4.544163 3 0 1 1 2 2 0 1 0 .6767302 1 12
          15 0 .09039909 6.283722 7.56499 6 1 1 0 6 1 0 0 1 .6026112 .238213 19
          15 0 .09039909 4.4023714 . 1 1 0 0 2 1 1 0 1 .6026112 .238213 19
          15 0 .09039909 5.271214 . 10 1 1 0 3 1 0 0 1 .6026112 .238213 19
          15 .062339331954717636 .09039909 5.506657 8.006822 11 1 0 1 5 1 0 1 1 .6026112 .238213 19
          15 .04739336669445038 .09039909 4.201944 7.325038 24 1 0 0 3 1 0 0 1 .6026112 .238213 19
          18 .7627118644067796 .04118309 4.2753835 6.311872 4 0 1 0 1 3 0 1 1 .3454125 .9154598 1
          18 0 .04118309 3.675561 7.353326 2 1 0 0 2 2 0 0 1 .3454125 .9154598 1
          18 0 .04118309 3.433923 6.311872 3 0 0 0 1 2 1 0 1 .3454125 .9154598 1
          14 0 .0625 3.010644 3.9633024 3 0 1 1 1 1 0 1 0 .75 1 15
          14 .18181818181818182 .0625 6.479462 . 0 0 1 0 4 1 0 0 0 .75 1 15
          14 0 .0625 1.3308897 2.7105396 2 0 0 0 1 1 1 1 0 .75 1 15







          This is the regression I performed:
          Code:
          xtreg rem_edu handling_edu1 ln_exp_total ln_remittance rem_freq oecd femmigrant childprim childsec childtert elderlyageabove62 womenabove15 i.hh_edu head_gender i.hagriland i.urban incomepoverty school road i.region, re vce(boot)
          and here is the stata output:
          Code:
          Random-effects GLS regression                   Number of obs     =      2,268
          Group variable: country                         Number of groups  =          4
          
          R-sq:                                           Obs per group:
               within  = 0.0817                                         min =        332
               between = 0.9982                                         avg =      567.0
               overall = 0.2192                                         max =        818
          
                                                          Wald chi2(0)      =          .
          corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =          .
          
                                                      (Replications based on 4 clusters in country)
          -----------------------------------------------------------------------------------------
                                  |   Observed   Bootstrap                         Normal-based
                          rem_edu |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          ------------------------+----------------------------------------------------------------
                    handling_edu1 |  -.1360502          .        .       .            .           .
                     ln_exp_total |    .019857          .        .       .            .           .
                    ln_remittance |   .0020914          .        .       .            .           .
                         rem_freq |  -.0011525          .        .       .            .           .
                             oecd |  -.0100997          .        .       .            .           .
                       femmigrant |   .0115505          .        .       .            .           .
                        childprim |   .0059648          .        .       .            .           .
                         childsec |   .0248458          .        .       .            .           .
                        childtert |   .0284721          .        .       .            .           .
                elderlyageabove62 |  -.0032485          .        .       .            .           .
                     womenabove15 |   .0002082          .        .       .            .           .
          Last edited by Alexander Marx; 30 May 2020, 08:25.

          Comment


          • #6
            Alexander:
            set aside that -child*- predictors were not included in your excerpt, your dataset seems to suffer from quasi-extreme multicollinearity.
            This suspect is also supported by the sky-rocketing R-sq between.
            The advise is to consider a more parsimonious model.
            Please find below what I got after running -xtreg- on your data excerpt:

            Code:
            . xtreg rem_edu handling_edu1 ln_exp_total ln_remittance rem_freq oecd femmigrant elderlyageabove62 womenabove15 i.hh_edu head_gender i
            > .hagriland i.urban incomepoverty school i.region, re vce(cluster country)
            note: 3.hh_edu omitted because of collinearity
            note: head_gender omitted because of collinearity
            note: 1.hagriland omitted because of collinearity
            note: 1.urban omitted because of collinearity
            note: incomepoverty omitted because of collinearity
            note: school omitted because of collinearity
            note: 12.region omitted because of collinearity
            note: 15.region omitted because of collinearity
            note: 19.region omitted because of collinearity
            insufficient observations
            r(2001);
            
            . xtreg rem_edu handling_edu1 ln_exp_total ln_remittance rem_freq oecd femmigrant elderlyageabove62 womenabove15 i.hh_edu head_gender i
            > .hagriland i.urban incomepoverty school i.region, re vce(boot)
            note: head_gender omitted because of collinearity
            note: 1.hagriland omitted because of collinearity
            note: 1.urban omitted because of collinearity
            note: incomepoverty omitted because of collinearity
            note: school omitted because of collinearity
            note: 12.region omitted because of collinearity
            note: 15.region omitted because of collinearity
            note: 19.region omitted because of collinearity
            (running xtreg on estimation sample)
            insufficient observations
            an error occurred when bootstrap executed xtreg
            r(2001);
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X