Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Possible bug in -bootstrap-

    Currently, I only have access to Stata 11 to 14. In these releases, there appears to be a bug in bootstrap. Here is an example

    Code:
    // example data
    webuse rate2 , clear
    
    // we only keep the relevant variables
    keep rada radb
    describe
    
    // estimate kappa coefficient
    kap rada radb
    
    // correct results with -bootstrap-
    bootstrap kappa = r(kappa) : kap rada radb
    
    // incorrect results with -bootstrap-
    // probably because -bootstrap- includes its temporary variables
    bootstrap kappa = r(kappa) : kap *
    bootstrap appears to unabbreviate the passed variable list after it adds its own temporary variables to the dataset. This messes up the estimated coefficient(s). I have found, and reported to tech-support, a similar bug in egen. while ago. The latter has been fixed. Could someone replicate the above with Stata 15 and/or 16 and confirm that this is a bug?

    Best
    Daniel

  • #2
    Hello Daniel,

    Here is the output from Stata 16 IC, Current update level: 16 Oct 2019

    Code:
    . // estimate kappa coefficient
    . kap rada radb
    
                 Expected
    Agreement   Agreement     Kappa   Std. Err.         Z      Prob>Z
    -----------------------------------------------------------------
      63.53%      30.82%     0.4728     0.0694       6.81      0.0000
    
    . 
    . // correct results with -bootstrap-
    . bootstrap kappa = r(kappa) : kap rada radb
    (running kap on estimation sample)
    
    Warning:  Because kap is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations are used in calculating the statistics and so assumes that all observations
              are used.  This means that no observations will be excluded from the resampling because of missing values or other reasons.
    
              If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded.  Be sure that the dataset in memory contains only the relevant data.
    
    Bootstrap replications (50)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
    ..................................................    50
    
    Bootstrap results                               Number of obs     =         85
                                                    Replications      =         50
    
          command:  kap rada radb
            kappa:  r(kappa)
    
    ------------------------------------------------------------------------------
                 |   Observed   Bootstrap                         Normal-based
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           kappa |      0.473      0.068    6.985   0.000        0.340       0.605
    ------------------------------------------------------------------------------
    
    . 
    . // incorrect results with -bootstrap-
    . // probably because -bootstrap- includes its temporary variables
    . bootstrap kappa = r(kappa) : kap *
    (running kap on estimation sample)
    
    Warning:  Because kap is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations are used in calculating the statistics and so assumes that all observations
              are used.  This means that no observations will be excluded from the resampling because of missing values or other reasons.
    
              If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded.  Be sure that the dataset in memory contains only the relevant data.
    
    Bootstrap replications (50)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
    ..................................................    50
    
    Bootstrap results                               Number of obs     =         85
                                                    Replications      =         50
    
          command:  kap *
            kappa:  r(kappa)
    
    ------------------------------------------------------------------------------
                 |   Observed   Bootstrap                         Normal-based
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           kappa |      0.062      0.041    1.526   0.127       -0.018       0.142
    ------------------------------------------------------------------------------
    Martyn

    Comment


    • #3
      Martyn, thanks for confirming this. In my view, there is no question that this is a bug.

      I have not looked into the code but here is what I think probably happens: bootstrap creates a temporary variable to mark the estimation sample. In the example, that variable is constant and selects all observations; this is what the warning tells us. That temporary variable is added to the dataset before bootstrap, or more likely, the command that is bootstrapped expands the variable list, here: *. I can indeed replicate the observed coefficient, adding a constant variable that holds value 1 for all observations to the dateset

      Code:
      // example data
      webuse rate2 , clear
      
      // we only keep the relevant variables
      keep rada radb
      
      // add a constant
      generate byte one = 1
      describe
      
      // replicate the wrong kappa coefficient
      kap rada radb one
      yields

      Code:
      (output omitted)
      
      . kap rada radb one
      
      There are 3 raters per subject:
      
               Outcome |    Kappa          Z     Prob>Z
      -----------------+-------------------------------
                     1 |   -0.0255      -0.41    0.6581
                     2 |    0.0628       1.00    0.1579
                     3 |    0.1905       3.04    0.0012
                     4 |    0.2380       3.80    0.0001
      -----------------+-------------------------------
              combined |    0.0622       1.39    0.0816
      Obviously, we would neither expect nor want this to happen.

      Note that in this example kap, when combined with bootstrap, even estimates a different coefficient, namely Fleiss kappa, when the original call to kap estimated Cohen's kappa.

      I will now report this to tech-support.

      Edit:

      Thanks to Carlo, too.

      Best
      Daniel

      Comment


      • #4
        Daniel and Martyn:
        this is what I get from Stata 15.1 (all files updated as per 27 Oct 2019):
        Code:
        . webuse rate2 , clear
        (Altman p. 403)
        
        . kap rada radb
        
                     Expected
        Agreement   Agreement     Kappa   Std. Err.         Z      Prob>Z
        -----------------------------------------------------------------
          63.53%      30.82%     0.4728     0.0694       6.81      0.0000
        
        . bootstrap kappa = r(kappa) : kap rada radb
        (running kap on estimation sample)
        
        Warning:  Because kap is not an estimation command or does not set e(sample), bootstrap has no way to
                  determine which observations are used in calculating the statistics and so assumes that all
                  observations are used.  This means that no observations will be excluded from the resampling
                  because of missing values or other reasons.
        
                  If the assumption is not true, press Break, save the data, and drop the observations that
                  are to be excluded.  Be sure that the dataset in memory contains only the relevant data.
        
        Bootstrap replications (50)
        ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
        ..................................................    50
        
        Bootstrap results                               Number of obs     =         85
                                                        Replications      =         50
        
              command:  kap rada radb
                kappa:  r(kappa)
        
        ------------------------------------------------------------------------------
                     |   Observed   Bootstrap                         Normal-based
                     |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               kappa |   .4727891   .0705139     6.70   0.000     .3345845    .6109937
        ------------------------------------------------------------------------------
        
        . bootstrap kappa = r(kappa) : kap *
        (running kap on estimation sample)
        
        Warning:  Because kap is not an estimation command or does not set e(sample), bootstrap has no way to
                  determine which observations are used in calculating the statistics and so assumes that all
                  observations are used.  This means that no observations will be excluded from the resampling
                  because of missing values or other reasons.
        
                  If the assumption is not true, press Break, save the data, and drop the observations that
                  are to be excluded.  Be sure that the dataset in memory contains only the relevant data.
        
        Bootstrap replications (50)
        ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
        ..................................................    50
        
        Bootstrap results                               Number of obs     =         85
                                                        Replications      =         50
        
              command:  kap *
                kappa:  r(kappa)
        
        ------------------------------------------------------------------------------
                     |   Observed   Bootstrap                         Normal-based
                     |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               kappa |   .0622045   .0338795     1.84   0.066    -.0041982    .1286071
        ------------------------------------------------------------------------------
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          I have reported this to tech-support. Here is more evidence that my diagnostic concerning the cause of the problem is correct:

          Code:
          // example data
          webuse rate2 , clear
          
          // we only keep the relevant variables
          keep rada radb
          describe
          
          // wrong kappa coefficient
          // note that variable __000000 does not (yet) exist in the dataset
          bootstrap kappa = r(kappa) : kap rada radb __000000
          Best
          Daniel

          Comment


          • #6
            Thanks Daniel for pointing this out.
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Stata tech-support and the developers have looked into this. They basically confirm my diagnostics of the problem but they do not see a quick and easy fix. They might add a warning message if bootstrap encounters * as part of the arguments of a command. They have also pointed out that jackknife, permute, and statsby will behave in the same way as bootstrap.

              Best
              Daniel

              Comment


              • #8
                Thanks Daniel for sharing.
                Kind regards,
                Carlo
                (Stata 18.0 SE)

                Comment

                Working...
                X