Using bootstrap on performance calculations (sens, spec, PPV, NVP) to obtain CI and standard error

Anna Thor

Join Date: May 2024
Posts: 8

Using bootstrap on performance calculations (sens, spec, PPV, NVP) to obtain CI and standard error

12 May 2024, 10:19

Hi everyone,

I have spent the weekend browsing this excellent forum and the web for a solution to my problem, but unfortunately without succeding. Any help would be very much appreciated.

In STATA 16.1, I have a dataset of 18 patients of whom 8 have viable disease, and we are testing a biomarker. 4 tests are positive, and three of those are true positives. The data is added below, the varible PADRPLND is viable disease and the variable POSMIR is a positive marker.

I have done performance calculations to obtain sensitivity, specificity, PPV and NPV, either by the diagt-command, or by logistic regression and then estat classification.

I now need to add the bootstrap by a 1000 reps to obtain the 95% CI and the standard error, for these four calculations.

The main problem seems to be that I can't store the previous estimates for the bootstrap procedure to use. I have then tried to write a program for the calculations to loop in the bootstrap, and seem to have success with the sens and spec but not the PPV and NPV.

I also need an upper bound so that the CI don't surpass 100% in any of the calculations.

Thank you for your help and I hope this is somewhat clear, since it's my first post.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int Id byte(PADRPLND POSMIR)
7019 0 0
7020 1 0
7036 0 0
7038 0 0
7056 0 1
7067 0 0
7071 0 0
7074 1 0
7075 0 0
7078 1 0
7081 0 0
7084 0 0
7506 1 1
7507 1 1
2045 1 0
2057 1 1
2220 0 0
2236 1 0
end


logit PADRPLND POSMIR


estat classification


scalar sensitivity = r(Sensitivity)


scalar specificity = r(Specificity)


scalar ppv = r(Positive predictive value)


scalar npv = r(Negative predictive value)


// Bootstrap sensitivity
bootstrap, reps(1000) seed(12345): 
    // Calculate sensitivity
    r(sensitivity) = sens_est

// Bootstrap specificity
bootstrap, reps(1000) seed(12345): 
    // Calculate specificity
    r(specificity) = spec_est

// Bootstrap PPV
bootstrap, reps(1000) seed(12345): 
    // Calculate PPV
    r(ppv) = ppv_est

// Bootstrap NPV
bootstrap, reps(1000) seed(12345): 
    // Calculate NPV
    r(npv) = npv_est

by this, STATA replies "last estimates not found".

Also if I use,

Code:


di "Sensitivity: " sensitivity
di "Specificity: " specificity
di "PPV: " ppv
di "NPV: " npv

I only get dots which I think means that the estimates are not stored.

The other way I tried was by using this code I found in the forum;

Code:

capture program drop operating_characteristics
program define operating_characteristics, rclass
    summ POSMIR if PADRPLND, meanonly
    local sensitivity = r(mean)
    summ POSMIR if !PADRPLND, meanonly
    local specificity = 1 - r(mean)
    return scalar sensitivity = `sensitivity'
    return scalar specificity = `specificity'
    return scalar likelihood_ratio = `sensitivity'/(1-`specificity')
    exit
end

bootstrap sens = r(sensitivity) spec = r(specificity) lr = r(likelihood_ratio), ///
    reps(500): operating_characteristics

These are the results

Code:

Bootstrap results                               Number of obs     =         18
                                                Replications      =        325

      command:  operating_characteristics
         sens:  r(sensitivity)
         spec:  r(specificity)
           lr:  r(likelihood_ratio)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        sens |       .375   .1804689     2.08   0.038     .0212874    .7287126
        spec |         .9    .075812    11.87   0.000     .7514113    1.048589
          lr |       3.75   2.094575     1.79   0.073    -.3552913    7.855291
------------------------------------------------------------------------------
Note: One or more parameters could not be estimated in 175 bootstrap
      replicates; standard-error estimates include only complete replications.

Problem here is that the upper interval of spec exceeds 100%.

If anyone could please help me expand the code for PPV and NPV, my problem would be solved. Thank you!

Tags: None

Dirk Enzmann

Join Date: Apr 2014
Posts: 527

12 May 2024, 15:02

As far as I can see the observation that the upper limit of the CI exceeds 1 (or 100%) -- it could equally be that the lower limit is less than 0 -- is due to the fact that the CIs calculated are normal-based. However, proportions -- especially if they are closer to 0 or 1 -- should have asymmetric CIs. See, for example Brown et al. (2001) [reference is given below].

At the risk of committing a serious blunder (I don't feel that I am an expert here, hence others should correct me!): If you are looking for 95% CIs of PPV and NPV you need no bootstrap procedure but can calculate these values directly from the logistic regression results. To obtain asymmetric confidence intervals you can use margins to predict logits and subsequently transfrom the logits using transform_margins (see here and in the syntax):

Code:

. input int Id byte(PADRPLND POSMIR)

           Id  PADRPLND    POSMIR
  1. 7019 0 0
  2. 7020 1 0
  3. 7036 0 0
  4. 7038 0 0
  5. 7056 0 1
  6. 7067 0 0
  7. 7071 0 0
  8. 7074 1 0
  9. 7075 0 0
 10. 7078 1 0
 11. 7081 0 0
 12. 7084 0 0
 13. 7506 1 1
 14. 7507 1 1
 15. 2045 1 0
 16. 2057 1 1
 17. 2220 0 0
 18. 2236 1 0
 19. end

. // Install transform_margins if necessary:
. cap which transform_margins

. if _rc net install transform_margins, from(http://www.stata.com/users/jpitblado)

. logit PADRPLND i.POSMIR

Iteration 0:  Log likelihood = -12.365308  
Iteration 1:  Log likelihood = -11.376829  
Iteration 2:  Log likelihood = -11.373933  
Iteration 3:  Log likelihood = -11.373932  

Logistic regression                                     Number of obs =     18
                                                        LR chi2(1)    =   1.98
                                                        Prob > chi2   = 0.1591
Log likelihood = -11.373932                             Pseudo R2     = 0.0802

------------------------------------------------------------------------------
    PADRPLND | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    1.POSMIR |   1.686398   1.282359     1.32   0.188    -.8269787    4.199775
       _cons |  -.5877867   .5577734    -1.05   0.292    -1.681002     .505429
------------------------------------------------------------------------------

. estat classification

Logistic model for PADRPLND

              -------- True --------
Classified |         D            ~D  |      Total
-----------+--------------------------+-----------
     +     |         3             1  |          4
     -     |         5             9  |         14
-----------+--------------------------+-----------
   Total   |         8            10  |         18

Classified + if predicted Pr(D) >= .5
True D defined as PADRPLND != 0
--------------------------------------------------
Sensitivity                     Pr( +| D)   37.50%
Specificity                     Pr( -|~D)   90.00%
Positive predictive value       Pr( D| +)   75.00%
Negative predictive value       Pr(~D| -)   64.29%
--------------------------------------------------
False + rate for true ~D        Pr( +|~D)   10.00%
False - rate for true D         Pr( -| D)   62.50%
False + rate for classified +   Pr(~D| +)   25.00%
False - rate for classified -   Pr( D| -)   35.71%
--------------------------------------------------
Correctly classified                        66.67%
--------------------------------------------------

. // The PPV corresponds to the predicted proportion for POSMIR = 1
. // With transform_margins of the predicted logits you get the logit CI:
. qui margins POSMIR, predict(xb)

. transform_margins invlogit(@)
----------------------------------------------
             |         b         ll         ul
-------------+--------------------------------
      POSMIR |
          0  |  .3571429   .1569628   .6237343
          1  |  .7499999   .2378398   .9664886
----------------------------------------------

. // The NPV corresponds to the predicted proportion for POSMIR = 0
. // if PADPRLND is reversed:
. // With transform_margins of the predicted logits you get the logit CI:
. recode PADRPLND (1=0) (0=1), gen(PADRPLND_r)
(18 differences between PADRPLND and PADRPLND_r)

. logit PADRPLND_r i.POSMIR

Iteration 0:  Log likelihood = -12.365308  
Iteration 1:  Log likelihood = -11.376829  
Iteration 2:  Log likelihood = -11.373933  
Iteration 3:  Log likelihood = -11.373932  

Logistic regression                                     Number of obs =     18
                                                        LR chi2(1)    =   1.98
                                                        Prob > chi2   = 0.1591
Log likelihood = -11.373932                             Pseudo R2     = 0.0802

------------------------------------------------------------------------------
  PADRPLND_r | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
    1.POSMIR |  -1.686398   1.282359    -1.32   0.188    -4.199775    .8269787
       _cons |   .5877867   .5577734     1.05   0.292     -.505429    1.681002
------------------------------------------------------------------------------

. qui margins POSMIR, predict(xb)

. transform_margins invlogit(@)
----------------------------------------------
             |         b         ll         ul
-------------+--------------------------------
      POSMIR |
          0  |  .6428571   .3762657   .8430372
          1  |  .2500001   .0335114   .7621602
----------------------------------------------

Reference:

Brown, L. D., Cai, T. T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16, 101–133.

Last edited by Dirk Enzmann; 12 May 2024, 15:07.

Comment

Anna Thor

Join Date: May 2024

Posts: 8
#3

13 May 2024, 02:50

Thank you so much prof Enzmann for great reply and code, however, the request of bootstrapping to my performance calculations was made by the chief statistician at a medical journal. Hence I have to use the bootstrap to my analysis.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4396
#4

13 May 2024, 04:27

Originally posted by Anna Thor View Post

. . . the request of bootstrapping to my performance calculations was made by the chief statistician at a medical journal. Hence I have to use the bootstrap to my analysis.

Just for my edification, I'd like to confirm the following.

You presented the medical journal this table:

.ÿ
.ÿversionÿ18.0

.ÿ
.ÿclearÿ*

.ÿ
.ÿquietlyÿinputÿintÿIdÿbyte(PADRPLNDÿPOSMIR)

.ÿ
.ÿlabelÿdefineÿFindingsÿ0ÿNegativeÿ1ÿPositive

.ÿlabelÿvaluesÿPOSMIRÿFindings

.ÿlabelÿdefineÿTruthÿ0ÿNodiseaseÿ1ÿDisease

.ÿlabelÿvaluesÿPADRPLNDÿTruth

.ÿtabulateÿPOSMIRÿPADRPLND

ÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿPADRPLND
ÿÿÿÿPOSMIRÿ|ÿNodiseaseÿÿÿÿDiseaseÿ|ÿÿÿÿÿTotal
-----------+----------------------+----------
ÿÿNegativeÿ|ÿÿÿÿÿÿÿÿÿ9ÿÿÿÿÿÿÿÿÿÿ5ÿ|ÿÿÿÿÿÿÿÿ14ÿ
ÿÿPositiveÿ|ÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿÿÿ3ÿ|ÿÿÿÿÿÿÿÿÿ4ÿ
-----------+----------------------+----------
ÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿÿ10ÿÿÿÿÿÿÿÿÿÿ8ÿ|ÿÿÿÿÿÿÿÿ18ÿ

.ÿ
.ÿexit

endÿofÿdo-file

.

And the journal's chief statistician said that you must bootstrap that?
1 like
Comment
Anna Thor

Join Date: May 2024

Posts: 8
#5

13 May 2024, 04:34

Correct.
I have two additional groups of patients, for instance 24 patients, 23 viable disease, 17 pos test. Very simple statistics but bootstrap is requested for consideration of acceptance.

Last edited by Anna Thor; 13 May 2024, 04:56.
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 527
#6

13 May 2024, 05:50

If they request it, why then don't you simply report the confidence intervals obtained with bootstrap? Perhaps you should use bias-corrected and accelerated (bca) CIs.
Comment
Anna Thor

Join Date: May 2024

Posts: 8
#7

13 May 2024, 10:14

Originally posted by Dirk Enzmann View Post

If they request it, why then don't you simply report the confidence intervals obtained with bootstrap? Perhaps you should use bias-corrected and accelerated (bca) CIs.

I am sorry if I was unclear, I am able to to do the bootstrap for sens and spec, but I can't figure out how to expand the code for bootstrap to PPV and NPV. Below is the code I found on the forum and employed.

So I am kindly requesting help to add the bootstrap for PPV and NPV, either by the commands below or any other strategy.

Code:

capture program drop operating_characteristics program define operating_characteristics, rclass summ POSMIR if PADRPLND, meanonly local sensitivity = r(mean) summ POSMIR if !PADRPLND, meanonly local specificity = 1 - r(mean) return scalar sensitivity = `sensitivity' return scalar specificity = `specificity' return scalar likelihood_ratio = `sensitivity'/(1-`specificity') exit end bootstrap sens = r(sensitivity) spec = r(specificity) lr = r(likelihood_ratio), /// reps(500): operating_characteristics

You can see the results of this at the end of my first post, I was not able to copy it in the forum format with a reasonable appearance.

And thank you!

Last edited by Anna Thor; 13 May 2024, 10:17.
Comment

Dirk Enzmann

Join Date: Apr 2014
Posts: 527

13 May 2024, 13:45

What about

Code:

clear
input int Id byte(PADRPLND POSMIR)
7019 0 0
7020 1 0
7036 0 0
7038 0 0
7056 0 1
7067 0 0
7071 0 0
7074 1 0
7075 0 0
7078 1 0
7081 0 0
7084 0 0
7506 1 1
7507 1 1
2045 1 0
2057 1 1
2220 0 0
2236 1 0
end

logit PADRPLND POSMIR
estat classification

capture program drop operating_characteristics
program define operating_characteristics, rclass    
   summ POSMIR if PADRPLND, meanonly    
   local sensitivity = r(mean)    
   summ POSMIR if !PADRPLND, meanonly    
   local specificity = 1 - r(mean)
   summ PADRPLND if POSMIR, meanonly
   local PPV = r(mean)
   summ PADRPLND if !POSMIR, meanonly
   local NPV = 1 - r(mean)
   return scalar sensitivity = `sensitivity'    
   return scalar specificity = `specificity'
   return scalar PPV = `PPV'
   return scalar NPV = `NPV'
   return scalar likelihood_ratio = `sensitivity'/(1-`specificity')    
end  

bootstrap sens = r(sensitivity) spec = r(specificity) lr = r(likelihood_ratio) ///
          PPV = r(PPV) NPV = r(NPV), ///    
          reps(1700): operating_characteristics

If you wonder why, try to make sense of the symbols in Pr(...) in the following output table:

Code:

--------------------------------------------------
Sensitivity                     Pr( +| D)   37.50%
Specificity                     Pr( -|~D)   90.00%
Positive predictive value       Pr( D| +)   75.00%
Negative predictive value       Pr(~D| -)   64.29%
--------------------------------------------------

By the way: You should run at least 1,000 bootstrap replications, but because there are so many errors in the execution of bootstrap I set it to 1,700. The errors are most likely due to the small sample size (and/or small cell frequencies) making the analysis a questionable undertaking.

Comment

Anna Thor

Join Date: May 2024

Posts: 8
#9

13 May 2024, 23:47

Thank you so much! I will try this. So grateful for your help.
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 527
#10

14 May 2024, 01:44

To be able to replicate the results you should use the seed() option, e.g. seed(12345).
Comment

Anna Thor

Join Date: May 2024
Posts: 8

#11

15 May 2024, 13:24

Dear prof Enzmann and the rest of the users of Statalist,

It worked fine and I am so grateful for your help.

However, I was not able to use the same code for my next group of patients, because in this group the predictor variable perfectly predicts the outcome variable with 100% specifiticity, and therefore STATA drops the 17 patients with viable disease and positive tests, in logistic regression.

I have therefore used firthlogit instead. However, the estat classification command that I previously used is not available for firth, or exlogit. I have made a program called my_stats to calculate sens, spec, PPV and NPV. However when I add the bootstrap I get the reply from STATA that "r(sensitivity) evaluated to missing in full sample". Is this the same issue repeating...?

clear
input int Id byte(PADRPLND POSMIR)
7009 1 1
7013 1 1
7014 1 1
7026 1 0
7027 1 0
7028 1 1
7030 1 1
7037 1 1
7040 1 1
7046 1 1
7049 0 0
7052 1 0
7053 1 1
7055 1 0
7062 1 1
7079 1 1
7085 1 0
7502 1 1
7504 1 1
2070 1 1
2076 1 1
2094 1 0
2184 1 1
2192 1 1
end
[/CODE]

Code:

. diagt PADRPLND POSMIR

           |        POSMIR
  PADRPLND |      Pos.       Neg. |     Total
-----------+----------------------+----------
  Abnormal |        17          6 |        23
    Normal |         0          1 |         1
-----------+----------------------+----------
     Total |        17          7 |        24
True abnormal diagnosis defined as PADRPLND = 1


                                                  [95% Confidence Interval]
---------------------------------------------------------------------------
Prevalence                         Pr(A)     95.8%     78.9%      99.9%
---------------------------------------------------------------------------
Sensitivity                      Pr(+|A)     73.9%     51.6%     89.8%
Specificity                      Pr(-|N)    100.0%      2.5%    100.0%
ROC area               (Sens. + Spec.)/2      0.87         .      1.00
---------------------------------------------------------------------------
Likelihood ratio (+)     Pr(+|A)/Pr(+|N)         .         .         .
Likelihood ratio (-)     Pr(-|A)/Pr(-|N)      0.26      0.13      0.52
Odds ratio                   LR(+)/LR(-)         .      0.00         .
Positive predictive value        Pr(A|+)    100.0%     80.5%    100.0%
Negative predictive value        Pr(N|-)     14.3%      0.4%     57.9%
---------------------------------------------------------------------------

  Missing values or confidence intervals may be estimated
  using the -sf- or -sf0- options.

Code:


program define my_stats, rclass

    firthlogit PADRPLND POSMIR
    predict prob

    
    gen predicted_class = (prob > 0.5)

  
    tabulate PADRPLND predicted_class, matcell(confusion_matrix)

  
    local TP = confusion_matrix[2,2]  // True Positives
    local TN = confusion_matrix[1,1]  // True Negatives
    local FP = confusion_matrix[1,2]  // False Positives
    local FN = confusion_matrix[2,1]  // False Negatives

    
    local sensitivity = `TP' / (`TP' + `FN')
    local specificity = `TN' / (`TN' + `FP')
    local PPV = `TP' / (`TP' + `FP')
    local NPV = `TN' / (`TN' + `FN')

 
    return scalar sensitivity = `sensitivity'
    return scalar specificity = `specificity'
    return scalar PPV = `PPV'
    return scalar NPV = `NPV'
end

Code:


. my_stats

initial:       penalized log likelihood = -4.9879767
rescale:       penalized log likelihood = -4.9879767
Iteration 0:   penalized log likelihood = -4.9879767  
Iteration 1:   penalized log likelihood = -3.7872439  
Iteration 2:   penalized log likelihood = -3.7558953  
Iteration 3:   penalized log likelihood = -3.7558086  
Iteration 4:   penalized log likelihood = -3.7558086  

                                                Number of obs     =         24
                                                Wald chi2(1)      =       1.52
Penalized log likelihood = -3.7558086           Prob > chi2       =     0.2181

------------------------------------------------------------------------------
    PADRPLND |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      POSMIR |   2.089011   1.696366     1.23   0.218    -1.235804    5.413826
       _cons |   1.466337   .9058217     1.62   0.105    -.3090408    3.241715
------------------------------------------------------------------------------

           | predicted_
           |   class
  PADRPLND |         1 |     Total
-----------+-----------+----------
         0 |         1 |         1
         1 |        23 |        23
-----------+-----------+----------
     Total |        24 |        24

Now the last step that does not work

Code:


. bootstrap r(sensitivity) r(specificity) r(PPV) r(NPV), reps(2000) seed(12345): my_stats
(running my_stats on estimation sample)
'r(sensitivity)' evaluated to missing in full sample

Any help would be very much appreciated. Thank you in advance.

Last edited by Anna Thor; 15 May 2024, 13:38.

Comment

Dirk Enzmann

Join Date: Apr 2014

Posts: 527
#12

15 May 2024, 14:09

But what else would you expect from a sample size of 1 in the group defined by PADRPLND = 1? You simply cannot do things that are impossible.
1 like
Comment
Anna Thor

Join Date: May 2024

Posts: 8
#13

19 May 2024, 05:11

Thank you again prof Enzmann for your help. That is what we thought too, however, this has been a specific request made by the chief statistician - yes - twice. I have now sent this issue to the chief statistician at our department for review.
Comment

Announcement

Using bootstrap on performance calculations (sens, spec, PPV, NVP) to obtain CI and standard error

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment