advice on bootstrap sampling to internally validate a logistic model

Miranda Kim

Join Date: May 2016

Posts: 12
#1

advice on bootstrap sampling to internally validate a logistic model

21 Oct 2016, 10:08

I am not familiar with this type of analysis, and so would greatly appreciate advice on the following.

I would like to develop and internally validate a model using a logistic regression analysis and a bootstrap sampling process.

Using the dataset, nlsw88, installed in Stata, as an example, I wrote the following code:

sysuse nlsw88, clear
cd "C:\Documents"
save nlsw88, replace

capture program drop mysim
program define mysim, rclass
use nlsw88, clear
bsample
merge m:1 idcode using nlsw88
* Fit logistic regression model on the bootstrap sample
logit union south grade if _merge == 3
matrix b = e(b)
* test the model on the subjects that were not sampled
lroc union if _merge == 2, nograph beta(b)
return scalar area=r(area)
end

simulate area=r(area), reps(10000): mysim
_pctile area, p(2.5 50 97.5)
ret list
* Gives the validation AUROC and accompanying 95% probability interval.

Does this way of going about the task make sense?

Thank you for your feedback.
Best wishes,
Miranda
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30122

21 Oct 2016, 12:48

Well, your code is a bit convoluted and hard to follow, so I can't really say if it will work or not. But this code will, I believe, accomplish what you seek:

Code:

clear*

capture program drop mysim
program define mysim, rclass
    logit union south grade
    lroc, nograph
    return scalar area = r(area)
    exit
end

webuse nlsw88, clear
tempfile areas
bootstrap area = r(area), saving(`areas') reps(100): mysim
use `areas', clear
_pctile area, p(2.5 50 97.5)
return list

Comment

Miranda Kim

Join Date: May 2016

Posts: 12
#3

21 Oct 2016, 14:25

Thank you for your response. I agree that my code is convoluted, and indeed that's why I wanted advice.
My understanding of the process of internal validation is to draw a bootstrap sample (with replacement), fit the model on this training dataset, and then test this same model on the unsampled data. That is why I used merge (to identify which subjects where not sampled) and I stored the coefficients from the model built on the training dataset to be then applied to the testing dataset.
It doesn't seem to me that the code you suggest applies the model built on the training dataset to the testing dataset, it looks like it fits the model on bootstrap datasets and for each of these it calculates the AUROC. Am I missing something here? This is not what I understand as internal model validation. But then again I am a novice at this, so may have gotten the wrong idea of the process...
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#4

21 Oct 2016, 15:25

OK, you have correctly understood my code, and I did not understand what you were trying to do.

Based on your explanation, your code looks correct to me.

Added: That said, I don't quite understand your approach to "validation." The AUROC quantifies the discriminatory power of the logistic model; it is not a test of model fit or validity. If your goal is to validate a model, it seems to me you would want to use a criterion that reflects the fit of the model to the data. AUROC does not do that. A better choice for that would be, for example, the Hosmer-Lemeshow chi square statistic, which compares the predicted to the observed outcome probabilities (and is sometimes called a calibration statistic). Calibration and discrimination are two separate aspects of a logistic model, and they are more or less independent of each other.

Last edited by Clyde Schechter; 21 Oct 2016, 15:29.
Comment
Miranda Kim

Join Date: May 2016

Posts: 12
#5

22 Oct 2016, 01:18

Thank you for your suggestion, that's very helpful I will look into that. I was also planning on working out the error rate as I noticed when reading up on validation that this seems to be a standard approach, is this correct?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#6

22 Oct 2016, 09:09

I'm not sure that there is any "standard" approach to this. I've seen lots of things done and lots of arguments back and forth about the merits of each. Which suggests to me that there really is no one best way. As for validating based on an error rate, remember that overall probability of correct prediction is not a good measure of validity because it is sensitive to the base rate of the outcome. So if you are going to use an error rate, you need two error rates: the error rate when the observed value is 0 and the error rate when the observed value is non-zero. That gets around the sensitivity to base rates.
Comment
Miranda Kim

Join Date: May 2016

Posts: 12
#7

27 Oct 2016, 01:46

Sorry for late response, I've been away unexpectedly.
Thank you very much for this advice. I am a little confused though, as I thought the error rate by definition incorporates the false positives and false negatives ((FP+FN)/All), which I would have thought incorporates the two errors you mention (when the observed value in 0 and when it is 1), is this not what you mean?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#8

27 Oct 2016, 08:17

The problem with (FP+FN)/All is that it confounds two different error rates. Suppose that the true distribution of events is 100 Positive (1) and 10 Negative (2). Suppose that the predictor gets it right 99% of the time when the true event is Positive, but gets it right only 10% of the time when the true event is Negative. So the joint distribution of predicted and observed events is:

Code:

| col row | 1 2 | Total -----------+----------------------+---------- 1 | 99 1 | 100 | 91.67 50.00 | 90.91 -----------+----------------------+---------- 2 | 9 1 | 10 | 8.33 50.00 | 9.09 -----------+----------------------+---------- Total | 108 2 | 110 | 100.00 100.00 | 100.00

So the sensitivity, TP/(TP+FN) = 99% (awesome!), and the specificity, TN/(TN+FP) = 10% (awful) . But look at the overall error rate: (FP+FN)/ALL = (1+9)/110 = 10/110 = 9.1%. That looks like a very good, low, error rate, but it completely obscures the dreadful performance when the true condition is negative. There is no single statistic that adequately reflects the performance of a predictor: you always need separate validity measures for positive and negative results.
Comment
Miranda Kim

Join Date: May 2016

Posts: 12
#9

27 Oct 2016, 08:35

I see. Thank you so much for your time explaining these things, I really appreciate!
So the sensitivity and specificity are better measures to use to assess the validity of the prediction model when I run my bootstrapping?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#10

27 Oct 2016, 09:11

Yes.
Comment
Miranda Kim

Join Date: May 2016

Posts: 12
#11

27 Oct 2016, 09:15

ok great, thank you so much for all your help with this!
Best wishes,
Miranda
Comment

Announcement

advice on bootstrap sampling to internally validate a logistic model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment