Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • advice on bootstrap sampling to internally validate a logistic model

    I am not familiar with this type of analysis, and so would greatly appreciate advice on the following.

    I would like to develop and internally validate a model using a logistic regression analysis and a bootstrap sampling process.

    Using the dataset, nlsw88, installed in Stata, as an example, I wrote the following code:

    sysuse nlsw88, clear
    cd "C:\Documents"
    save nlsw88, replace

    capture program drop mysim
    program define mysim, rclass
    use nlsw88, clear
    bsample
    merge m:1 idcode using nlsw88
    * Fit logistic regression model on the bootstrap sample
    logit union south grade if _merge == 3
    matrix b = e(b)
    * test the model on the subjects that were not sampled
    lroc union if _merge == 2, nograph beta(b)
    return scalar area=r(area)
    end

    simulate area=r(area), reps(10000): mysim
    _pctile area, p(2.5 50 97.5)
    ret list
    * Gives the validation AUROC and accompanying 95% probability interval.

    Does this way of going about the task make sense?

    Thank you for your feedback.
    Best wishes,
    Miranda

  • #2
    Well, your code is a bit convoluted and hard to follow, so I can't really say if it will work or not. But this code will, I believe, accomplish what you seek:

    Code:
    clear*
    
    capture program drop mysim
    program define mysim, rclass
        logit union south grade
        lroc, nograph
        return scalar area = r(area)
        exit
    end
    
    webuse nlsw88, clear
    tempfile areas
    bootstrap area = r(area), saving(`areas') reps(100): mysim
    use `areas', clear
    _pctile area, p(2.5 50 97.5)
    return list

    Comment


    • #3
      Thank you for your response. I agree that my code is convoluted, and indeed that's why I wanted advice.
      My understanding of the process of internal validation is to draw a bootstrap sample (with replacement), fit the model on this training dataset, and then test this same model on the unsampled data. That is why I used merge (to identify which subjects where not sampled) and I stored the coefficients from the model built on the training dataset to be then applied to the testing dataset.
      It doesn't seem to me that the code you suggest applies the model built on the training dataset to the testing dataset, it looks like it fits the model on bootstrap datasets and for each of these it calculates the AUROC. Am I missing something here? This is not what I understand as internal model validation. But then again I am a novice at this, so may have gotten the wrong idea of the process...

      Comment


      • #4
        OK, you have correctly understood my code, and I did not understand what you were trying to do.

        Based on your explanation, your code looks correct to me.

        Added: That said, I don't quite understand your approach to "validation." The AUROC quantifies the discriminatory power of the logistic model; it is not a test of model fit or validity. If your goal is to validate a model, it seems to me you would want to use a criterion that reflects the fit of the model to the data. AUROC does not do that. A better choice for that would be, for example, the Hosmer-Lemeshow chi square statistic, which compares the predicted to the observed outcome probabilities (and is sometimes called a calibration statistic). Calibration and discrimination are two separate aspects of a logistic model, and they are more or less independent of each other.
        Last edited by Clyde Schechter; 21 Oct 2016, 15:29.

        Comment


        • #5
          Thank you for your suggestion, that's very helpful I will look into that. I was also planning on working out the error rate as I noticed when reading up on validation that this seems to be a standard approach, is this correct?

          Comment


          • #6
            I'm not sure that there is any "standard" approach to this. I've seen lots of things done and lots of arguments back and forth about the merits of each. Which suggests to me that there really is no one best way. As for validating based on an error rate, remember that overall probability of correct prediction is not a good measure of validity because it is sensitive to the base rate of the outcome. So if you are going to use an error rate, you need two error rates: the error rate when the observed value is 0 and the error rate when the observed value is non-zero. That gets around the sensitivity to base rates.

            Comment


            • #7
              Sorry for late response, I've been away unexpectedly.
              Thank you very much for this advice. I am a little confused though, as I thought the error rate by definition incorporates the false positives and false negatives ((FP+FN)/All), which I would have thought incorporates the two errors you mention (when the observed value in 0 and when it is 1), is this not what you mean?

              Comment


              • #8
                The problem with (FP+FN)/All is that it confounds two different error rates. Suppose that the true distribution of events is 100 Positive (1) and 10 Negative (2). Suppose that the predictor gets it right 99% of the time when the true event is Positive, but gets it right only 10% of the time when the true event is Negative. So the joint distribution of predicted and observed events is:
                Code:
                           |          col
                       row |         1          2 |     Total
                -----------+----------------------+----------
                         1 |        99          1 |       100 
                           |     91.67      50.00 |     90.91 
                -----------+----------------------+----------
                         2 |         9          1 |        10 
                           |      8.33      50.00 |      9.09 
                -----------+----------------------+----------
                     Total |       108          2 |       110 
                           |    100.00     100.00 |    100.00
                So the sensitivity, TP/(TP+FN) = 99% (awesome!), and the specificity, TN/(TN+FP) = 10% (awful) . But look at the overall error rate: (FP+FN)/ALL = (1+9)/110 = 10/110 = 9.1%. That looks like a very good, low, error rate, but it completely obscures the dreadful performance when the true condition is negative. There is no single statistic that adequately reflects the performance of a predictor: you always need separate validity measures for positive and negative results.

                Comment


                • #9
                  I see. Thank you so much for your time explaining these things, I really appreciate!
                  So the sensitivity and specificity are better measures to use to assess the validity of the prediction model when I run my bootstrapping?

                  Comment


                  • #10
                    Yes.

                    Comment


                    • #11
                      ok great, thank you so much for all your help with this!
                      Best wishes,
                      Miranda

                      Comment

                      Working...
                      X