Internal validation/Correction for optimism in linear regression using Harrell's method

Marta Garcia-Granero

Join Date: Apr 2014

Posts: 20
#1

Internal validation/Correction for optimism in linear regression using Harrell's method

09 Mar 2018, 04:16

Hi:

I'm in the process of validating internally a linear regression model using the method described by Harrell. Basically:

1) Generate a random bootstrap sample
2) Compute the regression coefficients and use that bootstrap model to compute R-square (or Adj R-square) on the bootstrap sample and on the original sample (<- my unsolved problem)
3) Compute the difference between the bootsample R-square and original sample R-square
4) Repeat at least 100 times
5) Get an average of the difference between R-squares
6) Use that average as a correction for optimism for the original R-square value obtained while developing the model

I get an idea that I have to use -simulate- together with an rclass program written by me, but I'm stuck because I think all the code I see posted in Statalist and somewhere else does the opossite: apply the original model to the bootstrap samples (which I don't want to). Besides, I found code for logistic and Cox models (for Harrell's C statistic, not R-square)

Any pointers?

Thanks in advance

Sorry for not posting code, I'm not even there yet...

Marta
Tags: None

Marta Garcia-Granero

Join Date: Apr 2014
Posts: 20

09 Mar 2018, 04:24

This is the closest I got, with a lot of red crosses when running it:

Code:

program define optimism, rclass
preserve
bsample
regress DIF_PESO rs2605100_LYPLAL1 Edad rs4929949_STK33 rs1801133_MTHFR#c.Edad peso1 rs3813929_HTR2C rs659366_UCP2 rs1801133_MTHFR rs11030104_BDNF
return scalar rsquare=e(er2)
return scalar psquare=e(r2_adj)
end
tempfile sim_results
simulate r2 = r(rsquare) r2adj=r(psquare), reps(200) seed(12345) saving(`sim_results'): optimism

Marta

Comment

Marta Garcia-Granero

Join Date: Apr 2014

Posts: 20
#3

09 Mar 2018, 04:44

I spotted part of the problem, and I blame it on me being really tired

Part of the code should have been:

Code:

return scalar rsquare=e(r2) return scalar psquare=e(r2_a)

But now I only have a set of r2 and adj. r2 that I don't know if they come from the bootsample or from the original sample when the botmodel is applied. Anyway, I'm still lacking the other pair, in order to compute the difference and get the correction for optimism

Marta
Comment

Announcement

Internal validation/Correction for optimism in linear regression using Harrell's method

Comment

Comment