Crossfold -- k-fold cross-validation

Leo Martinez

Join Date: May 2015

Posts: 82
#1

Crossfold -- k-fold cross-validation

07 Jul 2016, 13:42

Hello,

I am a fairly elementary Stata user. Currently I am using Stata 14.1. I am trying to perform k-fold cross-validation using crossfold (http://fmwww.bc.edu/repec/bocode/c/crossfold.html).
However, I am having trouble understanding what the output is telling me -- even with the help file -- and how I reasonably choose a model. I am doing 10-fold cross-validation.

The crossfold gives the summary R2 (or another measure of model fit) for each attempt (in my case 10 attempts). I'm unsure what I do from there. If I take out or add a variable and then get another 10 attempts how do I compare the different models? Is there a way to get an average of the 10 attempts and then compare the two? Is this the best way to compare the models?

Thank you for any help you can provide!

Best
Leo
Tags: None
Leo Martinez

Join Date: May 2015

Posts: 82
#2

07 Jul 2016, 13:56

If this is just an average of the 10 attempts and I can put these values into excel then that is one non-Stata solution to this problem I suppose. I'm unsure if that's what is required...
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

07 Jul 2016, 16:07

I note that the URL you very helpfully provided includes a reference at the end to the 2005 Stata Journal article that introduced this program. Have you had a chance to look at it? (I have not, just posting this quickly between other tasks.) The article is old enough that it is freely available:

Added in edit: I see now that the article may not be directly related, still it may be of some help.

http://www.stata-journal.com/article...article=st0087

Last edited by William Lisowski; 07 Jul 2016, 16:10.
Comment
Federico Tedeschi

Join Date: Mar 2015

Posts: 137
#4

06 Jul 2017, 07:25

I also downloaded "crossfold" and didn't understand how to make it useful for either model selection or evaluating model performance. I saw that here:
http://<br /> https://www.statalist....oss-validation

a for-loop is proposed, that I guess can be a starting point (to be adapted basing on their own needs) to get the results from cross-validation. For example, using the same data, I made a half-half cross-validation performing a logistic regression with "foreign" as an outcome, getting the estimated probabilities on the validation halves:

sysuse auto, clear

generate prp=0

gen u = runiform()
sort u
gen split = _n <= 37

forvalues i = 0/1 {
logit foreign regress price mpg headroom if split== `i'
predict p
replace prp=p if split!=`i'
drop p
}
Comment
Dave Airey

Join Date: Apr 2014

Posts: 396
#5

06 Jul 2017, 07:52

There is a free book here that has a good section on cross-validation in Chapter 5: http://www-bcf.usc.edu/~gareth/ISL/. Cross-validation can be used for both model assessment or selection. Note that Chapter 6 and later cover model selection in detail using other methods. These might also prove useful.
Comment

Announcement

Crossfold -- k-fold cross-validation

Comment

Comment

Comment

Comment