Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Crossfold -- k-fold cross-validation

    Hello,

    I am a fairly elementary Stata user. Currently I am using Stata 14.1. I am trying to perform k-fold cross-validation using crossfold (http://fmwww.bc.edu/repec/bocode/c/crossfold.html).
    However, I am having trouble understanding what the output is telling me -- even with the help file -- and how I reasonably choose a model. I am doing 10-fold cross-validation.

    The crossfold gives the summary R2 (or another measure of model fit) for each attempt (in my case 10 attempts). I'm unsure what I do from there. If I take out or add a variable and then get another 10 attempts how do I compare the different models? Is there a way to get an average of the 10 attempts and then compare the two? Is this the best way to compare the models?

    Thank you for any help you can provide!

    Best
    Leo







  • #2
    If this is just an average of the 10 attempts and I can put these values into excel then that is one non-Stata solution to this problem I suppose. I'm unsure if that's what is required...

    Comment


    • #3
      I note that the URL you very helpfully provided includes a reference at the end to the 2005 Stata Journal article that introduced this program. Have you had a chance to look at it? (I have not, just posting this quickly between other tasks.) The article is old enough that it is freely available:

      Added in edit: I see now that the article may not be directly related, still it may be of some help.

      http://www.stata-journal.com/article...article=st0087
      Last edited by William Lisowski; 07 Jul 2016, 16:10.

      Comment


      • #4
        I also downloaded "crossfold" and didn't understand how to make it useful for either model selection or evaluating model performance. I saw that here:
        http://<br /> https://www.statalist....oss-validation


        a for-loop is proposed, that I guess can be a starting point (to be adapted basing on their own needs) to get the results from cross-validation. For example, using the same data, I made a half-half cross-validation performing a logistic regression with "foreign" as an outcome, getting the estimated probabilities on the validation halves:
        sysuse auto, clear

        generate prp=0

        gen u = runiform()
        sort u
        gen split = _n <= 37

        forvalues i = 0/1 {
        logit foreign regress price mpg headroom if split== `i'
        predict p
        replace prp=p if split!=`i'
        drop p
        }

        Comment


        • #5
          There is a free book here that has a good section on cross-validation in Chapter 5: http://www-bcf.usc.edu/~gareth/ISL/. Cross-validation can be used for both model assessment or selection. Note that Chapter 6 and later cover model selection in detail using other methods. These might also prove useful.

          Comment

          Working...
          X