Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Leave One Out Cross Validation

    Hello,

    I am trying to look for the best n given:


    set obs 200
    gen x = runiform(0,5)
    gen U = rnormal(0,100)
    gen m = exp(x) - 4*(x^2)
    gen Y = m+U

    and the equation in the attached image:

    //note the loops below does not work for some reason but it gets the idea across

    forvalues i in 1(1)20 {
    gen cosx`i' = cos(x*`i')
    }

    forvalues i in 1(1)20 {
    gen sinx`i' = sinx*(`i')
    }

    forvalues i in 1/20 {
    gen csx`i' = (cosx`i')+(sinx`i')
    }

    // Y(n=1)
    regress Y csx1
    predict Y1

    //Y(n=5)
    regress Y csx1-csx5
    predict Y2

    //Y(n=20)
    regress Y csx1-csx20
    predict Y3

    scatter Y1 Y2 Y3 m x, legend(order(1 "Y1" 2 "Y2" 3 "Y3" 4 "m"))

    //with a scatter plot looking like the attached scatter.png:

    -------------------------------------------------------------------------------------------------------------------------

    How should I perform LOOCV in Stata to find the best n? I tried help in Stata but found no information on it.

    (The choice of n is kind of like finding the bandwidth in kernel regression but I'm not sure how to approach it with the syntax)

    Thanks,
    Rayne
    Attached Files
Working...
X