K-fold cross validation

Caleb Hall-Paterson

Join Date: Apr 2020

Posts: 17
#1

K-fold cross validation

01 May 2020, 16:36

Hi all,

I am trying to conduct K-fold cross validation for both Logistic and OLS regressions. Having read plenty online regarding this topic, the following appear to be my options.

Code:

cvauroc

for logistic regressions

Code:

loocv

or

Code:

crossfold

for OLS regressions

I cannot use lasso as I do not have Stata 16. I can carry out the

Code:

cvauroc

commands but for the OLS options whenever I try to enter the command it simply rejects it stating command x is unrecognised, regarding the dependent variable. Was hoping someone may be able to assist me?

Thanks in advance,
Caleb
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

01 May 2020, 19:04

each of those 3 is user-written (community contributed); have you downloaded and installed them? if not, use -search- to find and install
Comment
Caleb Hall-Paterson

Join Date: Apr 2020

Posts: 17
#3

01 May 2020, 19:15

Hi Rich,

I know that all are in fact user-written, and I have now proceeded to use

Code:

loccv

for my out-of-sample performance estimates.

I suppose the only question I have on the back of this is as to whether there is a disadvantage to using this method, given the fact there are no 'folds' in the out of sample estimates i.e. it estimates one observation at a time, rather than a larger sample of say 100?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2470
#4

01 May 2020, 19:24

Hi Caleb
Let me suggest to use a command I wrote -cv_regress-. It is faster than loocv for linear regressions. you can get it from ssc (ssc install cv_regress)
Im also working on another command for k-fold cross-validation for other estimation commands like logit probit mprobit, etc.
Best Regards
Comment
Caleb Hall-Paterson

Join Date: Apr 2020

Posts: 17
#5

01 May 2020, 19:32

Hi Fernando,

Thank you for the advice, I indeed also have used the -cv_regress- command for OLS.

Any suggestions as for Logit? I simply have used loccv and it seems to give robust results.

Kind regards
Comment

FernandoRios

Join Date: Apr 2014
Posts: 2470

01 May 2020, 19:53

Use this piece of code. Doesnt have all the safe guards, but works well

Code:

capture program drop cross_probit
program cross_probit, rclass
    syntax, k(int) reps(int) [seed(str)]
    tempname eqreg
    ** save eq
    qui:est sto `eqreg'
    ** get what I need.
    tempvar touse
    qui:gen byte `touse'=e(sample)
    local  cmdln=subinstr("`e(cmdline)'","`e(cmd)'","",1)
    qui:reparser `cmdln'
    local y_x `r(y_x)'
    local wgt `r(wgt)'
    local opts  `r(opts)'
    local cmd  `e(cmd)'
    tempvar y
    clonevar `y'=`e(depvar)'!=0 if `touse'
    tempname binit
    matrix `binit'=e(b)
    ** regress uses residuals
    tempvar kfld resid tmpresid
    tempname msqr
    local mmsqr=0
    qui:gen double `resid'=.
    forvalues i=1/`reps' {
        capture drop `kfld'
        qui:xtile `kfld'=runiform() if `touse', n(`k')
        forvalues j=1/`k' {
            qui:`cmd' `y_x' `wgt' if `touse' & `kfld'!=`j', `opts' from(`binit',skip)
            qui:capture drop `tmpresid'
            qui:predict double `tmpresid', pr 
            qui:replace `resid'=log(`tmpresid')*(`y'==1)+log(1-`tmpresid')*(`y'==1) if `touse' & `kfld'==`j'
            
        }
        ** Root MSQR
        qui:sum `resid' if `touse', meanonly
        qui:matrix `msqr'=nullmat(`msqr')\ (r(mean)*r(N))
        local mmsqr=`mmsqr'+(r(mean)*r(N))
    }
    local mmsqr = `mmsqr'/`reps'
    matrix colname `msqr'=msqr
    return local mmsqr = `mmsqr'
    return matrix msqr = `msqr'
    return local k = `k'
    return local reps = `reps'
    return local seed  `seed'
    qui:est restore `eqreg'
    display as result "k-fold Cross validation"
    display as text   "Number of Folds     : " %10.0f `k' 
    display as text   "Number of Repetions : " %10.0f `reps'
    display as text   "Avg LL              : " %10.3f `mmsqr'
end

Comment

Caleb Hall-Paterson

Join Date: Apr 2020

Posts: 17
#7

02 May 2020, 07:25

Hi Fernando,

Really appreciate you sending the above code over. I run your program and then the subsequent command;

Code:

cross_logit NoRecoveryCL CreditSpread_n OriginalMaturity HardCallY FlexStatusU LoanTrancheSizeMM SponsorLedY IHealthY, k(10) reps(1) [seed(110)]

But it keeps getting rejected. I am unsure as to where I have gone wrong?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2470
#8

02 May 2020, 08:55

You only need to set rep and k
everthing else is taken from the original regression
Comment
Caleb Hall-Paterson

Join Date: Apr 2020

Posts: 17
#9

02 May 2020, 19:28

Fernando, thank you for being so patient with me. I am still relatively new to the Forum.

I run your code, setting rep and k as you said, on the back of my logistic regression, and I have no difficulties. However, no results are presented. Do I need some further command to display the results of the code you have provided?

Thanks again,
Caleb
Comment
Achim Ahrens

Join Date: Jun 2014

Posts: 49
#10

03 May 2020, 04:30

Check lassologit, which is part of lassopack. ssc install lassopack.

--
Tag me or email me for ddml/pdslasso/lassopack/pystacked related questions. I don't check Statalist.
Comment
Caleb Hall-Paterson

Join Date: Apr 2020

Posts: 17
#11

03 May 2020, 06:25

Thank you Achim, though I am trying to find the RSE and RAE, is there anyway to do this with your model?
Comment

Announcement

K-fold cross validation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment