elegant code for out-of-sample tests

Stan Peterburgsky

Join Date: Jun 2019

Posts: 74
#1

elegant code for out-of-sample tests

23 Jan 2025, 19:40

Hello folks,

Is there an elegant code for out-of-sample tests (or test statistics)?
For example, lets say I'd like to perform the Clark and West (2007) MSPE-adjusted test. Suppose I have data for 300 periods, of which the first 100 will be in-sample and the last 200 will be out-of-sample. Let y denote the observations on the dependent variable, and yhat_model and yhat_simple denote the fitted values for the larger model and the simple model, respectively. I'd like to do a test (or get a test statistic) without copying and pasting parameters, etc. Could someone please recommend an elegant code?

Thank you very much,
Stan
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#2

23 Jan 2025, 21:37

I have never heard of this particular test. But assuming that it is a test that is used following some Stata estimation command, you can do the in-sample and out-of-sample tests as follows:

Code:

Stata_estimation_command predict whatever, needed_prediction // IF NECESSSARY FOR TEST command_for_clark_and_west_mspe_adjusted_test if e(sample) // IN-SAMPLE TEST command_for_clark_and_west_mspe_adjusted_test if !e(sample) // OUT-OF-SAMPLE TEST

I
Comment
Stan Peterburgsky

Join Date: Jun 2019

Posts: 74
#3

23 Jan 2025, 21:57

Thank you, Clyde.
The test is from Clark, T.E. and West, K.D., 2007. Approximately normal tests for equal predictive accuracy in nested models. Journal of econometrics, 138(1), pp.291-311. I will reproduce the calculations (from a working paper on Google Scholar, publicly available) below.
Comment
Stan Peterburgsky

Join Date: Jun 2019

Posts: 74
#4

23 Jan 2025, 22:00

Basically, its a test of whether a time-series model with predictors X1, X2, X3, ... is statistically better than a model that only has a subset of those predictors.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#5

24 Jan 2025, 09:18

OK, here is a program that, I think, implements what you show in equation 2.1 shown in #3. You should consider this program a hack, not a program ready for widespread release. It has the bare bones features necessary to do the calculation. It incorporates no validity checks on the data, other than to verify that they have been -tsset-. (If used with panel data rather than a single time series, it calculates the appropriate lead values in each panel, but also calculates a single summary test for the entire sample--I have no idea if that is what makes sense to do here or not (I suspect not, but I'm not sure what, if anything, would be appropriate). Also, it does not have the option to require an autocorrelation specific standard error, as I don't know how to do that.

The syntax to call it is:

Code:

cwmpse_adj observed_value forecast1 forecast2, tau(#)

where observed_value is the name of the observed outcome, forecast1 is the name of the first forecast, and forecast2 is the name of the other forecast which nests forecast1. -tau- is the lead time desired, and must be specified replacing # by an integer. You can also restrict its operation to a subset of the data by using appropriate -if- or -in- clauses (as proposed in #2.)

It returns three things in -r()-: r(t) is the actual t-statistic calculated; r(sig_5) is 1 if the t-statistic is significant by a one-tailed t-test at the 0.05 level, 0 otherwise; and r(sig_10) is 1 if the t-statistic is significant by a one-tailed t-test at the 0.1 level, 0 otherwise.

I have tested it to the extent that I am confident it will not produce syntax errors, and it produces results that are not obviously insane. But I cannot say if it is a correct implementation of the method described in #2 as I have no data set with known results to test it on. So, use it at your own risk. But it's a start and it might be sufficient for now.

Code:

capture program drop cwmpse_adj program define cwmpse_adj, rclass syntax varlist(min = 3 max = 3) [if] [in], tau(integer) marksample touse tokenize `varlist' local observed `1' local forecast1 `2' local forecast2 `3' capture tsset if c(rc) != 0 { display as error "Data must be tsset" exit 9 } tempvar delta01 delta02 delta12 fhat quietly { gen `delta01' = F`tau'.`observed' - F`tau'.`forecast1' if `touse' gen `delta02' = F`tau'.`observed' - F`tau'.`forecast2' if `touse' gen `delta12' = F`tau'.`forecast1' - F`tau'.`forecast2' if `touse' gen `fhat' = `delta01'^2 - (`delta02'^2 - `delta12'^2) if `touse' ttest `fhat' = 0 } return scalar t = r(t) return scalar sig_10 = (r(t) > 1.282) return scalar sig_5 = (r(t) > 1.645) exit end
Comment
Stan Peterburgsky

Join Date: Jun 2019

Posts: 74
#6

24 Jan 2025, 10:45

Wow, thank you very much! I will test this out by manually computing everything one time and post what I find (hopefully over the weekend).
Comment
Stan Peterburgsky

Join Date: Jun 2019

Posts: 74
#7

26 Jan 2025, 14:36

I tested this out, and it appears to give identical answers to my less elegant code, but the call to the program needs to be modified to

Code:

cwmpse_adj observed_value forecast1 forecast2 if insample==0, tau(#)

to give out-of-sample t stat.

Thank you very much for the program!

P.S. I didn't realize that Stata can do out-of-sample predictions based on in-sample estimation; I thought I would have to copy and paste regression coefficients.

Last edited by Stan Peterburgsky; 26 Jan 2025, 14:46.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#8

26 Jan 2025, 16:01

I didn't realize that Stata can do out-of-sample predictions based on in-sample estimation...

It depends on exactly what you are predicting and it is not supported for some estimation commands, but it works for most of them.
Comment

Announcement

elegant code for out-of-sample tests

Comment

Comment

Comment

Comment

Comment

Comment

Comment