Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • elegant code for out-of-sample tests

    Hello folks,

    Is there an elegant code for out-of-sample tests (or test statistics)?
    For example, lets say I'd like to perform the Clark and West (2007) MSPE-adjusted test. Suppose I have data for 300 periods, of which the first 100 will be in-sample and the last 200 will be out-of-sample. Let y denote the observations on the dependent variable, and yhat_model and yhat_simple denote the fitted values for the larger model and the simple model, respectively. I'd like to do a test (or get a test statistic) without copying and pasting parameters, etc. Could someone please recommend an elegant code?

    Thank you very much,
    Stan

  • #2
    I have never heard of this particular test. But assuming that it is a test that is used following some Stata estimation command, you can do the in-sample and out-of-sample tests as follows:

    Code:
    Stata_estimation_command
    predict whatever, needed_prediction  // IF NECESSSARY FOR TEST
    command_for_clark_and_west_mspe_adjusted_test if e(sample) // IN-SAMPLE TEST
    command_for_clark_and_west_mspe_adjusted_test if !e(sample) // OUT-OF-SAMPLE TEST
    I

    Comment


    • #3
      Thank you, Clyde.
      The test is from Clark, T.E. and West, K.D., 2007. Approximately normal tests for equal predictive accuracy in nested models. Journal of econometrics, 138(1), pp.291-311. I will reproduce the calculations (from a working paper on Google Scholar, publicly available) below.
      Click image for larger version

Name:	Picture1.png
Views:	1
Size:	256.2 KB
ID:	1771389

      Comment


      • #4
        Basically, its a test of whether a time-series model with predictors X1, X2, X3, ... is statistically better than a model that only has a subset of those predictors.

        Comment


        • #5
          OK, here is a program that, I think, implements what you show in equation 2.1 shown in #3. You should consider this program a hack, not a program ready for widespread release. It has the bare bones features necessary to do the calculation. It incorporates no validity checks on the data, other than to verify that they have been -tsset-. (If used with panel data rather than a single time series, it calculates the appropriate lead values in each panel, but also calculates a single summary test for the entire sample--I have no idea if that is what makes sense to do here or not (I suspect not, but I'm not sure what, if anything, would be appropriate). Also, it does not have the option to require an autocorrelation specific standard error, as I don't know how to do that.

          The syntax to call it is:
          Code:
          cwmpse_adj observed_value forecast1 forecast2, tau(#)
          where observed_value is the name of the observed outcome, forecast1 is the name of the first forecast, and forecast2 is the name of the other forecast which nests forecast1. -tau- is the lead time desired, and must be specified replacing # by an integer. You can also restrict its operation to a subset of the data by using appropriate -if- or -in- clauses (as proposed in #2.)

          It returns three things in -r()-: r(t) is the actual t-statistic calculated; r(sig_5) is 1 if the t-statistic is significant by a one-tailed t-test at the 0.05 level, 0 otherwise; and r(sig_10) is 1 if the t-statistic is significant by a one-tailed t-test at the 0.1 level, 0 otherwise.

          I have tested it to the extent that I am confident it will not produce syntax errors, and it produces results that are not obviously insane. But I cannot say if it is a correct implementation of the method described in #2 as I have no data set with known results to test it on. So, use it at your own risk. But it's a start and it might be sufficient for now.

          Code:
          capture program drop cwmpse_adj
          program define cwmpse_adj, rclass
              syntax varlist(min = 3 max = 3) [if] [in], tau(integer)
              marksample touse
              tokenize `varlist'
              local observed `1'
              local forecast1 `2'
              local forecast2 `3'
              capture tsset
              if c(rc) != 0 {
                  display as error "Data must be tsset"
                  exit 9
              }
              tempvar delta01 delta02 delta12 fhat
              quietly {
                  gen `delta01' = F`tau'.`observed' - F`tau'.`forecast1' if `touse'
                  gen `delta02' = F`tau'.`observed' - F`tau'.`forecast2' if `touse'
                  gen `delta12' = F`tau'.`forecast1' - F`tau'.`forecast2' if `touse'
                  gen `fhat' = `delta01'^2 - (`delta02'^2 - `delta12'^2) if `touse'
                  ttest `fhat' = 0
              }
              return scalar t = r(t)
              return scalar sig_10 = (r(t) > 1.282)
              return scalar sig_5 = (r(t) > 1.645)
              exit
          end

          Comment


          • #6
            Wow, thank you very much! I will test this out by manually computing everything one time and post what I find (hopefully over the weekend).

            Comment


            • #7
              I tested this out, and it appears to give identical answers to my less elegant code, but the call to the program needs to be modified to
              Code:
               cwmpse_adj observed_value forecast1 forecast2 if insample==0, tau(#)
              to give out-of-sample t stat.

              Thank you very much for the program!

              P.S. I didn't realize that Stata can do out-of-sample predictions based on in-sample estimation; I thought I would have to copy and paste regression coefficients.
              Last edited by Stan Peterburgsky; 26 Jan 2025, 14:46.

              Comment


              • #8
                I didn't realize that Stata can do out-of-sample predictions based on in-sample estimation...
                It depends on exactly what you are predicting and it is not supported for some estimation commands, but it works for most of them.

                Comment

                Working...
                X