Prediction intervals for out-of-sample predictions using a model with bootstrapped standard errors

Dominique Pride

Join Date: Dec 2014

Posts: 2
#1

Prediction intervals for out-of-sample predictions using a model with bootstrapped standard errors

08 Dec 2014, 16:07

I have a linear model with bootstrapped standard errors for the parameters. I then used a separate data set to generate out-of-sample forecasts using this model. I would like to make prediction intervals for these forecast values, but when I try to generate the standard errors of the forecasts using the command, "predict PIse,stdf" I get an error that says, "option stdf not allowed after bootstrap estimation r(198);" However, it does allow me to generate the standard error of the prediction using the command, "predict, CIse, stdp" which can be used to make confidence intervals.

Why am I able to estimate the standard errors of the predictions after bootstrap estimation but not able to estimate the standard errors of the forecasts?

Is there a mathematical reason why standard errors of the forecasts cannot be calculated after bootstrap estimation, or is it a software limitation?

Last edited by Dominique Pride; 08 Dec 2014, 16:19.
Tags: None
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

14 Dec 2014, 06:56

Welcome to Statalist, Dominique!

Bottom line: To compute a standard error for the forecast (a new observation $y^*)$, you need an estimate of the variance for the error or deviation term. OLS assumes that that variance is either a constant $\sigma$ or a known function of a constant. The bootstrap makes no assumption about the variance of the error terms. As a consequence it provides no information for estimating the out-of-sample variance.

Longer version:

If you do a regression, the prediction for a new individual * with covariates $x$ is the estimated mean $\widehat{\mu}(x)$.

However the actual value for a new individual (*) with covariate values $x$, will be the true mean + a deviation or error term:

\[
y^* = \mu(x) + e^*
\]

Here $\mu(x)$ is the mean of $y$ evaluated at the covariates x, and $e^*$ is the new random part of $y^*$. The only assumption about $e^*$ is that it has expectation zero.

We don't know $\mu(x)$, so we have to substitute the estimated mean from the regression:

\[
y^*(x) =
\widehat{\mu}(x) + e^*
\]

The variance of the new observation is a sum of the variances of the two parts on the left-hand side.

\[
\text{var}(y^*(x)) = \text{var}(\widehat{\mu}(x)) + \text{var}(e^*)
\]

Both OLS and the bootstrap estimate the first term. But OLS estimates the second only under the assumption of constant error variance. Bootstrapping the regression coefficients alone provides no information about the error variance.

The modern approach to getting at what is called "prediction error" is cross-validation or a bootstrap analysis, e.g. Efron and Tibshirani, 1997.

Reference
Efron, Bradley, and Robert Tibshirani. 1997. Improvements on cross-validation: the 632+ bootstrap method. Journal of the American Statistical Association 92, no. 438: 548-560.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
2 likes
Comment
Giuseppe Cascarino

Join Date: May 2014

Posts: 6
#3

21 Mar 2017, 09:17

-

Last edited by Giuseppe Cascarino; 21 Mar 2017, 09:47.
Comment

Announcement

Prediction intervals for out-of-sample predictions using a model with bootstrapped standard errors

Comment

Comment