Request guidance in Out-of-sample on Probit model

Dipen Modi

Join Date: Aug 2020

Posts: 6
#1

Request guidance in Out-of-sample on Probit model

12 Aug 2020, 03:03

Hi,
Forgive me for my silly question as I am new to this. I do not understand how I can generate Out-of-sample forecasts.
I am working on a probit model where the binary dependent variable is Recession Indicator.
The sample is broken into two - one for in-sample analysis (1981m5 - 2001m4) and the other for out-of-sample analysis (2001m5 - 2020m3). I have spent over a week researching this with no success. I understand the technique described in other pages but don't understand what command I should use to create the out-of-sample forecasts.
For your ease, I have attached part of my data (so that I can try this on my own with the entire data once I know how).

Thank you so much for your time and kind help,
DM
Attached Files

Essay.xlsx (33.8 KB, 2 views)

Last edited by Dipen Modi; 12 Aug 2020, 03:23.
Tags: logit, Out of sample, probit, Suggestion, Time Series
Andrew Musau

Join Date: Oct 2014

Posts: 10200
#2

12 Aug 2020, 04:25

You do not show your model and do not present a data example as advised in FAQ Advice #12. Therefore, only a generic solution can be suggested. So run your model using the -if- qualifier to constrain observations to the in-sample years and then predict using the full sample containing the out-of-sample observations.

Code:

xtprobit ... if inrange(ym, tm(1981m5), tm(2001m4)) predict prob, pu0

where "ym" is your year-month variable.

Last edited by Andrew Musau; 12 Aug 2020, 04:30.
Comment
Dipen Modi

Join Date: Aug 2020

Posts: 6
#3

12 Aug 2020, 12:28

Originally posted by Andrew Musau View Post

You do not show your model and do not present a data example as advised in FAQ Advice #12. Therefore, only a generic solution can be suggested. So run your model using the -if- qualifier to constrain observations to the in-sample years and then predict using the full sample containing the out-of-sample observations.

Code:

xtprobit ... if inrange(ym, tm(1981m5), tm(2001m4)) predict prob, pu0

where "ym" is your year-month variable.

Sorry about that. I am still trying to get the hang of this. If you can tell me how I can share the data example, that'd be great.

Wow! It actually worked. Thank you so much, Sir! You are very helpful.
So basically, I am running a probit on the restricted sample first.

Next, should I run the probit on the full sample or the next bit

Code:

probit ... if inrange (Time, tm(2001m5), tm(2020m4)) *OR probit ... if inrange (Time, tm(1981m5), tm(2020m4))

Also, does it matter whether I take probit or xtprobit?

Last edited by Dipen Modi; 12 Aug 2020, 12:38.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10200
#4

12 Aug 2020, 12:54

So basically, I am running a probit on the restricted sample first.

Yes, so according to your description in #1, your in-sample period is 1981m5-2001m4, so your estimation should be based on these years. Your out-sample period is 2001m5-2020m4, so you use the in-sample estimates to generate out-sample predictions. Running the predict command with no conditions will achieve this.

Also, does it matter whether I take probit or xtprobit?

If you have panel data, you should use xtprobit. In fact, I would switch to xtlogit and compare fixed effects and random effects using a Hausman test.
Comment
Dipen Modi

Join Date: Aug 2020

Posts: 6
#5

12 Aug 2020, 16:00

Originally posted by Andrew Musau View Post

Yes, so according to your description in #1, your in-sample period is 1981m5-2001m4, so your estimation should be based on these years. Your out-sample period is 2001m5-2020m4, so you use the in-sample estimates to generate out-sample predictions. Running the predict command with no conditions will achieve this.

If you have panel data, you should use xtprobit. In fact, I would switch to xtlogit and compare fixed effects and random effects using a Hausman test.

Your answer is to the point and indeed very helpful. I'm working on a time series data. Just to be sure that I'm getting you right.

Step 1:

Code:

probit y x1 x2 x3 if inrange (Time, tm(2001m5), tm(2020m4)

Step 2: Get the estimates from in-sample data.

Code:

predict Probability, pr

Step 3: Use the in-sample estimates to make out of sample predictions.

Code:

How???

I think I'm still not there yet.

Last edited by Dipen Modi; 12 Aug 2020, 16:04.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10200
#6

12 Aug 2020, 16:13

Step 1:
Code:
probit y x1 x2 x3 if inrange (Time, tm(2001m5), tm(2020m4)

Now you are running the regression using the out-of-sample observations. Change the time period.

Step 2: Get the estimates from in-sample data.
Code:
predict Probability, pr

So you have done the out-of-sample predictions in step 2, given that you have addressed my first point. If you browse your data, you will see a variable named "Probability", which are your predicted probabilities. This variable will cover the entire in-sample and out-sample periods. The predictions for observatons falling within the period 2001m5 to 2020m4 are what are referred to as out-of-sample predictions.

Last edited by Andrew Musau; 12 Aug 2020, 16:35.
Comment
Dipen Modi

Join Date: Aug 2020

Posts: 6
#7

12 Aug 2020, 16:32

Originally posted by Andrew Musau View Post

Code:

Step 1: Code: probit y x1 x2 x3 if inrange (Time, tm(2001m5), tm(2020m4)

Now you are running the regression using the out-of-sample observations. Change the time period.

So you have done the out-of-sample predictions in step 2, given that you have addressed my first point. If you browse your data, you will see a variable named "Probability", which are your predicted probabilities. This variable will cover the entire in-sample and out-sample periods. The predictions for observatons falling within the period 2001m5 to 2020m4 are what are referred to as out-of-sample predictions.

Thanks for pointing out the error.
So it is a two simple two step process.

Step 1:

Code:

Probit y x1 x2 x3 if inrange (Time, tm(1981m5), tm(2020m4)

Step 2:

Code:

Predict Pr_OOS, pr

In essence, the difference between In-Sample and Out of Sample lies in the range we are using to run the probit regression. Then, I can simply compare the two with the help of time plots:

Code:

line Prob_IS Prob_OOS Time brier

Is there any other testing that can be done?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10200
#8

12 Aug 2020, 16:40

Step 1:
Code:
Probit y x1 x2 x3 if inrange (Time, tm(1981m5), tm(2020m4)

Again, look at when your in-sample period ends from #1. In your code you are using all observations.

In essence, the difference between In-Sample and Out of Sample lies in the range we are using to run the probit regression. Then, I can simply compare the two with the help of time plots:

Code:
line Prob_IS Prob_OOS Time brier
Is there any other testing that can be done?

For the out-sample, you can plot (compare) actual and predicted as you have these values in the dataset. What other analyses can be done depends on the goal of your prediction, so consult relevant literature for this.
1 like
Comment
Dipen Modi

Join Date: Aug 2020

Posts: 6
#9

12 Aug 2020, 16:45

Originally posted by Andrew Musau View Post

Again, look at when your in-sample period ends from #1. In your code you are using all observations.

For the out-sample, you can plot (compare) actual and predicted as you have these values in the dataset. What other analyses can be done depends on the goal of your prediction, so consult relevant literature for this.

Yes, you are right.

For In-Sample:

Code:

y x1 x2 x3 if inrange (Time, tm(1981m5), tm(2001m4)

FOR Out-of-Sample:

Code:

y x1 x2 x3 if inrange (Time, tm(1981m5), tm(2020m4)

Some people on Research Gate recommended something called rolling procedure. I felt it is an iterative process and I didn't get it. Is this an alternative method to that?
This is so much simpler!
Comment

Announcement

Request guidance in Out-of-sample on Probit model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment