Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Request guidance in Out-of-sample on Probit model

    Hi,
    Forgive me for my silly question as I am new to this. I do not understand how I can generate Out-of-sample forecasts.
    I am working on a probit model where the binary dependent variable is Recession Indicator.
    The sample is broken into two - one for in-sample analysis (1981m5 - 2001m4) and the other for out-of-sample analysis (2001m5 - 2020m3). I have spent over a week researching this with no success. I understand the technique described in other pages but don't understand what command I should use to create the out-of-sample forecasts.
    For your ease, I have attached part of my data (so that I can try this on my own with the entire data once I know how).

    Thank you so much for your time and kind help,
    DM
    Attached Files
    Last edited by Dipen Modi; 12 Aug 2020, 03:23.

  • #2
    You do not show your model and do not present a data example as advised in FAQ Advice #12. Therefore, only a generic solution can be suggested. So run your model using the -if- qualifier to constrain observations to the in-sample years and then predict using the full sample containing the out-of-sample observations.

    Code:
    xtprobit ... if inrange(ym, tm(1981m5), tm(2001m4))
    predict prob, pu0
    where "ym" is your year-month variable.
    Last edited by Andrew Musau; 12 Aug 2020, 04:30.

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      You do not show your model and do not present a data example as advised in FAQ Advice #12. Therefore, only a generic solution can be suggested. So run your model using the -if- qualifier to constrain observations to the in-sample years and then predict using the full sample containing the out-of-sample observations.

      Code:
      xtprobit ... if inrange(ym, tm(1981m5), tm(2001m4))
      predict prob, pu0
      where "ym" is your year-month variable.
      Sorry about that. I am still trying to get the hang of this. If you can tell me how I can share the data example, that'd be great.

      Wow! It actually worked. Thank you so much, Sir! You are very helpful.
      So basically, I am running a probit on the restricted sample first.

      Next, should I run the probit on the full sample or the next bit

      Code:
       probit ... if inrange (Time, tm(2001m5), tm(2020m4)) 
      
      *OR
      
      probit ... if inrange (Time, tm(1981m5), tm(2020m4))
      Also, does it matter whether I take probit or xtprobit?
      Last edited by Dipen Modi; 12 Aug 2020, 12:38.

      Comment


      • #4
        So basically, I am running a probit on the restricted sample first.

        Yes, so according to your description in #1, your in-sample period is 1981m5-2001m4, so your estimation should be based on these years. Your out-sample period is 2001m5-2020m4, so you use the in-sample estimates to generate out-sample predictions. Running the predict command with no conditions will achieve this.


        Also, does it matter whether I take probit or xtprobit?

        If you have panel data, you should use xtprobit. In fact, I would switch to xtlogit and compare fixed effects and random effects using a Hausman test.

        Comment


        • #5
          Originally posted by Andrew Musau View Post


          Yes, so according to your description in #1, your in-sample period is 1981m5-2001m4, so your estimation should be based on these years. Your out-sample period is 2001m5-2020m4, so you use the in-sample estimates to generate out-sample predictions. Running the predict command with no conditions will achieve this.





          If you have panel data, you should use xtprobit. In fact, I would switch to xtlogit and compare fixed effects and random effects using a Hausman test.
          Your answer is to the point and indeed very helpful. I'm working on a time series data. Just to be sure that I'm getting you right.

          Step 1: ​​​​
          Code:
            
          probit y x1 x2 x3 if inrange (Time, tm(2001m5), tm(2020m4)
          Step 2: Get the estimates from in-sample data.
          Code:
          predict Probability, pr
          ​​Step 3: Use the in-sample estimates to make out of sample predictions.
          Code:
          How???
          I think I'm still not there yet.
          Last edited by Dipen Modi; 12 Aug 2020, 16:04.

          Comment


          • #6

            Step 1: ​​​​
            Code:
            probit y x1 x2 x3 if inrange (Time, tm(2001m5), tm(2020m4)
            Now you are running the regression using the out-of-sample observations. Change the time period.


            Step 2: Get the estimates from in-sample data.
            Code:
            predict Probability, pr

            So you have done the out-of-sample predictions in step 2, given that you have addressed my first point. If you browse your data, you will see a variable named "Probability", which are your predicted probabilities. This variable will cover the entire in-sample and out-sample periods. The predictions for observatons falling within the period 2001m5 to 2020m4 are what are referred to as out-of-sample predictions.
            Last edited by Andrew Musau; 12 Aug 2020, 16:35.

            Comment


            • #7
              Originally posted by Andrew Musau View Post
              Code:
              Step 1: ​​​​
              Code:
              probit y x1 x2 x3 if inrange (Time, tm(2001m5), tm(2020m4)

              Now you are running the regression using the out-of-sample observations. Change the time period.





              So you have done the out-of-sample predictions in step 2, given that you have addressed my first point. If you browse your data, you will see a variable named "Probability", which are your predicted probabilities. This variable will cover the entire in-sample and out-sample periods. The predictions for observatons falling within the period 2001m5 to 2020m4 are what are referred to as out-of-sample predictions.
              Thanks for pointing out the error.
              So it is a two simple two step process.

              Step 1:
              Code:
               
              Probit y  x1 x2 x3 if inrange (Time, tm(1981m5), tm(2020m4)
              Step 2:
              Code:
               
              ​​​​Predict Pr_OOS, pr
              In essence, the difference between In-Sample and Out of Sample lies in the range we are using to run the probit regression. Then, I can simply compare the two with the help of time plots:

              Code:
               
              line Prob_IS  Prob_OOS Time
              brier
              Is there any other testing that can be done?

              ​​​​​

              ​​

              Comment


              • #8
                Step 1:
                Code:
                Probit y x1 x2 x3 if inrange (Time, tm(1981m5), tm(2020m4)
                Again, look at when your in-sample period ends from #1. In your code you are using all observations.


                In essence, the difference between In-Sample and Out of Sample lies in the range we are using to run the probit regression. Then, I can simply compare the two with the help of time plots:

                Code:
                line Prob_IS Prob_OOS Time brier
                Is there any other testing that can be done?

                For the out-sample, you can plot (compare) actual and predicted as you have these values in the dataset. What other analyses can be done depends on the goal of your prediction, so consult relevant literature for this.

                Comment


                • #9
                  Originally posted by Andrew Musau View Post

                  Again, look at when your in-sample period ends from #1. In your code you are using all observations.





                  For the out-sample, you can plot (compare) actual and predicted as you have these values in the dataset. What other analyses can be done depends on the goal of your prediction, so consult relevant literature for this.
                  Yes, you are right.
                  ​​​​​​
                  For In-Sample:
                  Code:
                   y x1 x2 x3 if inrange (Time, tm(1981m5), tm(2001m4)
                  FOR Out-of-Sample:
                  Code:
                   y x1 x2 x3 if inrange (Time, tm(1981m5), tm(2020m4)
                  Some people on Research Gate recommended something called rolling procedure. I felt it is an iterative process and I didn't get it. Is this an alternative method to that?
                  This is so much simpler!

                  Comment

                  Working...
                  X