Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Probit Analysis Error

    Dear Stata Community,


    I am trying to run the Probit command with many DVs for different years, I keep receiving the error (2000) message. None of my variables is string.
    The period of the analysis is 5 years, I have 3 different dummy variables to be applied for these periods of years (DV1 for year 1 and 2, DV2 for year 2 and 3, DV3 for year 5)


    My command is:
    probit emp_stat urbdum1 age yrseduc Under5 if year == 2001


    I was able to run the code for the first two years but substituting the DV and the year for the third period brings out the error r(2000) message saying "no observation"

    My data looks like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte emp_stat int year float urbdum1 int age byte yrseduc float Under5
    0 1995 1 40 10 0
    1 1995 1 28 13 1
    1 1995 1 23 13 0
    0 1995 1 31 10 0
    1 1995 1 37 13 0
    1 1995 1 45 10 0
    1 1995 1 47 12 0
    1 1995 1 35 12 1
    1 1995 1 35 12 1
    0 1995 1 40 10 0
    1 1995 1 39  . 0
    1 1995 1 42 12 0
    1 1995 1 21 12 0
    1 1995 1 34 12 1
    1 1995 1 41 12 0
    1 1995 1 23 12 0
    1 1995 1 55 13 0
    1 1995 1 31 15 0
    1 1995 1 46 12 0
    1 1995 1 21 12 0
    1 1995 1 34 15 1
    0 1995 1 43 12 0
    1 1995 1 29 13 1
    1 1995 1 43 12 0
    1 1995 1 36 12 0
    1 1995 1 39 11 0
    1 1995 1 45 12 0
    1 1995 1 37 12 0
    1 1995 1 41 12 0
    1 1995 1 38 12 1
    1 1995 1 39 12 0
    1 1995 1 45 10 0
    1 1995 1 36 11 1
    1 1995 1 25 10 0
    1 1995 1 27 12 0
    1 1995 1 34  6 1
    1 1995 1 38 10 1
    1 1995 1 28  9 0
    0 1995 1 31  9 1
    1 1995 1 38  6 0
    0 1995 1 22  9 1
    1 1995 1 41  7 1
    1 1995 1 26  8 1
    1 1995 1 19 12 0
    1 1995 1 20 12 1
    1 1995 1 33 10 1
    1 1995 1 43  9 0
    1 1995 1 19 10 0
    1 1995 1 23 12 0
    1 1995 1 46  7 0
    0 1995 1 23 10 0
    1 1995 1 33 10 0
    1 1995 1 42 13 0
    1 1995 1 27 12 1
    1 1995 1 25 12 0
    1 1995 1 23 12 0
    0 1995 1 20 11 1
    1 1995 1 23 12 1
    1 1995 1 39 12 0
    1 1995 1 50  7 0
    1 1995 1 33  8 0
    1 1995 1 26 10 1
    1 1995 1 19 12 1
    1 1995 1 28  9 1
    1 1995 1 29 12 0
    1 1995 1 34  9 1
    0 1995 1 20  7 1
    1 1995 1 30  8 1
    1 1995 1 43  9 0
    1 1995 0 32 15 1
    0 1995 0 29  8 0
    0 1995 0 21 12 0
    1 1995 0 22 12 0
    1 1995 0 49  6 0
    1 1995 0 50 12 0
    1 1995 0 30  8 0
    1 1995 0 41  6 1
    1 1995 0 22  6 1
    1 1995 0 24  6 1
    1 1995 0 30  6 1
    1 1995 0 52  6 0
    1 1995 1 24 10 0
    1 1995 1 30 12 0
    1 1995 1 24 12 1
    0 1995 1 28 12 1
    0 1995 1 24 12 0
    1 1995 1 43 10 0
    0 1995 1 34 15 1
    1 1995 1 32 12 0
    1 1995 1 39 12 0
    1 1995 1 54 10 0
    1 1995 1 25 12 0
    1 1995 1 28  8 0
    1 1995 1 27 10 0
    1 1995 1 42 12 0
    1 1995 1 21 12 0
    1 1995 1 24 12 0
    1 1995 1 23 12 0
    1 1995 1 26 12 1
    1 1995 1 24 13 0
    end
    label values age age

    What are your recommendations for this?

  • #2
    Your example data is not very helpful because all of the observations there are from 1995, so, of course, it shows no observations when you run a command with -if year == 2001-.

    But I'll assume that's not true in your real data set (though you should verify that you really do have data for 2001 by running -count if year == 2001-). There are two ways you can end up with no observations even though you do have data for year 2001.

    1. In order for an observation to be included in any regression command, it must have non-missing values on all of its variables. If even one of the variables has a missing value, the observation is excluded. So it doesn't take a large amount of missing data overall to end up with a situation where every observation in year 2001 has a missing value for some variable. You can check this by running:
    Code:
    count if !missing(emp_stat, urbdum1, age, yrseduc, Under5) & year == 2001
    If the result is 0, then you have your answer: all 2001 observations have a missing value somewhere.

    2. If the outcome variable, emp_stat, is always 0 or always 1 for year 2001, then you have what is known as "perfect prediction", and those observations will all be dropped, leaving nothing behind. This is done because in the situation of perfect prediction, the probit regression coefficient will be infinite in magnitude, and -probit- would not be able to converge. Stata tests for this before starting the estimation process, and tells you about the problem rather than trying a pointless estimation that is doomed to fail. The simplest way to check for this is:
    Code:
    tab emp_stat if year == 2001
    If the result shows only 0 or only 1, then you have your answer.

    Comment


    • #3
      Dear Prof. Clyde,

      Thank you for your response.


      1. The data excerpt is showing 1995 for the year because I have thousands of observations and the year 2001 is not the first year of my observation. I was only able to cull the first 100 observations. As I said, I was able to run the Probit command for the first two years of my analysis, running for subsequent years and DVs brings the error r(2000).

      2. count if !missing(emp_stat, urbdum1, age, yrseduc, Under5) & year == 2001.

      Running this command brings quite a number of observations that are not missing for the year 2001. Again, the probit went through for the year 1995 which as well has some missing values.

      3. emp_stat is a binary choice variable that has both 0 and 1 as outcome for the year 2001. Running the code
      tab emp_stat if year == 2001
      shows that emp_stat has both values for year 2001

      Comment


      • #4
        I was able to combine the three different Dummy Variables into one composite Dummy Variable (Urbdummy). The commands for the first two years of my analysis were executed. however, running for the subsequent years brings error r(2000) "no observation" message. Any way forward, please?

        My Commands:
        probit emp_stat Urbdummy age yrseduc Under5 if year==1995
        probit emp_stat Urbdummy age yrseduc Under5 if year==2001
        probit emp_stat Urbdummy age yrseduc Under5 if year==2008
        probit emp_stat Urbdummy age yrseduc Under5 if year==2014
        probit emp_stat Urbdummy age yrseduc Under5 if year==2017
        Last edited by Olabisi Matthew; 23 Aug 2019, 02:57.

        Comment


        • #5
          What do you get for
          Code:
          tab emp_stat if year == 2001 & !missing(emp_stat, Urbdummy, age, yrseduc, Under5)
          If you have only one value of emp_stat showing there, then that would explain the no observations message. That's actually what I should have asked you to do in #3.

          Comment


          • #6
            Running the command
            tab emp_stat if year == 2001 & !missing(emp_stat, Urbdummy, age, yrseduc, Under5) gives a total observations of 57,877. 15,762 observations with the value of 0 and 42,115 observations with the value of 1.

            Comment


            • #7
              Off hand I cannot think of any other reason for this. Please post an example of your data (using the -dataex- command), and please be sure that the example includes several observations that you believe should be included in the regression but Stata is not recognizing. And also show the exact code you ran and the exact output Stata is giving you: copy from your Results window or log file and paste directly here between code delimiters. (If you are not familiar with code delmiters, see FAQ #12 for details.) Do not edit the code or output in any way.

              If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

              When asking for help with code, always show example data. When showing example data, always use -dataex-.

              Added: Actually I can think of one other way you could encounter this problem. If there is some predictor in your variable that, when it takes on a certain value, perfectly predicts the outcome, then all observations having that value of that variable get omitted from the estimation sample. So it might be that all of your 2001 observations get omitted by such a situation, or by several of them in combination.
              Last edited by Clyde Schechter; 26 Aug 2019, 09:29.

              Comment


              • #8
                In an attempt to reply to this message, I realized that one of my variables has no observation for the year 2001 and subsequent years. I guess that is where the error emanated from.

                Thank you Prof. Clyde.

                Comment

                Working...
                X