Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • r(2000) no observations when running a regression

    First of all, I am new to Stata and trying to learn the basics. I am trying to run a simple regression on some data that I have. I have panel data which is set up with the company number as the panel variable and the YEAR variable as the time variable:

    .xtset CONO YEAR, yearly
    panel variable: CONO (strongly balanced)
    time variable: YEAR, 1999 to 2012
    delta: 1 year


    I want to regress a PRICE variable against 7 other variables. However, when I run the regression I get the following error:

    . regress PRICE BV NI PPE1 PPE2 PPE3 PPE4 PPE5
    no observations
    r(2000);


    The PRICE variable has 1553 observations, the independent variables have differing numbers of observations BV (1701), NI(1698) PPE1 (103), PPE2 (190), PPE3 (394), PPE4 (282), PPE5 (158).

    I'm trying to understand what the problem is here and how to overcome it. Any help is greatly appreciated. Thanks in advance.

  • #2
    Hi,
    First of all, Remind you of the protocol in the Forum is to use your name and not (as in this case) a random sequence of numbers. Most people wouldnt not appreciate that. You can always ask to change your user name using the "Contact us" option.
    Second, While is true you have observations on all your variables, there is nothing that says that the information is complete for each observation. For example, in you regression you might have 103 observations with information on variable PEE1 and 190 completely different observations with information on PPE2. Since there is no common ground for all, Stata assumes there are no observations (with common information) available to run a regression.
    HTH
    Fernando

    Comment


    • #3
      The error code 2000 may mean that one or more of your variables is string when numeric variables are needed. Here the problem is more likely to be that your data contain missing values.

      What's crucial is whether all variables specified are all non-missing in at least some observations. The evidence implies No. Try

      Code:
      count if !missing(PRICE, BV, NI, PPE1, PPE2, PPE3, PPE4, PPE5)
      I note that your regression ignores the panel structure you carefully explain.

      Please respect the request to register with your full real name. You can use the "Contact us" button on the home page to give a full real name.

      Comment


      • #4
        Apologies for using my student number as my username, I have sent a request to Admin to change it to my name. Thank you for answers to my question.

        Fernando: the nature of the data is such that if there are values in PPE1 then there won't be values in PPE2, PPE3, PPE4 & PPE5. There will only be a value in one of these (PPE1, PPE2, PPE3, PPE4 & PPE5) for each company and the other 4 will be missing values. I want a single set of results for presentation purposes. Apart from running 5 different regressions - is there any way to run the regression taking this into account?

        Nick: none of the variables are string - there are numerous missing values throughout the data.

        2 questions:
        1. For each company, there will only be one value for PPE1, PPE2, PPE3, PPE4 & PPE5 - the other four will be missing values. Is there any way to factor that in to the regression?
        2. How do I take the panel structure into account with my regression?

        . count if !missing(PRICE, BV, NI, PPE1, PPE2, PPE3, PPE4, PPE5)
        0


        Thanks again for the help.

        Comment


        • #5
          You can't do regression with data in that form, as Stata is already telling you. What's more likely to work is that you combine variables

          Code:
          gen PPE = max(PPE1, PPE2, PPE3, PPE4, PPE5)
          and then create indicator variables based on

          Code:
           
          gen whichPPE = . 
          forval j = 1/5 { 
             replace whichPPE = `j' if PPE`j' < . 
          }
          i.e. whichPPE will be a factor variable you might include in your model.

          Comment


          • #6
            For a simple linear regression, the only solution is either to impute something for the missing values or to use only one of the five variables (PPE1-PPE5) at a time. (I doubt if the latter is what you want, though). It sounds like there is some dependence between PPE1-PPE5 so a better solution may be to combine them somehow into a single composite variable. If you can explain more about the nature of these five variables, perhaps there will be some suggestions on how to combine them and/or how to impute the missings.

            There may also be other more sophisticated regression methods for this situation which I am not aware of.

            Comment


            • #7
              Joe: The regression is intended to assess the association between the valuation of an item of Plant & Equipment in a company and the share price. Companies can choose between 5 (PPE1 - PPE5) different policies to value their Plant & Equipment so each company can only have a value in one of the 5 variables.

              If I fill in the missing value for the other four variables as a 0 instead of a missing value. Would this be a possible way to overcome this? Or is that flawed?

              Comment


              • #8
                Just consider whether that is consistent with the overall idea that you are fitting a hyperplane to your data.

                Comment


                • #9
                  Sean,

                  What are the possible values of PPE1-PPE5?

                  If they are just missing or 1, then, yes, you can making the missings into 0 and you just have a series of dummy variables indicating the 5 different policies. (Alternatively, and more efficiently, you can have a single variable with values 1-5--see Nick's post #5--which you can treat as categorical in your regression using i.whichPPE notation).

                  Otherwise, you probably don't want to change the missings to 0 because that will probably result in a series of very skewed distributions. In that case, you should do what Nick suggested in his post #5 and look at the interaction between PPE and i.whichPPE.

                  Regards,
                  Joe

                  Comment


                  • #10
                    Hi,
                    I'm having a similar problem. I tried mvreg and sureg but I have the error 2000 as well.
                    There are missing values in my dataset, but I can regress those variables with a few data taken one by one, because Stata handles missing values. I have 250 variables and I have to regress every variable against one unique independent variable. Since I can't do it manually, is there a way to regress all the dependent variable with one command?
                    Thanks for the help

                    Comment


                    • #11
                      Stefano:
                      are you looking for something like the following toy-example?:
                      Code:
                      . foreach var of varlist y_1 y_2 {
                        2. regress `var' x
                        3. }
                      Kind regards,
                      Carlo
                      (Stata 18.0 SE)

                      Comment


                      • #12
                        Perfect! Thanks

                        Comment


                        • #13
                          I wan to run SVAR/VR but When I run "no observation" whether I can run regression and tsset mth, monthly is ok. How can I solve the problem? please help.

                          Comment


                          • #14
                            Hasan, please read the FAQ, especially section 12 ("What should I say about the commands and data I use?"), and then start a new topic with a more detailed description of your problem. The other list members need more information in order to help.

                            Comment


                            • #15
                              hello ... i have a problem running a first difference regression ; i get the "(r2000) no observations" error. Im quite sure i have all my observations and no missing data because my "describe" command detailed so. however, after the first difference regression failed, i tried running a "count if !missing" for all regression variables and got 234 which is basically all my observations.... does this make any sense ? i'd appreciate your support... thanks..

                              Comment

                              Working...
                              X