Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • But with a gap

    Dear Researchers,

    I have unbalance panel data set from 1990 to 2019, and after running the regression, the STATA showed me a message particularly after setting the data as a panel that (the years are from 1990 to 2019 but with a gap). So, do you think this will affect the analysis. I mean shall I treat it or I shall leave it? And, why this message appeared?

    Secondly, since I have missing values in some of the variables, the STATA will drop these observations with missing values, so is there anyway to know the number of firms included in my analysis after running the regression?

    Many thanks in advance.

  • #2
    It probably will make no difference. Historically, there were analyses that could only be carried out on balanced panel data sets with no gaps, but that seldom matters with modern statistical software. You would have to ask the Stata developers why they chose to have Stata issue that message. I imagine that given the historic preference for a nice, balanced panel with consecutive (or evenly spaced) years, it just seemed prudent to warn the user if the data at hand didn't live up to this ideal. If you are surprised that your data set has gaps in it, then Stata has appropriately warned you that your data are not what you think they are, and before you proceed to analyze them in any way, you should go back and find out why the data differ from what you expected. If you fully understood ahead of time that this data comes with gaps, then this warning can be simply ignored.

    If you are using -xtreg-, then the only observations that will be omitted are those with missing values on one or more of the variables in your model. So if those variables are y, u, v, w, and x, you can approximate the number of firms in your estimation sample by running -distinct firm_id if !missing(y, u, v, w, x)-. (-distinct- is written by Nick Cox and is available from SSC.) If, however, you are including lags or leads or differences in the model, then it gets more complicated, because if, for example, x is missing in the third observation of a given panel, then not only will that third observation be omitted due to missingness of x, but the fourth one also will due to the missingness of L1.x. Similarly, if you are running -xtlogit, fe- then there are other situations where observations can be omitted, as when the outcome doesn't vary. It starts getting complicated trying to calculate all of that in advance, so it's usually just simpler to run the model and see what sample size comes out!

    Comment


    • #3
      Dear Prof. Clyde

      As always, I can't thank you enough for the reply and for the very valuable information. Greatly appreciated.

      Regarding point one, I have fully understood it.

      As for point two, I am using the following model
      Code:
       reg Y X1 X2 X3 i. year, r
      So, do you think that -distinct- function will work with -reg- ?



      Many thanks in advance.

      Comment


      • #4
        Code:
        distinct firm_id if !missing(Y, X1, X2, X3, year)
        will tell you the number of firms that will be included in the estimation sample for that command.

        Comment


        • #5
          Dear Prof. Clyde,

          Thank you very much for your reply.

          I apologize, it was my mistake. I did not meant the above-mentioned model, I mean the following model which includes lag.

          Code:
           
           reg l.Y l.X1 l.X2 l.X3 i. year, r
          I have tried the above model with lag and then the following model:
          Code:
          distinct firm_id if !missing(Y, l.X1, l.X2, l.X3, year)
          And, the STATA showed me the following message:
          -command distinct is unrecognized-

          Do you think this is due to the complexity that you have mentioned in #2 for models with lag?

          I am so sorry for asking again.

          Thank you very much in advance.

          Comment


          • #6
            No. -distinct-, as I noted in #2, is not a part of official Stata. You have to install it from SSC to use it. -ssc install distinct-

            Your model, however, isn't really a model with lags. Because all of the variables other than year are lagged, it's really just an ordinary model, with the variable year mislabeled by one. On the assumption that year is never missing, you could just as simply get your answer with -distinct firm_id if !missing(Y, X1, X2, X3)

            Comment


            • #7
              Dear prof. Clyde,

              Thank you very much for the reply. I truly appreciate your response,

              Also, thank you very much for the code. It works with me.

              One last question, and I am so sorry again for asking, but do you think I can know the name of firms included in the analysis also?

              Thank you very much in advance.

              Comment


              • #8
                Code:
                tab firm_id if !missing(Y, X1, X2, X3)

                Comment


                • #9
                  Dear Prof. Clyde,

                  Thanks from the bottom of my heart for all your answers and for your consideration and cooperation. Greatly appreciated.

                  Comment

                  Working...
                  X