Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unbalanced panel data

    Hello

    I downloaded WDI in panel format from World Bank website, did some cleaning in excel and imported it to Stata but xtset command results to:

    xtset CountryNum Year, yearly
    panel variable: CountryNum (unbalanced)
    time variable: Year, 1967 to 2016, but with gaps
    delta: 1 year


    Is there a way to find out which year is missing so I could just drop years from that and earlier? Earlier years mostly have missing data anyway.. Or is there a better solution to this? Hoping for a response. Thank you!
    Last edited by Krizia Garcia; 06 Feb 2017, 06:24.

  • #2
    Hello Krizia,

    Welcome to the Stata Forum.

    A panel data with gaps is not necessarily an issue of much concern. Moreover, if I got it right, you seem to wish to "drop years" with gaps. But the gaps are exactly the lacking ones. In case you need to cope with missing data related to the variables, there are many resources, multiple imputation being one to think about. Finally, you may - tsfill, full - your data and - list - the missing data related to the years, i.e, to see the gaps.
    Best regards,

    Marcos

    Comment


    • #3
      Dear Krizia,

      You do not necessarily need to download data from wdi and then use it in stata. I know that it requires too much work. To make things easy, the World Bank has developed a program. Type:

      Code:
      ssc install wbopendata
      You can download all the data you need directly to stata with this program. You can select which years you want.

      Hope this helps

      Comment


      • #4
        Krizia:
        as an aside to Marcos helpful remarks, please note that by deleting years with missing data you're impicitly doing a sort of make-up to your dataset. Hence, you may end up with a sample that shows only a tenuous link with the original one.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Originally posted by Marcos Almeida View Post
          Hello Krizia,

          Welcome to the Stata Forum.

          A panel data with gaps is not necessarily an issue of much concern. Moreover, if I got it right, you seem to wish to "drop years" with gaps. But the gaps are exactly the lacking ones. In case you need to cope with missing data related to the variables, there are many resources, multiple imputation being one to think about. Finally, you may - tsfill, full - your data and - list - the missing data related to the years, i.e, to see the gaps.
          Thank you

          I actually intend to drop the years preceding that of the missing year.. For example if I find out 1989 is missing, I drop years <1990.. ​​​​Thank you so much for your comments and suggestions! I am very new to the intricacies of Stata so I apologize for my elementary question. As of now I find out how to perform multiple imputation..

          Comment


          • #6
            Originally posted by Dias Rafaj View Post
            Dear Krizia,

            You do not necessarily need to download data from wdi and then use it in stata. I know that it requires too much work. To make things easy, the World Bank has developed a program. Type:

            Code:
            ssc install wbopendata
            You can download all the data you need directly to stata with this program. You can select which years you want.

            Hope this helps
            This does help a lot! Thank you!

            Comment


            • #7
              Originally posted by Carlo Lazzaro View Post
              Krizia:
              as an aside to Marcos helpful remarks, please note that by deleting years with missing data you're impicitly doing a sort of make-up to your dataset. Hence, you may end up with a sample that shows only a tenuous link with the original one.
              My original plan for this panel is to include only 1996-2015 to cover the most recent 20 years for the purposes of availability of data, but yes, I do agree with you about this. I am a little bit concerned about multiple imputation, however, because I have read somewhere around the internet that "nobody relies heavily on a study that uses imputed data" which I hope is not true.

              Comment


              • #8
                Krizia:
                the acceptability of multiple imputation is conditional on its assumptions (that should encompasses both missingness mechanism and patterns).
                To avoid any criticism yo may want to present the results of your panel data regeession with and without imputed missing values and comment on the possibe difference in the Discussion section of your paper/article.
                Kind regards,
                Carlo
                (Stata 18.0 SE)

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  Krizia:
                  the acceptability of multiple imputation is conditional on its assumptions (that should encompasses both missingness mechanism and patterns).
                  To avoid any criticism yo may want to present the results of your panel data regeession with and without imputed missing values and comment on the possibe difference in the Discussion section of your paper/article.
                  Thank you so much for your suggestion, Carlo! I will definitely take note on this. Again, thanks!
                  Best,
                  Krizia

                  Comment


                  • #10
                    Unless you research imputation very carefully to understand the underlying assumptions, I would not apply it to a data set such as a panel on countries. For one, if your imputation procedure treats the panel as if it's one cross section, it is highly likely that the variables that are imputed would no longer be exogenous in equations for which the actually variables are anything except strictly exogenous. That is because future values will be used to impute current values, and this means that you could not estimate a model where the variables are only weakly but not strictly exogenous.

                    Plus, countries are very different from one another. This isn't like imputing data on individuals where you can view the data as a sample from a large population.

                    In my view, you are much better off use the data you have and including country and year fixed effects. Then the missingness can vary systematically by country and time, and this is about as general as things get. Do not fall for the siren song of imputation.

                    If you want further discussion, here's a link to lecture notes that discuss the shortcomings of imputation:

                    http://www.irp.wisc.edu/newsevents/w.../schedule1.htm

                    It's the very last set, Lecture 18.

                    JW

                    Comment

                    Working...
                    X