Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unbalanced Panel Dataset: How to exclude all countries with one or more missing variables

    Hi everyone

    I work with the dataset pwt10.0 from the Worldbank. At the end I want to run a fixed effect regression for the years 1971 until 2019, where the dependent variable is the gdp per capita and later on the gdp per worker. For my regression, I just need a few variables from the dataset, so I excluded the rest.

    Now I need a command, which controls if there are any missing in i.e. the "saving" variable. If there is a missing variable in one year, I have to drop this country. Have you here any advice, how this command looks like?

    Further more it would be nice to have a command, which tells me if every country has observations for all year, that means from 1971 to 2019.

    Have someone any idea how to proceed here?

    If you need more information, just bring it up.

  • #2

    Code:
    egen wanted = count(foobar), by(country)


    will return 49 if foobar is always non-missing for each country and a smaller value otherwise.

    Comment


    • #3
      Stata practices something called "case-wise deletion," meaning that if you
      reg Y X Z W
      any row (case) in your data for which any of the Y or X or Z or W has a missing would be dropped from the regression.

      Probably the easiest way to do the things you want is to run the regression you have in mind, and then to use the e(sample) function to see what was included and dropped from your regression.

      Comment


      • #4
        Thanks Nick. Your code helps a lot. Now I can create for each variable a specific wanted variable, and then drop all countries, which have fewer observations than 49, not?

        Comment


        • #5
          If several variables are in question you can count across variables with the egen function rowmiss() and then count within countries as in #2 but use total(). Then your totals should be zero for what you want.

          Code:
          egen rowmiss = rowmiss(frog toad newt) 
          egen totalmiss = total(rowmiss), by(country)
          You don't need to drop countries. You can just fit conditionally on totalmiss being zero.
          Last edited by Nick Cox; 25 Mar 2023, 10:03.

          Comment

          Working...
          X