Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Selecting the subset of data that contains all observations

    Hi All,

    I have data that resemble the following:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(Country GDP var3)
    1   2 1990
    2  32 1990
    3  12 1990
    4  21 1990
    1 123 1991
    2  12 1991
    3  32 1991
    4 321 1991
    5 321 1991
    1 321 1992
    2  32 1992
    3   1 1992
    4  32 1992
    5  12 1992
    end
    In the above, we have information on the GDP of countries, for different years. In the above dataset, I wish to only select the subset of this dataset, where all countries are present. In this case, the resulting dataset should look like:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(Country GDP var3)
    1 123 1991
    2  12 1991
    3  32 1991
    4 321 1991
    5 321 1991
    1 321 1992
    2  32 1992
    3   1 1992
    4  32 1992
    5  12 1992
    end
    As can be seen, year 1990 has been dropped as it did not contain country 5. Now, I can imagine that it is tricky to define all countries. For that purpose, I would think that all would be considered the maximum number of countries possible in the dataset, and then select only the subset of data consisting of only those years wherein all of these countries are present.

    Any help is much appreciated.

    Best,
    CS

  • #2
    if I understand you correctly the following does what you want:
    Code:
    egen sumc=sum(1), by(var3)
    keep if sumc==5

    Comment


    • #3
      Thanks I lot! I get the logic of the code. Perhaps for my case, I will modify it as:

      Code:
      egen sumc=sum(1),by(var3)
      egen max=max(sumc)
      keep if sumc==max

      Comment

      Working...
      X