Selecting the subset of data that contains all observations

Chinmay Sharma

Join Date: Nov 2015

Posts: 351
#1

Selecting the subset of data that contains all observations

06 Jul 2018, 15:06

Hi All,

I have data that resemble the following:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(Country GDP var3) 1 2 1990 2 32 1990 3 12 1990 4 21 1990 1 123 1991 2 12 1991 3 32 1991 4 321 1991 5 321 1991 1 321 1992 2 32 1992 3 1 1992 4 32 1992 5 12 1992 end

In the above, we have information on the GDP of countries, for different years. In the above dataset, I wish to only select the subset of this dataset, where all countries are present. In this case, the resulting dataset should look like:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(Country GDP var3) 1 123 1991 2 12 1991 3 32 1991 4 321 1991 5 321 1991 1 321 1992 2 32 1992 3 1 1992 4 32 1992 5 12 1992 end

As can be seen, year 1990 has been dropped as it did not contain country 5. Now, I can imagine that it is tricky to define all countries. For that purpose, I would think that all would be considered the maximum number of countries possible in the dataset, and then select only the subset of data consisting of only those years wherein all of these countries are present.

Any help is much appreciated.

Best,
CS
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4459
#2

06 Jul 2018, 15:22

if I understand you correctly the following does what you want:

Code:

egen sumc=sum(1), by(var3) keep if sumc==5
Comment
Chinmay Sharma

Join Date: Nov 2015

Posts: 351
#3

06 Jul 2018, 15:33

Thanks I lot! I get the logic of the code. Perhaps for my case, I will modify it as:

Code:

egen sumc=sum(1),by(var3) egen max=max(sumc) keep if sumc==max
Comment

Announcement

Selecting the subset of data that contains all observations

Comment

Comment