Subsample with complicated condition

Mariano Rossi

Join Date: Aug 2022

Posts: 4
#1

Subsample with complicated condition

28 Aug 2022, 11:41

Hello,

I have a dataset with missing values for some of the variables. I would like to select a subsample of observations with no missing values, i.e a subsample of observations for which each row has all the values.
But here's the kicker, I don't want to modify my dataset in any way. Meaning that I don't want to use keep and/or drop commands.
The point is to create a summary statistics table with the same number of observations for all variables.

Here is what I have attempted:

sum varlist // identify var with smallest number of observation, let's call it X
sum X

cap drop min_sample
gen min_sample = e(sample)

sum varlist

But it doesn't work.
Would like to hear your take on this.
Thank you in advance.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#2

28 Aug 2022, 11:53

Your approach doesn't work because -sum- is not an estimation command, so it doesn't leave behind any -e(sample)-. In fact, it doesn't leave behind any -e()- at all. You can accomplish your goal with:

Code:

egen long mcount = rowmiss(_all) gen byte in_subsample = (mcount == 0)
Comment
Mariano Rossi

Join Date: Aug 2022

Posts: 4
#3

28 Aug 2022, 13:31

Ah! I understand, thank you for your answer.
Comment

Announcement