Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subsample with complicated condition

    Hello,

    I have a dataset with missing values for some of the variables. I would like to select a subsample of observations with no missing values, i.e a subsample of observations for which each row has all the values.
    But here's the kicker, I don't want to modify my dataset in any way. Meaning that I don't want to use keep and/or drop commands.
    The point is to create a summary statistics table with the same number of observations for all variables.

    Here is what I have attempted:

    sum varlist // identify var with smallest number of observation, let's call it X
    sum X

    cap drop min_sample
    gen min_sample = e(sample)

    sum varlist

    But it doesn't work.
    Would like to hear your take on this.
    Thank you in advance.

  • #2
    Your approach doesn't work because -sum- is not an estimation command, so it doesn't leave behind any -e(sample)-. In fact, it doesn't leave behind any -e()- at all. You can accomplish your goal with:
    Code:
    egen long mcount = rowmiss(_all)
    gen byte in_subsample = (mcount == 0)

    Comment


    • #3
      Ah! I understand, thank you for your answer.

      Comment

      Working...
      X