Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Efficiently draw subset of data

    I have a dataset which is too large to load completely in Stata and want to draw a subsample on a condition, basically people should have at least one spell == X in a specific year.

    Thats why the normal

    Code:
    use vars using data if spell==X
    doesnt work because I want all spells of the person with that condition, not just the spell for which the condition is true.

    The easy but inefficient way would be to load the person ids of people who satisfy the condition and then merge them onto to the entire dataset. But that would require loading the whole dataset. An alternative would be to write a loop and merge the person id's over slices of the original data. That could be feasible.

    Anyway, are there more clever ways that I am missing to deal with this?

  • #2
    The easy but inefficient way would be to load the person ids of people who satisfy the condition and then merge them onto to the entire dataset. But that would require loading the whole dataset.
    No, that doesn't require loading the whole data set.
    Code:
    use person_id if spell == X using dataset, clear
    duplicates drop
    merge 1:m using dataset, keep(match) nogenerate
    will get exactly what you are asking for and never load anything that isn't part of the end result.

    Comment


    • #3
      Oh okay, thats great, thanks!

      Comment

      Working...
      X