I would greatly appreciate your help as I am encountering some issues with panel datasets.
I have very huge panel datasets (15-year observation period, each year with around 10 million observations). I would like to draw a 10% random subsample out of the entire sample. However, if I try to merge all files together and then assign a random number by unit of analysis, I'm afraid stata cannot smoothly process such a large amount of observations.
Is there a proper way to randomly select part of the sample year by year first before merging? But then, I don't want to just follow one entry cohort, so I'd still like to include randomly another 10% of new entries in the next year in addition to the 10% random subsample of the previous year. But I am not sure it is do-able because you might include people who are not necessarily new entry to the dataset, but just people whom you did not randomly select in the previous year.
Do you know how I can solve this problem? Many, many thanks indeed!
I have very huge panel datasets (15-year observation period, each year with around 10 million observations). I would like to draw a 10% random subsample out of the entire sample. However, if I try to merge all files together and then assign a random number by unit of analysis, I'm afraid stata cannot smoothly process such a large amount of observations.
Is there a proper way to randomly select part of the sample year by year first before merging? But then, I don't want to just follow one entry cohort, so I'd still like to include randomly another 10% of new entries in the next year in addition to the 10% random subsample of the previous year. But I am not sure it is do-able because you might include people who are not necessarily new entry to the dataset, but just people whom you did not randomly select in the previous year.
Do you know how I can solve this problem? Many, many thanks indeed!
Comment