I have a huge dataset of individual performance over time. I want to do multiple things with this dataset but am not sure how to do them.
First, I would like to tell stata to ignore the individual if they don't have at least 4 scores. (Imagine person 1 as p1. So I have p1, p2, p3..... and then I have Test 1 as t1. so t1, t2, t3. Of course my variable names are not as simple as t1-t15) Not all people took all tests. So I only want to look at the people who took at least four tests. Does that make sense?
Once that is done, I want to somehow flag all cases where the person improved over time. Yes I could look at graphs but that would take forever when I have hundreds of thousands of people. Is there code for this? I haven't done any stats in a year or two so I'm rusty and most of my advanced stats are self taught.
The eventual goal would be to examine all the people who improved and see if they have things in common. But I'm just struggling to determine how to work with just those individuals' data.
Thanks in advance.
First, I would like to tell stata to ignore the individual if they don't have at least 4 scores. (Imagine person 1 as p1. So I have p1, p2, p3..... and then I have Test 1 as t1. so t1, t2, t3. Of course my variable names are not as simple as t1-t15) Not all people took all tests. So I only want to look at the people who took at least four tests. Does that make sense?
Once that is done, I want to somehow flag all cases where the person improved over time. Yes I could look at graphs but that would take forever when I have hundreds of thousands of people. Is there code for this? I haven't done any stats in a year or two so I'm rusty and most of my advanced stats are self taught.
The eventual goal would be to examine all the people who improved and see if they have things in common. But I'm just struggling to determine how to work with just those individuals' data.
Thanks in advance.
Comment