Dear Statalist user,
I have a problem regarding my survey panel data that should be fairly easy to solve but I am unable to accomplish it. I found similar threads but was unable to find a fitting solution.
(e.g.https://www.statalist.org/forums/for...-in-panel-data)
My data is structured as follows; I was unable to present a data example straight from my set since my school computer will not let me install datax:
PID_____year
101_____2000
101_____2001
101_____2002
101_____2003
201_____2001
201_____2002
I have multiple person ID's (PID) and I want to know how to isolate only the PIDs that have data from 2000 to 2003 consistently.
For example, here it would keep PID 101, but would drop PID 201 since it does not have all these year values.
So far, I have only achieved results such as a range, but then that includes all PIDs that have years 2000 to 2003 regardless of whether it is consistent for all years in between.
I have used keep commands so that all the years I wanted were there, as well as bysort PID year: drop if _N<=3, but these either just keep a range of values that don't have consistent yearly observations, or they delete all my observations.
Please if you know a solution to keep my PID's that only have observations for all the years from 2000 to 2003, I would greatly appreciate it.
Thanks,
Josh
I have a problem regarding my survey panel data that should be fairly easy to solve but I am unable to accomplish it. I found similar threads but was unable to find a fitting solution.
(e.g.https://www.statalist.org/forums/for...-in-panel-data)
My data is structured as follows; I was unable to present a data example straight from my set since my school computer will not let me install datax:
PID_____year
101_____2000
101_____2001
101_____2002
101_____2003
201_____2001
201_____2002
I have multiple person ID's (PID) and I want to know how to isolate only the PIDs that have data from 2000 to 2003 consistently.
For example, here it would keep PID 101, but would drop PID 201 since it does not have all these year values.
So far, I have only achieved results such as a range, but then that includes all PIDs that have years 2000 to 2003 regardless of whether it is consistent for all years in between.
I have used keep commands so that all the years I wanted were there, as well as bysort PID year: drop if _N<=3, but these either just keep a range of values that don't have consistent yearly observations, or they delete all my observations.
Please if you know a solution to keep my PID's that only have observations for all the years from 2000 to 2003, I would greatly appreciate it.
Thanks,
Josh
Comment