Dear all,
I am finally posting because I am struggling with a dataset I am working with. This is a rotative panel from the INDEC called the EPH (for some of you that may be familiar with it) and I have to delete from my sample all the households that I don't have the 4 interviews for.
The structure of the data is pretty complex because 1 household enters the survey and is interviewed consequently for 2 quarters, then it is out for 2 quarters and it reenters for 2 last quarters (2-2-2). More about the dataset and my sample : it is composed of 6 years (2006-2011) and I have the detail for approximately 50,000 individuals by quarter. Each one of them has an unique identifier (id), plus the identifier of the household he is a member of (codusu).
Using duplicates report and tag, I have been able to create variables that allow me to identify for each individual when the different interviews took place, meaning that I have 24 variables for each quarter that take the value 1 is the interview for this individual occured on that quarter.


The thing that constrains me the most is that I am in long shape, so my individuals are repeated for every quarter that they appear. So my question really is : is there a way for me to generate a new variable that could gather all the information from the number of interviews for each individual (or each household), even tho the number of hh member is often different ?
(by the way I tried collapse but it didn't work when I try to restore)
I hope my explanations were clear. I am truly grateful for anyone reading and responding, and I am open to any kind of proposition if you think about anything better !!
Thank you very much,
Anouck.
I am finally posting because I am struggling with a dataset I am working with. This is a rotative panel from the INDEC called the EPH (for some of you that may be familiar with it) and I have to delete from my sample all the households that I don't have the 4 interviews for.
The structure of the data is pretty complex because 1 household enters the survey and is interviewed consequently for 2 quarters, then it is out for 2 quarters and it reenters for 2 last quarters (2-2-2). More about the dataset and my sample : it is composed of 6 years (2006-2011) and I have the detail for approximately 50,000 individuals by quarter. Each one of them has an unique identifier (id), plus the identifier of the household he is a member of (codusu).
Using duplicates report and tag, I have been able to create variables that allow me to identify for each individual when the different interviews took place, meaning that I have 24 variables for each quarter that take the value 1 is the interview for this individual occured on that quarter.
The thing that constrains me the most is that I am in long shape, so my individuals are repeated for every quarter that they appear. So my question really is : is there a way for me to generate a new variable that could gather all the information from the number of interviews for each individual (or each household), even tho the number of hh member is often different ?
(by the way I tried collapse but it didn't work when I try to restore)
I hope my explanations were clear. I am truly grateful for anyone reading and responding, and I am open to any kind of proposition if you think about anything better !!
Thank you very much,
Anouck.
Comment