Dealing with survey waves in a rotating panel - delete households

Anouck Daubree

Join Date: Jul 2023

Posts: 5
#1

Dealing with survey waves in a rotating panel - delete households

14 Jul 2023, 13:43

Dear all,

I am finally posting because I am struggling with a dataset I am working with. This is a rotative panel from the INDEC called the EPH (for some of you that may be familiar with it) and I have to delete from my sample all the households that I don't have the 4 interviews for.
The structure of the data is pretty complex because 1 household enters the survey and is interviewed consequently for 2 quarters, then it is out for 2 quarters and it reenters for 2 last quarters (2-2-2). More about the dataset and my sample : it is composed of 6 years (2006-2011) and I have the detail for approximately 50,000 individuals by quarter. Each one of them has an unique identifier (id), plus the identifier of the household he is a member of (codusu).

Using duplicates report and tag, I have been able to create variables that allow me to identify for each individual when the different interviews took place, meaning that I have 24 variables for each quarter that take the value 1 is the interview for this individual occured on that quarter.

The thing that constrains me the most is that I am in long shape, so my individuals are repeated for every quarter that they appear. So my question really is : is there a way for me to generate a new variable that could gather all the information from the number of interviews for each individual (or each household), even tho the number of hh member is often different ?
(by the way I tried collapse but it didn't work when I try to restore)

I hope my explanations were clear. I am truly grateful for anyone reading and responding, and I am open to any kind of proposition if you think about anything better !!

Thank you very much,
Anouck.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17731
#2

16 Jul 2023, 04:32

Anouck:
I find impossible to reply more positively without a doable example from your side provided by -dataex-.
That said, I would consider -bysort- and the -count- function available from -egen-.

Kind regards,
Carlo
(Stata 19.0)
Comment

Anouck Daubree

Join Date: Jul 2023
Posts: 5

16 Jul 2023, 10:42

Carlo,

Thank you for your answer and my apologies for the data I showed you, I am new to dataex. I believe you will be able to compute them now :

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long codusu double(id ano4) int y_t byte(dupl_q106 dupl_q206 dupl_q306 dupl_q406 dupl_q107 dupl_q207 dupl_q407)
125001  12500111 2009 20091 0 0 0 0 0 0 0
125001  12500112 2009 20091 0 0 0 0 0 0 0
125001  12500112 2009 20092 0 0 0 0 0 0 0
125001  12500111 2009 20092 0 0 0 0 0 0 0
125001  12500112 2010 20101 0 0 0 0 0 0 0
125001  12500111 2010 20101 0 0 0 0 0 0 0
125001  12500113 2010 20102 0 0 0 0 0 0 0
125001  12500112 2010 20102 0 0 0 0 0 0 0
125001  12500111 2010 20102 0 0 0 0 0 0 0
125002  12500211 2009 20091 0 0 0 0 0 0 0
125002  12500212 2009 20091 0 0 0 0 0 0 0
125002  12500211 2009 20092 0 0 0 0 0 0 0
125002  12500212 2009 20092 0 0 0 0 0 0 0
125002  12500212 2010 20101 0 0 0 0 0 0 0
125002  12500211 2010 20101 0 0 0 0 0 0 0
125002  12500212 2010 20102 0 0 0 0 0 0 0
125002  12500211 2010 20102 0 0 0 0 0 0 0
125005  12500513 2009 20093 0 0 0 0 0 0 0
125005  12500512 2009 20093 0 0 0 0 0 0 0
125005  12500511 2009 20093 0 0 0 0 0 0 0
125005  12500511 2009 20094 0 0 0 0 0 0 0
125005  12500512 2009 20094 0 0 0 0 0 0 0
125005  12500513 2009 20094 0 0 0 0 0 0 0
125005  12500512 2010 20103 0 0 0 0 0 0 0
125005  12500513 2010 20103 0 0 0 0 0 0 0
125005  12500511 2010 20103 0 0 0 0 0 0 0
125005  12500511 2010 20104 0 0 0 0 0 0 0
125005  12500513 2010 20104 0 0 0 0 0 0 0
125005  12500512 2010 20104 0 0 0 0 0 0 0
125008  12500818 2007 20071 0 0 0 0 1 0 0
125008  12500814 2007 20071 0 0 0 0 1 0 0
125008  12500811 2007 20071 0 0 0 0 1 0 0
125008  12500817 2007 20071 0 0 0 0 1 0 0
125008  12500816 2007 20071 0 0 0 0 1 0 0
125008  12500815 2007 20071 0 0 0 0 1 0 0
125008  12500812 2007 20071 0 0 0 0 1 0 0
125008  12500813 2007 20071 0 0 0 0 1 0 0
125008  12500812 2007 20072 0 0 0 0 0 1 0
125008  12500811 2007 20072 0 0 0 0 0 1 0
125008  12500814 2007 20072 0 0 0 0 0 1 0
125008  12500815 2007 20072 0 0 0 0 0 1 0
125008  12500817 2007 20072 0 0 0 0 0 1 0
125008  12500816 2007 20072 0 0 0 0 0 1 0
125008  12500813 2007 20072 0 0 0 0 0 1 0
125008  12500818 2007 20072 0 0 0 0 0 1 0
125008  12500818 2008 20081 0 0 0 0 0 0 0
125008  12500815 2008 20081 0 0 0 0 0 0 0
125008  12500816 2008 20081 0 0 0 0 0 0 0
125008  12500812 2008 20081 0 0 0 0 0 0 0
125008  12500811 2008 20081 0 0 0 0 0 0 0
125008  12500817 2008 20081 0 0 0 0 0 0 0
125008  12500814 2008 20081 0 0 0 0 0 0 0
125008  12500813 2008 20081 0 0 0 0 0 0 0
125009  12500914 2006 20061 1 0 0 0 0 0 0
125009  12500913 2006 20061 1 0 0 0 0 0 0
125009  12500912 2006 20061 1 0 0 0 0 0 0
125009  12500911 2006 20061 1 0 0 0 0 0 0
125009  12500912 2006 20062 0 1 0 0 0 0 0
125009  12500914 2006 20062 0 1 0 0 0 0 0
125009  12500913 2006 20062 0 1 0 0 0 0 0
125009  12500911 2006 20062 0 1 0 0 0 0 0
125009  12500911 2007 20071 0 0 0 0 1 0 0
125009  12500914 2007 20071 0 0 0 0 1 0 0
125009  12500913 2007 20071 0 0 0 0 1 0 0
125009  12500912 2007 20071 0 0 0 0 1 0 0
125009  12500913 2007 20072 0 0 0 0 0 1 0
125009  12500914 2007 20072 0 0 0 0 0 1 0
125009  12500915 2007 20072 0 0 0 0 0 1 0
125009  12500912 2007 20072 0 0 0 0 0 1 0
125012  12501218 2006 20061 1 0 0 0 0 0 0
125012 125012111 2006 20061 1 0 0 0 0 0 0
125012  12501214 2006 20061 1 0 0 0 0 0 0
125012  12501213 2006 20061 1 0 0 0 0 0 0
125012  12501215 2006 20061 1 0 0 0 0 0 0
125012  12501216 2006 20061 1 0 0 0 0 0 0
125012  12501219 2006 20061 1 0 0 0 0 0 0
125012  12501211 2006 20061 1 0 0 0 0 0 0
125012  12501212 2006 20061 1 0 0 0 0 0 0
125012  12501217 2006 20061 1 0 0 0 0 0 0
125012 125012110 2006 20061 1 0 0 0 0 0 0
125012 125012112 2006 20061 1 0 0 0 0 0 0
125012 125012111 2006 20062 0 1 0 0 0 0 0
125012  12501212 2006 20062 0 1 0 0 0 0 0
125012  12501211 2006 20062 0 1 0 0 0 0 0
125012  12501215 2006 20062 0 1 0 0 0 0 0
125012  12501213 2006 20062 0 1 0 0 0 0 0
125012  12501214 2006 20062 0 1 0 0 0 0 0
125012  12501218 2006 20062 0 1 0 0 0 0 0
125012 125012110 2006 20062 0 1 0 0 0 0 0
125012  12501216 2006 20062 0 1 0 0 0 0 0
125012 125012112 2006 20062 0 1 0 0 0 0 0
125012  12501217 2006 20062 0 1 0 0 0 0 0
125012  12501219 2006 20062 0 1 0 0 0 0 0
125014  12501411 2006 20062 0 1 0 0 0 0 0
125014  12501412 2006 20062 0 1 0 0 0 0 0
125014  12501412 2006 20063 0 0 1 0 0 0 0
125014  12501411 2006 20063 0 0 1 0 0 0 0
125017  12501711 2006 20061 1 0 0 0 0 0 0
125017  12501711 2006 20062 0 1 0 0 0 0 0
125018  12501812 2007 20072 0 0 0 0 0 1 0
end

To complement my previous explanations, I am looking to identify the households whose first(s) interview(s) occurred before the start of my sample, i.e. before Q1 2006, so I can remove them. I tried using bysort but it didn't work so I might be using it incorrectly :

bysort codusu (ano4) : egen duplicate = count(dupl_q106) if dupl_q106==1

Best regards,
Anouck.

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10281
#4

16 Jul 2023, 11:12

Create a proper quarter variable that you can refer to.

Code:

gen quarter= yq(ano4, real(substr(string(y_t), -1, 1))), after(ano4) format quarter %tq bys codusu: egen todrop= max(quarter<tq(2006q1)) drop if todrop

See https://www.stata.com/support/faqs/d...ble-recording/ for a review of the technique.
Comment
Anouck Daubree

Join Date: Jul 2023

Posts: 5
#5

17 Jul 2023, 11:43

Thank you very much Andrew, your code worked and the variable allowed me to identify the households I needed to removed !
Comment

Announcement