Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dealing with survey waves in a rotating panel - delete households

    Dear all,

    I am finally posting because I am struggling with a dataset I am working with. This is a rotative panel from the INDEC called the EPH (for some of you that may be familiar with it) and I have to delete from my sample all the households that I don't have the 4 interviews for.
    The structure of the data is pretty complex because 1 household enters the survey and is interviewed consequently for 2 quarters, then it is out for 2 quarters and it reenters for 2 last quarters (2-2-2). More about the dataset and my sample : it is composed of 6 years (2006-2011) and I have the detail for approximately 50,000 individuals by quarter. Each one of them has an unique identifier (id), plus the identifier of the household he is a member of (codusu).

    Using duplicates report and tag, I have been able to create variables that allow me to identify for each individual when the different interviews took place, meaning that I have 24 variables for each quarter that take the value 1 is the interview for this individual occured on that quarter.

    Click image for larger version

Name:	Capture d’écran 2023-07-14 à 21.34.02.png
Views:	1
Size:	66.6 KB
ID:	1720546

    Click image for larger version

Name:	Capture d’écran 2023-07-14 à 21.34.15.png
Views:	1
Size:	63.2 KB
ID:	1720547

    The thing that constrains me the most is that I am in long shape, so my individuals are repeated for every quarter that they appear. So my question really is : is there a way for me to generate a new variable that could gather all the information from the number of interviews for each individual (or each household), even tho the number of hh member is often different ?
    (by the way I tried collapse but it didn't work when I try to restore)

    I hope my explanations were clear. I am truly grateful for anyone reading and responding, and I am open to any kind of proposition if you think about anything better !!

    Thank you very much,
    Anouck.

  • #2
    Anouck:
    I find impossible to reply more positively without a doable example from your side provided by -dataex-.
    That said, I would consider -bysort- and the -count- function available from -egen-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo,

      Thank you for your answer and my apologies for the data I showed you, I am new to dataex. I believe you will be able to compute them now :

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long codusu double(id ano4) int y_t byte(dupl_q106 dupl_q206 dupl_q306 dupl_q406 dupl_q107 dupl_q207 dupl_q407)
      125001  12500111 2009 20091 0 0 0 0 0 0 0
      125001  12500112 2009 20091 0 0 0 0 0 0 0
      125001  12500112 2009 20092 0 0 0 0 0 0 0
      125001  12500111 2009 20092 0 0 0 0 0 0 0
      125001  12500112 2010 20101 0 0 0 0 0 0 0
      125001  12500111 2010 20101 0 0 0 0 0 0 0
      125001  12500113 2010 20102 0 0 0 0 0 0 0
      125001  12500112 2010 20102 0 0 0 0 0 0 0
      125001  12500111 2010 20102 0 0 0 0 0 0 0
      125002  12500211 2009 20091 0 0 0 0 0 0 0
      125002  12500212 2009 20091 0 0 0 0 0 0 0
      125002  12500211 2009 20092 0 0 0 0 0 0 0
      125002  12500212 2009 20092 0 0 0 0 0 0 0
      125002  12500212 2010 20101 0 0 0 0 0 0 0
      125002  12500211 2010 20101 0 0 0 0 0 0 0
      125002  12500212 2010 20102 0 0 0 0 0 0 0
      125002  12500211 2010 20102 0 0 0 0 0 0 0
      125005  12500513 2009 20093 0 0 0 0 0 0 0
      125005  12500512 2009 20093 0 0 0 0 0 0 0
      125005  12500511 2009 20093 0 0 0 0 0 0 0
      125005  12500511 2009 20094 0 0 0 0 0 0 0
      125005  12500512 2009 20094 0 0 0 0 0 0 0
      125005  12500513 2009 20094 0 0 0 0 0 0 0
      125005  12500512 2010 20103 0 0 0 0 0 0 0
      125005  12500513 2010 20103 0 0 0 0 0 0 0
      125005  12500511 2010 20103 0 0 0 0 0 0 0
      125005  12500511 2010 20104 0 0 0 0 0 0 0
      125005  12500513 2010 20104 0 0 0 0 0 0 0
      125005  12500512 2010 20104 0 0 0 0 0 0 0
      125008  12500818 2007 20071 0 0 0 0 1 0 0
      125008  12500814 2007 20071 0 0 0 0 1 0 0
      125008  12500811 2007 20071 0 0 0 0 1 0 0
      125008  12500817 2007 20071 0 0 0 0 1 0 0
      125008  12500816 2007 20071 0 0 0 0 1 0 0
      125008  12500815 2007 20071 0 0 0 0 1 0 0
      125008  12500812 2007 20071 0 0 0 0 1 0 0
      125008  12500813 2007 20071 0 0 0 0 1 0 0
      125008  12500812 2007 20072 0 0 0 0 0 1 0
      125008  12500811 2007 20072 0 0 0 0 0 1 0
      125008  12500814 2007 20072 0 0 0 0 0 1 0
      125008  12500815 2007 20072 0 0 0 0 0 1 0
      125008  12500817 2007 20072 0 0 0 0 0 1 0
      125008  12500816 2007 20072 0 0 0 0 0 1 0
      125008  12500813 2007 20072 0 0 0 0 0 1 0
      125008  12500818 2007 20072 0 0 0 0 0 1 0
      125008  12500818 2008 20081 0 0 0 0 0 0 0
      125008  12500815 2008 20081 0 0 0 0 0 0 0
      125008  12500816 2008 20081 0 0 0 0 0 0 0
      125008  12500812 2008 20081 0 0 0 0 0 0 0
      125008  12500811 2008 20081 0 0 0 0 0 0 0
      125008  12500817 2008 20081 0 0 0 0 0 0 0
      125008  12500814 2008 20081 0 0 0 0 0 0 0
      125008  12500813 2008 20081 0 0 0 0 0 0 0
      125009  12500914 2006 20061 1 0 0 0 0 0 0
      125009  12500913 2006 20061 1 0 0 0 0 0 0
      125009  12500912 2006 20061 1 0 0 0 0 0 0
      125009  12500911 2006 20061 1 0 0 0 0 0 0
      125009  12500912 2006 20062 0 1 0 0 0 0 0
      125009  12500914 2006 20062 0 1 0 0 0 0 0
      125009  12500913 2006 20062 0 1 0 0 0 0 0
      125009  12500911 2006 20062 0 1 0 0 0 0 0
      125009  12500911 2007 20071 0 0 0 0 1 0 0
      125009  12500914 2007 20071 0 0 0 0 1 0 0
      125009  12500913 2007 20071 0 0 0 0 1 0 0
      125009  12500912 2007 20071 0 0 0 0 1 0 0
      125009  12500913 2007 20072 0 0 0 0 0 1 0
      125009  12500914 2007 20072 0 0 0 0 0 1 0
      125009  12500915 2007 20072 0 0 0 0 0 1 0
      125009  12500912 2007 20072 0 0 0 0 0 1 0
      125012  12501218 2006 20061 1 0 0 0 0 0 0
      125012 125012111 2006 20061 1 0 0 0 0 0 0
      125012  12501214 2006 20061 1 0 0 0 0 0 0
      125012  12501213 2006 20061 1 0 0 0 0 0 0
      125012  12501215 2006 20061 1 0 0 0 0 0 0
      125012  12501216 2006 20061 1 0 0 0 0 0 0
      125012  12501219 2006 20061 1 0 0 0 0 0 0
      125012  12501211 2006 20061 1 0 0 0 0 0 0
      125012  12501212 2006 20061 1 0 0 0 0 0 0
      125012  12501217 2006 20061 1 0 0 0 0 0 0
      125012 125012110 2006 20061 1 0 0 0 0 0 0
      125012 125012112 2006 20061 1 0 0 0 0 0 0
      125012 125012111 2006 20062 0 1 0 0 0 0 0
      125012  12501212 2006 20062 0 1 0 0 0 0 0
      125012  12501211 2006 20062 0 1 0 0 0 0 0
      125012  12501215 2006 20062 0 1 0 0 0 0 0
      125012  12501213 2006 20062 0 1 0 0 0 0 0
      125012  12501214 2006 20062 0 1 0 0 0 0 0
      125012  12501218 2006 20062 0 1 0 0 0 0 0
      125012 125012110 2006 20062 0 1 0 0 0 0 0
      125012  12501216 2006 20062 0 1 0 0 0 0 0
      125012 125012112 2006 20062 0 1 0 0 0 0 0
      125012  12501217 2006 20062 0 1 0 0 0 0 0
      125012  12501219 2006 20062 0 1 0 0 0 0 0
      125014  12501411 2006 20062 0 1 0 0 0 0 0
      125014  12501412 2006 20062 0 1 0 0 0 0 0
      125014  12501412 2006 20063 0 0 1 0 0 0 0
      125014  12501411 2006 20063 0 0 1 0 0 0 0
      125017  12501711 2006 20061 1 0 0 0 0 0 0
      125017  12501711 2006 20062 0 1 0 0 0 0 0
      125018  12501812 2007 20072 0 0 0 0 0 1 0
      end
      To complement my previous explanations, I am looking to identify the households whose first(s) interview(s) occurred before the start of my sample, i.e. before Q1 2006, so I can remove them. I tried using bysort but it didn't work so I might be using it incorrectly :

      bysort codusu (ano4) : egen duplicate = count(dupl_q106) if dupl_q106==1

      Best regards,
      Anouck.

      Comment


      • #4
        Create a proper quarter variable that you can refer to.

        Code:
        gen quarter= yq(ano4, real(substr(string(y_t), -1, 1))), after(ano4)
        format quarter %tq
        bys codusu: egen todrop= max(quarter<tq(2006q1))
        drop if todrop
        See https://www.stata.com/support/faqs/d...ble-recording/ for a review of the technique.

        Comment


        • #5
          Thank you very much Andrew, your code worked and the variable allowed me to identify the households I needed to removed !

          Comment

          Working...
          X