Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data 2: Dropping individuals when you only want 1 subject per family unit

    I'm using PSID tools. I cleaned all my data, ran some numbers, and was an almost happy camper until I discovered that (1) unbalanced and hacked my panel apart using -drop- (see my previous post "Panel Data 1"), and (2) I only wanted family-level variables for my family-level unit of analysis, but PSID tools effectively imports a panel of individuals and grafts the family-level data onto each person, meaning that I have a panel consisting of every individual in every family, with repetitive family-level results grafted onto each individual. Specifically, and by way of example, I have four identical family income observations for 4 person family A in year x, and one family income observation for 1 person family B in year x. I just want one observation for each family in each year.

    How do I solve this second problem? Here are the variables:

    x11101II = person identification number, unique to each person
    wave = year

    x11102 = 1999 interview number. This is a family-level ID number, but it changes every year. Within a family unit, person IDs will have the same 1999 interview number in a given year. But that interview number will go to a different family in a different year. I have 5 waves: 1999, 2001, 2003, 2005, 2007.

    xsqnr = sequence number. As far as I can tell, this is used for multifamily households to identify who was interviewed in what order.

    So for any given year, the family-level info is all the same for each family member, i.e. for family 9 I have four identical family income observations with different sequence numbers and different personal ID numbers, and four identical age of head of household observations, etc.

    As mentioned before, I just want one set of observations per family. Any help dropping extra family members is much appreciated.

    In my previous post, I failed at dataex despite reading help dataex, but I will nonetheless try again to use dataex to post my relevant variables here:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long x11101ll int(wave x11102) byte(xsqnr state famcompch)
    4003    1    . .  . .
    4003    3    . .  . .
    4003    5    . .  . .
    4003    7    . .  . .
    4003 1999    2 1 41 0
    4003 2001   96 1 41 1
    4003 2003 1392 1 41 0
    4003 2005  289 1 41 0
    4003 2007  148 1 41 0
    4004    1    . .  . .
    4004    3    . .  . .
    4004    5    . .  . .
    4004    7    . .  . .
    4004 1999 6129 1 41 3
    4004 2001 5987 1 41 0
    4004 2003 6278 1 41 0
    4004 2005 2356 1 41 0
    4004 2007 5399 1 41 0
    4006    1    . .  . .
    4006    3    . .  . .
    4006    5    . .  . .
    4006    7    . .  . .
    4006 1999 4920 2 15 0
    4006 2001 5599 2 15 0
    4006 2003 4812 1 15 3
    4006 2005 4097 1 15 0
    4006 2007  720 1 41 0
    4031    1    . .  . .
    4031    3    . .  . .
    4031    5    . .  . .
    4031    7    . .  . .
    4031 1999 1702 1 41 0
    4031 2001  285 2 41 4
    4031 2003 1427 2 41 0
    4031 2005 1157 2 41 0
    4031 2007  196 2 41 0
    4033    1    . .  . .
    4033    3    . .  . .
    4033    5    . .  . .
    4033    7    . .  . .
    4033 1999    2 4 41 0
    4033 2001 5479 1 41 5
    4033 2003 6061 1 41 2
    4033 2005  641 1 41 0
    4033 2007  189 1 41 0
    4039    1    . .  . .
    4039    3    . .  . .
    4039    5    . .  . .
    4039    7    . .  . .
    4039 1999    2 3 41 0
    4039 2001   96 3 41 1
    4039 2003 1392 3 41 0
    4039 2005  289 3 41 0
    4039 2007  148 3 41 0
    4041    1    . .  . .
    4041    3    . .  . .
    4041    5    . .  . .
    4041    7    . .  . .
    4041 1999 1702 2 41 0
    4041 2001  285 3 41 4
    4041 2003 1427 3 41 0
    4041 2005 1157 3 41 0
    4041 2007  196 3 41 0
    4042    1    . .  . .
    4042    3    . .  . .
    4042    5    . .  . .
    4042    7    . .  . .
    4042 1999 1702 3 41 0
    4042 2001  285 4 41 4
    4042 2003 1427 4 41 0
    4042 2005 1157 4 41 0
    4042 2007  196 4 41 0
    4173    1    . .  . .
    4173    3    . .  . .
    4173    5    . .  . .
    4173    7    . .  . .
    4173 1999    2 2 41 0
    4173 2001   96 2 41 1
    4173 2003 1392 2 41 0
    4173 2005  289 2 41 0
    4173 2007  148 2 41 0
    4180    1    . .  . .
    4180    3    . .  . .
    4180    5    . .  . .
    4180    7    . .  . .
    4180 1999 3818 3 41 4
    4180 2001 5964 3 41 1
    4180 2003 6443 2 41 6
    4180 2005  771 2 41 0
    4180 2007 1130 1 41 3
    5002    1    . .  . .
    5002    3    . .  . .
    5002    5    . .  . .
    5002    7    . .  . .
    5002 1999  376 1 41 0
    5002 2001  444 1 41 1
    5002 2003 3724 1 41 1
    5002 2005 1654 1 41 0
    5002 2007 1210 1 41 0
    5003    1    . .  . .
    end

  • #2
    Ha! Dataex worked! Here is a better example sorting by wave, 1999 family ID, person ID, sequence number:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(wave x11102) long x11101ll byte xsqnr
    1999  1 1654001 1
    1999  2    4003 1
    1999  2    4033 4
    1999  2    4039 3
    1999  2    4173 2
    1999  4  573004 2
    1999  4  573033 3
    1999  4  573035 4
    1999  4  573036 5
    1999  4  573171 1
    1999  5 1268001 1
    1999  5 1268002 2
    1999  6 2119002 1
    1999  6 2119008 2
    1999  6 2119182 3
    1999  6 2119183 4
    1999  6 2119184 5
    1999  6 2119185 6
    1999  6 2119187 7
    1999  6 2119188 8
    1999  7 6051004 2
    1999  8  673030 2
    1999  8  673033 3
    1999 10   61003 1
    1999 10   61035 3
    1999 10   61036 4
    1999 10   61172 2
    1999 11   61002 1
    1999 12  907001 1
    1999 13  419030 1
    1999 13  419036 2
    1999 13  419037 3
    1999 13  419038 4
    1999 14 2621001 2
    1999 15 2621003 1
    1999 16  534001 1
    1999 16  534002 2
    1999 17 1168003 1
    1999 17 1168034 2
    1999 17 1168035 3
    1999 17 1168176 5
    1999 18  575002 1
    1999 19 1758031 4
    1999 19 1758032 3
    1999 22  920001 1
    1999 23  931030 2
    1999 23  931032 3
    1999 23  931033 4
    1999 23  931172 1
    1999 24  931031 1
    1999 24  931173 2
    1999 26 2673002 1
    1999 31   16002 1
    1999 32   68002 1
    1999 33  586004 1
    1999 35   40001 1
    1999 35   40002 2
    1999 36   40030 2
    1999 36   40037 3
    1999 36   40038 4
    1999 36   40039 5
    1999 36   40173 1
    1999 37  452002 1
    1999 38  691006 1
    1999 38  691034 3
    1999 38  691035 4
    1999 38  691172 2
    1999 39 1221004 1
    1999 39 1221032 5
    1999 39 1221173 2
    1999 40 1221005 1
    1999 40 1221035 3
    1999 40 1221036 4
    1999 40 1221037 5
    1999 40 1221172 2
    1999 41  127003 2
    1999 41  127170 1
    1999 42  450021 1
    1999 42  450033 5
    1999 42  450172 2
    1999 43  691001 1
    1999 43  691002 2
    1999 44 1221001 1
    1999 44 1221002 2
    1999 45   57001 1
    1999 45   57002 2
    1999 46  441030 1
    1999 47 6210001 1
    1999 48 6210003 1
    1999 48 6210030 2
    1999 48 6210036 3
    1999 48 6210041 4
    1999 48 6210042 5
    1999 49  639002 1
    1999 49  639008 2
    1999 50 5065003 1
    1999 51 1994005 1
    1999 52 5066009 1
    1999 53 5052050 3
    1999 53 5052174 2
    end

    Comment


    • #3
      So, to clarify my thinking, for any given year, there are multiple entries for each family-level id, and I just want one of each family-level id per year. I want to get rid of duplicates, but don’t want to unbalance the panel by willy-nilly hacking away unique person ids, which are the only IDs consistent across years.

      Comment


      • #4
        Solved my own problem. I went back to a previous file before I did -psid long- and had the data in wide format. From there, I could drop everyone in the first year wave who had a sequence number other than 1 (1 corresponds to head of household). That left me with one individual per family, and I could track each family across waves regardless whether the individual's status as head changed from year to year.

        Comment


        • #5
          Max, l am also working on PSID data for my research and struggling with the issue with tracking the same families over the years (my sample period is 2001-2019). I want to clarify the following couple question:

          We can generate a unique ID number for each individual by using the equation [(ER30001*1000) + ER30002], but many of these unique individuals belong to the same family, so tracking these individuals over the years does not mean tracking the same families over the years, right?

          Also, many of these individuals are not the head of their household. So, if I keep only the heads (using relationship to head) I can track only the household heads over the years, but that still does not mean that I am able to track the same households over the years, right?

          Comment


          • #6
            For those who find this topic at a later date, the discussion at

            https://www.statalist.org/forums/for...a-household-id

            addresses similar questions. (I regret not having replied to this topic when it first went up in December 2020.)

            Comment

            Working...
            X