Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Within a group, create an indicator variable for pairs of respondents

    I have this data:
    Code:
    clear
    input int year str2 country str10 pid str8 hhid str10 rb240
    2010 "AT" "265850002" "2658500" "265850001"
    2010 "AT" "265850001" "2658500" "265850002"
    2010 "AT" "265870001" "2658700" "."        
    2010 "AT" "265900003" "2659000" "."        
    2010 "AT" "265900004" "2659000" "."        
    2010 "AT" "265900002" "2659000" "265900001"
    2010 "AT" "265900001" "2659000" "265900002"
    2010 "AT" "265910002" "2659100" "265910001"
    2010 "AT" "265910001" "2659100" "265910002"
    2010 "AT" "265940003" "2659400" "."        
    2010 "AT" "265940005" "2659400" "."        
    2010 "AT" "265940004" "2659400" "."        
    2010 "AT" "265940002" "2659400" "265940001"
    2010 "AT" "265940001" "2659400" "265940002"
    2010 "AT" "265950001" "2659500" "."        
    2010 "AT" "265970002" "2659700" "265970001"
    2010 "AT" "265970001" "2659700" "265970002"
    2010 "AT" "265980002" "2659800" "265980001"
    2010 "AT" "265980001" "2659800" "265980002"
    2010 "AT" "266050003" "2660500" "."        
    2012 "IT" "265340001" "2653400" "265340002"
    2012 "IT" "265340002" "2653400" "265340001"
    2012 "IT" "265340003" "2653400" "265340004"
    2012 "IT" "265980004" "2653400" "265340003"
    2012 "IT" "266050005" "2653400" "."        
    end
    The columns are year, country, pid(personal ID), hhid(household ID) and rb240(partner ID). I'm interested in creating an indicator that identifies partners within a household. For example, in row 1 you can see that in household 2658500 (column hhid), person 265850002 (pid column) is in a relationship with 265850001 (column rb240). Equivalently, in row 2 person 265850001(pid column) is in a relationship with person 265850002 (rb240). Within that household, then, the couple should have a 1 for both. In row 21 there's a household that has two couples in the same residence. The indicator should have a 1 for one of the couples and 2 for the other couple.

    The final results should be like this (with the code being able to identify N couples within a household and not just only one or two.)

    Code:
    clear
    input int year str2 country str10 pid str8 hhid str10 rb240 str2 n
    2010 "AT" "265850002" "2658500" "265850001" "1"
    2010 "AT" "265850001" "2658500" "265850002" "1"
    2010 "AT" "265870001" "2658700" "."         "."
    2010 "AT" "265900003" "2659000" "."         "."
    2010 "AT" "265900004" "2659000" "."         "."
    2010 "AT" "265900002" "2659000" "265900001" "1"
    2010 "AT" "265900001" "2659000" "265900002" "1"
    2010 "AT" "265910002" "2659100" "265910001" "1"
    2010 "AT" "265910001" "2659100" "265910002" "1"
    2010 "AT" "265940003" "2659400" "."         "."
    2010 "AT" "265940005" "2659400" "."         "."
    2010 "AT" "265940004" "2659400" "."         "."
    2010 "AT" "265940002" "2659400" "265940001" "1"
    2010 "AT" "265940001" "2659400" "265940002" "1"
    2010 "AT" "265950001" "2659500" "."         "."
    2010 "AT" "265970002" "2659700" "265970001" "1"
    2010 "AT" "265970001" "2659700" "265970002" "1"
    2010 "AT" "265980002" "2659800" "265980001" "1"
    2010 "AT" "265980001" "2659800" "265980002" "1"
    2010 "AT" "266050003" "2660500" "."         "."
    2012 "IT" "265340001" "2653400" "265340002" "1"
    2012 "IT" "265340002" "2653400" "265340001" "1"
    2012 "IT" "265340003" "2653400" "265340004" "2"
    2012 "IT" "265980004" "2653400" "265340003" "2"
    2012 "IT" "266050005" "2653400" "."         "."
    end
    Any help is appreciated!

  • #2
    Often discussed here. Searching the forum for mentions of dm0043 will find about 20 hits. Naturally, you couldn't know that incantation in advance, but the question has arisen twice in the last month alone. "split identity" is a tag for this problem.

    (Incidentally, I'd wonder about giving the same identifier to different couples who just happen to be in the same household. Couple #1 in household A and couple #1 in B won't be guaranteed any similarity otherwise. You're committing yourself to more complicated code thereafter if you do that.)
    Last edited by Nick Cox; 07 Nov 2016, 09:26.

    Comment


    • #3
      I'm not sure about what your trying to do: I'll try to explain myself.

      Why in last household in your example (hhid==2653400), would pid 001 be matched with pid 002, while pid 003 matched with 004 and pid 5 being lonely.
      Have you any information on the construction of your dataset that indicates that 001 partner is 002 rather than 003, 004 or 005?

      Why then in the third household (hhid==2659000) 003 and 004 are not matched together, while hey are in the last example?

      It is hard to set up a general rule when we don't understand the logic behind it.


      Also, although you really need to keep the hhid and pid this long - or if you have so many-, consider recoding the ids, they are just too long not to get confused.

      In addition, since the last digits in person id seems to mater for the match (the 001 with the 002, and so on), I would advice you create a new id variable (within the household) which only keeps these (two, I assume 99 household members is sufficient) last digits.

      Since your IDs seem to be string variable, you could try:
      Code:
      gen pid_hh=substr(pid,-2,.)
      Best,
      Charlie

      Comment


      • #4
        Charlie:
        Variable pid identifies the unique identifier of each person within the household and rb240 has the pid of the person with whom that row is partnered with. If it's empty, it means that person is not partnered with anyone in the household. So for example, in hhid==2653400 person 001 is partnered with person 002 and person 003 is partnered with 004 (two couples in the household). Person 005 is not partnered with anyone.

        In hhid == 2659000 person 003 and 004 are not partners, but person 001 and 002 are. Person 003 and 004 are children (something irrelevant to the solution but something I can tell from other variables in my dataset).

        Try this data that has the pid in a shorter format:
        Code:
        clear
        input int year str2(country pid_hh) str8 hhid str10 rb240
        2010 "AT" "02" "2658500" "265850001"
        2010 "AT" "01" "2658500" "265850002"
        2010 "AT" "01" "2658700" "."        
        2010 "AT" "03" "2659000" "."        
        2010 "AT" "04" "2659000" "."        
        2010 "AT" "02" "2659000" "265900001"
        2010 "AT" "01" "2659000" "265900002"
        2010 "AT" "02" "2659100" "265910001"
        2010 "AT" "01" "2659100" "265910002"
        2010 "AT" "03" "2659400" "."        
        2010 "AT" "05" "2659400" "."        
        2010 "AT" "04" "2659400" "."        
        2010 "AT" "02" "2659400" "265940001"
        2010 "AT" "01" "2659400" "265940002"
        2010 "AT" "01" "2659500" "."        
        2010 "AT" "02" "2659700" "265970001"
        2010 "AT" "01" "2659700" "265970002"
        2010 "AT" "02" "2659800" "265980001"
        2010 "AT" "01" "2659800" "265980002"
        2010 "AT" "03" "2660500" "."        
        2012 "IT" "01" "2653400" "265340002"
        2012 "IT" "02" "2653400" "265340001"
        2012 "IT" "03" "2653400" "265340004"
        2012 "IT" "04" "2653400" "265340003"
        2012 "IT" "05" "2653400" "."        
        end
        Nick:
        Thanks, I couldn't find anything in my previous search but I'll search the tags. As for the more complicated code, you're right, but I'm only gonna use this in one more step.

        Comment

        Working...
        X