Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How would you create a unique ID for the people who live in the same household but at different times?

    The "dataset" is below in the table. Every year you see a different set of persons (either 2 or 3) living at the same physical address. You want to create a unique identifier (ID2 below) that says they are all related--maybe they are all biologically related children. However, right now you can only create ID1 (e.g., using egen group). How would you create ID2?

    Here is an example using the names in the table below. In 2000, Mary and Todd live together; let's say they have the same parents. We never see Mary live together with Rick, but we see Todd and Rick live together in 2010. Because we see Todd live with Mary in 2000, we want to conclude that Mary is also related to Rick, even though we never see Mary live with Rick. The same kind of reasoning applies for Carmen. We never see Carmen live with Rick, but we know Carmen lived with Todd, and Todd lived with Rick. Therefore, Carmen and Rick are also relatives. Ultimately we want to create ID2.

    year person 1 person 2 person 3 ID1 ID2
    2000 Mary Todd Carmen 1 1
    2005 Todd Carmen 2 1
    2010 Rick Todd 3 1



  • #2
    First, I'd remark that you'd have a better chance of getting a helpful answer if you posted a data example using -dataex-, as described in the StataList FAQ for new members. In your case, I'd say that a more extensive listing is needed, and also that it should have examples that show the variations that are relevant in your data.

    Second, I'm not clear on what you want; I'll assume that you want an ID that identifies people that are related, even though your subject line requests something different. I would say that you need to describe in detail what are the rules you want to use in classifying people as "related." Perhaps you just mean "Assign the same ID value to all people who have ever lived in the same household," but that won't identify all related persons, so I'm thinking you have something different in mind, something more along the lines of what might be detected in some kind of relationship chain in a social network ("A is linked to B, who is linked to C, and therefore A has a network link to C.") Whatever you have in mind, I'm not able to discern it. Perhaps someone else can, or you could try giving a more detailed and rigorous exposition of the rules that define "related" for you in this application..

    Comment


    • #3
      Presumably your data includes some sort of address ID so that you can identify the same physical address in different years. Why does that ID not suit your needs for ID2?

      Comment


      • #4
        Alternatively, perhaps the community-contributed group_twoway command will do what you need.
        Code:
        search group_twoway

        Comment

        Working...
        X