Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Preparing household data

    Dear Stata list

    My data has the following format:

    Code:
    clear
    input householdID    personID    personinHHID    personsfatherID
    1001    5001    1    .
    1001    5002    2    .
    1002    5003    1    .
    1002    5004    2    .
    1002    5005    3    1
    1003    5006    1    .
    1003    5007    2    .
    end
    
    list, sepby(householdID) abbrev(20)
    
         +---------------------------------------------------------+
         | householdID   personID   personinHHID   personsfatherID |
         |---------------------------------------------------------|
      1. |        1001       5001              1                 . |
      2. |        1001       5002              2                 . |
         |---------------------------------------------------------|
      3. |        1002       5003              1                 . |
      4. |        1002       5004              2                 . |
      5. |        1002       5005              3                 1 |
         |---------------------------------------------------------|
      6. |        1003       5006              1                 . |
      7. |        1003       5007              2                 . |
         +---------------------------------------------------------+
    I.e. the data set consists of households (householdID) with people in them (personID). People within a household are numbered consecutively (personinHHID), and a variable (personsfatherID) tells me that within-household ID of their father (if known). How can I create a data set that gives me a person's father's person ID, i.e. make the data set look like this:


    Code:
         +-----------------------------------+
         | householdID   personID   fatherID |
         |-----------------------------------|
      1. |        1001       5001          . |
      2. |        1001       5002          . |
         |-----------------------------------|
      3. |        1002       5003          . |
      4. |        1002       5004          . |
      5. |        1002       5005       5003 |
         |-----------------------------------|
      6. |        1003       5006          . |
      7. |        1003       5007          . |
         +-----------------------------------+
    Thanks for your consideration
    KS

  • #2
    This should do it:

    Code:
    preserve
        generate puid = householdID*100+personinHHID
        rename personID fatherID
        keep puid fatherID
        sort puid
        isid puid // keep this check!
        tempfile tmp
        save `"`tmp'"'
    restore
    
    generate puid = householdID*100+personsfatherID
    merge m:1 puid using `"`tmp'"', keep(match master)
    drop _merge puid
    keep householdID personID fatherID
    sort householdID personID
    with some reasonable assumptions about size of the household, non-negative IDs and other which should be re-checked in your context.

    Best, Sergiy Radyakin

    Code:
    . list , sepby(hou)
    
         +--------------------------------+
         | househ~D   personID   fatherID |
         |--------------------------------|
      1. |     1001       5001          . |
      2. |     1001       5002          . |
         |--------------------------------|
      3. |     1002       5003          . |
      4. |     1002       5004          . |
      5. |     1002       5005       5003 |
         |--------------------------------|
      6. |     1003       5006          . |
      7. |     1003       5007          . |
         +--------------------------------+

    Comment


    • #3
      Thank you very much, that was exactly what I was looking for. As a note to myself:

      Code:
      generate double puid = ...
      helps if the ID's are longer than in my example.

      Comment


      • #4
        Sergiy's solution is probably more robust than mine, but I was trying to see if there were an alternate solution. I added a couple more observations and a p_motherID.

        Code:
        dataex householdID personID personinHHID personsfatherID p_motherID  // data shared via  -dataex-. To install: ssc install dataex
        clear
        input float(householdID personID personinHHID personsfatherID p_motherID)
        1001 5001 1 . .
        1001 5002 2 . .
        1002 5003 1 . .
        1002 5004 2 . .
        1002 5005 3 1 2
        1003 5006 1 . .
        1003 5007 2 . .
        1003 5008 3 1 2
        1004 5009 1 . .
        1004 5010 2 . .
        1004 5011 3 2 1
        end
        
        
        . list householdID personID personinHHID personsfatherID p_motherID, noobs sepby( householdID) abbrev(14)
        
          +---------------------------------------------------------------------+
          | householdID   personID   personinHHID   personsfathe~D   p_motherID |
          |---------------------------------------------------------------------|
          |        1001       5001              1                .            . |
          |        1001       5002              2                .            . |
          |---------------------------------------------------------------------|
          |        1002       5003              1                .            . |
          |        1002       5004              2                .            . |
          |        1002       5005              3                1            2 |
          |---------------------------------------------------------------------|
          |        1003       5006              1                .            . |
          |        1003       5007              2                .            . |
          |        1003       5008              3                1            2 |
          |---------------------------------------------------------------------|
          |        1004       5009              1                .            . |
          |        1004       5010              2                .            . |
          |        1004       5011              3                2            1 |
          +---------------------------------------------------------------------+
        
        
        sort householdID personinHHID
        bysort householdID ( personinHHID): gen fath_id = personID[1] if personinHHID[1]==1 & personsfatherID==1 // this only works if 1st person is father
        bysort householdID ( personinHHID): gen fath_id2 = personID[personsfatherID] if personinHHID[personsfatherID]== personsfatherID
        bysort householdID ( personinHHID): gen fath_id3 = personID[personsfatherID]  // I wondered if I could simplify the above expression
        
        * Doing the same for mother_id
        sort householdID personinHHID
        bysort householdID (personinHHID): gen mother_id  = personID[ p_motherID] if personinHHID[ p_motherID ]== p_motherID
        bysort householdID (personinHHID): gen mother_id2 = personID[ p_motherID]
        
        . list, sepby( householdID) abbrev(18) noobs
        
          +----------------------------------------------------------------------------------------------------------------------+
          | householdID   personID   p_inHHID   p_fatherID   p_motherID   fath_id   fath_id2   fath_id3   mother_id   mother_id2 |
          |----------------------------------------------------------------------------------------------------------------------|
          |        1001       5001          1            .            .         .          .          .           .            . |
          |        1001       5002          2            .            .         .          .          .           .            . |
          |----------------------------------------------------------------------------------------------------------------------|
          |        1002       5003          1            .            .         .          .          .           .            . |
          |        1002       5004          2            .            .         .          .          .           .            . |
          |        1002       5005          3            1            2      5003       5003       5003        5004         5004 |
          |----------------------------------------------------------------------------------------------------------------------|
          |        1003       5006          1            .            .         .          .          .           .            . |
          |        1003       5007          2            .            .         .          .          .           .            . |
          |        1003       5008          3            1            2      5006       5006       5006        5007         5007 |
          |----------------------------------------------------------------------------------------------------------------------|
          |        1004       5009          1            .            .         .          .          .           .            . |
          |        1004       5010          2            .            .         .          .          .           .            . |
          |        1004       5011          3            2            1         .       5010       5010        5009         5009 |
          +----------------------------------------------------------------------------------------------------------------------+
        Although I think this solution will breakdown if there are missing numbers in personinHHID
        Last edited by David Benson; 31 Jan 2019, 12:26.

        Comment

        Working...
        X