Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deleting household member when panelist and moving other household members into the gap

    Hi,
    I’ve been working on a large panel study. We’ve reshaped a database for the entire population by address to get up to 12 household members for each address. I then used address to match the panel members to household members.
    The household members include some data about the panel member. One way around this is to use name and age to identify the panel member. Loops to identify the number of people of a particular race or the oldest person in the household can easily be written to exclude the panelist.




    dataex id forename surname house street year panel_14_fm fore_14_fm_1 sur_14_fm_1 race_14_fm_1 start_yr_14_fm_1 fore_14_fm_2 sur_14_fm_2 race_14_fm_2 star
    > t_yr_14_fm_2

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id str7(forename surname) byte house str11 street int year byte panel_14_fm str7 fore_14_fm_1 str9 sur_14_fm_1 byte race_14_fm_1 int start_yr_14_fm_1 str5 fore_14_fm_2 str10 sur_14_fm_2 byte race_14_fm_2 int start_yr_14_fm_2
    1 ""        "Who"      . "Tardis"      1776 2 "Tegan"   "Jokanovic" 1 1960 ""      "Who"        1 1776
    2 "Barbara" "Wright"  17 "Coroonation" 1940 1 "Barbara" "Wright"    1 1940 "Ian"   "Chesterton" 1 1940
    3 "Ben "    "Jackson" 32 "Every"       1935 1 "Ben"     "Jackson"   1 1935 "Polly" "Jackson"    . 1940
    4 "Vicky"   "Water"    . "London"      1850 2 "Jamie"   "McCrimmon" 1 1730 "Vicky" "Water"      1 1850
    end
    However, I’ve been told that the data will be cleaner if we take out the panel member from the household data!


    Code:
     forvalues j = 1/2 {
    replace fore_14_fm_`j'="" if panel_14_fm==`j'
    replace sur_14_fm_`j'="" if panel_14_fm==`j'
    replace race_14_fm_`j'=. if panel_14_fm==`j'
    replace start_yr_14_fm_`j'=. if panel_14_fm==`j' 
    *…
    }
    This leaves gaps for household members where the panelist was. I’ve been looking at ways of moving the remaining household members to the left to remove the gaps. Reshape might be an option, but by this stage we have data for five time periods. Another complication in moving the data across is that although every household member has a surname, title and gender, etc, are optional fields.


    dataex id forename surname house street year panel_14_fm fore_14_fm_1 sur_14_fm_1 race_14_fm_1 start_yr_14_fm_1 fore_14_fm_2 sur_14_fm_2 race_14_fm_2 star
    > t_yr_14_fm_2

    copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id str7(forename surname) byte house str11 street int year byte panel_14_fm str7 fore_14_fm_1 str9 sur_14_fm_1 byte race_14_fm_1 int start_yr_14_fm_1    str5    fore_14_fm_2    str10    sur_14_fm_2    byte    race_14_fm_2    int    start_yr_14_fm_2
    1 ""        "Who"      . "Tardis"      1776 2 "Tegan" "Jokanovic" 1 1960 ""      ""           .    .
    2 "Barbara" "Wright"  17 "Coroonation" 1940 1 ""      ""          .    . "Ian"   "Chesterton" 1 1940
    3 "Ben "    "Jackson" 32 "Every"       1935 1 ""      ""          .    . "Polly" "Jackson"    . 1940
    4 "Vicky"   "Water"    . "London"      1850 2 "Jamie" "McCrimmon" 1 1730 ""      ""           .    .
    end
    I've tried the below code, but it isn't working. I wondered if, short of reshaping the data, there are solutions for ensuring there are no gaps. Alternatively we could live with the gaps or simply write programs that work around panel members also being household members. There are up to 12 housemates (e.g. religious communities) so I'd like to try a loop or automated command.
    Code:
    forvalues j = 1/2 {
    replace fore_14_fm_`j'=fore_14_fm_`j+1' if sur_14_fm_`j'==""
    }
    Suggestions are welcome,
    ​​​​​​​Matthew Gibbons

  • #2
    I think I've come up with a solution.

    Code:
    gen hh_size_14_N=.
    replace hh_size_14_N=0 if id!=.
    forvalues j = 1/2 {
    replace hh_size_14_N=hh_size_14_N+1 if sur_14_fm_`j'!=""
    }
    tab hh_size_14_N, missing
    
    label var hh_size_14_N "Hh size 2014, incl panelist"
    
    list *
    
    *create variables for the final hh member
    gen fore_14_fm_f="" 
    gen sur_14_fm_f="" 
    gen race_14_fm_f=.  
    gen start_yr_14_fm_f=. 
     
    forvalues j = 1/2 {
    replace fore_14_fm_f=fore_14_fm_`j' if `j'==hh_size_14_N
    replace sur_14_fm_f=sur_14_fm_`j' if `j'==hh_size_14_N
    replace race_14_fm_f=race_14_fm_`j' if `j'==hh_size_14_N
    replace start_yr_14_fm_f=start_yr_14_fm_`j' if `j'==hh_size_14_N
    }
    
    list *
    
    *now replace values for panel member with those for the final household member
    forvalues j = 1/2 {
    replace fore_14_fm_`j'=fore_14_fm_f if panel_14_fm==`j'
    replace sur_14_fm_`j'=sur_14_fm_f if panel_14_fm==`j'
    replace race_14_fm_`j'=race_14_fm_f if panel_14_fm==`j'
    replace start_yr_14_fm_`j'=start_yr_14_fm_f if panel_14_fm==`j'
    }
    
    list *
    
    tab panel_14_fm
    tab hh_size_14_N
    
    *now want to take out the last value, which is either a duplicate or the panelist
    forvalues j = 1/2 {    
    replace fore_14_fm_`j'="" if hh_size_14_N==`j'
    replace sur_14_fm_`j'="" if hh_size_14_N==`j'
    replace race_14_fm_`j'=. if hh_size_14_N==`j'
    replace start_yr_14_fm_`j'=. if hh_size_14_N==`j'
    }
    
    list *
    *seems to be working
    
    drop fore_14_fm_f sur_14_fm_f race_14_fm_f start_yr_14_fm_f
    *now drop the working variables
    
    list *
    drop hh_size_14_N //as this is the old hh size including the panelist
    
    list *
    dataex id forename surname house street year panel_14_fm fore_14_fm_1 sur_14_fm_1 race_14_fm_1 start_yr_14_fm_1 fore_14_fm_2 sur_14_fm_2 race_14_fm_2 star
    > t_yr_14_fm_2

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id str7(forename surname) byte house str11 street int year byte panel_14_fm str7 fore_14_fm_1 str10 sur_14_fm_1 byte race_14_fm_1 int start_yr_14_fm_1 str5 fore_14_fm_2 str10 sur_14_fm_2 byte race_14_fm_2 int start_yr_14_fm_2
    1 ""        "Who"      . "Tardis"      1776 2 "Tegan" "Jokanovic"  1 1960 "" "" . .
    2 "Barbara" "Wright"  17 "Coroonation" 1940 1 "Ian"   "Chesterton" 1 1940 "" "" . .
    3 "Ben "    "Jackson" 32 "Every"       1935 1 "Polly" "Jackson"    . 1940 "" "" . .
    4 "Vicky"   "Water"    . "London"      1850 2 "Jamie" "McCrimmon"  1 1730 "" "" . .
    end
    This solution also seems to work for much larger datasets with more than two household members. I still wonder if it was necessary though.

    Comment

    Working...
    X