Individual data to household data

Alberto Nunez

Join Date: Jan 2016

Posts: 27
#1

Individual data to household data

30 Jan 2017, 15:41

Hi, I have a dataset with personal data. There is a household identifier so I believe it can be converted into household data. But I am a bit lost.
A quick look sorting by hhld_id shows me that household members may have different occupations.
I have generated a dummy variable for that occupation group that is of my interest (lawyers, occupation code 25)

tab occupation if occupation==25, gen(occupation_d)

However, if I keep only those occupation where occupation_d ==1, that will end in missing valuable data from household (for instance I would miss those respondents whose job is not lawyers).

I would be interested in keeping ALL household data if either the head of household or partner is a lawyer but the other member could be employed (in the same or different occupation) or inactive. This could mean that both household member could be working in the same occupation or not necessarily.

Variables I have are hhld_id, sex, age (of respondent), employment_status (==1 employed, ==2 unemployed, ==3 inactive) among others. Not sure if the question is clear and I explained properly.

Would be absolute pleased if someone has any idea on how to cope with this.

Thanks
Tags: None
Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#2

30 Jan 2017, 17:22

To find households with at least one lawyer use:

Code:

bys hhld_id: egen lawyers=total(occupation==25)

Then you can drop housholds with no lawyers.

Code:

drop if lawyers==0

Last edited by Oded Mcdossi; 30 Jan 2017, 17:24.
Comment
Alberto Nunez

Join Date: Jan 2016

Posts: 27
#3

30 Jan 2017, 18:23

Thanks Oded, that was great and worked.

May I ask another issue? By pooling different cross-sections it happens that hhld_id can be the same in different moments of time (say in 2001q1 and 2007q2 or 2013q5). However, people living at household may be different because the former occupants left. I have a time variable and a wave variable, saying in what moment people asked the survey. The wave variable takes values 1 to 5, where each belongs to a quarter and people should be followed a maximum of 5 consecutive quarters.
How could I group the household information to reflect the same family living at household. For example, how could I know that, despite having the same hhld_id (say hhld_id = 543) the respondents belong to the same family? (I think controlling by time and wave may help). This comes as I have been thinking how to figure out what the age of head_hhld and partner_hhld are. But as hhld_id can be the same in different moments, got cases where ages differ considerably (say as an example that age of head_hhld was 35 and partner_hhld age was 65, which is quite a large difference).

bys hhld_id time wave: egen lawyers=total(occupation==25) ???
Comment
Oded Mcdossi

Join Date: Jun 2014

Posts: 577
#4

30 Jan 2017, 23:45

Alberto I think I understand your question but you'll get better answers if you'll follow the FAQ and post some data. Help the forum members to help you and advise solutions based on the actual structure of your data.
Comment
Alberto Nunez

Join Date: Jan 2016

Posts: 27
#5

31 Jan 2017, 03:35

Thanks Oded. I will check it and will follow your advice then. The problem is that my database is huge but will try to only post a very small sample.
Comment

Announcement

Individual data to household data

Comment

Comment

Comment

Comment