Create dummy variables for household groups based on characteristics of individuals in the household

Sumayyah Tasnim

Join Date: May 2017

Posts: 13
#1

Create dummy variables for household groups based on characteristics of individuals in the household

28 May 2017, 06:45

Hi,

I need to create various dummy variables that would put households into groups based on characteristics of individuals in the household.

One of the groups is a household which has at least one eligible son and one eligible daughter (I have already created dummy variables for individuals who are eligible where it equals one if the individual is eligible). I've tried using the egen function with the max option however have had no luck (for instance, I've tried by hhid: egen hld_typ4 = max(eligible_son & eligible_daughter) ) Have I simply got the syntax wrong? or is there any way?

I've used the following FAQ as guidance (http://www.stata.com/support/faqs/da...ble-recording/), however it's not clear here how I would go about using the max option on two variables.

Any help would be much appreciated!

Thanks
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

28 May 2017, 07:00

Welcome to the Stata Forum / Statalist.

Unfortunately, you didn't share data, hence it is more difficult to provide a helpful reply.

That said, if I understood your query, you could use egen with 'group' instead of 'max'.

This way, you'd get the combinations you mentioned in the first line.

You cannot use By with group, but I gather it is not necessary in your case.

Hopefully that helps.

Best regards,

Marcos
Comment

Sumayyah Tasnim

Join Date: May 2017
Posts: 13

28 May 2017, 07:37

I am unable to attach the data but hopefully the following table will give you an idea:

HHID: household ID
Line number: Individual entry within household
Age: in years
Sex: male or female
eligible daughter: =1 if daughter and eligible (i.e. 11 years old)
eligible son:=1 if son and eligible (i.e. 11 years old)
HasEligible_daughter: all individuals in household assigned value 1 if at least one individual in the household is an eligible daughter

HHID	Line number	Age	Sex	Eligible daughter	Eligible son	HasEligible_daughter
1	1	40	M	0	0	1
1	2	37	F	0	0	1
1	3	11	F	1	0	1
1	4	10	F	0	0	1
2	1	45	M	0	0	1
2	2	38	F	0	0	1
2	3	11	M	0	1	1
2	4	11	F	1	0	1
3	1	55	M	0	0	1
3	2	50	F	0	0	1
3	3	11	F	1	0	1
3	4	13	F	1	0	1

I am trying to create dummy variables for the following groups of households:

1. Households which have at least one eligible daughter: I've created a variable called HasEligible_daughter which equals one when at least one daughter in the household is eligible
2. Households which have at least one eligible daughter AND one eligible age son. I have dummy variables for each (equalling one when eligible)
3. Households which have at least one eligible daughter AND at least one 10year old daughter
4. Households which have at least one eligible daughter AND at least one daughter between the ages of 12-14

I think to do this I will need to use By? I used the following script to create the HasEligible dummy by hhid: egen HasEligible=max(eligible_daughter)

Thanks,

Sumayyah

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

28 May 2017, 08:34

It seems to me that the next step is to create HasEligible_son in the same way that you created HasEligible_daughter, then

Code:

generate hid_typ4 = HasEligible_son & HasEligible_daughter

or equivalently (since both variables are 0/1)

Code:

generate hid_typ4 = HasEligible_son==1 & HasEligible_daughter==1

or again equivalently

Code:

generate hid_typ4 = min(HasEligible_son,HasEligible_daughter)

The same approach should deal with your other problems.
Comment

Sumayyah Tasnim

Join Date: May 2017
Posts: 13

04 Jun 2017, 08:28

Thank you! I've created the various household types using the script above.

By data now looks something like this: (included a column for years of schooling and three different household types)

HHID	Line no.	Age	Sex	Years of schooling	Eligible daughter	Eligible son	HasEligible_daughter	Hld_typ1	Hld_typ2	Hld_typ3
1	1	40	M	7	0	0	1	1
1	2	37	F	7	0	0	1	1
1	3	11	F	5	1	0	1	1
1	4	10	F	4	0	0	1	1
2	1	45	M	6	0	0	1		1
2	2	38	F	4	0	0	1		1
2	3	11	M	6	0	1	1		1
2	4	11	F	5	1	0	1		1
3	1	55	M	8	0	0	1			1
3	2	50	F	4	0	0	1			1
3	3	11	F	4	1	0	1			1
3	4	13	F	4	1	0	1			1

Now I want to do some analysis of individuals within different types of households. For instance, I want to calculate the mean number of years of schooling for all 11 year olds in household type 2. The example above only includes one household that fits this type but lets say there's around 50. How would I calculate that and do I need to use By?

Thanks

Comment

Sumayyah Tasnim

Join Date: May 2017

Posts: 13
#6

04 Jun 2017, 08:32

Actually , I've just realised how simple that is!

How would I calculate the distribution of educational attainment within each household type? For instance, if I want to find out whether older siblings (say those aged 12-14) within a household have higher years of schooling than younger siblings (say those aged 10-11)? Or how it differs across gender within each household type?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

04 Jun 2017, 08:40

Sumayyah:
- first of all, I think you can make -Hld_typ*- more efficient with -label-, creating a single variable instead of three:

Code:

replace Hld_typ1=2 if Hld_typ2==1 replace Hld_typ1=3 if Hld_typ3==1 rename Hld_typ1 Hld_typ_all label define Hld_typ_all 1 "Hld_typ1" 2 "Hld_typ2" 3 "Hld_typ3" label val Hld_typ_all Hld_typ_all drop Hld_typ2 Hld_typ3

As far as your question is concerned, you may want to try:

Code:

tabstat Years_of_schooling if age==11 & Hld_typ_all==2, stat(count mean sd p50 min max)

Last edited by Carlo Lazzaro; 04 Jun 2017, 08:45.

Kind regards,
Carlo
(Stata 19.0)
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

04 Jun 2017, 08:48

Sumayyah:
you may want to try:

Code:

gen age_flag=0 if age<=11
replace gen age_flag=1 if age>11 & age<=14
label define age_flag 0 "younger_siblings" 1 "older_siblings"
label val age_flag age_flag
bysort Hld_typ_all: regress Years_of_schooling i.age_flag

Kind regards,
Carlo
(Stata 19.0)

Comment

Sumayyah Tasnim

Join Date: May 2017

Posts: 13
#9

10 Jun 2017, 10:08

Thanks Carlo for providing a more efficient label. I think i will keep the variables seperate as it may be the case that households fall in to more than one type of household (e.g. there could be a household which has an eligible daughter, 10 year old and 14 year old and so would satisfy all three household types - unlikely but still theoretically possible).

Currently when I run "sum hld_typ*" it gives me the number of individuals that are in a household of that type. How do I check how many HOUSEHOLDS there are of each type?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#10

10 Jun 2017, 10:43

Sumayyah:
if a family falls across two different levels of the categorical variable -Hld_typ_all-, you can simply input another value and then modify the -label- list.
That said, you may want to try:

Code:

egen flag=tag(HHID) total flag, over(Hld_typ_all)

The following toy-example can hopefully help:

Code:

sysuse auto.dta egen flag=tag(mpg) total flag,over(foreign)

Last edited by Carlo Lazzaro; 10 Jun 2017, 10:47.

Kind regards,
Carlo
(Stata 19.0)
Comment
Sumayyah Tasnim

Join Date: May 2017

Posts: 13
#11

10 Jun 2017, 11:16

Thank you that's worked!

I have calculated average years of schooling for various age groups within each type of household. Is there a way to store the various means and present it in a publication quality table via Stata? A table say where the columns are different household types and the rows different age groups.
Comment
Sumayyah Tasnim

Join Date: May 2017

Posts: 13
#12

02 Jul 2017, 02:07

I have created a comparison group for each type of household and I now want to carry out a t-test for each household and it's corresponding comparison group. How do I do a t-test for specific individuals within each household type as opposed to the entire household? For instance, a t-test calculating the difference in average educational attainment for 10 year old girls in household type1 and the comparison household.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#13

02 Jul 2017, 08:14

Sumayyah:
you may want to try:

Code:

replace hld_typ1=0 if hld_typ1==. ttest yearsofschooling if age==10, by(hld_typ1) unequal

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement