Counting households for which there is info on both 1 male and 1 female

Allie Sun

Join Date: Jul 2018

Posts: 16
#1

Counting households for which there is info on both 1 male and 1 female

26 Jul 2018, 09:46

Hi,
I have a dataset that should include information on 2 interviewees per household: a female and a male partner. However, I noticed that plenty of information has been collected regarding only one of the partners or about more members of the same household by having the same gender (i.e. 2 females, or 3 males).

I would like to count the number of households for which we have information on both 1 male and 1 female.

An example of the data looks like this (with plenty more observations):
ID Gender

1. male 2. female Total

1 0 1 1

2 2 0 2

4 0 1 1

5 0 1 1

6 2 0 2

7 0 3 3

11 0 1 1

12 0 2 2

15 0 2 2

17 0 1 1

21 1 1 2

In this small extract of data, only Household ID 21 has info on 1 male and 1 female. How can I derive an exact count of the Households that are like ID 21 (for which there is info on 1 male and 1 female)?

Thank you very much.
Tags: None
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#2

26 Jul 2018, 09:51

Please provide an extract of your dataset with the dataex command (see FAQ 12.2 https://www.statalist.org/forums/help)

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment

Allie Sun

Join Date: Jul 2018
Posts: 16

26 Jul 2018, 10:30

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int sectionaa_new_householda1_househ byte sectionaa_new_householda8_gender
  851 1
 1101 1
  541 1
10361 1
  667 1
  624 1
  261 1
  811 1
 1017 1
  821 1
10654 1
  296 1
 1101 1
  751 1
  386 1
  474 1
  917 1
  743 1
  213 1
 1017 1
  292 1
10693 1
  131 1
 1061 1
  295 1
  296 1
  472 1
  764 1
   72 1
  337 1
  913 1
  386 1
    2 1
  312 1
  624 1
10351 1
  971 1
   36 1
  281 1
  322 1
  855 1
  951 1
  337 1
  161 1
    2 1
    6 1
  963 1
  967 1
  747 1
  457 1
10141 1
10681 1
  163 1
  951 1
  366 1
  723 1
  603 1
  661 1
  476 1
  941 1
  672 1
 1127 1
  724 1
10654 1
10613 1
   51 1
10547 1
10161 1
  617 1
  291 1
  781 1
  161 1
  192 1
  917 1
  473 1
  263 1
  516 1
10613 1
  312 1
 1147 1
   43 1
  384 1
 1207 1
   31 1
10151 1
  874 1
10391 1
 1201 1
   81 1
  747 1
  501 1
  184 1
  554 1
10361 1
  805 1
  452 1
10547 1
  141 1
   71 1
  443 1
end
label values sectionaa_new_householda8_gender gender
label def gender 1 "1. male", modify

Comment

Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#4

26 Jul 2018, 11:01

So there is no variation on gender here, but I'm assuming there is variation later in the dataset. I added a little variation by putting a few women in the dataset (this is just for illustration with your sample data, you would not do this in your own):

Code:

sort sectionaa_new_householda1_househ replace sectionaa_new_householda8_gender=2 in 96 replace sectionaa_new_householda8_gender=2 in 2 replace sectionaa_new_householda8_gender=2 in 82

You can then count the number of women and men in each household:

Code:

bysort sectionaa_new_householda1_househ: egen count_men=total( sectionaa_new_householda8_gender==1) bysort sectionaa_new_householda1_househ: egen count_women=total( sectionaa_new_householda8_gender==2)

If all you want to know is the number of households with only one of each:

Code:

count if count_men==1 & count_women==1

Then just divide that number by 2 (since you know there will be exactly 2 people in each household).

This solution assumes that you do not have any missing values on your gender variable such that you might have 1 male, 1 female, 1 unknown.

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
1 like
Comment
Allie Sun

Join Date: Jul 2018

Posts: 16
#5

27 Jul 2018, 01:52

Thank you very much. It worked!

Last edited by Allie Sun; 27 Jul 2018, 01:55.
Comment
Allie Sun

Join Date: Jul 2018

Posts: 16
#6

27 Jul 2018, 02:57

Would you be able to tell me how to generate a variable that just tells us the number of households for which we have info on the couples. Meaning, a variable that summarizes the info we just derived - including the divided by two part.

I need to this to then be able to graph the number of households with couples interviewed per cluster. If I do so right now using the following command:

graph bar (count) sectionaa_new_householda1_househ if count_men==1 & count_women==1, by(sectionaa_new_householda3_cluste)

I clearly get double the number.

I need a variable that already includes the count divided by 2.

I tried egen and count, but it keeps giving me error messages.

Thank you very much - your help is truly appreciated; as I am new to this.
Comment
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#7

27 Jul 2018, 06:53

The values for count_men & count_women are constant within households. We can easily create a new variable that is 1 if the number of men is 1 and the number of women is 1--this will also be constant within households:

Code:

gen only_mf=0 replace only_mf=1 if count_men==1 & count_women==1

Now we still have the problem that we have multiple observations per household. So if we want characteristics of the household, we only want to observe/graph a single observation of that household. The command -egen- has a function -tag( )- that selects one observation per specified group:

Code:

egen tag=tag(sectionaa_new_householda1_househ)

I don't have your variable sectionaa_new_householda3_cluste, but I can create a made up group variable:

Code:

gen group=0 replace group=1 in 34/66 replace group=2 in 67/100

And graph using the following command:

Code:

graph bar (count) sectionaa_new_householda1_househ if only_mf==1 & tag==1, over(group) allcategories

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
Comment
Allie Sun

Join Date: Jul 2018

Posts: 16
#8

30 Jul 2018, 02:03

Very thankful - it worked well.

Your help is truly appreciated.
Comment
Allie Sun

Join Date: Jul 2018

Posts: 16
#9

30 Jul 2018, 03:10

May I ask your support again:

I would like to draw a bar graph illustrating on the y axes the total count and on the x axis: the number of households and the gender per household. Meaning, I would like to visualize each household number and within that show how many females and males we have.

I have used this command:

graph bar (count), over(sectionaa_new_householda8_gender) over(sectionaa_new_householda1_househ)

However, having a total of 871 observations and 527 households, I can 't see much from the graph.

I fear that also in this case, the household numbers are repeated more than once - while I would like just one household number and for each to be able to see how many males and females we have info on.

I suppose that the tag function would help me but I am not sure how to apply it in this case; or I might just need to work on the graph so that 527 households fit clearly in it.

Thanks again for all your support.

Last edited by Allie Sun; 30 Jul 2018, 03:27.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35625
#10

30 Jul 2018, 04:50

527 households identified explicitly on a graph??? I can't imagine a design that would work unless your graph size is a few metres.

The data example in #3 -- as Carole tactfully pointed out -- is not helpful here. Modifying the data example in #1 I can suggest one kind of graph that should work: it shows the joint distribution of number of females and number of males.

Note that

1. It's not clear why you have more observations than household if your data are like #1.

2. You need to install tabplot before you can use it.

Code:

search tabplot, sj

to get a clickable download link. At the time of writing you need files from gr0066_1

SJ-17-3 gr0066_1 . . . . . . . . . . . . . . . . Software update for tabplot
(help tabplot if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q3/17 SJ 17(3):779
added options for reversing axis scales; improved handling of
axis labels containing quotation marks

SJ-16-2 gr0066 . . . . . . Speaking Stata: Multiple bar charts in table form
(help tabplot if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q2/16 SJ 16(2):491--510
provides multiple bar charts in table form representing
contingency tables for one, two, or three categorical variables

Code:

clear input ID male female Total 1 0 1 1 2 2 0 2 4 0 1 1 5 0 1 1 6 2 0 2 7 0 3 3 11 0 1 1 12 0 2 2 15 0 2 2 17 0 1 1 21 1 1 2 end tabplot male female, showval yasis xasis bfcolor(none)
Comment

ID	Gender
	1. male	2. female	Total
1	0	1	1
2	2	0	2
4	0	1	1
5	0	1	1
6	2	0	2
7	0	3	3
11	0	1	1
12	0	2	2
15	0	2	2
17	0	1	1
21	1	1	2

Announcement

Counting households for which there is info on both 1 male and 1 female

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment