Create %female of a categorical variable with 99 categories

Julie Well

Join Date: Aug 2020

Posts: 9
#1

Create %female of a categorical variable with 99 categories

10 Sep 2020, 14:25

Hello - I have a categorical variable with 99 categories ('city'). The city variable shows me how many people are in each of the 99 cities.

I have a female variable for each person in the data set (0/1). I can use the tab code to see how many men and women are in each individual city (category).

Is there a way to create a new variable 'city_percentfemale' that will list the %female (female in that city N/total N in that city) for each individual city (category) within the city variable?

Thank you!

Julie
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10223
#2

10 Sep 2020, 15:05

With no missing values in the dataset

Code:

bys city: egen wanted= total(female) bys city: replace wanted= (wanted/_N)*100
1 like
Comment
Julie Well

Join Date: Aug 2020

Posts: 9
#3

10 Sep 2020, 16:37

Thanks Andrew! Any way to do it and preserve the city label? Right now it looks like the label is the percent?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10223
#4

11 Sep 2020, 04:00

I am not sure I understand what you mean. There are no labels here (at least in Stata's definition of a label). The wanted variable is numeric and gives you the percentage of female inhabitants in a given city, i.e., the variable varies between cities but is constant within cities. This should correspond to the output from the tab command. Since you have a city variable, to identify which value belongs to which city, there are a number of ways:

Code:

egen tag= tag(city) browse city wanted if tag list city wanted if tag, sep(0)
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35730
#5

11 Sep 2020, 04:15

Even with missing values present

Code:

bysort city : egen percent = mean(100 * female)

will do what what you want. There is a subtle detail that

Code:

100 * mean(female)

is not allowed but the allowed syntax gives what you want any way. As Andrew Musau explains, your variable city is unchanged. In addition to his excellent technique

Code:

tabdisp city, c(percent) format(%2.1f)

should work fine -- and you can specify any other format that you might want.

However, with 99 cities I doubt that I would want to see alphabetical order.

I could easily imagine wanting to see a listing of cities and their means, but in order of those means. There are several ways of doing that and here's one which depends on installing groups -- which is a community-contributed command from the Stata Journal.

The example is self-contained

Code:

. sysuse auto, clear (1978 Automobile Data) . egen mean = mean(mpg) , by(rep78) . format mean %2.1f . groups mean rep78, colorder(2) show(none) +--------------+ | rep78 mean | |--------------| | 2 19.1 | | 3 19.4 | | 1 21.0 | | 4 21.7 | | 5 27.4 | +--------------+ . search st0496, entry Search of official help files, FAQs, Examples, and Stata Journals SJ-18-1 st0496_1 . . . . . . . . . . . . . . . . . Software update for groups (help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox Q1/18 SJ 18(1):291 groups exited with an error message if weights were specified; this has been corrected SJ-17-3 st0496 . . . . . Speaking Stata: Tables as lists: The groups command (help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox Q3/17 SJ 17(3):760--773 presents command for listing group frequencies and percents and cumulations thereof; for various subsetting and ordering by frequencies, percents, and so on; for reordering of columns; and for saving tabulated data to new datasets

There are two minor perversities here on my part. First, back in 2003 I hijacked the name groups for what was intended as a fairly general listing or tabulation command. (StataCorp retain the right to use the name for an official command, in which case that will break my command and i will need a new command name.) However, a side-effect of my using a simple English word is that "groups" is a lousy search term, so the code above gives the detail that st0496 is what works as you want.

Second,

Code:

groups rep78 mean

is perfectly legal, but in essence sorts first on the first variable and then on the second variable before displaying results. Reversing the variables changes the sort order. Often people would still want to see the categorical variable in the first column and the colorder() allows you to have it both ways.

The write-up in SJ 17-3 will emerge very shortly from behind a paywall, but the main story can be seen at https://www.statalist.org/forums/for...updated-on-ssc and the software may be installed regardless of whether you or your workplace subscribe to the Stata Journal.
.

Last edited by Nick Cox; 11 Sep 2020, 05:07.
3 likes
Comment
Julie Well

Join Date: Aug 2020

Posts: 9
#6

11 Sep 2020, 13:45

Thank you Andrew Musau and Nick Cox! Everything works and the information is very helpful - I appreciate it! The bys command collapses some of the data but with the list and tabdisp commands I am able to double check the integrity. The groups command looks useful and I will be working with that too.
Comment

Announcement

Create %female of a categorical variable with 99 categories

Comment

Comment

Comment

Comment

Comment