Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create %female of a categorical variable with 99 categories

    Hello - I have a categorical variable with 99 categories ('city'). The city variable shows me how many people are in each of the 99 cities.

    I have a female variable for each person in the data set (0/1). I can use the tab code to see how many men and women are in each individual city (category).

    Is there a way to create a new variable 'city_percentfemale' that will list the %female (female in that city N/total N in that city) for each individual city (category) within the city variable?

    Thank you!

    Julie

  • #2
    With no missing values in the dataset

    Code:
    bys city: egen wanted= total(female)
    bys city: replace wanted= (wanted/_N)*100

    Comment


    • #3
      Thanks Andrew! Any way to do it and preserve the city label? Right now it looks like the label is the percent?

      Comment


      • #4
        I am not sure I understand what you mean. There are no labels here (at least in Stata's definition of a label). The wanted variable is numeric and gives you the percentage of female inhabitants in a given city, i.e., the variable varies between cities but is constant within cities. This should correspond to the output from the tab command. Since you have a city variable, to identify which value belongs to which city, there are a number of ways:

        Code:
        egen tag= tag(city)
        browse city wanted if tag
        list city wanted if tag, sep(0)

        Comment


        • #5
          Even with missing values present

          Code:
          bysort city : egen percent = mean(100 * female)
          will do what what you want. There is a subtle detail that

          Code:
          100 * mean(female)
          is not allowed but the allowed syntax gives what you want any way. As Andrew Musau explains, your variable city is unchanged. In addition to his excellent technique

          Code:
          tabdisp city, c(percent) format(%2.1f)
          should work fine -- and you can specify any other format that you might want.

          However, with 99 cities I doubt that I would want to see alphabetical order.

          I could easily imagine wanting to see a listing of cities and their means, but in order of those means. There are several ways of doing that and here's one which depends on installing groups -- which is a community-contributed command from the Stata Journal.

          The example is self-contained

          Code:
          . sysuse auto, clear
          (1978 Automobile Data)
          
          . egen mean = mean(mpg) , by(rep78)
          
          . format mean %2.1f
          
          . groups mean rep78, colorder(2) show(none)
          
            +--------------+
            | rep78   mean |
            |--------------|
            |     2   19.1 |
            |     3   19.4 |
            |     1   21.0 |
            |     4   21.7 |
            |     5   27.4 |
            +--------------+
          
          . search st0496, entry
          
          Search of official help files, FAQs, Examples, and Stata Journals
          
          SJ-18-1 st0496_1  . . . . . . . . . . . . . . . . . Software update for groups
                  (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                  Q1/18   SJ 18(1):291
                  groups exited with an error message if weights were specified;
                  this has been corrected
          
          SJ-17-3 st0496  . . . . .  Speaking Stata: Tables as lists: The groups command
                  (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                  Q3/17   SJ 17(3):760--773
                  presents command for listing group frequencies and percents and
                  cumulations thereof; for various subsetting and ordering by
                  frequencies, percents, and so on; for reordering of columns;
                  and for saving tabulated data to new datasets


          There are two minor perversities here on my part. First, back in 2003 I hijacked the name groups for what was intended as a fairly general listing or tabulation command. (StataCorp retain the right to use the name for an official command, in which case that will break my command and i will need a new command name.) However, a side-effect of my using a simple English word is that "groups" is a lousy search term, so the code above gives the detail that st0496 is what works as you want.

          Second,


          Code:
          groups rep78 mean
          is perfectly legal, but in essence sorts first on the first variable and then on the second variable before displaying results. Reversing the variables changes the sort order. Often people would still want to see the categorical variable in the first column and the
          colorder() allows you to have it both ways.

          The write-up in SJ 17-3 will emerge very shortly from behind a paywall, but the main story can be seen at https://www.statalist.org/forums/for...updated-on-ssc and the software may be installed regardless of whether you or your workplace subscribe to the
          Stata Journal.
          .
          Last edited by Nick Cox; 11 Sep 2020, 05:07.

          Comment


          • #6
            Thank you Andrew Musau and Nick Cox! Everything works and the information is very helpful - I appreciate it! The bys command collapses some of the data but with the list and tabdisp commands I am able to double check the integrity. The groups command looks useful and I will be working with that too.

            Comment

            Working...
            X