Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to count the number of males and females per household?

    I have a dataset reporting the id of the households (varname: nquest), then to each component is given a number (varname: nord) and it is also reported the gender (sex=1 if male and sex=2 if females)
    nquest nord sex
    100 1 1
    100 2 1
    100 3 2
    101 1 1
    102 1 2
    102 2 1
    103 1 2
    103 2 2
    103 3 1

    i would like to create two variables, one reporting the number of males per households and another the number of females per households and have a result like this one:

    nquest nord sex males females
    100 1 1 2 1
    100 2 1 2 1
    100 3 2 2 1
    101 1 1 1 0
    102 1 2 1 1
    102 2 1 1 1
    103 1 2 0 3
    103 2 2 0 3
    103 3 2 0 3

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int nquest byte(nord sex)
    100 1 1
    100 2 1
    100 3 2
    101 1 1
    102 1 2
    102 2 1
    103 1 2
    103 2 2
    103 3 1
    end
    
    by nquest (nord), sort: egen hh_males = total(sex == 1)
    by nquest: egen hh_females = total(sex == 2)
    Note: The results this generates do not agree with what you show in #1. Specifically, in the input data, nquest 103 nord 3 is shown as a male, but in your expected output you show him as female and thereby produce a total of 0 males and 3 females for the household, when it should be 1 male and 2 females.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int nquest byte(nord sex)
      100 1 1
      100 2 1
      100 3 2
      101 1 1
      102 1 2
      102 2 1
      103 1 2
      103 2 2
      103 3 1
      end
      
      egen males = total(sex == 1), by(nquest)
      
      egen females = total(sex == 2), by(nquest)
      What is key here is expressions like sex == 1 evaluate as 1 if true and 0 is false. So for males and the first household you are adding 1 and 1 and 0 and getting 2. So that is how to count.

      See also e.g.

      SJ-19-1 dm0099 . . . . . . How best to generate indicator or dummy variables
      . . . . . . . . . . . . . . . . . . . . N. J. Cox and C. B. Schechter
      Q1/19 SJ 19(1):246--259 (no commands)
      discusses how to best generate indicator or dummy variables

      SJ-16-1 dm0087 . . . Speaking Stata: Truth, falsity, indication, and negation
      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
      Q1/16 SJ 16(1):229--236 (no commands)
      looks at the following concepts: indicator variables, by: for
      groupwise calculations, and control of sort order to enable
      exactly what you want

      Comment

      Working...
      X