Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating age brackets from numerical data and an population pyramid graph

    I have a household survey with the following variables:
    Age of respondents (by year, so basically 0 years old, 1 year old, 2, 3, 4, etc... until 100)
    Gender (i.e. male or female)

    I am trying to create a new variable that would be broken down by age brackets in 5 year intervals (so from 0 to 5, 6 to 10, 11 to 15, etc.. until we reach 100) but also broken down by gender, so each of these brackets for males and females separately.

    I've tried several codes but I cannot seem to get the right variable, I've tried foreach and I've tried forvalues and I've tried generating a new variable with totals. Could you please let me know what the best way to go about this is?

    Many thanks,
    Rana

  • #2
    Have a look at the FAQs for advice on how to post data examples using dataex. Ideally, you want your data in long form like below with a variable for age and another for gender. I get frustrated when I look at papers which call the gender variable "gender". This is very uninformative to the reader and therefore, you should name this variable either male or female.

    Code:
    clear
    input float(age male)
     5 1
    20 1
    40 1
    99 1
    13 0
    77 0
    22 0
     7 0
    end
    Having this, you can define the start or end years using either a floor or a ceiling function. You do not need to create separate variables for males and females because you already have the indicator variable "male" which identifies which gender an observation belongs to.

    Code:
    *5 YEAR AGE GROUPS
    bys male: gen wanted= 5*ceil( age/5 )
    gen label = string(wanted-5) + " to " +string(wanted) + " years"
    *TO INSTALL TYPE findit labmask AND FOLLOW INSTRUCTIONS
    labmask wanted, val(label)
    drop label
    l, sepby(male)
    Code:
    . l, sepby(male)
    
        +------------------------------+
         | age   male            wanted |
         |------------------------------|
      1. |   7      0     5 to 10 years |
      2. |  22      0    20 to 25 years |
      3. |  77      0    75 to 80 years |
      4. |  13      0    10 to 15 years |
         |------------------------------|
      5. |  99      1   95 to 100 years |
      6. |  40      1    35 to 40 years |
      7. |  20      1    15 to 20 years |
      8. |   5      1      0 to 5 years |
         +------------------------------+
    Last edited by Andrew Musau; 13 Jun 2017, 15:11.

    Comment


    • #3
      The age "0 years" presents a problem in #2 because it will be grouped with the unborn up to five years from today. One way to deal with this is to set up an initial condition

      Code:
      clear
      input float(age male)
       5 1
      20 1
      40 1
      99 1
      13 0
      77 0
      22 0
       7 0
       0 1
      end
      
      
      bys male: gen wanted= cond(age<5.0001, 5,  5*ceil( age/5 ))
      gen label = cond(age<5.0001, string(0) + " to " +string(5) + " years", string(wanted-4) + " to " +string(wanted) + " years")
      labmask wanted, val(label)
      drop label
      l, sepby(male)

      Code:
      
      . l, sepby(male)
      
           +------------------------------+
           | age   male            wanted |
           |------------------------------|
        1. |   7      0     6 to 10 years |
        2. |  13      0    11 to 15 years |
        3. |  22      0    21 to 25 years |
        4. |  77      0    76 to 80 years |
           |------------------------------|
        5. |   0      1      0 to 5 years |
        6. |  20      1    16 to 20 years |
        7. |   5      1      0 to 5 years |
        8. |  99      1   96 to 100 years |
        9. |  40      1    36 to 40 years |
           +------------------------------+
      Last edited by Andrew Musau; 14 Jun 2017, 07:02.

      Comment

      Working...
      X