Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Number of children in household variable from individual level dataset

    Dear all
    I am using the PALMS dataset (Post-Apartheid Labour Market Survey) - it is a stacked crossed sectional dataset of the labour market dynamic survey in South Africa (and its predecessors) across the years 1994 to 2017. It is at the individual level.
    I am wanting to create a variable for the number of children in the household under the ages of 7 (i.e. 'Young children present'). I gather you do this using the 'egen' command. There is a variable for the household and individual identifier.

    The code that i have tried to run is:

    egen Y_CHILD = count(personnr) if age <= 7 , by (uqnr)
    * Counts number of young children in every uqnr (household identifier)
    egen YChild = max (YCHILD), by (uqnr)
    *Assigns total number of children to each individual within the same household
    tab YChild
    recode YChild (0/2=1 "1. 0 - 2 children") (3/4=2 "2. 3 - 4 children") (5/6=3 "3. 5 - 6 children") (7/max=4 "4. 7 or more children"), gen(YChild_cat)
    label var YChild_cat "Categorical variable: Number of Young Children present in Household"
    *Assess years
    tab YChild_cat if year == 2007
    tab YChild_cat if year == 2012
    tab YChild_cat if year == 2017

    However, this does not produce the desired result. I believe i am missing something fundamental about the egen command.
    Could anyone help?

  • #2
    this does not produce the desired result
    That's on all fours with "doesn't work". What is wrong specifically?

    Here is some technique that may help:

    Code:
    egen num_child = total(age <= 7) , by (uqnr)
    egen tag = tag(uqnr)
    tab num_child if tag
    The first command gives you the number of children under 8 in every observation in every household. The second and third insist that each household is tabulated just once; otherwise you are tabulating observations, not households. I guess that's the main problem here.

    See also

    Code:
    help egen
    for more on tag(). Recoding and restricting to years is extra: I am guessing that is not where your problem lies.

    By the way, watch out that


    Code:
    egen num_others = total(age > 7) , by (uqnr)
    catches missing values on age too; if you didn't want that, then

    Code:
    egen num_others = total(inrange(age, 8, .)) , by (uqnr)
    will not include those missings.
    Last edited by Nick Cox; 08 May 2019, 03:26.

    Comment


    • #3
      Thank you for your response.

      My confusion lies in the fact that I do not know how to follow one household using the 'tag' command, and then to specify the number of children within a single household.

      I have now attempted to use it, however i am unsure if it is still correct.
      The code:

      egen hh_one=tag(uqnr)

      * 'tag' - creates a dummy variable: where = 1 for the first observation for each household
      * and = 0 to subsequent observations

      egen num_ychild = total(age <= 7) , by (uqnr)
      tab num_ychild if hh_one==1

      This produces the result:

      num_ychild | Freq. Percent Cum.
      ------------+-----------------------------------
      0 | 339,982 56.12 56.12
      1 | 76,915 12.70 68.81
      2 | 57,762 9.53 78.34
      3 | 34,183 5.64 83.99
      4 | 42,876 7.08 91.06
      5 | 9,409 1.55 92.62
      6 | 11,541 1.90 94.52
      7 | 5,636 0.93 95.45
      8 | 13,499 2.23 97.68
      9 | 3,197 0.53 98.21
      10 | 2,099 0.35 98.55
      11 | 1,749 0.29 98.84
      12 | 3,353 0.55 99.40
      13 | 759 0.13 99.52
      14 | 599 0.10 99.62
      15 | 518 0.09 99.71
      16 | 799 0.13 99.84
      17 | 203 0.03 99.87
      18 | 200 0.03 99.90
      19 | 122 0.02 99.92
      20 | 180 0.03 99.95
      21 | 68 0.01 99.97
      22 | 43 0.01 99.97
      23 | 38 0.01 99.98
      24 | 56 0.01 99.99
      25 | 23 0.00 99.99
      26 | 8 0.00 99.99
      27 | 5 0.00 99.99
      28 | 13 0.00 100.00
      29 | 1 0.00 100.00
      30 | 4 0.00 100.00
      31 | 6 0.00 100.00
      32 | 6 0.00 100.00
      33 | 1 0.00 100.00
      34 | 3 0.00 100.00
      35 | 1 0.00 100.00
      36 | 1 0.00 100.00
      37 | 1 0.00 100.00
      40 | 1 0.00 100.00
      42 | 1 0.00 100.00
      ------------+-----------------------------------
      Total | 605,861 100.00

      .
      How do you know restrict it to one variable for the number of children within a single household - i.e. i would like to get rid of having to use the 'if' statement in the command.

      Regards

      Comment


      • #4
        Sorry, but I don't think I understand your new question.

        But at its simplest, the number of children in a household is the same for all people in a household. If you want to tabulate households not people, then either you need to use an if restriction to see each household just once, or you need a reduced dataset with one observation per household.

        Comment

        Working...
        X