Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a new variable by Household ID

    I have a dataset where everybody has household ID and each respondent has a given economic status: "Unemployed/Student/Employed/Home Duties". I want to create a variable that records the proportion of the household which is unemployed. i.e. if there is a household of 4 members and 2 are unemployed, then I want this variable to record that 50% of the household is unemployed. Does anyone know how can I create a new variable by household ID to record this?

    Thanks,

    John

  • #2
    At 84 posts in, you may be expected to be familiar with the request to include a dataex example and if possible some code you tried so far.

    Comment


    • #3
      I agree with Jorrit Gosens You might find this example instructive:


      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . egen pcforeign = mean(100 * (foreign == 1)), by(rep78)
      
      . tabdisp rep78, c(pcforeign)
      
      ----------------------
      Repair    |
      Record    |
      1978      |  pcforeign
      ----------+-----------
              1 |          0
              2 |          0
              3 |         10
              4 |         50
              5 |   81.81818
              . |         20
      ----------------------
      I can't suggest code for you because you don't give any variable names. Nor do you make clear whether your status variable is string or numeric with value labels.

      Comment


      • #4
        Hi all,

        Thank you Nick and Jorrit for very good points, and I do apologize for not being more explicit.


        In the dataset a respondent for the household is asked to describe the principle economic status of each household member, as below: Here PES P1 is the principle economic status of Person 1 in the household, this is repeated for person 2 (PES P2) and so on and so forth all the way up to PES P7. I have ID which is the households id. My intention is to create a by household variable which will describe how many members of each household are unemployed. Secondly I will create a variable that is the percent of household members who are unemployed in each household. I can do this as the variable MMA4 describes how many people are in each household.

        Code:
        
        . tab MMa5pesp1
        
                  PES P1 |      Freq.     Percent        Cum.
        -----------------+-----------------------------------
        School/Education |        176        1.58        1.58
        At work/training |      6,356       57.09       58.67
              Unemployed |        422        3.79       62.46
             Home duties |      4,061       36.47       98.93
                   Other |        112        1.01       99.94
              Don't Know |          7        0.06      100.00
        -----------------+-----------------------------------
                   Total |     11,134      100.00
        
        . tab MMa5pesp1, nolab
        
             PES P1 |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  2 |        176        1.58        1.58
                  3 |      6,356       57.09       58.67
                  4 |        422        3.79       62.46
                  6 |      4,061       36.47       98.93
                  7 |        112        1.01       99.94
                  9 |          7        0.06      100.00
        ------------+-----------------------------------
              Total |     11,134      100.00
        
        
        . tab MMA4
        
            A4. How |
        many people |
                 in |
          household |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  2 |        383        3.44        3.44
                  3 |      3,732       33.52       36.96
                  4 |      3,611       32.43       69.39
                  5 |      2,064       18.54       87.93
                  6 |        893        8.02       95.95
          7 or more |        451        4.05      100.00
        ------------+-----------------------------------
              Total |     11,134      100.00
        
        . tab MMA4, nolab
        
            A4. How |
        many people |
                 in |
          household |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  2 |        383        3.44        3.44
                  3 |      3,732       33.52       36.96
                  4 |      3,611       32.43       69.39
                  5 |      2,064       18.54       87.93
                  6 |        893        8.02       95.95
                  7 |        451        4.05      100.00
        ------------+-----------------------------------
              Total |     11,134      100.00

        My first approach was as below:

        Code:
        gen X_propunemphhsd_y0=.
        replace X_propunemphhsd_y0 = 1 if MMa5pesp1 == 4 & MMa5pesp2!=4 & MMa5pesp3!=4 & MMa5pesp4!=4 & MMa5pesp5!=4 & MMa5pesp6!=4 & MMa5pesp7!=4 
        replace X_propunemphhsd_y0 = 1 if MMa5pesp2 == 4 & MMa5pesp1!=4 & MMa5pesp3!=4 & MMa5pesp4!=4 & MMa5pesp5!=4 & MMa5pesp6!=4 & MMa5pesp7!=4 
        replace X_propunemphhsd_y0 = 1 if MMa5pesp3 == 4 & MMa5pesp2!=4 & MMa5pesp1!=4 & MMa5pesp4!=4 & MMa5pesp5!=4 & MMa5pesp6!=4 & MMa5pesp7!=4 
        replace X_propunemphhsd_y0 = 1 if MMa5pesp4 == 4 & MMa5pesp2!=4 & MMa5pesp3!=4 & MMa5pesp1!=4 & MMa5pesp5!=4 & MMa5pesp6!=4 & MMa5pesp7!=4 
        replace X_propunemphhsd_y0 = 1 if MMa5pesp5 == 4 & MMa5pesp2!=4 & MMa5pesp3!=4 & MMa5pesp4!=4 & MMa5pesp1!=4 & MMa5pesp6!=4 & MMa5pesp7!=4 
        replace X_propunemphhsd_y0 = 1 if MMa5pesp6 == 4 & MMa5pesp2!=4 & MMa5pesp3!=4 & MMa5pesp4!=4 & MMa5pesp5!=4 & MMa5pesp1!=4 & MMa5pesp7!=4 
        replace X_propunemphhsd_y0 = 1 if MMa5pesp7 == 4 & MMa5pesp2!=4 & MMa5pesp3!=4 & MMa5pesp4!=4 & MMa5pesp5!=4 & MMa5pesp6!=4 & MMa5pesp1!=4
        Where I created a variable that is only equal to one as above, with the intention of adding to this cumulatively to make the variable equal to 2, then 3, then 4, etc. But this approach is very brute-force and I think there must be a better way.

        Please find an example of my data as below:



        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double(ID MMA4 MMa5pesp1 MMa5pesp2 MMa5pesp3 MMa5pesp4 MMa5pesp5 MMa5pesp6 MMa5pesp7)
         128900 7 2 . 3 3 3 2 2
         783000 6 2 . 3 3 3 3 .
         768500 4 2 . 1 1 . . .
         143100 3 2 . 2 . . . .
         265400 3 2 . 3 . . . .
         258900 3 2 . 2 . . . .
         108800 5 2 . 3 2 1 . .
         100100 4 2 . 6 2 . . .
         252500 3 2 . 4 . . . .
         777000 5 2 . 2 2 2 . .
          236900 7 2 . 1 3 3 2 2
         526900 3 2 . 3 . . . .
         226700 6 2 . 4 3 2 1 .
         906200 2 2 . . . . . .
         391500 7 2 . 3 6 2 2 2
         247800 3 2 . 1 . . . .
         505300 3 2 . 3 . . . .
         328700 6 2 . 3 6 2 2 .
          75300 7 2 . 3 6 2 2 2
         711300 2 2 . . . . . .
         338900 7 2 . 3 3 2 2 2
         635100 3 2 . 3 . . . .
         508800 3 2 . 3 . . . .
         541500 4 2 . 3 3 . . .
         443400 2 2 . . . . . .
        1025800 6 2 . 2 2 2 1 .
         158500 3 2 . 3 . . . .
        1087400 4 2 . 3 1 . . .
         493100 7 2 . 6 4 2 2 2
         109100 5 2 . 6 2 2 . .
         471100 2 2 . . . . . .
         474400 3 2 . 3 . . . .
         778000 5 2 . 9 3 3 . .
         120900 6 2 . 3 3 3 2 .
         203900 5 2 . 2 4 2 . .
         595900 6 2 . 4 2 2 7 .
         536900 4 2 . 3 2 . . .
         158700 3 2 . 3 . . . .
         311600 5 2 . 3 2 2 . .
          35100 3 2 . 2 . . . .
         267600 3 2 . 1 . . . .
          86100 3 2 . 4 . . . .
         688400 5 2 . 3 1 1 . .
         456500 4 2 . 2 2 . . .
         328600 6 2 . 3 3 3 3 .
         234300 7 2 . 4 6 4 2 2
         242800 2 2 . . . . . .
         699800 6 2 . 7 3 7 3 .
         324400 5 2 . 3 3 7 . .
        end
        label values MMA4 MMA4
        label def MMA4 7 "7 or more", modify
        label values MMa5pesp1 MMa5pesp1
        label def MMa5pesp1 2 "School/Education", modify
        label def MMa5pesp1 7 "Other", modify
        label values MMa5pesp2 MMa5pesp2
        label values MMa5pesp3 MMa5pesp3
        label def MMa5pesp3 1 "Pre-school", modify
        label def MMa5pesp3 2 "School/Education", modify
        label def MMa5pesp3 3 "At work/training", modify
        label def MMa5pesp3 4 "Unemployed", modify
        label def MMa5pesp3 6 "Home duties", modify
        label def MMa5pesp3 7 "Other", modify
        label def MMa5pesp3 9 "Don't Know", modify
        label values MMa5pesp4 MMa5pesp4
        label def MMa5pesp4 1 "Pre-school", modify
        label def MMa5pesp4 2 "School/Education", modify
        label def MMa5pesp4 3 "At work/training", modify
        label def MMa5pesp4 4 "Unemployed", modify
        label def MMa5pesp4 5 "Retired", modify
        label def MMa5pesp4 6 "Home duties", modify
        label values MMa5pesp5 MMa5pesp5
        label def MMa5pesp5 1 "Pre-school", modify
        label def MMa5pesp5 2 "School/Education", modify
        label def MMa5pesp5 3 "At work/training", modify
        label def MMa5pesp5 4 "Unemployed", modify
        label def MMa5pesp5 6 "Home duties", modify
        label def MMa5pesp5 7 "Other", modify
        label values MMa5pesp6 MMa5pesp6
        label def MMa5pesp6 1 "Pre-school", modify
        label def MMa5pesp6 2 "School/Education", modify
        label def MMa5pesp6 3 "At work/training", modify
        label def MMa5pesp6 6 "Home duties", modify
        label def MMa5pesp6 7 "Other", modify
        label values MMa5pesp7 MMa5pesp7
        label def MMa5pesp7 1 "Pre-school", modify
        label def MMa5pesp7 2 "School/Education", modify


        Person 2 is always missing their economic status as they are a baby.


        Comment


        • #5
          Contrary to any guess from #1 that an observation is an individual it seems that each observation is one household. From your output for

          Code:
           
          MMa5pesp1
          we infer that 2 means unemployed. That being so, you need to count instances of 2 across MMa5esp* and then divide by the number of people in the household. egen has a suitable row function for the counting.
          Last edited by Nick Cox; 15 Feb 2019, 07:31.

          Comment

          Working...
          X