Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create dummy variables for household groups based on characteristics of individuals in the household

    Hi,

    I need to create various dummy variables that would put households into groups based on characteristics of individuals in the household.

    One of the groups is a household which has at least one eligible son and one eligible daughter (I have already created dummy variables for individuals who are eligible where it equals one if the individual is eligible). I've tried using the egen function with the max option however have had no luck (for instance, I've tried by hhid: egen hld_typ4 = max(eligible_son & eligible_daughter) ) Have I simply got the syntax wrong? or is there any way?

    I've used the following FAQ as guidance (http://www.stata.com/support/faqs/da...ble-recording/), however it's not clear here how I would go about using the max option on two variables.

    Any help would be much appreciated!

    Thanks


  • #2
    Welcome to the Stata Forum / Statalist.

    Unfortunately, you didn't share data, hence it is more difficult to provide a helpful reply.

    That said, if I understood your query, you could use egen with 'group' instead of 'max'.

    This way, you'd get the combinations you mentioned in the first line.

    You cannot use By with group, but I gather it is not necessary in your case.

    Hopefully that helps.
    Best regards,

    Marcos

    Comment


    • #3
      I am unable to attach the data but hopefully the following table will give you an idea:

      HHID: household ID
      Line number: Individual entry within household
      Age: in years
      Sex: male or female
      eligible daughter: =1 if daughter and eligible (i.e. 11 years old)
      eligible son:=1 if son and eligible (i.e. 11 years old)
      HasEligible_daughter: all individuals in household assigned value 1 if at least one individual in the household is an eligible daughter
      HHID Line number Age Sex Eligible daughter Eligible son HasEligible_daughter
      1 1 40 M 0 0 1
      1 2 37 F 0 0 1
      1 3 11 F 1 0 1
      1 4 10 F 0 0 1
      2 1 45 M 0 0 1
      2 2 38 F 0 0 1
      2 3 11 M 0 1 1
      2 4 11 F 1 0 1
      3 1 55 M 0 0 1
      3 2 50 F 0 0 1
      3 3 11 F 1 0 1
      3 4 13 F 1 0 1
      I am trying to create dummy variables for the following groups of households:

      1. Households which have at least one eligible daughter: I've created a variable called HasEligible_daughter which equals one when at least one daughter in the household is eligible
      2. Households which have at least one eligible daughter AND one eligible age son. I have dummy variables for each (equalling one when eligible)
      3. Households which have at least one eligible daughter AND at least one 10year old daughter
      4. Households which have at least one eligible daughter AND at least one daughter between the ages of 12-14


      I think to do this I will need to use By? I used the following script to create the HasEligible dummy by hhid: egen HasEligible=max(eligible_daughter)

      Thanks,

      Sumayyah

      Comment


      • #4
        It seems to me that the next step is to create HasEligible_son in the same way that you created HasEligible_daughter, then
        Code:
        generate hid_typ4 = HasEligible_son & HasEligible_daughter
        or equivalently (since both variables are 0/1)
        Code:
        generate hid_typ4 = HasEligible_son==1 & HasEligible_daughter==1
        or again equivalently
        Code:
        generate hid_typ4 = min(HasEligible_son,HasEligible_daughter)
        The same approach should deal with your other problems.

        Comment


        • #5
          Thank you! I've created the various household types using the script above.

          By data now looks something like this: (included a column for years of schooling and three different household types)
          HHID Line no. Age Sex Years of schooling Eligible daughter Eligible son HasEligible_daughter Hld_typ1 Hld_typ2 Hld_typ3
          1 1 40 M 7 0 0 1 1
          1 2 37 F 7 0 0 1 1
          1 3 11 F 5 1 0 1 1
          1 4 10 F 4 0 0 1 1
          2 1 45 M 6 0 0 1 1
          2 2 38 F 4 0 0 1 1
          2 3 11 M 6 0 1 1 1
          2 4 11 F 5 1 0 1 1
          3 1 55 M 8 0 0 1 1
          3 2 50 F 4 0 0 1 1
          3 3 11 F 4 1 0 1 1
          3 4 13 F 4 1 0 1 1

          Now I want to do some analysis of individuals within different types of households. For instance, I want to calculate the mean number of years of schooling for all 11 year olds in household type 2. The example above only includes one household that fits this type but lets say there's around 50. How would I calculate that and do I need to use By?

          Thanks

          Comment


          • #6
            Actually , I've just realised how simple that is!

            How would I calculate the distribution of educational attainment within each household type? For instance, if I want to find out whether older siblings (say those aged 12-14) within a household have higher years of schooling than younger siblings (say those aged 10-11)? Or how it differs across gender within each household type?

            Comment


            • #7
              Sumayyah:
              - first of all, I think you can make -Hld_typ*- more efficient with -label-, creating a single variable instead of three:
              Code:
              replace Hld_typ1=2 if Hld_typ2==1
              replace Hld_typ1=3 if Hld_typ3==1
              rename Hld_typ1 Hld_typ_all
              label define Hld_typ_all 1 "Hld_typ1" 2 "Hld_typ2" 3 "Hld_typ3"
              label val Hld_typ_all Hld_typ_all
              drop Hld_typ2 Hld_typ3
              As far as your question is concerned, you may want to try:
              Code:
              tabstat Years_of_schooling if age==11 & Hld_typ_all==2, stat(count mean sd p50 min max)
              Last edited by Carlo Lazzaro; 04 Jun 2017, 08:45.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Sumayyah:
                you may want to try:
                Code:
                gen age_flag=0 if age<=11
                replace gen age_flag=1 if age>11 & age<=14
                label define age_flag 0 "younger_siblings" 1 "older_siblings"
                label val age_flag age_flag
                bysort Hld_typ_all: regress Years_of_schooling i.age_flag
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Thanks Carlo for providing a more efficient label. I think i will keep the variables seperate as it may be the case that households fall in to more than one type of household (e.g. there could be a household which has an eligible daughter, 10 year old and 14 year old and so would satisfy all three household types - unlikely but still theoretically possible).

                  Currently when I run "sum hld_typ*" it gives me the number of individuals that are in a household of that type. How do I check how many HOUSEHOLDS there are of each type?

                  Comment


                  • #10
                    Sumayyah:
                    if a family falls across two different levels of the categorical variable -Hld_typ_all-, you can simply input another value and then modify the -label- list.
                    That said, you may want to try:
                    Code:
                    egen flag=tag(HHID)
                    total flag, over(Hld_typ_all)
                    The following toy-example can hopefully help:
                    Code:
                    sysuse auto.dta
                    egen flag=tag(mpg)
                    total flag,over(foreign)
                    Last edited by Carlo Lazzaro; 10 Jun 2017, 10:47.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Thank you that's worked!

                      I have calculated average years of schooling for various age groups within each type of household. Is there a way to store the various means and present it in a publication quality table via Stata? A table say where the columns are different household types and the rows different age groups.

                      Comment


                      • #12
                        I have created a comparison group for each type of household and I now want to carry out a t-test for each household and it's corresponding comparison group. How do I do a t-test for specific individuals within each household type as opposed to the entire household? For instance, a t-test calculating the difference in average educational attainment for 10 year old girls in household type1 and the comparison household.

                        Comment


                        • #13
                          Sumayyah:
                          you may want to try:
                          Code:
                          replace hld_typ1=0 if hld_typ1==.
                          ttest yearsofschooling if age==10, by(hld_typ1) unequal
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment

                          Working...
                          X