Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Appropriate egen=rowmax() command

    Hello everybody,

    I'm quite new to Stata so sorry if this question might seem a little trivial to you but I just cannot seem to find a solution for the following problem:

    I'm working with a data set where each id = one household and the respondent is asked to give information about other household members such as age, gender (over the course of 8 variables for each information [kinship1-kinship8; age1-age etc]

    For the first step, I've generated the number of children in each household. The problem that I'm now facing is generating the age and gender of the oldest child in each household.

    I've looked into the egen=rowmax() function which seems like the way to go for generating the age.

    [egen maxagec=rowmax(age1-age8) if chil > 0] was my first attempt, now knowing that I'm simply generating the max age for households with children in them.

    My problem now is to find a way to get the oldest child of the household, not the oldest member of households with children.

    After that, I would have to find a way to get to the sex of the oldest child for which I have yet to find the appropriate egen function as well.

    I would really appreciate your help with this issue.

    Best regards

  • #2
    Such problems are usually much easier in long format. In fact the final analysis will probably require long format anyhow.

    Code:
    // prepare some example data
    clear
    input famid kinship1 kinship2 kinship3 kinship4 age1 age2 age3 age4 female1 female2 female3 female4
    1 1 2 3 3 41 42 12 10 0 1 1 1
    2 1 2 4 3 41 42 81 10 0 1 1 0
    end
    label define kinship 1 "hh head" ///
                         2 "spouse"  ///
                         3 "child"   ///
                         4 "parent"  ///
                         5 "other, family" ///
                         6 "other, not family"
    label value kinship* kinship
    
    label define female 0 "male" ///
                        1 "female"
    label value female* female
    
    // look at the example data
    list
    
    // turn this into long format
    reshape long kinship age female, i(famid) j(persid)
    list, sepby(famid) // look at the result
    
    // find the oldest child
    bys famid kinship (age) : gen idoc = persid[_N] if kinship == 3
    list, sepby(famid) // look at the result
    bys famid (idoc) : replace idoc = idoc[1]
    list, sepby(famid) // look at the result
    
    // use that id to get the age and sex of the oldest child
    sort famid persid
    by famid : gen int  ocage           = age[idoc]
    by famid : gen byte ocfemale:female = female[idoc]
    
    list, sepby(famid) // look at the result
    This solution works because reshape makes the persid variable a sequential number, so the third observation in a family has persid 3.

    This solution does not take into account the possibility that the oldest child is one of dizygotic twins, in which case you get children of the same age (assumed that age is measured in years and not minutes), which could have a different sex. In that case it is random which child is deemed to be oldest.
    Last edited by Maarten Buis; 23 May 2018, 05:38.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      See the concurrent thread https://www.statalist.org/forums/for...ariable-dhs-hr My contribution makes the point that this kind of analysis is often much easier with a long layout (some say structure or format) in which each family member defines their own observation. But the article cited there (which is freely accessible) covers other tricks too.

      With your structure, there isn't going to be a single egen function that does what you want unless you write the code yourself. But egen functions for row-wise calculations just hide loops over variables in any case.

      Let's imagine female1-female8 where 1 means female and 0 means male and age1-age8. are ages of the people in the household.

      Initialise variables for ages and gender.

      Code:
      gen age_o_c = .
      gen female_o_c = . 
      Now we loop over the members of the household. One criterion might be that children are 17 or younger. (Another criterion might be that the kinship variable indicates a child.) Warning: code not tested.

      Code:
      quietly forval j = 1/8 {
          replace age_o_c = max(age_o_c, age`j') if inrange(age`j', 0, 17)
          replace female_o_c = female`j' if age_o_c == age`j'
      }
      So, the loop is In pseudocode

      for each age:
      if it's 17 or younger and also bigger than any child's age seen so far, it is the new maximum seen so far
      update female variable if you changed your mind on who is the oldest child

      The paper cited in the linked thread explains that e.g. max(., 17) is returned as 17 so that initialising age of oldest child to missing is safe.

      Detail: What happens if there are two or more oldest children (twins, etc.) with the same age and different genders?

      If this is not enough of an answer, you may need to give a (realistic) data example, as we do always ask! (FAQ Advice #12).

      EDIT This was being written while Maarten was writing his but the replies don't contradict each other.

      Comment


      • #4
        Hey Maarten, hey Nick,

        Thanks for your quick responses!
        @Maarten: I've had some trouble following your Syntax which is probably why I got this output:
        Code:
        variable id does not uniquely identify the observations
            Your data are currently wide.  You are performing a reshape long.  You specified i(famid) and j(persid).  In the current wide form, variable famid should uniquely
            identify the observations."
        @Nick:
        Code:
        quietly forval j= 2/8 { 
          replace age_o_c=max(age_o_c, hh`j'age) if inrange(hh`j'age,0,99)
          replace male_o_c=hh`j'sex if age_o_c==hh`j'age
          
          }
        I adjusted the code to my actual variables and it would be perfect if
        (Another criterion might be that the kinship variable indicates a child.)
        I could somehow integrate that into the loop since I'm theoretically also interested in children at the age of 40 or 60 living at home. Is that possible?

        Comment


        • #5
          Happy to help further once you follow up on my previous request:

          you may need to give a (realistic) data example, as we do always ask! (FAQ Advice #12).

          Comment


          • #6
            Hi Nick,

            of course, I'm sorry for the late reply:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input int respid byte(hh2kin-hh3sex)
            65   1   1  56   3   1
            66 -10 -10 -10 -10 -10
            68   1   2  44   3   1
            69   2   2  29 -10 -10
            70   1   1  47   3   2
            71   2   2  29   3   1
            72 -10 -10 -10 -10 -10
            73   1   2  33   3   2
            74   1   1  36 -10 -10
            75   2   2  64 -10 -10
            76   2   1  40   3   1
            77   1   2  50   3   2
            78   2   2  38   3   1
            end
            label values hh2kin hh2kin
            label def hh2kin -10 "FILTER", modify
            label def hh2kin 1 "SPOUSE", modify
            label def hh2kin 2 "PARTNER>", modify
            label values hh2sex hh2sex
            label def hh2sex -10 "FILTER", modify
            label def hh2sex 1 "MALE", modify
            label def hh2sex 2 "FEMALE", modify
            label values hh2age hh2age
            label def hh2age -10 "FILTER", modify
            label values hh3kin hh3kin
            label def hh3kin -10 "FILTER", modify
            label def hh3kin 3 "BIO. CHILD", modify
            label values hh3sex hh3sex
            label def hh3sex -10 "FILTER", modify
            label def hh3sex 1 "MALE", modify
            label def hh3sex 2 "FEMALE", modify
            I hope that this is sufficient, let me know whether you'll need more cases

            Thanks again for your help!

            Comment


            • #7
              Originally posted by Lydia Thornton View Post
              @Maarten: I've had some trouble following your Syntax which is probably why I got this output:
              Code:
              variable id does not uniquely identify the observations
              Your data are currently wide. You are performing a reshape long. You specified i(famid) and j(persid). In the current wide form, variable famid should uniquely
              identify the observations."
              The code I gave you runs without error message. (You copy the code and paste it in the .do file editor, and then you do that .do-file.) So apparently you are doing something else. If you don't tell me what that something else is, then I obviously cannot tell you what is wrong about that and how to fix it.

              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                The code in #6 fails early on. It's not out of dataex. I see 6 variables but there should I think be 7 variables, respid and sex age kin for hh2 and hh3.

                Here is a guess at syntax

                Code:
                gen ocage = . 
                gen ocsex = . 
                
                forval j = 2/3 { 
                    replace ocage = max(ocage, hh`j'age) if hh`j'kin == 3 
                    replace ocsex = hh`j'sex if ocage == hh`j'age & hh`j'kin == 3 
                } 
                
                list
                Twins???

                Comment

                Working...
                X