Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching characteristics of children with mothers in the same household when there are more than one mother in the household

    Hello all,

    Using household survey data I am trying to match the child's age with the mother's characteristics where the mother's age and her children's ages are contained in the same 'Age' variable.
    Key information to consider is that in one household, there can be more than one child to a mother and in some cases, there are more than one mother (and her children) in a household (these are households with multiple families).

    There is a household ID used to identify members of a household and a unique person ID for each individual in the household. Each individual also has a variable which can be used to identify whether their mother is resident in the household.

    I have tried to create separate datasets where one contains information on mothers and the second contains information on children and then merging this by creating the same unique identifier for the mother and child however the mother's characteristics replicates depending on the number of children she has (this incorrectly increases the sample size). This method also does not account for having more than one mother in a household and requires that all individuals except for mothers and children be removed from the dataset.

    Essentially, I want to calculate the mother's age at birth by subtracting her child's age from her age but in order to do this I need to:
    1. Match mothers with their children (in cases where there are more than one child and mother than one mother and children set in a household)
    2. Create a variable where the child's age is on the same row as the mother's characteristics. This needs to be done for each child and each mother in the household.

    Any suggestions?

    Here is an example of the data - childage1 and childage2 are variables I want to create using information from the Age variable


    clear
    input HHID PID UQNR AGE Gender motherinhh PIDofmother Childage1 Childage2
    001 01 00101 45 M N
    001 02 00102 43 F N 12
    001 03 00103 12 F Y 02

    002 01 00201 39 F N 15 09
    002 02 00202 34 M N
    002 03 00203 15 M Y 01
    002 04 00204 09 M Y 01


    003 01 00301 75 F N 45
    003 02 00302 45 F Y 01 17 10
    003 03 00303 17 M Y 02
    003 04 00304 10 F N 02
    003 05 00305 38 F Y 09 06
    003 06 00306 09 M Y 05
    003 07 00307 06 M Y 05
    end


    Note:
    All variables are numeric
    HHID = household ID
    PID = person ID
    UQNR = unique person number
    Gender = gender where M = male and F = female
    motherinhh = mother in household where Y = yes N = no
    PIDof mother = PID of mother (extracted from the PID variable)
    Childage1 = child age 1 (extracted from the Age variable)


    Many thanks for any assistance received.
    Reesha

  • #2
    Two points
    1. Please use dataex to present a data example. Because you don't add quotation marks around your strings, your example data cannot be used without modifications.
    2. You still have to clarify how to resolve the issue of two mothers per household. How do you identify which child belongs to which mother?

    Comment


    • #3
      Dear Andrew,

      Apologies for not using dataex. Here is an example of the data:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str13 uqnr str2 person str15 identify byte Gender int Age byte(Q14mpart Q14mpsnn)
      "1011001002201" "01" "101100100220101" 2 46 8 88
      "1011001002201" "02" "101100100220102" 2 25 1  1
      "1011001002201" "03" "101100100220103" 1 24 1  1
      "1011001002201" "04" "101100100220104" 1 21 1  1
      "1011001004901" "01" "101100100490101" 1 38 2 88
      "1011001004901" "02" "101100100490102" 2 38 8 88
      "1011001004901" "03" "101100100490103" 2 17 1  2
      "1011001004901" "04" "101100100490104" 1 13 1  2
      "1011001007501" "01" "101100100750101" 2 38 2 88
      "1011001007501" "02" "101100100750102" 2 10 1  1
      "1011001007501" "03" "101100100750103" 2  9 1  1
      "1011001007501" "04" "101100100750104" 2  6 1  1
      "1011001007502" "01" "101100100750201" 2 36 2 88
      "1011001007502" "02" "101100100750202" 2  9 2 88
      "1011001010101" "01" "101100101010101" 2 56 2 88
      "1011001012801" "01" "101100101280101" 1 49 8 88
      "1011001012801" "02" "101100101280102" 2 38 8 88
      "1011001012801" "03" "101100101280103" 1 16 1  2
      "1011001012801" "04" "101100101280104" 2 13 1  2
      "1011001015401" "01" "101100101540101" 1 75 8 88
      "1011001015401" "02" "101100101540102" 2 74 8 88
      end
      label values Gender Gender
      label def Gender 1 "Male", modify
      label def Gender 2 "Female", modify
      label values Q14mpart Q14mpart
      label def Q14mpart 1 "Yes", modify
      label def Q14mpart 2 "No", modify
      label def Q14mpart 8 "Not applicable", modify
      label values Q14mpsnn Q14mpsnn
      label def Q14mpsnn 88 "Not applicable", modify


      The variable Q14mpart identifies where the mother is resident in the household while Q14mpsnn highlights the mother's person number (used to identify her in the household). Using this variable, I am able to match mothers to their children, especially in cases where there is more than one mother in the household. Unfortunately, the above example does not contain more than one mother in a household.

      Thanks.

      Comment


      • #4
        Essentially, I want to calculate the mother's age at birth by subtracting her child's age from her age

        Thanks for the data example. You will need a bit back and forth between data sets, but it should be nothing too complicated. I have modified your example slightly to include multiple mothers per household, but this should not matter for your end goal.


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str13 uqnr str2 person str15 identify byte Gender int Age byte(Q14mpart Q14mpsnn)
        "1011001002201" "01" "101100100220101" 2 46 8 88
        "1011001002201" "02" "101100100220102" 2 25 1  1
        "1011001002201" "03" "101100100220103" 1 24 1  1
        "1011001002201" "04" "101100100220104" 1 21 1  1
        "1011001002201" "05" "101100100220111" 2 39 8 88
        "1011001002201" "06" "101100100220112" 2 15 1  5
        "1011001002201" "07" "101100100220113" 1 17 1  5
        "1011001004901" "01" "101100100490101" 1 38 2 88
        "1011001004901" "02" "101100100490102" 2 38 8 88
        "1011001004901" "03" "101100100490103" 2 17 1  2
        "1011001004901" "04" "101100100490104" 1 13 1  2
        "1011001007501" "01" "101100100750101" 2 38 2 88
        "1011001007501" "02" "101100100750102" 2 10 1  1
        "1011001007501" "03" "101100100750103" 2  9 1  1
        "1011001007501" "04" "101100100750104" 2  6 1  1
        "1011001007502" "01" "101100100750201" 2 36 2 88
        "1011001007502" "02" "101100100750202" 2  9 2 88
        "1011001010101" "01" "101100101010101" 2 56 2 88
        "1011001012801" "01" "101100101280101" 1 49 8 88
        "1011001012801" "02" "101100101280102" 2 38 8 88
        "1011001012801" "03" "101100101280103" 1 16 1  2
        "1011001012801" "04" "101100101280104" 2 13 1  2
        "1011001015401" "01" "101100101540101" 1 75 8 88
        "1011001015401" "02" "101100101540102" 2 74 8 88
        end
        label values Gender Gender
        label def Gender 1 "Male", modify
        label def Gender 2 "Female", modify
        label values Q14mpart Q14mpart
        label def Q14mpart 1 "Yes", modify
        label def Q14mpart 2 "No", modify
        label def Q14mpart 8 "Not applicable", modify
        label values Q14mpsnn Q14mpsnn
        label def Q14mpsnn 88 "Not applicable", modify
        So in summary, create a children's data set, retrieve mother's age and merge back with children's data set. Then finally, append with the full data set excluding children's observations.


        Code:
        *ENCODE STRINGS
        foreach var in uqnr person identify{
        encode `var', gen (r`var') 
         }
        
        *SAVE FULL DATA SET IN A TEMPORARY FILE
        tempfile data
        save `data'
        
        *CREATE CHILDREN'S DATA SET
        gen child= Q14mpsnn!= 88
        gen SHHID= string(ruqnr) + string(Q14mpsnn)
        tempfile children
        save `children'
        
        
        *CREATE DATA SET WITH MOTHERS' AGES
        use `data'
        gen SHHID= string(ruqnr)+string(rperson) if Q14mpsnn ==88 & Gender==2
        keep SHHID Age 
        rename Age Mage
        contract SHHID Mage, nomiss
        drop _freq
        tempfile mage
        save `mage'
        merge 1:m SHHID using `children'
        drop _merge
        keep if child==1
        tempfile mcage
        save `mcage'
        
        *REVERT TO FULL DATASET WITH MOTHERS' AGES INCLUDED
        use `children'
        drop if child==1
        gen Mage=.
        append using `mcage'
        sort ruqnr rperson
        list uqnr person Gender Age Mage child , sepby(ruqnr)
        Code:
        . list uqnr person Gender Age Mage child , sepby(ruqnr)
        
             +------------------------------------------------------+
             |          uqnr   person   Gender   Age   Mage   child |
             |------------------------------------------------------|
          1. | 1011001002201       01   Female    46      .       0 |
          2. | 1011001002201       02   Female    25     46       1 |
          3. | 1011001002201       03     Male    24     46       1 |
          4. | 1011001002201       04     Male    21     46       1 |
          5. | 1011001002201       05   Female    39      .       0 |
          6. | 1011001002201       06   Female    15     39       1 |
          7. | 1011001002201       07     Male    17     39       1 |
             |------------------------------------------------------|
          8. | 1011001004901       01     Male    38      .       0 |
          9. | 1011001004901       02   Female    38      .       0 |
         10. | 1011001004901       03   Female    17     38       1 |
         11. | 1011001004901       04     Male    13     38       1 |
             |------------------------------------------------------|
         12. | 1011001007501       01   Female    38      .       0 |
         13. | 1011001007501       02   Female    10     38       1 |
         14. | 1011001007501       03   Female     9     38       1 |
         15. | 1011001007501       04   Female     6     38       1 |
             |------------------------------------------------------|
         16. | 1011001007502       01   Female    36      .       0 |
         17. | 1011001007502       02   Female     9      .       0 |
             |------------------------------------------------------|
         18. | 1011001010101       01   Female    56      .       0 |
             |------------------------------------------------------|
         19. | 1011001012801       01     Male    49      .       0 |
         20. | 1011001012801       02   Female    38      .       0 |
         21. | 1011001012801       03     Male    16     38       1 |
         22. | 1011001012801       04   Female    13     38       1 |
             |------------------------------------------------------|
         23. | 1011001015401       01     Male    75      .       0 |
         24. | 1011001015401       02   Female    74      .       0 |
             +------------------------------------------------------+
        "Mage" is the mother's age. From here, you can do the subtraction.

        Comment


        • #5
          Dear Andrew,

          Thank you for your response, I really appreciate it. But as mentioned in my first post, I need for the mother to take on the ages of her children - so the age of each child needs to be on the same row as the mother's characteristics. This would ideally be presented in a new variable for each child's age. For example: childage1, childage2, ...., childage12.

          The code provided above results in mothers being counted more than once if she has more than one child and using a onememb variable does not work in cases with multiple mothers in a household.

          I have tried a few ways but cannot figure out how to stop over counting mothers who have more than one child and under counting when there is more than one mother in a household.

          Thank you very much once again.

          Comment


          • #6
            To get to a single observation per mother with separate variables for each child's age, use -reshape wide-. Read -help reshape-.

            That said, be warned that arranging the data that way is usually a bad idea in Stata. Most Stata data management and analylsis commands work best (or only) with the data in the long layout that Andrew's code gives you. There are some exceptions, and perhaps what you plan to do next is among them. But unless you are pretty sure that you are going to do something that really requires the wide layout, you are best advised to leave your data long, as they are.

            Comment


            • #7
              Hello Clyde,

              Thank you for this.

              All of this preparation is towards the development of an aged_mom variable where 1 = mothers who gave birth between ages 30-49 years and 0 = other.
              In order to calculate the mother's age at birth for all of her births (to determine whether at least one birth occurred between ages 30-49 years), I would need for the children's ages to be on the same row as the mother's characteristics.

              I understand that reshaping is not the best option and that is why I am looking for an alternative way of creating the aged_mom variable or trying to ascertain whether it is even possible to create this variable given the structure of the data.

              I have been trying various options for months and have still not come across a suitable method.

              Comment


              • #8
                In order to calculate the mother's age at birth for all of her births (to determine whether at least one birth occurred between ages 30-49 years), I would need for the children's ages to be on the same row as the mother's characteristics.
                No, that is not true at all. In fact, it will be harder to do it that way. If you leave the data in long layout, you can very easily identify mothers who gave birth between ages 30 and 49 years:

                Code:
                gen age_at_birth = Mage - Age
                by uqnr, sort: egen gave_birth_30_to_49 = max(inrange(age_at_birth, 30, 49))

                Comment


                • #9
                  I agree with Clyde's remarks. Given that "uqnr" is the household-identifier, his code in #8 needs to be changed slightly to accomodate multiple mothers per household. A sub-household identifier is therefore required when specifying the age condition (i.e., a modified SHHID from my code in #4). This will be created also for single-unit households (for the reason that "every set is a subset of itself").

                  Code:
                  gen mother= Q14mpsnn ==88 & Gender==2
                  gen SHHID2=SHHID
                  replace SHHID2=  subinstr(SHHID, "88", "", .)+ string(rperson) if mother
                  gen age_at_birth = Mage - Age
                  *CONSIDER MOTHERS WHO GAVE BIRTH BETWEEN 18-21 YRS. (COMPARE FOR THE FIRST HOUSEHOLD)
                  by SHHID2, sort: egen gave_birth_18_to_21 = max(inrange(age_at_birth, 18, 21))
                  sort uqnr person
                  To count how many mothers satisfy this condition

                  Code:
                  count if gave_birth_18_to_21 & mother


                  Comment


                  • #10
                    Hello,

                    Thank you very much for the clarification and codes.

                    I have spent some time understanding the code and then attempted to execute it however I get the following message when the loop in post #4 executes.

                    too many values
                    r(134);


                    --------------------------------------------------------------------------------
                    search for r(134) (manual: [R] search)
                    --------------------------------------------------------------------------------

                    Search of official help files, FAQs, Examples, SJs, and STBs

                    [P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 134
                    too many values;
                    1) You attempted to encode a string variable that takes on
                    more than 65,536 unique values. 2) You attempted to tabulate
                    a variable or pair of variables that take on too many values.
                    If you specified two variables, try interchanging them.
                    3) You issued a graph command using the by option. The
                    by-variable takes on too many different values to construct
                    a readable chart.

                    (end of search)


                    The dataset has 102 359 variables, is there a way to deal with this?
                    If I drop adult males from the dataset I am left with 73 909 observations and thus still get the same error when I run the code. I cannot shrink the dataset anymore because I need mothers and children.

                    Thank you for all the assistance.

                    Comment


                    • #11
                      The issue is not with the number of variables, it's the number of distinct values of uqnr, person, and identify. Getting rid of the adult males certainly makes sense as they contribute no information for the current problem.

                      But with 73,909 remaining observations, you are still going to encounter the limit of 65,536 unique values with -encode-. Fortunately, there is an easy workaround. The only purpose of -encode- here is to create numeric variables in one-one correspondence with these string varibles. So you can do that with -egen-'s -group()- function, which has no limit on the number of distinct values it can handle.

                      Code:
                      *ENCODE STRINGS
                      foreach var in uqnr person identify{
                      //    encode `var', gen (r`var')  REPLACE THIS WITH THE NEXT LINE
                            egen long r`var' = group(`var')
                      }

                      Comment


                      • #12
                        Hello,

                        Thank you Cylde, your code in post #11 worked wonderfully.

                        However, (I am really sorry about this) I am having more problems with the code not running.

                        I receive the following error message when I try to run the first code - use `data' - under the Create dataset with mother's ages' section - post #4.

                        . use `data'
                        invalid file specification
                        r(198);

                        end of do-file




                        search for r(198) (manual: [R] search)
                        --------------------------------------------------------------------------------

                        Search of official help files, FAQs, Examples, SJs, and STBs

                        [P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 198
                        invalid syntax;
                        option __________ incorrectly specified;
                        option __________ not allowed;
                        __________ invalid;
                        range invalid;
                        __________ invalid obs no;
                        invalid filename;
                        __________ invalid varname;
                        __________ invalid name;
                        multiple by's not allowed;
                        __________ found where number expected;
                        on or off required;
                        All items in this list indicate invalid syntax. These errors
                        are often, but not always, due to typographical errors. Stata
                        attempts to provide you with as much information as it can.
                        Review the syntax diagram for the designated command.
                        In giving the message "invalid syntax", Stata is not helpful.
                        Errors in specifying expressions often result in this message.

                        (end of search)


                        The other codes before this one runs well. Please advise as I am not too sure of how to deal with this error.

                        Thank you.

                        Comment


                        • #13
                          If you run the code in a do file, you should select and run the full code from the start. Alternatively, you can run the code directly in the command window.

                          Comment


                          • #14
                            Andrew is absolutely correct. Whenever code includes local macros (and tempfiles are local macros), you must run the entire code from start to finish in one fell swoop. If you run a block of code that defines a local macro, the definition of the local macro "expires" after the block runs. (And if the local macro is a tempfile, the actual file itself gets erased.) If you then try to refer to that same local macro in code run separately, the local macro will be undefined and the code will fail (in various ways depending on the particular command that the undefined macro is found in).

                            So, load up the code in #4 into the do-file editor and run the whole thing. Do not select parts of it to run a little at a time.

                            Comment


                            • #15
                              Dear Clyde and Andrew,

                              Thank you very much for your assistance - this is my first time using temp files so there is still a lot for me to learn. Thank you for your patience.

                              I ran the code in post #4 as a whole but received the following error:

                              . merge 1:m SHHID using `children'
                              variable SHHID does not uniquely identify observations in the master data
                              r(459);

                              end of do-file



                              search for r(459) (manual: [R] search)
                              --------------------------------------------------------------------------------

                              Search of official help files, FAQs, Examples, SJs, and STBs

                              [P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 459
                              something that should be true of your data is not;
                              data have changed since estimation;
                              This is the generic form of this message; more likely, you
                              will see messages such as "y must be between 0 and 1" or
                              "x not positive". You have attempted to do something that,
                              given your data, does not make sense.

                              (end of search)


                              The error was from the 9th line of the code below:

                              *CREATE DATA SET WITH MOTHERS' AGES
                              use `data'
                              gen SHHID= string(ruqnr)+string(rperson) if Q14mpsnn ==88 & Gender==2
                              keep SHHID Age
                              rename Age Mage
                              contract SHHID Mage, nomiss
                              drop _freq
                              tempfile mage
                              save `mage'
                              merge 1:m SHHID using `children'
                              drop _merge
                              keep if child==1
                              tempfile mcage
                              save `mcage'


                              I am very grateful for all the assistance.

                              Comment

                              Working...
                              X