Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • inspecting and comparing same variable content in relation to other variable

    Please am new to stata and need help in extracting wrong inputs in a dataset.
    The data set contains information of households members for example age, marital status, relationship to head of household, gender, a unique reference number etc and are represented with numerical codes as indicated in the table below;
    From the table, notice that the ages of some children in a households are higher than that of their parent(Head and Spouse).
    I need a syntax that can extract or list all households with such abnormal ages where;
    1. The age of a child or grandchild is above that of the head or spouse
    2. The age difference between Spouse(mother) and child is not up to child bearing age
    3. The age difference between Head(Father) and child is not up to child bearing age
    .
    The sample data set below are from 3 different households, 2 from same County and one from a different county
    ref num of first household = Jon/home/1
    ref num of second household = Mon/arm/1
    ref num of third household = Can/arm/2
    Thanks for your kind assistance

    ref num name relationship relationship code gender Age
    Jon/home/1/01 John Gore Head 1 male 56
    Jon/home/1/02 Susan Gore Spouse 2 female 53
    Jon/home/1/03 George Gore child 3 male 58
    Jon/home/1/04 Mark Gore child 3 male 49
    Jon/home/1/05 Susan Gore Grandchild 4 female 16
    Jon/home/1/06 Mathew Gore child 3 male 20
    Mon/arm/1/01 Mary Robert Head 1 female 30
    Mon/arm/1/02 David Robert child 3 male 25
    Mon/arm/1/03 Fidel Robert child 3 male 10
    Can/arm/2/01 Okon David Head 1 male 45
    Can/arm/2/02 Mike David child 3 male 17
    Can/arm/2/03 Martha David child 3 female 47

  • #2
    Some of your conditions do not automatically signal erroneous observations. For example, some people have spouses younger than their children. Secondly, you assume that the spouse is the mother of the children living within a household or the head is the father, which realistically is not always the case (#1 implies 2 here). In any case, here is one way to tag observations that satisfy your conditions.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str13 refnum str12 name str10 relationship byte relationshipcode str6 gender byte age
    "Jon/home/1/01" "John Gore"    "Head"       1 "male"   56
    "Jon/home/1/02" "Susan Gore"   "Spouse"     2 "female" 53
    "Jon/home/1/03" "George Gore"  "child"      3 "male"   58
    "Jon/home/1/04" "Mark Gore"    "child"      3 "male"   49
    "Jon/home/1/05" "Susan Gore"   "Grandchild" 4 "female" 16
    "Jon/home/1/06" "Mathew Gore"  "child"      3 "male"   20
    "Mon/arm/1/01"  "Mary Robert"  "Head"       1 "female" 30
    "Mon/arm/1/02"  "David Robert" "child"      3 "male"   25
    "Mon/arm/1/03"  "Fidel Robert" "child"      3 "male"   10
    "Can/arm/2/01"  "Okon David"   "Head"       1 "male"   45
    "Can/arm/2/02"  "Mike David"   "child"      3 "male"   17
    "Can/arm/2/03"  "Martha David" "child"      3 "female" 47
    end
    
    *#1. The age of a child or grandchild is above that of the head or spouse
    bys refnum: egen tag= min(age) if inlist(relationshipcode, 1, 2)
    bys refnum: egen lowage= max(tag)
    gen problematic1= age>=lowage if inlist(relationshipcode, 3, 4)
    
    *# 2. The age difference between Spouse(mother) and child is not up to child bearing age
    *(ASSUMING CHILDBEARING AGE IS AGE>=12)
    bys refnum: gen tag2= age if relationshipcode==2
    bys refnum: egen spouseage= max(tag2)
    gen problematic2= (spouseage-age) <12 if relationshipcode==3
    
    *#3. The age difference between Head(Father) and child is not up to child bearing age.
    *(I SET AGE AT 12 BUT YOU ARE THE EXPERT HERE; REPLACE OTHERWISE)
    bys refnum: gen tag3= age if relationshipcode==1
    bys refnum: egen hhage= max(tag3)
    gen problematic3= (hhage-age) <12 if relationshipcode==3

    ADDED IN EDIT: Running the code, it is apparent that refunm is not the unique household identifier. Replace this variable in the code with the household id.
    Last edited by Andrew Musau; 02 Aug 2019, 07:56.

    Comment


    • #3
      Thanks a lot Andrew, it worked!
      In situation where spouses are younger than children, that will be adopted or step children, there's a different relationshipcode for stepchildren.
      Grateful if you can also help with syntax to identify all households with duplicate Heads.
      Very grateful. Thanks

      Comment


      • #4
        Code:
        bys hhid (relationshipcode): egen wanted= max(relationshipcode==1 & relationshipcode[_n+1]==1)
        tab hhid if wanted
        where "hhid" is the household id. See also

        Code:
        help duplicates

        Comment


        • #5
          Thanks a lot, Andrew, very much appreciated

          Comment


          • #6
            Here's a way to rewrite the excellent code of Andrew Musau

            Code:
            *#1. The age of a child or grandchild is above that of the head or spouse
            bys refnum: egen lowage = min(cond(inlist(relationshipcode, 1, 2), age, .)) 
            gen problematic1 = age>=lowage if inlist(relationshipcode, 3, 4)
            
            *# 2. The age difference between Spouse(mother) and child is not up to child bearing age
            *(ASSUMING CHILDBEARING AGE IS AGE >= 12)
            bys refnum: egen spouseage = max(cond(relationshipcode==2, age, .)) 
            gen problematic2 = (spouseage - age) < 12 if relationshipcode==3
            
            *#3. The age difference between Head(Father) and child is not up to child bearing age.
            *(I SET AGE AT 12 BUT YOU ARE THE EXPERT HERE; REPLACE OTHERWISE)
            bys refnum: egen hhage = max(cond(relationshipcode==1, age, .)) 
            gen problematic3 = (hhage - age) < 12 if relationshipcode==3
            For more discussion, see Section 9 in https://www.stata-journal.com/articl...article=dm0055

            Comment


            • #7
              Thanks Andrew and Nick, please I need further help.
              Using the same data set/variables I would like to find out(1) households where a male head of household is married to a male spouse
              and where a female head of household is married to a female spouse.(That is to say within a household relationshipcode =1,gender=male, and relationshipcode(spouse)=2 and gender=male)
              (2) Household where the head is widowed(male or female) but still indicated to have a spouse.(That is households where relationshipcode=1 marital statuscode(widowed)=4 but there's a spouse(relationshipcode 2) within the household.
              Marital status represented in codes; married =1,seperated=2,divorced=3,widowed=4,not married=5,others=6
              ref num name relationship relationship code gender Age maritalstatuscode
              Jon/home/1/01 John Gore Head 1 male 56 1
              Jon/home/1/02 Susan Gore Spouse 2 male 53 1
              Jon/home/1/03 George Gore child 3 male 58 5
              Jon/home/1/04 Mark Gore child 3 male 49 1
              Jon/home/1/05 Susan Gore Grandchild 4 female 16 5
              Jon/home/1/06 Mathew Gore child 3 male 20 6
              Mon/arm/1/01 Mary Robert Head 1 female 30 4
              Mon/arm/1/02 David Robert Spouse 2 male 25 1
              Mon/arm/1/03 Fidel Robert child 3 male 10 5
              Can/arm/2/01 Okon David Head 1 male 45 1
              Can/arm/2/02 Mike David child 3 male 17 5
              Can/arm/2/03 Martha David child 2 female 47 4

              Comment


              • #8
                Also like to know how to list/identify single names from a data set containing first names and surnames.
                …..that is where a name is surposed to be "john smith" but we have only "john" and no other name.

                Comment


                • #9
                  (1) households where a male head of household is married to a male spouse
                  and where a female head of household is married to a female spouse.(That is to say within a household relationshipcode =1,gender=male, and relationshipcode(spouse)=2 and gender=male)
                  This assumes that there are no duplicate heads in a household. Again, -hhid- below refers to the household identifier.

                  Code:
                  bys hhid (relationshipcode): egen wanted1= max(cond(relationshipcode==2 & gender=="female" ///
                  & maritalstatuscode ==1 &relationshipcode[_n+1]==2 & gender[_n+1]=="female" & ///
                  maritalstatuscode[_n+1]==1|relationshipcode==1 & gender=="male" & maritalstatuscode ==1 ///
                  &relationshipcode[_n+1]==2 & gender[_n+1]=="male" & maritalstatuscode[_n+1]==1,1, .))
                  (2) Household where the head is widowed(male or female) but still indicated to have a spouse.(That is households where relationshipcode=1 marital statuscode(widowed)=4 but there's a spouse(relationshipcode 2) within the household.
                  Code:
                  bys hhid (relationshipcode): egen wanted2 = max(cond(relationshipcode==1 & maritalstatuscode ==4 ///
                  &relationshipcode[_n+1]==2,1, .))
                  Also like to know how to list/identify single names from a data set containing first names and surnames.
                  …..that is where a name is surposed to be "john smith" but we have only "john" and no other name.
                  Code:
                  gen firstname = word(name, 1)
                  list if name==firstname
                  or more easily

                  Code:
                  list name if wordcount(name)==1
                  In future, please use dataex to present data examples. This makes it easier both for you in terms of getting a direct solution and for those who want to respond in terms trying out code. (For example, for a variable with value labels, using dataex makes it unnecessary for you to explain what values relate to what categories).
                  Last edited by Andrew Musau; 26 Aug 2019, 06:11.

                  Comment


                  • #10
                    Thanks so much for all your help. Truly appreciated

                    2) Household where the head is widowed(male or female) but still indicated to have a spouse.(That is households where relationshipcode=1 marital statuscode(widowed)=4 but there's a spouse(relationshipcode 2) within the household.
                    bys hhid (relationshipcode): egen wanted2 = max(cond(relationshipcode==1 & maritalstatuscode ==4 ///
                    &relationshipcode[_n+1]==2,1, .))
                    The above command did not give desired results, perhaps I didn't apply it properly. Grateful if you can give further explaination.

                    Comment


                    • #11
                      reshape command revealed that a variable that should be same is not the same value within same hhid . example variable typeofroof not constant within hhid,
                      since all household members should have same type of roof,
                      Please is there a command I can use to correct the errors?
                      Perhaps so that the value of typeofroof for the first housemember listed in the household is used to be same for all household member.
                      Thanks for your help.

                      Comment


                      • #12
                        If you can give a data example which shows that the code fails, I can find out what is wrong. I assume that the spouse is sorted second in order after the head of the household, hence

                        Code:
                        &relationshipcode[_n+1]==2

                        Comment


                        • #13
                          name sex relationship age maritalstatus
                          christiana egor
                          2 1(head) 78 4 (widowed)
                          sunny ego
                          1 2(spouse) 57 1 (married)
                          emmanuel ego
                          1 3(child) 32 5 (never married)
                          blessing ego
                          2 3(child) 30 5 (never married)
                          comfort egbe
                          2 3(child) 31 5 (never married)
                          ferdinand amon
                          1 3(child) 28 5 (never married)
                          example is the data set above, note that the head of household is a widow but has a spouse.

                          Comment


                          • #14
                            Again, please read the FAQs, especially Section 12.2 on how to present data examples using dataex. After fixing your data example, I don't find evidence that the code does not work.

                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input str15 name byte sex str9 relationship byte age str17 maritalstatus
                            "christiana egor" 2 "1(head)"   78 "4 (widowed)"      
                            "sunny ego"       1 "2(spouse)" 57 "1 (married)"      
                            "emmanuel ego"    1 "3(child)"  32 "5 (never married)"
                            "blessing ego"    2 "3(child)"  30 "5 (never married)"
                            "comfort egbe"    2 "3(child)"  31 "5 (never married)"
                            "ferdinand amon"  1 "3(child)"  28 "5 (never married)"
                            end
                            
                            gen relationshipcode= real(substr(relationship, 1, 1))
                            gen maritalstatuscode= real(substr(maritalstatus, 1, 1))
                            gen hhid=1
                            bys hhid (relationshipcode): egen wanted2 = max(cond(relationshipcode==1 & maritalstatuscode ==4 &relationshipcode[_n+1]==2,1, .))
                            The household is identified as the variable wanted2=1.

                            Res.:

                            Code:
                            . l hhid name sex relationshipcode maritalstatuscode wanted2, sep(6)
                            
                                 +--------------------------------------------------------------+
                                 | hhid              name   sex   relati~e   marita~e   wanted2 |
                                 |--------------------------------------------------------------|
                              1. |    1   christiana egor     2          1          4         1 |
                              2. |    1         sunny ego     1          2          1         1 |
                              3. |    1      emmanuel ego     1          3          5         1 |
                              4. |    1      blessing ego     2          3          5         1 |
                              5. |    1      comfort egbe     2          3          5         1 |
                              6. |    1    ferdinand amon     1          3          5         1 |
                                 +--------------------------------------------------------------+

                            Comment


                            • #15
                              It worked! initially I didn't destring before sending the command, Thanks again.

                              Comment

                              Working...
                              X