Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with creating sample of couples

    Dear Statalister,

    I encounter a problem when I try to create a sample of couples over 2 years (2013 and 2015). I restrict to couples who: 1) both worked in 2013; 2) oberserved in both 2013 & 2015; 3) aged above 25 in 2013, and 4) had relationship as householdhead and spouse
    (id refers to individual id, hhid refers to household id)

    Code:
    Code:
    use fulldataset, replace
    keep if worked_2013==1
      
    duplicates tag id2013 if (year==2013 | year==2015), gen(dup1315)
    gen d1315=(dup1315==1)
    keep if d1315==1
    
    keep if age2013>=25
    
    gen temp = 1 if relationship ==1
    replace temp = 2 if relationship == 2
    egen couple = total(temp) if (relationship ==1 | relationship == 2), by (hhid2013 year)
    keep if couple ==3
    When I change the order of restriction [for example 1) aged above 25 in 2013 ; 2) had relationship as household and spouse ; .....], I got different number of couples.
    I tried different way but I could not get a sample of couples with equal male and female, equal household head and spouse, and equal numbers of individuals observed in 2013 and 2015.
    I guess my codes might be wrong somewhere.
    I would appreciate any advice on this.

    Thanks in advance.

  • #2
    Your conditions are somewhat complicated and will require multiple lines of code to implement. Without example data to develop it in and test it on, it is likely that what I try to do in my head will contain mistakes. So please post back and provide example data. Make sure your example includes some households that include a couple meeting all the criteria, and also some that don't. Use the -dataex- command to show your example so that it will be usable for present purposes.

    If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Dear Clyde,
      I would like to post herewith my code and my example data

      Code:
      Code:
      use fulldataset, replace
      keep if worked2013==1    
      
      duplicates tag id2013 if (year==2013 | year==2015), gen(dup1315)
      gen d1315=(dup1315==1)
      keep if d1315==1  
      
      keep if age2013>=25  
      
      gen temp = 1 if relationship ==1
      replace temp = 2 if relationship == 2
      egen couple = total(temp) if (relationship ==1 | relationship == 2), by (hhid2013 year)
      keep if couple ==3
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long(id2013 hhid2013 id2015 hhid2015) int year byte(sex age) int age2013 str15 relationship byte(worked worked2013)
      31400903 314009 30141703 301417 2013 1 19 19 "children"        0 0
      31400903 314009 30141703 301417 2015 1 21 19 "children"        0 0
      31400904 314009 30141704 301417 2013 0 10 10 "children"        0 0
      31400904 314009 30141704 301417 2015 0 12 10 "children"        0 0
      31400905 314009 30141705 301417 2013 0 82 82 "parents"         0 0
      31400905 314009 30141705 301417 2015 0 84 82 "children in law" 0 0
      31400906 314009 30141706 301417 2013 0 59 59 "sister/brother"  0 0
      31400906 314009 30141706 301417 2015 0 61 59 "grandparent"     0 0
      31401001 314010        .      . 2013 0 38 40 "household head"  0 0
      31401002 314010        .      . 2013 1 28 30 "sister/brother"  1 1
      31401003 314010        .      . 2013 0 26 28 "sister/brother"  1 1
      31401004 314010        .      . 2013 0 13 15 "children"        0 0
      31401101 314011 30140601 301406 2013 1 44 44 "household head"  1 1
      31401101 314011 30140601 301406 2015 1 46 44 "household head"  1 1
      31401102 314011 30140602 301406 2013 0 45 45 "spouse"          1 1
      31401102 314011 30140602 301406 2015 0 47 45 "spouse"          1 1
      31401103 314011        . 301406 2013 1 21 23 "children"        1 1
      31401104 314011 30140603 301406 2013 1 10 10 "children"        0 0
      31401104 314011 30140603 301406 2015 1 12 10 "children"        0 0
      31401201 314012 30140701 301407 2013 1 45 45 "household head"  1 1
      31401201 314012 30140701 301407 2015 1 47 45 "household head"  1 1
      31401202 314012 30140702 301407 2013 0 37 37 "spouse"          0 0
      31401202 314012 30140702 301407 2015 0 39 37 "spouse"          1 0
      31401203 314012 30140703 301407 2013 1  7  7 "children"        0 0
      31401203 314012 30140703 301407 2015 1  9  7 "children"        0 0
      31401204 314012 30140704 301407 2013 1  1  1 "children"        0 0
      31401204 314012 30140704 301407 2015 1  3  1 "children"        0 0
      31401301 314013 30140801 301408 2013 0 57 57 "household head"  0 0
      31401301 314013 30140801 301408 2015 0 59 57 "household head"  0 0
      31401302 314013 30140802 301408 2013 1 60 60 "spouse"          1 1
      31401302 314013 30140802 301408 2015 1 62 60 "spouse"          0 1
      31401303 314013 30140803 301408 2013 1 25 25 "children"        1 1
      31401303 314013 30140803 301408 2015 1 27 25 "children"        1 1
      31401401 314014        .      . 2013 1 46 48 "household head"  1 1
      31401402 314014        .      . 2013 0 41 43 "spouse"          1 1
      31401403 314014        .      . 2013 1 16 18 "children"        0 0
      31401404 314014        .      . 2013 1 11 13 "children"        0 0
      31401405 314014        .      . 2013 0 70 72 "parents"         0 0
      31401406 314014        .      . 2013 0 45 47 "sister/brother"  1 1
      31401501 314015 30140901 301409 2013 1 46 46 "household head"  1 1
      31401501 314015 30140901 301409 2015 1 48 46 "household head"  1 1
      31401502 314015 30140902 301409 2013 0 37 37 "spouse"          1 1
      31401502 314015 30140902 301409 2015 0 39 37 "spouse"          1 1
      31401503 314015 30140903 301409 2013 0 13 13 "children"        0 0
      31401503 314015 30140903 301409 2015 0 15 13 "children"        0 0
      31401504 314015 30140904 301409 2013 1  7  7 "parents"         0 0
      31401504 314015 30140904 301409 2015 1  9  7 "children"        0 0
      31401601 314016        .      . 2013 0 78 80 "household head"  0 0
      31401602 314016        .      . 2013 1 40 42 "children"        0 0
      31401603 314016        .      . 2013 0 36 38 "children in law" 1 1
      31401604 314016        .      . 2013 0 12 14 "grandchildren"   0 0
      end
      I am trying to get a sample of couple with equal male and female, equal household head and spouse, and equal numbers of individuals observed in 2013 and 2015.
      My criteria are: 1) both worked in 2013; 2) oberserved in both 2013 & 2015; 3) aged above 25 in 2013, and 4) had relationship as householdhead and spouse
      (id refers to individual id, hhid refers to household id).

      Comment


      • #4
        Thank you. The following code selects out the observations purtaining to those household heads and spouses who fulfill all of your requirements (including that the household include both the head and the spouse):
        Code:
        //    KEEP ONLY HOUSEHOLDS APPEARING IN BOTH 2013 AND 2015
        drop if missing(hhid2013, hhid2015)
        
        //    KEEP ONLY OBSERVATIONS FOR HOUSEHOLD HEAD OR SPOUSE
        keep if inlist(relationship, "household head", "spouse")
        
        //    KEEP ONLY OBSERVATIONS WITH AGE > 25 IN 2013
        keep if inrange(age2013, 25, .)
        
        //    KEEP ONLY OBSERVATIONS WHERE THE PERSON WORKED IN 2013
        keep if worked2013 == 1
        
        //    KEEP ONLY HOUSEHOLDS HAVING BOTH A HOUSEHOLD HEAD AND A SPOUSE
        by hhid2013 (relationship), sort: ///
            keep if relationship[1] == "household head" & relationship[_N] == "spouse"
        Now, this code will have eliminated the information about other members of the households. If you will need that information, this slightly different code will enable you to get it back:

        Code:
        //    SAVE THE ORIGINAL DATA
        tempfile copy
        save `copy'
        
        //    KEEP ONLY HOUSEHOLDS APPEARING IN BOTH 2013 AND 2015
        drop if missing(hhid2013, hhid2015)
        
        //    KEEP ONLY OBSERVATIONS FOR HOUSEHOLD HEAD OR SPOUSE
        keep if inlist(relationship, "household head", "spouse")
        
        //    KEEP ONLY OBSERVATIONS WITH AGE > 25 IN 2013
        keep if inrange(age2013, 25, .)
        
        //    KEEP ONLY OBSERVATIONS WHERE THE PERSON WORKED IN 2013
        keep if worked2013 == 1
        
        //    KEEP ONLY HOUSEHOLDS HAVING BOTH A HOUSEHOLD HEAD AND A SPOUSE
        by hhid2013 (relationship), sort: ///
            keep if relationship[1] == "household head" & relationship[_N] == "spouse"
            
        //    NOW RECOVER ALL OF THE INFORMATION ABOUT THE HOUSEHOLD (NOT JUST HEAD & SPOUSE)
        keep hhid2013
        duplicates drop
        merge 1:m hhid2013 using `copy', assert(match using) keep(match) nogenerate
        Note: The requirement that the sample balance on heads/spouses and males/females is met with the example data. If the full data set does include some same-sex marriages, then it will balance on heads/spouses but not necessarily on males/females. It is not clear to me whether or how you actually want to enforce both of these requirements. You could exclude same sex couples, though this might restrict the generalizability of your findings in ways that are unsatisfactory. Alternatively, we could add some code at the end that checks the sex balance and removes, at random some same-sex couples of the modal sex to produce balance. Or maybe you just leave it the way it is and not require sex balance. It really depends on your research goals.

        Comment


        • #5
          Dear Clyde,

          Thank you for your suggestion, and specifically about same-sex couple issue. I am also trying to compare the difference between female heads and male heads and thinking about comparing on exact similar sample.

          I have tried your code. I have relatively equal number of heads and spouses. Heads have about 2 more observations. I donnot know why there is a slight difference like that. There might be due to differences between the sample data and the full sample.

          When I tried with fullsample, I could not get a sample balance on year 2013/ year 2015. I need to use FE in my main analysis. So, I add code to track couples who were observed in both years.
          Code:
              duplicates tag id2013 if (year==2013 | year==2015), gen(dup1315)
              gen d1315=(dup1315==1)
              keep if d1315==1
          I am not sure whether it is appropriate to add this at the end to try to get balance on year.

          Comment


          • #6
            I cannot see any reason why the code I gave in #4 would result in an unbalanced sample. It should be perfectly balanced between spouses and heads and also produce only couples who were surveyed in both 2013 and 2015. If you can post a data example that produces unbalanced results, I would be happy to troubleshoot this and fix whatever the problem is.

            Comment


            • #7
              Dear Clyde,

              Sorry for my late reply. I have waited for a cleaner data to give you the representative sample.
              I would like to attach herewitth the sample.
              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input long(hhid2013 ivid2013 hhid2015 ivid2015) int year str4 sex str15 relationship byte age str12 worked byte(age2013 worked2013)
              111002 11100201 101101        . 2002 "Male" "household head"  65 "Worked"       65 1
              111002 11100202 101101 10110101 2002 "0"    "spouse"          60 "Worked"       60 1
              111002 11100202 101101 10110101 2004 "0"    "household head"  62 "Worked"       60 1
              111003 11100301 101102 10110201 2002 "Male" "household head"  57 "Did not work" 57 0
              111003 11100301 101102 10110201 2004 "Male" "household head"  59 "Worked"       57 0
              111003 11100302 101102 10110202 2002 "0"    "spouse"          47 "Did not work" 47 0
              111003 11100302 101102 10110202 2004 "0"    "spouse"          49 "Worked"       47 0
              111003 11100303 101102 10110203 2002 "0"    "children"        20 "Did not work" 20 0
              111003 11100303 101102 10110203 2004 "0"    "children"        22 "Did not work" 20 0
              111003 11100304 101102 10110204 2002 "Male" "children"        18 "Did not work" 18 0
              111003 11100304 101102 10110204 2004 "Male" "children"        20 "Did not work" 18 0
              111004 11100401 101103 10110301 2002 "Male" "household head"  38 "Worked"       38 1
              111004 11100401 101103 10110301 2004 "Male" "household head"  40 "Worked"       38 1
              111004 11100402 101103 10110302 2002 "0"    "spouse"          39 "Worked"       39 1
              111004 11100402 101103 10110302 2004 "0"    "spouse"          41 "Worked"       39 1
              111004 11100403 101103 10110303 2002 "Male" "children"        14 "Did not work" 14 0
              111004 11100403 101103 10110303 2004 "Male" "children"        16 "Did not work" 14 0
              111005 11100501 101104 10110401 2002 "Male" "household head"  54 "Worked"       54 1
              111005 11100501 101104 10110401 2004 "Male" "household head"  56 "Worked"       54 1
              111005 11100502 101104 10110402 2002 "0"    "spouse"          45 "Worked"       45 1
              111005 11100502 101104 10110402 2004 "0"    "spouse"          47 "Worked"       45 1
              111005 11100503 101104 10110403 2002 "0"    "children"        21 "Worked"       21 1
              111005 11100503 101104 10110403 2004 "0"    "children"        23 "Worked"       21 1
              111005 11100504 101104 10110404 2002 "0"    "children"        19 "Did not work" 19 0
              111005 11100504 101104 10110404 2004 "0"    "children"        21 "Did not work" 19 0
              111005 11100505 101104 10110405 2002 "0"    "children"        11 "Did not work" 11 0
              111005 11100505 101104 10110405 2004 "0"    "children"        13 "Did not work" 11 0
              111005 11100506 101104 10110406 2002 "0"    "parents"         86 "Did not work" 86 0
              111005 11100506 101104 10110406 2004 "0"    "children in law" 88 "Did not work" 86 0
              111006 11100601 101117 10111701 2002 "Male" "household head"  64 "Worked"       64 1
              111006 11100601 101117 10111701 2004 "Male" "household head"  67 "Worked"       64 1
              111006 11100602 101117 10111702 2002 "0"    "spouse"          58 "Worked"       58 1
              111006 11100602 101117 10111702 2004 "0"    "spouse"          61 "Did not work" 58 1
              111006 11100603 101117        . 2002 "0"    "children"        36 "Worked"       65 1
              111006 11100604 101117        . 2002 "0"    "grandparent"      5 "Did not work" 65 0
              111006 11100605 101117 10111703 2002 "Male" "children"        31 "Worked"       31 1
              111006 11100605 101117 10111703 2004 "Male" "children"        33 "Worked"       31 1
              111006 11100606 101117 10111704 2002 "0"    "children in law" 23 "Worked"       23 1
              111006 11100606 101117 10111704 2004 "0"    "children"        25 "Worked"       23 1
              111007 11100701 101105 10110501 2002 "Male" "household head"  68 "Did not work" 68 0
              111007 11100701 101105 10110501 2004 "Male" "household head"  70 "Did not work" 68 0
              111007 11100702 101105 10110502 2002 "0"    "spouse"          64 "Did not work" 64 0
              111007 11100702 101105 10110502 2004 "0"    "spouse"          67 "Did not work" 64 0
              111007 11100703 101105 10110503 2002 "Male" "children"        39 "Worked"       39 1
              111007 11100703 101105 10110503 2004 "Male" "children"        41 "Worked"       39 1
              111007 11100704 101105 10110504 2002 "0"    "children in law" 40 "Worked"       40 1
              111007 11100704 101105 10110504 2004 "0"    "children"        42 "Worked"       40 1
              111007 11100705 101105 10110505 2002 "Male" "grandchildren"    9 "Did not work"  9 0
              111007 11100705 101105 10110505 2004 "Male" "sister/brother"  11 "Did not work"  9 0
              111007 11100706 101105 10110506 2002 "Male" "grandchildren"    7 "Did not work"  7 0
              111007 11100706 101105 10110506 2004 "Male" "sister/brother"   9 "Did not work"  7 0
              end
              Thank you.

              Comment


              • #8
                Thank you. I see the problem. It arises because a household might have both a head and spouse in the first year, but then one of them is no longer present in the second year.

                If I can make the assumptiona that 1) each household has at most one head and at most one spouse in each year (i.e., no polygamy), and 2) each individual head or spouse appears in the data at most twice (once in each of two years), then the following code (which verifies these assumptions in its second and third commands) will give you a balanced sample:
                Code:
                //    KEEP ONLY OBSERVATIONS FOR HOUSEHOLD HEAD OR SPOUSE
                keep if inlist(relationship, "household head", "spouse")
                
                //      VERIFY EACH PERSON APPEARS IN THE DATA AT MOST TWICE
                by hhid2013 ivid2013 (year), sort: assert _N <= 2
                
                //      VERIFY AT MOST ONE HEAD AND ONE SPOUSE PER HOUSEHOLD EACH YEAR
                by hhid2013 year (relationship), sort: assert _N <= 2 ///
                    & relationship[1] != relationship[2]
                
                //    KEEP ONLY OBSERVATIONS WITH AGE > 25 IN 2013
                keep if inrange(age2013, 25, .)
                
                //    KEEP ONLY OBSERVATIONS WHERE THE PERSON WORKED IN 2013
                keep if worked2013 == 1
                
                //    KEEP ONLY HOUSEHOLDS HAVING BOTH A HOUSEHOLD HEAD AND A SPOUSE
                //      IN BOTH YEARS
                by hhid2013 (relationship), sort: ///
                    keep if relationship[1] == "household head" & relationship[_N] == "spouse" ///
                    & _N == 4
                I notice, by the way, a big change in the data: originally the variable year took on only values 2013 and 2015. Now year takes on only values 2002 and 2004, although the hhid variables are still called hhid2013 and 2015. This makes me worry that I'm misunderstanding something about the data that will also cause this code to break.

                Comment


                • #9
                  Thank you for your comments.
                  The existence of 2002 and 2004 is my mistake in coding to merge with another data. Sorry for confusing you.

                  I have tried your codes with the full data, I cannot get a balanced sample. It seems to be due to the sample again. The full data includes cases in which id and hhid of some individuals are missing.
                  I would like to attach herewith a larger sample. I hope it can represent my full data.

                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input long(hhid2013 ivid2013 hhid2015 ivid2015) int year str6 sex str15 relationship byte(age worked) int age2013 byte worked2013
                  106016 10601603      .        . 2013 "Male"   "children"        18 1  75 1
                  904004 90400401      .        . 2013 "Male"   "household head"  36 1  75 1
                  911010 91101001      .        . 2013 "Male"   "household head"  75 1  75 1
                  301104 30110401 300114 30011401 2015 "Male"   "household head"  78 1   . 0
                  104012 10401201 100406 10040601 2015 "Male"   "household head"  56 1   . 0
                  505002 50500203      .        . 2013 "Male"   "children"        19 1  75 1
                  903018 90301805 900310 90031005 2013 "Male"   "children"        23 1  23 1
                  307101 30710101 300713 30071301 2013 "Male"   "household head"  56 1  56 1
                  302103 30210304 300215 30021504 2015 "Female" "children"        16 1   . 0
                  913011 91301104 901307 90130704 2013 "Male"   "children"        16 1  16 1
                  113020 11302001      .        . 2013 "Male"   "household head"  59 1  75 1
                  301104 30110405      .        . 2013 "Female" "children in law" 21 1  75 1
                  302104 30210401      .        . 2013 "Male"   "household head"  54 1  75 1
                  307103 30710310      .        . 2013 "Male"   "grandchildren"   22 1  75 1
                  104011 10401101      .        . 2013 "Male"   "household head"  53 1  75 1
                  903012 90301202 900307 90030702 2015 "Female" "spouse"          37 1   . 0
                  514018 51401801      .        . 2013 "Male"   "household head"  45 1  75 1
                  514103 51410302      .        . 2013 "Female" "spouse"          50 1  75 1
                  713013 71301302      .        . 2013 "Female" "spouse"          31 1  75 1
                  705020 70502001 700512 70051201 2013 "Male"   "household head"  42 1  42 1
                  904103 90410306      .        . 2013 "Female" "children in law" 19 1  75 1
                  316010 31601002 301607 30160702 2015 "Female" "spouse"          42 1   . 0
                  118019 11801901      .        . 2013 "Male"   "household head"  65 1  75 1
                  103011 10301103 100308 10030803 2013 "Male"   "children"        16 1  16 1
                  912002 91200202 901201 90120102 2015 "Female" "spouse"          33 1   . 0
                  305008 30500802      .        . 2013 "Female" "children"        48 1  75 1
                  905103 90510306 902914        . 2013 "Female" "children in law" 21 1  75 1
                  112002 11200201      .        . 2013 "Female" "household head"  45 1  75 1
                  905101 90510102      .        . 2013 "Female" "children"        23 1  75 1
                  911015 91101501      .        . 2013 "Male"   "household head"  38 1  75 1
                  117103 11710302      .        . 2013 "Female" "spouse"          49 1  75 1
                  107006 10700601      .        . 2013 "Female" "household head"  57 1  75 1
                  304009 30400903      .        . 2013 "Male"   "children"        16 1  75 1
                  502007 50200704 500204 50020404 2015 "Female" "children"        50 1   . 0
                  903001 90300102      .        . 2013 "Female" "spouse"          38 1  75 1
                  507102 50710204      .        . 2013 "Male"   "children"        10 1  75 1
                  122002 12200203      .        . 2013 "Female" "children"        23 1  75 1
                  138105 13810506      .        . 2013 "Male"   "children"        21 1  75 1
                  913019 91301904 901310        . 2013 "Male"   "children in law" 38 1  75 1
                  113002 11300202      .        . 2013 "Male"   "spouse"          45 1  75 1
                  522104 52210404 302214 30221404 2013 "Female" "children in law" 31 1  31 1
                  745018 74501802 704516 70451602 2013 "Male"   "children"        15 1  15 1
                       .        . 500214 50021403 2015 "Female" "children"        19 1   . 0
                       .        . 700814 70081403 2015 "Female" "children"        29 1   . 0
                       .        . 302503 30250302 2015 "Female" "spouse"          31 1   . 0
                       .        . 100402 10040201 2015 "Male"   "household head"  36 1   . 0
                       .        . 501501 50150102 2015 "Male"   "spouse"          42 1   . 0
                       .        . 700609 70060902 2015 "Female" "spouse"          59 1   . 0
                       .        . 500511 50051102 2015 "Female" "spouse"          40 1   . 0
                       .        . 904816 90481601 2015 "Male"   "household head"  52 1   . 0
                       .        . 501713 50171302 2015 "Male"   "children"        22 1   . 0
                  111004 11100401 101103 10110301 2013 "Male"   "household head"  38 1  38 1
                  111004 11100401 101103 10110301 2015 "Male"   "household head"  40 1  38 1
                  111004 11100402 101103 10110302 2013 "Female" "spouse"          39 1  39 1
                  111004 11100402 101103 10110302 2015 "Female" "spouse"          41 1  39 1
                  111006 11100601 101117 10111701 2013 "Male"   "household head"  64 1  64 1
                  111006 11100601 101117 10111701 2015 "Male"   "household head"  67 1  64 1
                  111006 11100602 101117 10111702 2013 "Female" "spouse"          58 1  58 1
                  111006 11100602 101117 10111702 2015 "Female" "spouse"          61 0  58 1
                  314009 31400903 301417 30141703 2013 "Male"   "children"        19 0  19 0
                  314009 31400903 301417 30141703 2015 "Male"   "children"        21 0  19 0
                  314009 31400904 301417 30141704 2013 "Female" "children"        10 0  10 0
                  314009 31400904 301417 30141704 2015 "Female" "children"        12 0  10 0
                  314009 31400905 301417 30141705 2013 "Female" "parents"         82 0  82 0
                  314009 31400905 301417 30141705 2015 "Female" "children in law" 84 0  82 0
                  314009 31400906 301417 30141706 2013 "Female" "sister/brother"  59 0  59 0
                  314009 31400906 301417 30141706 2015 "Female" "grandparent"     61 0  59 0
                  314010 31401001      .        . 2013 "Female" "household head"  38 0 107 0
                  314010 31401002      .        . 2013 "Male"   "sister/brother"  28 1 107 1
                  314010 31401003      .        . 2013 "Female" "sister/brother"  26 1 107 1
                  314010 31401004      .        . 2013 "Female" "children"        13 0 107 0
                  314011 31401101 301406 30140601 2013 "Male"   "household head"  44 1  44 1
                  314011 31401101 301406 30140601 2015 "Male"   "household head"  46 1  44 1
                  314011 31401102 301406 30140602 2013 "Female" "spouse"          45 1  45 1
                  314011 31401102 301406 30140602 2015 "Female" "spouse"          47 1  45 1
                  314011 31401103 301406        . 2013 "Male"   "children"        21 1 107 1
                  314011 31401104 301406 30140603 2013 "Male"   "children"        10 0  10 0
                  314011 31401104 301406 30140603 2015 "Male"   "children"        12 0  10 0
                  314012 31401201 301407 30140701 2013 "Male"   "household head"  45 1  45 1
                  314012 31401201 301407 30140701 2015 "Male"   "household head"  47 1  45 1
                  314012 31401202 301407 30140702 2013 "Female" "spouse"          37 0  37 0
                  314012 31401202 301407 30140702 2015 "Female" "spouse"          39 1  37 0
                  314012 31401203 301407 30140703 2013 "Male"   "children"         7 0   7 0
                  314012 31401203 301407 30140703 2015 "Male"   "children"         9 0   7 0
                  314012 31401204 301407 30140704 2013 "Male"   "children"         1 0   1 0
                  314012 31401204 301407 30140704 2015 "Male"   "children"         3 0   1 0
                  314013 31401301 301408 30140801 2013 "Female" "household head"  57 0  57 0
                  314013 31401301 301408 30140801 2015 "Female" "household head"  59 0  57 0
                  314013 31401302 301408 30140802 2013 "Male"   "spouse"          60 1  60 1
                  314013 31401302 301408 30140802 2015 "Male"   "spouse"          62 0  60 1
                  314013 31401303 301408 30140803 2013 "Male"   "children"        25 1  25 1
                  314013 31401303 301408 30140803 2015 "Male"   "children"        27 1  25 1
                  314014 31401401      .        . 2013 "Male"   "household head"  46 1 107 1
                  314014 31401402      .        . 2013 "Female" "spouse"          41 1 107 1
                  314014 31401403      .        . 2013 "Male"   "children"        16 0 107 0
                  314014 31401404      .        . 2013 "Male"   "children"        11 0 107 0
                  314014 31401405      .        . 2013 "Female" "parents"         70 0 107 0
                  314014 31401406      .        . 2013 "Female" "sister/brother"  45 1 107 1
                  314015 31401501 301409 30140901 2013 "Male"   "household head"  46 1  46 1
                  314015 31401501 301409 30140901 2015 "Male"   "household head"  48 1  46 1
                  end

                  Comment


                  • #10
                    OK. To be honest, the missingness of id's makes me uncomfortable with this data at all, because I cannot now be confident about when two observations refer to the same person, or even the same household. Nevertheless, I can understand that sometimes people move from one household to another, perhaps with the origin or destination even not being in the survey, so that the hhid and individual id can be missing.

                    The big question is whether you want to include people who migrate in or out of the sample at all. If not, then the simple solution is to drop them:
                    Code:
                    drop if missing(hhid2013, hhid2015, ivid2013, ivid2015, year)
                    and then run the code in #8.

                    If, however, you want to retain such people in your sample, then I think the following code will give you a balanced sample:
                    Code:
                    //    VERIFY HHID20213, HHID2015, IVID2013, IVID2015, AND YEAR UNIQUELY
                    //    IDENTIFY OBSERVATIONS
                    isid hhid* ivid* year, missok
                    
                    //    KEEP ONLY OBSERVATIONS FOR HOUSEHOLD HEAD OR SPOUSE
                    keep if inlist(relationship, "household head", "spouse")
                    
                    //      VERIFY EACH PERSON APPEARS IN THE DATA AT MOST TWICE
                    by hhid2013 ivid2013 hhid2015 ivid2015 (year), sort: assert _N <= 2
                    
                    //      VERIFY AT MOST ONE HEAD AND ONE SPOUSE PER HOUSEHOLD EACH YEAR
                    by hhid2013 hhid2015 year (relationship), sort: assert _N <= 2 ///
                        & relationship[1] != relationship[2]
                    
                    //    KEEP ONLY OBSERVATIONS WITH AGE > 25 IN 2013
                    keep if inrange(age2013, 25, .)
                    
                    //    KEEP ONLY OBSERVATIONS WHERE THE PERSON WORKED IN 2013
                    keep if worked2013 == 1
                    
                    //    KEEP ONLY HOUSEHOLDS HAVING BOTH A HOUSEHOLD HEAD AND A SPOUSE
                    //      IN BOTH YEARS
                    by hhid2013 hhid2015 (relationship), sort: ///
                        keep if relationship[1] == "household head" & relationship[_N] == "spouse" ///
                        & _N == 4
                    Now, I also want to caution you about proper and improper ways to use code. I raise this, because you should not have been able to even know whether or not the code was producing a balanced sample in #8. Used properly, it would produce no sample at all. That's because the code would halt with an error message when executing the command -by hhid2013 ivid2013 (year), sort: assert _N <= 2-. When code halts with an error message, you should not then execute the rest of the code. Halting with an error message results when some condition that is required in order for the rest of the code to produce correct results is not met. The best you can hope for by running the rest of the code is getting some result that is obviously wrong (which, it seems, is what happened to you), and you will have wasted time running it. A much worse result is the possibility of getting a result that looks OK, but is in fact wrong. Later, you, or somebody relying on your work, may uncover the incorrectness of the result--when it will be harder to figure out what went wrong, and when someone may have already been harmed by reliance on the erroneous results. Error-halts are your friends: they are there to protect you from errors. You continue past them at your peril.

                    So never continue to run code that halts with an error message. Once you get an error-halt you should look at the command that gives the error message, figure out why you are getting that error, and then fix the underlying problem. Fixing the problem may entail modifying the code so that it can work correctly in the actual conditions it encountered, or it may entail re-creating the data set to eliminate observations that violate the necessary assumptions.

                    Comment


                    • #11
                      Dear Clyde,

                      Thank you for your kind suggestions.
                      May I ask how to browse/see to examine on those not fullfill all condition in my case? Specifically, those not met the conditions: "by hhid02 hhid04 year (relationship), _N <= 2 & relationship[1] != relationship[2]".
                      I want to know where the problem lies, because the "assert" lets me know there is a problem somewhere.
                      Last edited by June Le; 14 Dec 2023, 04:58.

                      Comment


                      • #12
                        Code:
                        by hhid2013 hhid2015 year (relationship), sort: gen byte problem = (_N <= 2 ///
                            & relationship[1] != relationship[2])
                        browse if problem

                        Comment


                        • #13
                          Thank you, Clyde.

                          Comment

                          Working...
                          X