Hello,
I am working with individuals (ind) and households(hh). The first variable is the unique identifier of an individual, and that individual may appear as a row multiple times (due to other variables, and most likely due to appearing in multiple years, but not always due to year). The second variable is the household(hh), and this just means that the individuals with the same hh in the same year are married, if they share the same hh# in THE SAME year (I say this in caps because hh # may be the same for unrelated individuals in another year). The binary variable called "credit (0/1), identifies individuals that have had a line of credit in the specific year. What I want to do is remove the associated spouse from the data if they are in the same household in the same year, with an individual who had a line of credit that year. If both individuals in the hh have credit==0 in that year, I want to keep them in the data. However, if in one year, an individual has credit==1, I want to remove their spouse (the associated person with their hh in that year, if their spouses credit==0). There may be cases where credit=1, for both members of the household and I want to keep them both. So if both have 0, or both have 1, keep. I think this would require tagging the spouse by year and household, if they have credit==0, but their associated member of the household has a 1 in that year. This is a large dataset with approx 3 million rows. I tried to use dataex for the first time, I hope this is the correct way:
Any help would be greatly appreciated! Thank you!
I am working with individuals (ind) and households(hh). The first variable is the unique identifier of an individual, and that individual may appear as a row multiple times (due to other variables, and most likely due to appearing in multiple years, but not always due to year). The second variable is the household(hh), and this just means that the individuals with the same hh in the same year are married, if they share the same hh# in THE SAME year (I say this in caps because hh # may be the same for unrelated individuals in another year). The binary variable called "credit (0/1), identifies individuals that have had a line of credit in the specific year. What I want to do is remove the associated spouse from the data if they are in the same household in the same year, with an individual who had a line of credit that year. If both individuals in the hh have credit==0 in that year, I want to keep them in the data. However, if in one year, an individual has credit==1, I want to remove their spouse (the associated person with their hh in that year, if their spouses credit==0). There may be cases where credit=1, for both members of the household and I want to keep them both. So if both have 0, or both have 1, keep. I think this would require tagging the spouse by year and household, if they have credit==0, but their associated member of the household has a 1 in that year. This is a large dataset with approx 3 million rows. I tried to use dataex for the first time, I hope this is the correct way:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str5 ind long hh float(credit year) "00647" 4 0 2006 "00647" 4 0 2006 "01983" 1358446 1 2007 "02493" 9 1 2014 "02914" 3244657 1 2008 "03226" 1000004 0 2011 "03226" 1000004 0 2011 "03226" 1154965 1 2019 "03226" 1154965 0 2019 "03226" 1140594 0 2021 "06196" 1988446 0 2020 "06343" 2589055 1 2010 "07326" 20 1 2006 "07831" 698297 1 2007 "08341" 19 0 2008 "08379" 21 1 2010 "09023" 28 0 2017 "09098" 2418063 0 2020 "09098" 2418063 0 2020 "09234" 2731752 0 2009 "10233" 30 0 2019 "10596" 27 0 2008 "11358" 33 0 2017 "11662" 1423110 0 2015 "11673" 2469273 0 2014 "11673" 2871647 0 2015 "11673" . 1 2019 "14118" 45 0 2007 "14330" 39 0 2013 "14330" 44 1 2015 "14882" 1287845 1 2006 "14882" 1268001 1 2007 "14882" 1263519 1 2010 "15055" 51 0 2019 "18042" 61 0 2016 "18181" 58 1 2006 "18181" 54 0 2008 "18241" 521318 0 2009 "18241" 497639 0 2014 "19034" 66 1 2020 end
Any help would be greatly appreciated! Thank you!

Comment