Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • egen command to tag certain individual in group and remove them using "if" something else is true

    Hello,

    I am working with individuals (ind) and households(hh). The first variable is the unique identifier of an individual, and that individual may appear as a row multiple times (due to other variables, and most likely due to appearing in multiple years, but not always due to year). The second variable is the household(hh), and this just means that the individuals with the same hh in the same year are married, if they share the same hh# in THE SAME year (I say this in caps because hh # may be the same for unrelated individuals in another year). The binary variable called "credit (0/1), identifies individuals that have had a line of credit in the specific year. What I want to do is remove the associated spouse from the data if they are in the same household in the same year, with an individual who had a line of credit that year. If both individuals in the hh have credit==0 in that year, I want to keep them in the data. However, if in one year, an individual has credit==1, I want to remove their spouse (the associated person with their hh in that year, if their spouses credit==0). There may be cases where credit=1, for both members of the household and I want to keep them both. So if both have 0, or both have 1, keep. I think this would require tagging the spouse by year and household, if they have credit==0, but their associated member of the household has a 1 in that year. This is a large dataset with approx 3 million rows. I tried to use dataex for the first time, I hope this is the correct way:



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str5 ind long hh float(credit year)
    "00647"       4 0 2006
    "00647"       4 0 2006
    "01983" 1358446 1 2007
    "02493"       9 1 2014
    "02914" 3244657 1 2008
    "03226" 1000004 0 2011
    "03226" 1000004 0 2011
    "03226" 1154965 1 2019
    "03226" 1154965 0 2019
    "03226" 1140594 0 2021
    "06196" 1988446 0 2020
    "06343" 2589055 1 2010
    "07326"      20 1 2006
    "07831"  698297 1 2007
    "08341"      19 0 2008
    "08379"      21 1 2010
    "09023"      28 0 2017
    "09098" 2418063 0 2020
    "09098" 2418063 0 2020
    "09234" 2731752 0 2009
    "10233"      30 0 2019
    "10596"      27 0 2008
    "11358"      33 0 2017
    "11662" 1423110 0 2015
    "11673" 2469273 0 2014
    "11673" 2871647 0 2015
    "11673"       . 1 2019
    "14118"      45 0 2007
    "14330"      39 0 2013
    "14330"      44 1 2015
    "14882" 1287845 1 2006
    "14882" 1268001 1 2007
    "14882" 1263519 1 2010
    "15055"      51 0 2019
    "18042"      61 0 2016
    "18181"      58 1 2006
    "18181"      54 0 2008
    "18241"  521318 0 2009
    "18241"  497639 0 2014
    "19034"      66 1 2020
    end


    Any help would be greatly appreciated! Thank you!








  • #2
    I'm not sure I understand what you are asking for, but I think it's this:
    Code:
    by hh year (credit), sort: drop if credit[1] != credit[_N] & credit == 0

    Comment


    • #3
      Thank you so much. I apologize if its confusing. This code does all that I need except in the case where where the same individual is listed twice in the same household in the same year.

      The code then drops one of the rows that that has a credit=zero, even though it should stay in the data as it is not the spouse, its the individual themselves.

      I think I could get around this if I could create an identifier to count the number of UNIQUE individuals in each house. Then I could add a restriction into the code "if the individual comes from a household of more than 1".

      I think the code would be something like this, but I don't know how to make it only count unique "ind".

      by year hh: egen numppl = count(ind)

      Would you know how to count unique individuals by year and household? Thanks!

      Comment


      • #4
        Would you know how to count unique individuals by year and household? Thanks!
        Code:
        by hh year (ind), sort: gen numppl = sum(ind != ind[_n-1])
        by hh year (ind) replace numppl = numppl[_N]

        Comment

        Working...
        X