Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Complicated loop with sizes of households and income

    Hello,

    I am trying to calculate the number of people that are in a low income family under the low income cut off. The low income cut (LICO in Canada) off varies by year and by number of people in the family. I have 20 years in my data, tiny sample of exporting it using dataex looks like this:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int taxyear str5 id float(fam_inc fam_size)
    2002 "M0311" 38855 4
    2002 "S3172" 38855 4
    2016 "M2263" 28061 3
    2016 "M2494" 26052 9
    2016 "S9925" 26052 3
    2017 "M2966"  7104 3
    2017 "M3977"  1164 2
    2017 "M6388" 33560 4
    2018 "O9499" 32551 4
    2018 "M6311" 32551 4
    end

    I want to put the following code into a loop to drop claimants if they are above the low income cut-off, not to have to copy and paste 20 times, and have to write out each cut-off under each family size (2 to 10 family members)


    * 2016
    drop if fam_size ==2 & taxyear==2016 & fam_inc > 32084
    drop if fam_size ==3 & taxyear==2016 & fam_inc > 39295
    drop if fam_size ==4 & taxyear==2016 & fam_inc > 45374
    drop if fam_size ==5 & taxyear==2016 & fam_inc > 50730
    drop if fam_size ==6 & taxyear==2016 & fam_inc > 55572
    drop if fam_size ==7 & taxyear==2016 & fam_inc > 60024
    drop if fam_size ==8 & taxyear==2016 & fam_inc > 64169
    drop if fam_size ==9 & taxyear==2016 & fam_inc > 68061
    drop if fam_size ==10 & taxyear==2016 & fam_inc > 71743

    * 2017
    drop if fam_size ==2 & taxyear==2017 & fam_inc > 33076
    drop if fam_size ==3 & taxyear==2017 & fam_inc > 40509
    drop if fam_size ==4 & taxyear==2017 & fam_inc > 46776
    drop if fam_size ==5 & taxyear==2017 & fam_inc > 52297
    drop if fam_size ==6 & taxyear==2017 & fam_inc > 57289
    drop if fam_size ==7 & taxyear==2017 & fam_inc > 61879
    drop if fam_size ==8 & taxyear==2017 & fam_inc > 66151
    drop if fam_size ==9 & taxyear==2017 & fam_inc > 70164
    drop if fam_size ==10 & taxyear==2017 & fam_inc > 73959

    * ETC FOR 20 years

    I have the 20 years of family income cut-offs for each family size. Here is a sample of the table of thresholds for how much the family income should be and family size for a person to fall under the low family income threhsold:
    Household size5 2 persons 3 persons 4 persons 5 persons 6 persons 7 persons 8 persons 9 persons 10 persons
    2007 25066 30699 35448 39632 43415 46893 50131 53172 56048
    2008 26490 32443 37462 41884 45881 49558 52979 56193 59233
    2009 26695 32694 37752 42208 46237 49941 53389 56628 59691
    2010 27208 33323 38478 43020 47126 50902 54416 57717 60839


    Any help would be appreciated! Thank you!



  • #2
    No, you shouldn't attempt to code all of those -replace- statements. It is unlikely you will get them all right. The way to do this is to create a new data file containing the thresholds. If the table you show at the end of your post is in a spreadsheet or a csv file, you can import it to Stata, and change the variable names to taxyear, threshold2, threshold3, threshold4,... If it's in a word processing document or PDF, then you can try to copy the table to your clipboard and paste the data in to Stata's data editor, and, again, change the variable names. Then save that data file. It will look like this:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(taxyear threshold2) long(threshold3 threshold4 threshold5 threshold6 threshold7 threshold8 threshold9 threshold10)
    2007 25066 30699 35448 39632 43415 46893 50131 53172 56048
    2008 26490 32443 37462 41884 45881 49558 52979 56193 59233
    2009 26695 32694 37752 42208 46237 49941 53389 56628 59691
    2010 27208 33323 38478 43020 47126 50902 54416 57717 60839
    end
    Then, you -reshape- that file to long layout, and -merge- it into your household data set. Finally, you then drop the observations where fam_inc exceeds threshold. So the whole thing looks like this:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(taxyear threshold2) long(threshold3 threshold4 threshold5 threshold6 threshold7 threshold8 threshold9 threshold10)
    2007 25066 30699 35448 39632 43415 46893 50131 53172 56048
    2008 26490 32443 37462 41884 45881 49558 52979 56193 59233
    2009 26695 32694 37752 42208 46237 49941 53389 56628 59691
    2010 27208 33323 38478 43020 47126 50902 54416 57717 60839
    end
    reshape long threshold, i(taxyear) j(fam_size)
    tempfile thresholds
    save `thresholds'
    
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int taxyear str5 id float(fam_inc fam_size)
    2002 "M0311" 38855 4
    2002 "S3172" 38855 4
    2016 "M2263" 28061 3
    2016 "M2494" 26052 9
    2016 "S9925" 26052 3
    2017 "M2966"  7104 3
    2017 "M3977"  1164 2
    2017 "M6388" 33560 4
    2018 "O9499" 32551 4
    2018 "M6311" 32551 4
    end
    
    merge m:1 taxyear fam_size using `thresholds', keep(master match) nogenerate
    drop if fam_inc > threshold & !missing(fam_inc, threshold)
    Note: This code does not work in your example data because the threshold data you show applies to years 2007-2010, but the years in the household data are all outside that range.

    Comment


    • #3
      Thats a great idea to merge it in as a separate data set. Thank YOU

      Comment

      Working...
      X