Hello, all, I have a coding problem that I'm hoping smarter people than I can help solve. I am working with tax data that is at the return level. I am trying to group together returns filed by the same people, based on their unique IDs (a masked version of SSN). Each return has a primary filer ID and a spouse ID (if two people filed a joint return; if not, the spouse ID field has a missing value). It is easy to find returns filed by the same people if the same ID is primary on each return. However, I have a large number of instances where they switch places. That is, the records look like this:
I'd like to add a "household ID" to this. At the moment, I have the following solution, which works but is very slow to run over hundreds of thousands of records:
Also, when I really get large numbers of records, the line
won't run because the local idlevels exceeds Stata's macro character limits.
Any thoughts on a faster and neater way to do this? Thanks very much in advance!!
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float returnID str1(primID spID) 1 "A" "B" 2 "B" "A" 3 "C" "D" 4 "D" "C" end
I'd like to add a "household ID" to this. At the moment, I have the following solution, which works but is very slow to run over hundreds of thousands of records:
Code:
gen id1 = primID gen id2 = spID reshape long id, i(returnID) j(filernum) levelsof id, local(idlevels) clean gen hhid = . local i = 1 foreach lvl in `idlevels' { levelsof primID if id=="`lvl'" , local(prim) clean levelsof spID if id=="`lvl'" , local(sp) clean local allids `prim' `sp' local allids : list uniq allids tokenize `allids' replace hhid = `i' if (id=="`1'" | id=="`2'") & hhid==. local i = `i'+1 } reshape wide id, i(returnID) j(filernum)
Code:
levelsof id, local(idlevels) clean
Any thoughts on a faster and neater way to do this? Thanks very much in advance!!
Comment