Hi all,
I am trying to see if I can simplify some code I was given to improve readability and efficiency.
My goal is to merge HRS (Health and Retirement Study) respondent-helper level files with respondent level files and then create some aggregate information on the care respondents get.
The respondent-helper level file looks like the following, each respondent can have multiple helpers. The helperid is not unique but the combination of helperid and id is. The helper level file has more information about that helper, like whether they are paid.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id helperid) float paid
1 21 1
1 11 0
2 21 0
end
The respondent level files look like the following, h1 and h2 are the helperids for helper 1 and helper 2. Here person 2 only has one helper, in the real data there are also other information we want to keep from the respondent level file that is not in the helper level file, hence the need to merge.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id float(h1 h2)
1 21 11
2 21 .
end
The initial merged product might look like the following:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id float(h1 h2 h1_paid h2_paid)
1 21 11 1 0
2 21 . 0 .
end
The final product should look like the following, where any_paid is whether the respondent had any helper that was paid in that period.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id float any_paid
1 1
2 0
end
The current code I have look something like below. The idea is to use each helperfile, rename the helperid variable to h1 h2 etc and the characteristics to h1_paid h2_paid etc , save as new file, and then merge with the original respondent file. This works but the dofile can quickly become confusing to read (in reality we don't just have h1 h2 but ha1 ha2..etc and hb1 and hb2 etc) and is not very intuitive (as we are saving the same file over and over again as different things). I would really appreciate any suggestions on how to simply this process, let me know if you need any further clarification.
//rename helperfiles
forval x=1/2 {
use `helperfile.dta", clear
gen h`x'=helperid
gen h`x'_paid = paid
tempfile h`x'
save "h`x'", replace
}
//merge with respondent file and get final information
use "respondentfile.dta"
forval x=1/2 {
merge 1:1 id h`x' using "h`x'"
}
forval x=1/2 {
gen any_paid = 1 if h`x'_paid==1
}
keep id any_paid
I am trying to see if I can simplify some code I was given to improve readability and efficiency.
My goal is to merge HRS (Health and Retirement Study) respondent-helper level files with respondent level files and then create some aggregate information on the care respondents get.
The respondent-helper level file looks like the following, each respondent can have multiple helpers. The helperid is not unique but the combination of helperid and id is. The helper level file has more information about that helper, like whether they are paid.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id helperid) float paid
1 21 1
1 11 0
2 21 0
end
The respondent level files look like the following, h1 and h2 are the helperids for helper 1 and helper 2. Here person 2 only has one helper, in the real data there are also other information we want to keep from the respondent level file that is not in the helper level file, hence the need to merge.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id float(h1 h2)
1 21 11
2 21 .
end
The initial merged product might look like the following:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id float(h1 h2 h1_paid h2_paid)
1 21 11 1 0
2 21 . 0 .
end
The final product should look like the following, where any_paid is whether the respondent had any helper that was paid in that period.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id float any_paid
1 1
2 0
end
The current code I have look something like below. The idea is to use each helperfile, rename the helperid variable to h1 h2 etc and the characteristics to h1_paid h2_paid etc , save as new file, and then merge with the original respondent file. This works but the dofile can quickly become confusing to read (in reality we don't just have h1 h2 but ha1 ha2..etc and hb1 and hb2 etc) and is not very intuitive (as we are saving the same file over and over again as different things). I would really appreciate any suggestions on how to simply this process, let me know if you need any further clarification.
//rename helperfiles
forval x=1/2 {
use `helperfile.dta", clear
gen h`x'=helperid
gen h`x'_paid = paid
tempfile h`x'
save "h`x'", replace
}
//merge with respondent file and get final information
use "respondentfile.dta"
forval x=1/2 {
merge 1:1 id h`x' using "h`x'"
}
forval x=1/2 {
gen any_paid = 1 if h`x'_paid==1
}
keep id any_paid
Comment