When I think I'm nearly there - another problem crops up.
I have two big datasets - here's a sample
hospital dataset
procedure dataset
Aim:
I would like to merge the hospital data set onto the procedure dataset.
The hospital dataset consists of duplicate patients with the same procedureno but different uniqueno. In order to decide row to keep, I need to merge the hospital dataset onto the proceduredataset.
Then create a dummy variable whereby Admissiondate - date. The one with the admissiondate closest to the opdate will be kept, the other dropped.
*date is the day the operation took place
However:
A.
There are duplicates in the procedure dataset due to patient having a Right + Left operation - therefore have a different indexcode but the same procedureno (as the operation was done in the same sitting)
Both right and left operation need to remain in the data set.
Only procedureno is the variable that is shared amongst the two datasets
Intially I was going to merge 1:m - however on realising the problem with A of course I couldn't do it.
Then I thought of merge m:m --> however there has been a lot of critism about this.
So then I though of what if I pull the date variable only from the procedure dataset merge this onto the hospital dataset, create the dummy variable Admission - date. Keep the ones I need. Save this as unique dataset.
use "hospitaldataset.dta"
merge procedureno using "proceduredataset.dta", keep (date)
It then tells me i'M USING old syntax - but it still does the job. Any critisims?
I have two big datasets - here's a sample
hospital dataset
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(procedureno ssi dehiscence pe) str3 uniqueno float admissiondate 112 0 1 1 "11A" 22645 113 1 1 1 "12A" 22690 113 1 1 1 "13A" 22705 114 1 0 1 "14A" 22737 115 1 0 1 "15A" 22738 end format %td admissiondate
procedure dataset
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(procedureno indexcode new_dvt new_mi pe date) 112 9 . 1 1 22646 112 10 1 1 1 22646 113 13 1 1 . 22706 113 15 1 0 . 22706 114 16 1 0 1 22738 115 17 . . . 22739 end format %td date
Aim:
I would like to merge the hospital data set onto the procedure dataset.
The hospital dataset consists of duplicate patients with the same procedureno but different uniqueno. In order to decide row to keep, I need to merge the hospital dataset onto the proceduredataset.
Then create a dummy variable whereby Admissiondate - date. The one with the admissiondate closest to the opdate will be kept, the other dropped.
*date is the day the operation took place
However:
A.
There are duplicates in the procedure dataset due to patient having a Right + Left operation - therefore have a different indexcode but the same procedureno (as the operation was done in the same sitting)
Both right and left operation need to remain in the data set.
Only procedureno is the variable that is shared amongst the two datasets
Intially I was going to merge 1:m - however on realising the problem with A of course I couldn't do it.
Then I thought of merge m:m --> however there has been a lot of critism about this.
So then I though of what if I pull the date variable only from the procedure dataset merge this onto the hospital dataset, create the dummy variable Admission - date. Keep the ones I need. Save this as unique dataset.
use "hospitaldataset.dta"
merge procedureno using "proceduredataset.dta", keep (date)
It then tells me i'M USING old syntax - but it still does the job. Any critisims?
Comment