Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge several observations on one ID variable

    Dear Statalist,

    I am trying to merge two datasets. When trying to do so, I realize that the using dataset has several observations of the variable ICD10 within one ID (PID_107945), meaning that the ID variable is listed several times.

    Code:
    . merge 1:1 PID_107945 using "dataset2"
    variable PID_107945 does not uniquely identify observations in the master data
    HTML Code:
      	 		 			. list 			ICD10 			PID_107945 if nobs 			>1 			in 			1/100, 			sepby(PID_107945) 		 		 			  			  		 		 			ICD10 			PID_107945 		 		 			  			- 		 		 			1. 			J31.0 			1079450000020 		 		 			2. 			1079450000020 		 		 			3. 			J31.0 			1079450000020 		 		 			4. 			1079450000020 		 		 			5. 			J31.0 			1079450000020 		 		 			  			- 		 		 			8. 			1079450000198 		 		 			9. 			1079450000198 		 		 			10. 			1079450000198 		 		 			11. 			1079450000198 		 		 			  			- 		 		 			13. 			J44.9 			1079450000228 		 		 			14. 			J44.9 			1079450000228 		 		 			15. 			J44.9 			1079450000228 		 		 			16. 			J44.9 			1079450000228 		 		 			17. 			M80.4 			1079450000228 		 		 			18. 			J44.8 			1079450000228 		 		 			19. 			J44.9 			1079450000228 		 		 			20. 			J44.9 			1079450000228 		 		 			21. 			M80.4 			1079450000228 		 		 			22. 			J44.9 			1079450000228 		 		 			23. 			J44.9 			1079450000228 		 		 			24. 			J44.1 			1079450000228 		 		 			25. 			J44.9 			1079450000228 		 		 			26. 			J44.9 			1079450000228 		 		 			  			-
    The dataset looks like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str13 PID_107945 str7 ICD10 double admissiondate
    "1079450000020" "J31.0"  20030213
    "1079450000020" ""       19980924
    "1079450000020" "J31.0"  20040507
    "1079450000020" ""       19951214
    "1079450000020" "J31.0"  20040212
    "1079450000068" ""       19970220
    "1079450000082" "J44.9"  20180820
    "1079450000198" ""       19950929
    "1079450000198" ""       19950314
    "1079450000198" ""       19950210
    "1079450000198" ""       19950925
    "1079450000204" "J31.0"  20010621
    "1079450000228" "J44.9"  20100921
    "1079450000228" "J44.9"  20100309
    "1079450000228" "J44.9"  20100618
    "1079450000228" "J44.9"  20110121
    "1079450000228" "M80.4"  20081110
    "1079450000228" "J44.8"  20020906
    "1079450000228" "J44.9"  20021008
    "1079450000228" "J44.9"  20100525
    "1079450000228" "M80.4"  20090821
    "1079450000228" "J44.9"  20100930
    "1079450000228" "J44.9"  20090227
    "1079450000228" "J44.1"  20060306
    "1079450000228" "J44.9"  20030131
    "1079450000228" "J44.9"  20011217
    "1079450000235" "M81.9"  20130221
    "1079450000334" "J44.0"  20160311
    "1079450000334" "M81.9"  20170407
    "1079450000334" "J43.9"  20180508
    "1079450000334" "M81.9"  20190705
    "1079450000334" "M81.99" 20171113
    "1079450000334" "J44.1"  20151228
    "1079450000334" "M81.99" 20180805
    "1079450000334" "J44.9"  20121114
    "1079450000334" "J44.9"  20130515
    "1079450000334" "J44.1"  20150831
    "1079450000334" "J44.1"  20180504
    "1079450000334" "J44.0"  20140908
    "1079450000334" "M81.9"  20140908
    "1079450000334" "J44.9"  20160502
    "1079450000334" "J44.9"  20150130
    "1079450000334" "J44.9"  20141215
    "1079450000334" "M81.99" 20180508
    "1079450000334" "J44.0"  20160225
    "1079450000334" "J44.9"  20150608
    "1079450000334" "J44.9"  20141215
    "1079450000334" "J44.8"  20160712
    "1079450000334" "J44.9"  20160726
    "1079450000334" "J44.9"  20170210
    "1079450000334" "M81.9"  20170525
    "1079450000334" "J44.9"  20141215
    "1079450000334" "J44.9"  20141126
    "1079450000334" "M81.9"  20170826
    "1079450000334" "J44.1"  20190410
    "1079450000334" "J44.1"  20170826
    "1079450000334" "J44.9"  20180418
    "1079450000334" "M81.9"  20170509
    "1079450000334" "J44.9"  20160812
    "1079450000334" "J44.9"  20141215
    "1079450000334" "J44.0"  20180508
    "1079450000334" "J44.1"  20161112
    "1079450000334" "J44.1"  20170525
    "1079450000334" "J44.1"  20190404
    "1079450000334" "J44.1"  20151126
    "1079450000334" "M81.9"  20190410
    "1079450000334" "J44.9"  20141215
    "1079450000334" "J44.1"  20161020
    "1079450000334" "J44.1"  20171113
    "1079450000334" "M81.9"  20170327
    "1079450000334" "M81.9"  20170929
    "1079450000334" "J44.0"  20170327
    "1079450000334" "J20.9"  20180508
    "1079450000334" "M81.9"  20190207
    "1079450000334" "J44.9"  20181221
    "1079450000334" "J44.1"  20160927
    "1079450000334" "M81.9"  20150709
    "1079450000334" "M81.9"  20150928
    "1079450000334" "J44.0"  20170407
    "1079450000334" "J44.1"  20170929
    "1079450000334" "J44.9"  20140925
    "1079450000334" "J44.1"  20170509
    "1079450000334" "M81.9"  20140925
    "1079450000334" "J44.9"  20141215
    "1079450000334" "J44.1"  20151028
    "1079450000334" "J44.9"  20141215
    "1079450000334" "J44.1"  20160331
    "1079450000334" "M81.9"  20190404
    "1079450000334" "J20.2"  20140908
    "1079450000334" "M81.9"  20140904
    "1079450000334" "J44.1"  20160117
    "1079450000334" "J44.1"  20190115
    "1079450000334" "J44.9"  20141215
    "1079450000334" "M81.9"  20151028
    "1079450000334" "J44.9"  20150928
    "1079450000334" "J44.1"  20190207
    "1079450000334" "J44.1"  20181129
    "1079450000334" "M81.9"  20190403
    "1079450000334" "J44.9"  20141215
    "1079450000334" "J44.1"  20180805
    end
    I'm interested in the first occasion where the code is registered (e.g. the first heart infarction). Is there a way that I can recode the ICD10 variable to only contain the first (by date) registered ICD10 code? I'm only looking for the main groups of codes (J44, doesn't matter if it's J44.1 or J44.9). I will need to do the same thing with the admissiondate variable (only keep the first observation date), since I'm using it as the end-date variable for -stset- later on.

    Grateful for any help with this.

    Best regards,
    Sigrid V

  • #2
    You can merge 1 to many (1:m) or many to 1 (m:1), but the main issue with your data are the observations with missing values for ICD10 if you intend this to be an identifier in the merge.

    I'm only looking for the main groups of codes (J44, doesn't matter if it's J44.1 or J44.9). I will need to do the same thing with the admissiondate variable (only keep the first observation date), since I'm using it as the end-date variable for -stset- later on.
    You need a date variable recognized by Stata. For your first request, I assume that the first 3 characters of ICD10 identify a group, but you should ensure that this is the case.

    #2
    Code:
    gen date= date(string(admissiondate, "%17.0g"), "YMD")
    format date %td
    bys ICD10 (date): keep if _n==1
    #1
    Code:
    gen ICDG= substr(ICD10, 1, 3)
    bys ICDG (date): keep if _n==1
    Last edited by Andrew Musau; 02 Apr 2020, 10:51.

    Comment


    • #3
      Thank you, Andrew Musau! I will give it a try on my data.

      Comment

      Working...
      X