I am working with a dataset containing 4 fields of interest:
SubjectID (unique for each subject, but each subject can have more than one encounter)
NoteDate
NoteID (which is unique for each subject encounter)
NoteText (which is split into several “lines” depending on length)
NoteLine (which numbers between 1 and as many as 5 or more, again depending on how many splits of NoteText were made, which I have no control over)
The dataset was delivered with the NoteText string field split into two or more parts, with the NoteLine field denoting the parts. For a few observations, they are recorded as follows:
SubjectID NoteDate NoteID NoteLine NoteText
1 23apr2012 4322 1 This field
1 23apr2012 4322 2 is long
1 30apr2012 4976 1 This field
1 30apr2012 4976 2 is very
1 30apr2012 4976 3 long
2 24apr2012 4329 1 This field
2 24apr2012 4329 2 is very
2 24apr2012 4329 3 long
2 30apr2012 4978 1 This field
2 30apr2012 4978 2 is extra-
2 30apr2012 4978 3 ordinarily
2 30apr2012 4978 4 long
Thus, NoteText for the 30apr2014 visit of Subject 1 consists of:
NoteID NoteLine NoteText
4976 1 This field
4976 2 is very
4976 3 long
I would like to:
- append the three different parts of NoteText into a new field, e.g. FullNote, and keeping the same NoteID
- keep the rows containing unique NoteIDs, that now include FullNote, and which were originally NoteLine==1.
- delete the rows for NoteLine==2 or more (but without losing any NoteIDs that were originally NoteLine==1, and which now contain the appended FullNote.
I know that the appending is straightforward:
gen FullNote = note1 + note2,
But, I’m not sure how that process would be done so that I can append as many lines as there are for a note (whether it was 2, 3, 4 or more), but within subsets of the same NoteID.
SubjectID (unique for each subject, but each subject can have more than one encounter)
NoteDate
NoteID (which is unique for each subject encounter)
NoteText (which is split into several “lines” depending on length)
NoteLine (which numbers between 1 and as many as 5 or more, again depending on how many splits of NoteText were made, which I have no control over)
The dataset was delivered with the NoteText string field split into two or more parts, with the NoteLine field denoting the parts. For a few observations, they are recorded as follows:
SubjectID NoteDate NoteID NoteLine NoteText
1 23apr2012 4322 1 This field
1 23apr2012 4322 2 is long
1 30apr2012 4976 1 This field
1 30apr2012 4976 2 is very
1 30apr2012 4976 3 long
2 24apr2012 4329 1 This field
2 24apr2012 4329 2 is very
2 24apr2012 4329 3 long
2 30apr2012 4978 1 This field
2 30apr2012 4978 2 is extra-
2 30apr2012 4978 3 ordinarily
2 30apr2012 4978 4 long
Thus, NoteText for the 30apr2014 visit of Subject 1 consists of:
NoteID NoteLine NoteText
4976 1 This field
4976 2 is very
4976 3 long
I would like to:
- append the three different parts of NoteText into a new field, e.g. FullNote, and keeping the same NoteID
- keep the rows containing unique NoteIDs, that now include FullNote, and which were originally NoteLine==1.
- delete the rows for NoteLine==2 or more (but without losing any NoteIDs that were originally NoteLine==1, and which now contain the appended FullNote.
I know that the appending is straightforward:
gen FullNote = note1 + note2,
But, I’m not sure how that process would be done so that I can append as many lines as there are for a note (whether it was 2, 3, 4 or more), but within subsets of the same NoteID.
Comment