Appending string fields that were split, keeping the row with appended field, but dropping remaining rows

Michael McCulloch

Join Date: Jul 2025

Posts: 24
#1

Appending string fields that were split, keeping the row with appended field, but dropping remaining rows

02 Sep 2014, 17:45

I am working with a dataset containing 4 fields of interest:
SubjectID (unique for each subject, but each subject can have more than one encounter)
NoteDate
NoteID (which is unique for each subject encounter)
NoteText (which is split into several “lines” depending on length)
NoteLine (which numbers between 1 and as many as 5 or more, again depending on how many splits of NoteText were made, which I have no control over)

The dataset was delivered with the NoteText string field split into two or more parts, with the NoteLine field denoting the parts. For a few observations, they are recorded as follows:

SubjectID NoteDate NoteID NoteLine NoteText
1 23apr2012 4322 1 This field
1 23apr2012 4322 2 is long

1 30apr2012 4976 1 This field
1 30apr2012 4976 2 is very
1 30apr2012 4976 3 long

2 24apr2012 4329 1 This field
2 24apr2012 4329 2 is very
2 24apr2012 4329 3 long

2 30apr2012 4978 1 This field
2 30apr2012 4978 2 is extra-
2 30apr2012 4978 3 ordinarily
2 30apr2012 4978 4 long

Thus, NoteText for the 30apr2014 visit of Subject 1 consists of:
NoteID NoteLine NoteText
4976 1 This field
4976 2 is very
4976 3 long

I would like to:
- append the three different parts of NoteText into a new field, e.g. FullNote, and keeping the same NoteID
- keep the rows containing unique NoteIDs, that now include FullNote, and which were originally NoteLine==1.
- delete the rows for NoteLine==2 or more (but without losing any NoteIDs that were originally NoteLine==1, and which now contain the appended FullNote.

I know that the appending is straightforward:
gen FullNote = note1 + note2,

But, I’m not sure how that process would be done so that I can append as many lines as there are for a note (whether it was 2, 3, 4 or more), but within subsets of the same NoteID.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#2

02 Sep 2014, 18:12

How about this:

Code:

reshape wide NoteText, i(NoteID) j(NoteLine) egen FullNote = concat(NoteText*) drop NoteText*

As a bonus, -reshape- will also verify that SubjectID and NoteDate are constant within NoteID. Actually, you didn't explicitly say that there couldn't be two different SubjectIDs with the same NoteID. If that can happen, then the i() option in the reshape command should be i(SubjectID NoteID).

You might also need to use the -punct()- option in the -egen- statement to put spaces between the concatenated parts of FullNote to avoid having words run together.

Last edited by Clyde Schechter; 02 Sep 2014, 18:15.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35711

03 Sep 2014, 03:54

Alternatively,

Code:

bysort SubjectID NoteID : gen Alltext = NoteText[1]
bysort SubjectID NoteID : replace Alltext = Alltext[_n-1] + " " + NoteText if _n > 1
bysort SubjectID NoteID : keep if _n == _N

Some clean-up required when the extra spaces were the wrong guess.

Comment

Michael McCulloch

Join Date: Jul 2025

Posts: 24
#4

27 Oct 2014, 14:56

Thanks to Clyde and Nick, for the two very helpful approaches!
Comment
Michael McCulloch

Join Date: Jul 2025

Posts: 24
#5

23 Jul 2015, 19:24

Greetings Clyde and Nick, the concatenation method suggested by both of you work perfectly. After some head scratching, I discovered the methods were successful quite by accident, and am asking for advice to clarify how this happened.
The concatenated note is strL with up to 10,000 characters.
When I attempt to visualize the result with the command <list CONCATENATED-VAR, notrim> it only displayed up to 2077 characters.
However, when I invoked the command <codebook CONCATENATED-VAR>, the full content of a test record with 6,829 characters, was displayed.
Of course I was relieved and satisfied with the outcome, but wish to learn how to list the full contents for more complete validation of the concatenation.
Michael
Comment
Michael McCulloch

Join Date: Jul 2025

Posts: 24
#6

24 Jul 2015, 16:28

Well, I've been able to answer my own question with the command below. Thanks again for your help.

HTML Code:

display _asis mystr
Comment

Announcement

Appending string fields that were split, keeping the row with appended field, but dropping remaining rows

Comment

Comment

Comment

Comment

Comment