Multiple lines to a single line

Yuqing Hu

Join Date: May 2014

Posts: 17
#1

Multiple lines to a single line

02 Mar 2017, 21:16

Hello everyone,

I have a question regarding creating a string variable. I have a document which contains ID and the story. For example:

001$"[begin]title: balabala...
time: balabala
place: balabala...
people: 3
balabala..
[end]"

002$[begin]title: balabala....
time: balabala
place: balabala...
people: 4
balabala...
balabala
[end]"

003$....

...

I want to have the multiple lines (between [begin] and [end]) in the story to become one single line and store it as a variable "story", and I also want to create variables for time, place, people, etc. How can I do that in Stata? Or should I use Python?

Thanks,
Yuqing
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

02 Mar 2017, 22:11

I do not understand how what you show between [begin] and [end] is laid out as a Stata data set. If you post back and use -dataex- to show a sample of the data you have in Stata, I'm confident you will get a helpful and timely response. To install the -dataex- command run -ssc install dataex- from the Stata command window. Then run -help dataex- and read the easy instructions.
1 like
Comment

Mike Lacy

Join Date: Apr 2014
Posts: 2418

02 Mar 2017, 22:17

I second Clyde's point about the problems with your example. However, making some wild guesses about your intent, here's my approach:

Code:

// Create workable example data, which you did not do, since you did
// not read the FAQ as you were requested to do when you joined StataList.
clear
input str80 s
"001$[begin]title: balabala..."
"time: balabala"
"place: balabala..."
"people: 3"
"balabala.."
"[end]"
"002$[begin]title: balabala...."
"time: balabala"
"place: balabala..."
"people: 4"
"balabala..."
"balabala"
"[end]"
end
//
// Create and spread the id variable
// You did not tell us anything about the id, so I will
// read your mind and assume the first "$" terminates the id.
gen int pos = strpos(s, "$") -1
gen id = real(substr(s, 1, pos))
replace s = substr(s, pos,.) if !missing(id)
replace id = id[_n-1] if missing(id)
drop pos
//

// Put all the lines pertaining to one story into a series of variable on one line
bysort id: gen int seq = _n
compress
reshape wide s, i(id) j(seq)
gen strL TheStory = ""
// Combine the story into one string
foreach s of varlist s* {
  replace TheStory = TheStory + `s'
}
drop s*
//
// Get rid of begin and end
replace TheStory = subinstr(TheStory, "[begin]", "", .)
replace TheStory = subinstr(TheStory, "[end]", "", .)

Last edited by Mike Lacy; 02 Mar 2017, 22:21.

Comment

Yuqing Hu

Join Date: May 2014

Posts: 17
#4

03 Mar 2017, 09:09

Thank you very much! It is a .txt file. I want to create a .dta file that contains the ID and story variables.

I used the following code:

Code:

import delimited story.txt, delimiter("$") bindquote(strict) varnames(1) stripquote(no) encoding(UTF-8)

But the "story" variable contains only the first line of "[begin]title: balabala..."

Last edited by Yuqing Hu; 03 Mar 2017, 09:25.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2418
#5

04 Mar 2017, 09:25

"$" is not a delimiter in your example. A delimiter is something that separates the content of different variables, which it does not do in your case.

To use my code, you need to read each line as one long string. This is usually a good strategy for complex text data.

Your -import delimited- command told Stata to separate each line into different variables, breaking it at each "$". Instead, you want to read each line as a string without the line being broken up into variables by a delimiter. Below, in my command, I use "zzz" as a "fake" delimiter because I this sequence of characters is very unlikely to occur anywhere in your text. The result will be that no delimiter will be found by Stata, so that each line will be treated as one string. (I'm sure there's a nicer way to make -import delimited- not use any delimiter, butI don't recall how to do that.)

Code:

import delimited "story.txt" , delimiter("zzz", asstring) rename v1 s

Once you have read the text file as I describe, you should apply the previous code I suggested.
1 like
Comment
Yuqing Hu

Join Date: May 2014

Posts: 17
#6

05 Mar 2017, 10:34

Thank you very much!
Comment

Announcement

Multiple lines to a single line

Comment

Comment

Comment

Comment

Comment