Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple lines to a single line

    Hello everyone,

    I have a question regarding creating a string variable. I have a document which contains ID and the story. For example:

    001$"[begin]title: balabala...
    time: balabala
    place: balabala...
    people: 3
    balabala..
    [end]"

    002$[begin]title: balabala....
    time: balabala
    place: balabala...
    people: 4
    balabala...
    balabala
    [end]"

    003$....

    ...

    I want to have the multiple lines (between [begin] and [end]) in the story to become one single line and store it as a variable "story", and I also want to create variables for time, place, people, etc. How can I do that in Stata? Or should I use Python?

    Thanks,
    Yuqing

  • #2
    I do not understand how what you show between [begin] and [end] is laid out as a Stata data set. If you post back and use -dataex- to show a sample of the data you have in Stata, I'm confident you will get a helpful and timely response. To install the -dataex- command run -ssc install dataex- from the Stata command window. Then run -help dataex- and read the easy instructions.

    Comment


    • #3
      I second Clyde's point about the problems with your example. However, making some wild guesses about your intent, here's my approach:

      Code:
      // Create workable example data, which you did not do, since you did
      // not read the FAQ as you were requested to do when you joined StataList.
      clear
      input str80 s
      "001$[begin]title: balabala..."
      "time: balabala"
      "place: balabala..."
      "people: 3"
      "balabala.."
      "[end]"
      "002$[begin]title: balabala...."
      "time: balabala"
      "place: balabala..."
      "people: 4"
      "balabala..."
      "balabala"
      "[end]"
      end
      //
      // Create and spread the id variable
      // You did not tell us anything about the id, so I will
      // read your mind and assume the first "$" terminates the id.
      gen int pos = strpos(s, "$") -1
      gen id = real(substr(s, 1, pos))
      replace s = substr(s, pos,.) if !missing(id)
      replace id = id[_n-1] if missing(id)
      drop pos
      //
      
      // Put all the lines pertaining to one story into a series of variable on one line
      bysort id: gen int seq = _n
      compress
      reshape wide s, i(id) j(seq)
      gen strL TheStory = ""
      // Combine the story into one string
      foreach s of varlist s* {
        replace TheStory = TheStory + `s'
      }
      drop s*
      //
      // Get rid of begin and end
      replace TheStory = subinstr(TheStory, "[begin]", "", .)
      replace TheStory = subinstr(TheStory, "[end]", "", .)
      Last edited by Mike Lacy; 02 Mar 2017, 22:21.

      Comment


      • #4
        Thank you very much! It is a .txt file. I want to create a .dta file that contains the ID and story variables.

        I used the following code:
        Code:
        import delimited story.txt, delimiter("$") bindquote(strict) varnames(1) stripquote(no) encoding(UTF-8)
        But the "story" variable contains only the first line of "[begin]title: balabala..."
        Last edited by Yuqing Hu; 03 Mar 2017, 09:25.

        Comment


        • #5
          "$" is not a delimiter in your example. A delimiter is something that separates the content of different variables, which it does not do in your case.

          To use my code, you need to read each line as one long string. This is usually a good strategy for complex text data.

          Your -import delimited- command told Stata to separate each line into different variables, breaking it at each "$". Instead, you want to read each line as a string without the line being broken up into variables by a delimiter. Below, in my command, I use "zzz" as a "fake" delimiter because I this sequence of characters is very unlikely to occur anywhere in your text. The result will be that no delimiter will be found by Stata, so that each line will be treated as one string. (I'm sure there's a nicer way to make -import delimited- not use any delimiter, butI don't recall how to do that.)

          Code:
          import delimited "story.txt" , delimiter("zzz", asstring)
          rename v1 s
          Once you have read the text file as I describe, you should apply the previous code I suggested.

          Comment


          • #6
            Thank you very much!

            Comment

            Working...
            X