Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New line or End of line in strl variables or Scalars

    Dear all,

    when using the function filread() to load a do-file (or any text file) in Stata, I get the entire text in my strL variable. If I display the variable I can see that STata understands where the end of the line is. In fact, if I use scalars for keeping part of the information, of such strL variable, Stata still understand the end of the line (which I cannot do with macros). Is there a way to know how to identify such End Of Lines in either the strL variable or the Scalar?

    Thank you so much,

    Pablo.
    Last edited by Pablo Bonilla; 18 Dec 2015, 09:29. Reason: scalars strL
    Best,
    Pablo Bonilla

  • #2
    You could likely use unicode regular expressions with the \n metacharacter to match the new line characters, but beyond that it isn't clear exactly what you want to do.

    Comment


    • #3
      Thank you for your answer, Wbuchanan.

      I have Stata 13.1 so I don’t know if the Unicode solution works for me. Let me tell you what I am trying to do. On the one hand, Let’s say that I have some information in my locals a, b, and c, and I need to put those three together in a single scalar such that when I display the scalar I read
      Code:
      disp `a’ 
      disp `b’
      disp `c’
      .
      That is, I need to see the information of each local in a different line, while they are contained in one single scalar. Do you know can I do that? I tried using \n, but it didn’t work or at least I didn’t know how to do it…

      On the other hand, I need to load into Stata a text file with specific information in each line and then analyze each line using regular expressions. I have found several ways to do this. One way is to use the command file read and go line by line analyzing the file. I found this a little inefficient because given that I need to find different patterns in the text and then put the lines together, I have to load the file over and over again. Now I found this other way in which I load the text file into stata and each line of the file is a different observation.

      Code:
      drop _all
      set obs 100000
      tempname myfile
      local file "myfile.txt"
      gen line = .
      gen strL code = ""
      file open `myfile' using "`file'"  , read
      file read `myfile' line
      local i = 1
      qui while r(eof)==0 {
                  replace line = `i' in `i'
                  replace code = `"`macval(line)'"' in `i'
                  file read `myfile' line
                  local i = `i' + 1
      }
      However, I now found that using the function fileread() I can load the entire text file into a single observation of my database!! That is great. The problem that I am facing now is that I have no idea how to identify the end of lines of my large observation that contains my whole text file.

      Thank you for your help!
      Pablo
      Best,
      Pablo Bonilla

      Comment


      • #4
        Pablo, you can store the whole dataset (not just a text file) in a single cell, as long as its size is less than the 2GB limit. However, this rarely is useful for processing, but more to embed the data into the file for storage and transfer (to avoid any dependent files lost).

        Now you are writing that you want to display the contents of the string on several lines.
        I need to see the information of each local in a different line, while they are contained in one single scalar.
        Unfortunately Stata's display command will not let you do that easily, mata will display the codes natively, but you can use a format trick:

        Code:
        local a "alpha"
        local b "beta"
        local c "gamma"
        
        local crlf "`=char(10)'`=char(13)'"
        local combo "`a'`crlf'`b'`crlf'`c'"
        
        display "`combo'"                  /* one */
        mata st_local("combo")             /* two */
        mata printf(st_local("combo"))     /* three */
        Characters with ASCII codes 10 and 13 are the archaic teletype "Line feed" and "Carriage return" commands.

        Best, Sergiy Radyakin

        Comment


        • #5
          I'm not sure if the older regex commands work with the new line meta character, but I recently started working on a regex package that will use the regex capabilities available in Java. Still need to work out the organization of things, but I wanted to set things up similar to the native regex commands in Stata (e.g., regexm, regexr, etc...) and the Java implementation of regular expressions seems to be much closer to Perl and/or Posix implementations of regular expressions (e.g., support for curly braces, meta characters, character classes, etc...). Otherwise, your code will likely need to leverage the regexr, regexm, and/or regexs functionality.

          Comment


          • #6
            Hi Sergiy,
            Thank you for your answer. Your solution is more than appropriate. In fact, I just tested it with scalars and it work perfect!!! Thank you so much. I just changed your last line for
            Code:
            scalar combo = "`a'`crlf'`b'`crlf'`c'"
            disp combo
            alpha beta gamma
            Wbuchanan, I think it would be great you can do something like that. That will be extremely useful. Please, keep us posted!!

            Thank you!
            Pablo
            Best,
            Pablo Bonilla

            Comment

            Working...
            X