Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Paragraph as special character in string variables

    I import data from excel, whereas some cells contain paragraph(s) (which are indeed created by Left Alt + Enter in Excel) as an example showed in picture 1. The importting process seems normal, but it is obviously could not show the "paragraph character" (See picture 2). I have tried -dataex- and hope that it may help to capture such character, but fail. Indeed, -dataex- does show an output to copy (code below), but it does not work.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str7 A str4 S double a
    "aaa
    bbb" "A100" 7
    "
    "       ""     1
    end
    To my knowledge, Stata does understand the "hidden" character represnting "a paragraph", since the length of the string variable does count this "hidden" one. However, I face an obstacle to edit or modify (in code) for that string variable.

    It would be highly appreciated if anyoen could teach me how to type ot capture (in code) for such "hidden paragraph character".

    Click image for larger version

Name:	Capture2.PNG
Views:	1
Size:	5.0 KB
ID:	1613748
    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	6.0 KB
ID:	1613749


  • #2
    The character is probably newline (UTF-8 LINE FEED (LF)). The font in the data browser does not show this character. Try using the -display- command
    Code:
    display A[1]
    Last edited by Bjarte Aagnes; 08 Jun 2021, 11:55.

    Comment


    • #3
      Many thanks, Bjarte. The -display- command does show the content of the "cell" properly, i.e aaa"/newline"bbb. But my problem is that I want to change it to aaa"/newline"ccc, but do not know how to do it. Kindly instruct me.
      Click image for larger version

Name:	a.PNG
Views:	1
Size:	1.2 KB
ID:	1613788
      Last edited by Diana Yoko; 08 Jun 2021, 17:04.

      Comment


      • #4
        Code:
        gen wanted = ustrregexra(A,"\x0A", `""\\x0A""')
        Last edited by Bjarte Aagnes; 09 Jun 2021, 02:00.

        Comment


        • #5
          Bjarte Aagnes, thanks. But kindly instruct me further, since I do not understand the code in #4. I run it and the output looks strange.

          Comment


          • #6
            Code:
            * see
            
            help ustrregexra() // above replaces substrings that match "\x0A" (LF) with `""\\x0A""' ("\x0A")  
            
            * in \\ the first backslash character is an escape character https://www.stata.com/support/faqs/programming/backslashes-and-macros/
            
            help quotes // compound double quotes (`" some string with quotes "')
            
            * \xhh    Match the character with 2 digit hex value hh. https://unicode-org.github.io/icu/userguide/strings/regexp.html
            
            * Unicode Character 'LINE FEED (LF)' (U+000A) https://www.fileformat.info/info/unicode/char/000a/index.htm

            Comment

            Working...
            X