Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replace unicode characters in a string variable

    Hello!

    I have the following question. I have a string variable with several unicode characters. An example is:

    Verantwoord, hygi<U+0091>nisch, vers en lekker, <U+008F>n met liefde bereid! U krijgt de beste ingredi<U+0091>nten, liefst uit eigen tuin.
    It seems that characters like ë and é are omitted.

    Is there a way of solving this problem?

    Thank you in advance!

  • #2
    This is not a Unicode string but a string that contains Unicode escape sequences. Also, the hex numbers appear to be wrong. Here are the correct codes:

    Code:
    . dis ustrtohex("ëé")
    \u00eb\u00e9
    The simplest thing to do is to manually replace each escape sequence with the expected character. Something like:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str200 s
    "Verantwoord, hygi<U+0091>nisch, vers en lekker, <U+008F>n met liefde bereid! U krijgt de beste ingredi<U+0091>nten, liefst uit eigen tuin."
    end
    
    clonevar s2 = s
    replace s2 = usubinstr(s2,"<U+0091>","ë",.)
    replace s2 = usubinstr(s2,"<U+008F>","é",.)
    list s2

    Comment


    • #3
      Thank you Robert!

      Comment

      Working...
      X