Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with curvy quotations marks when doing a merge.

    Hello,

    I am using Stata18. I am attempting to merge with string variables. However, some of the string variables in my master data have curly quotation marks like ” or like ’. I cannot replace them directly in my do file as when I tried to replace them, it appears directly " instead of ” even I copy the ” directly.

    Then, I am trying different solutions but none of them have worked. Here are some of the attempts I made:

    First solution:

    Code:
    replace language_name = subinstr(language_name, "\u201D", `"\""', .)

    Second solution:

    Code:
    local quote "\""
    replace language_name = subinstr(language_name, "\u201D", "`quote'", .)
    These attempts resulted in errors of invalid syntax or did not have any effect. Any alternative to deal with this issue?

    Thank you.
    Last edited by Diego Malo; 05 May 2024, 03:10.

  • #2
    I would try

    replace language_name = subinstr(language_name, XXX, YYY , .) where XXX and YYY are replaced by calls to uchar() or ustrunescape() chartab from SSC can help mightily here.

    Comment


    • #3
      Thanks a lot Nick Cox

      I could solve the issue of the ” with the following code:

      Code:
      replace language_name = subinstr(language_name, uchar(8221), uchar(34), .)
      However, I am still finding issues with ’. I am using the followind code:

      Code:
      replace language_name = subinstr(language_name, uchar(2019), uchar(27), .)
      However, It does not have any effect. Do you know why?

      Comment


      • #4
        That looks like uchar(39) to me. If it's something else, then as said chartab from SSC will tell you what you have.

        Comment


        • #5
          Thanks a lot for the help. It works! I write the code I did in case it can be helpful to anyone else:

          Code:
          ssc install chartab
          
          chartab language_name, noascii
          
          
             decimal  hexadecimal   character |     frequency    unique name
          ------------------------------------+---------------------------------------------
               8,217       \u2019       ’     |           207    RIGHT SINGLE QUOTATION MARK
          
          
          replace language_name = subinstr(language_name, uchar(8217), uchar(27), .)

          Comment

          Working...
          X