Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble merging datasets with variables containing special (Portuguese) characters

    I am using Stata 17, and I am having trouble merging between datasets based on variables containing Portuguese characters. It looks like both datasets are Unicode translated already (although one of them is much older). When I try to marge, variable values with special characters (such as â) don't get matched.

    Another clue, if I tab the variable name, it appears as it should in the output. Example: "luziânia"
    Yet in the log file it looks like this: "luzia^nia".

    Can anyone point me in the right direction?


  • #2
    HTML Code:
    https://www.stata.com/statalist/archive/2010-11/msg00100.html
    https://www.statalist.org/forums/forum/general-stata-discussion/general/1413550-remove-special-characters-from-string

    Comment


    • #3
      I usually run the following to remove such characters and standardize names:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str10 name
      "luziânia"
      end
      
      replace name = ustrto(ustrnormalize(name, "nfd"), "ascii", 2)
      Res.:
      Code:
      . l
           +----------+
           |     name |
           |----------|
        1. | luziania |
           +----------+
      
      .

      Comment

      Working...
      X