Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Working with Khmer texts in Stata 14

    Khmer language is covered by Unicode : https://en.wikipedia.org/wiki/Khmer_alphabet#Unicode

    Stata 14.1 however shows Khmer text differently from what is being entered:
    Click image for larger version

Name:	khmer.png
Views:	1
Size:	17.4 KB
ID:	1370668


    The text looks similar, but apparently contains some characters that are not to be displayed, but to be treated as modifiers (to the extent that I understand Khmer).
    The text is copied from notepad, where it is displayed correctly, which gives me some confidence that all relevant fonts/extensions are setup correctly on my workstation, and the problem only occurs in Stata as far as I can see.

    The problem occurs with direct text input, and also with byte-by-byte transfer of data directly into Stata file, which hints that the data is probably entered and processed correctly and this is solely the rendering problem. Here is what else is known:
    • rendering problem occurs in output window following the display and list commands.
    • when the data is exported and viewed with external software the Khmer texts are again displayed correctly.
    • when text is copied from output window to the clipboard and pasted into the command window it renders correctly again.
    I didn't find any limitations in the Stata manual with regards to handling of Khmer characters.
    If there are any particular steps for configuring Stata to work with Khmer language (and render exactly as notepad does), please advise.
    I have tried switching all fonts available (Courier New, Consolas, Lucinda Console, and others), Script setting is "Western" as no other settings makes more sense.

    If there is a Khmer-knowledgeable reader in the forum, could you please indicate whether the text shown on the black background is still readable in Khmer, or it is nonsense and only text shown on the white background is readable.

    An example text is attached to this message.

    Thank you, Sergiy Radyakin
    Attached Files

  • #2
    Stata's Results window is not designed for scripts that are considered to be "complex", which require complex text layout. This includes the Arabic alphabet and most of the Brahmic family of scripts (I believe Khmer is part of). See

    https://en.wikipedia.org/wiki/Complex_text_layout

    Stata's Results window uses a "simple" text layout, which requires text is left to right and the characters are fixed width. East Asian language (Japanese, Chinese, etc) characters are fixed width, just wider than Latin characters, and thus they work. Scripts which require varying-width characters will not display well in a fixed-width-font-based window.

    Comment


    • #3
      Dear Hua Peng,

      thank you very much for looking into the issue.
      If it is only rendering in the output window then it's nothing serious, as long as I/O, processing, and reporting happens correctly.

      Best regards, Sergiy

      Comment


      • #4
        Yes, it should be a display only issue.

        Comment

        Working...
        X