Working with Khmer texts in Stata 14

Sergiy Radyakin

Join Date: Apr 2014

Posts: 1831
#1

Working with Khmer texts in Stata 14

18 Jan 2017, 03:58

Khmer language is covered by Unicode : https://en.wikipedia.org/wiki/Khmer_alphabet#Unicode

Stata 14.1 however shows Khmer text differently from what is being entered:

The text looks similar, but apparently contains some characters that are not to be displayed, but to be treated as modifiers (to the extent that I understand Khmer).
The text is copied from notepad, where it is displayed correctly, which gives me some confidence that all relevant fonts/extensions are setup correctly on my workstation, and the problem only occurs in Stata as far as I can see.

The problem occurs with direct text input, and also with byte-by-byte transfer of data directly into Stata file, which hints that the data is probably entered and processed correctly and this is solely the rendering problem. Here is what else is known:
rendering problem occurs in output window following the display and list commands.

when the data is exported and viewed with external software the Khmer texts are again displayed correctly.

when text is copied from output window to the clipboard and pasted into the command window it renders correctly again.

I didn't find any limitations in the Stata manual with regards to handling of Khmer characters.
If there are any particular steps for configuring Stata to work with Khmer language (and render exactly as notepad does), please advise.
I have tried switching all fonts available (Courier New, Consolas, Lucinda Console, and others), Script setting is "Western" as no other settings makes more sense.

If there is a Khmer-knowledgeable reader in the forum, could you please indicate whether the text shown on the black background is still readable in Khmer, or it is nonsense and only text shown on the white background is readable.

An example text is attached to this message.

Thank you, Sergiy Radyakin

Attached Files

khmer_text.txt (78 Bytes, 1 view)
Tags: Khmer, unicode
Hua Peng (StataCorp)

StataCorp Employee

Join Date: Jun 2014

Posts: 322
#2

18 Jan 2017, 09:30

Stata's Results window is not designed for scripts that are considered to be "complex", which require complex text layout. This includes the Arabic alphabet and most of the Brahmic family of scripts (I believe Khmer is part of). See

https://en.wikipedia.org/wiki/Complex_text_layout

Stata's Results window uses a "simple" text layout, which requires text is left to right and the characters are fixed width. East Asian language (Japanese, Chinese, etc) characters are fixed width, just wider than Latin characters, and thus they work. Scripts which require varying-width characters will not display well in a fixed-width-font-based window.
1 like
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1831
#3

18 Jan 2017, 10:30

Dear Hua Peng,

thank you very much for looking into the issue.
If it is only rendering in the output window then it's nothing serious, as long as I/O, processing, and reporting happens correctly.

Best regards, Sergiy
Comment
Hua Peng (StataCorp)

StataCorp Employee

Join Date: Jun 2014

Posts: 322
#4

18 Jan 2017, 10:55

Yes, it should be a display only issue.
1 like
Comment

Announcement

Working with Khmer texts in Stata 14

Comment

Comment

Comment