Converting ASCII special characters to UTF-18

Bruce Arnold

Join Date: May 2024

Posts: 3
#1

Converting ASCII special characters to UTF-18

02 May 2024, 04:33

Today I installed Stata 18BE on both a Windows 10 and Windows 11 machine. I have a number of .do files from Stata 12 (pre-Covid, long story) to produce graphs suitable for inclusion into lecture slides and manuscripts.

In my old .do files, I have a number of instances of the following line:
note("Figure `chnumber' `=char(150)' `gnumber'", ...)
This labels my graphs "Figure 3–1", "Figure 3–2", and so forth.

When I run the .do files in Stata 18, my graphs read "Figure 3?1", "Figure 3?2", and so forth.

I understand that this is probably a function of Stata's switch to UTF-8. But I've spent hours searching for a way to fix my code, to no avail. I tried the unicode analyze function on an old .do file, only to learn that my "File does not need translation". All I need is a way to embed an en dash in my note.

Apologies for cluttering up the forum with such a trivial issue, but I'm at a loss. Thank you to anyone who can help.
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#2

02 May 2024, 06:05

Originally posted by Bruce Arnold View Post

All I need is a way to embed an en dash in my note.

Try something like the following.

Code:

note("Figure `chnumber'`=uchar(8211)'`gnumber'", ...)
Comment
Bruce Arnold

Join Date: May 2024

Posts: 3
#3

03 May 2024, 00:57

Dear Mr(?) Coveney,

I am grateful for your quick response. I see from your profile that you seem to have form in helping people out!

uchar(8211) worked the treat. From what I thought was a definitive table of UTF-8 characters, I was working on the misapprehension that uchar(2013) was associated with an en dash, and uchar(2014) with an em dash. It would have taken quite some time for me to work up to uchar(8211). Obviously I didn't find the table that is, in fact, definitive. If you have the time and inclination (and again, you've already been most generous with your time and knowledge), would you be so kind as to send me a link to YOUR definitive table of UTF-8 codes?

I also noted that Stata 18 doesn't give me perfect alignment when using SMCL with titles, axis labels, etc. For example, a graph using my old code has a sigma (mu too) slightly lower than a preceding numeral. If there is a fix, or if Stata 18 uses a new variation on old SMCL of which I should be aware, I'd appreciate a point in the right direction.

Again, thank you very much.

Best regards.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#4

03 May 2024, 17:30

Originally posted by Bruce Arnold View Post

. . . would you be so kind as to send me a link to YOUR definitive table of UTF-8 codes?

I just Googled en dash Unicode and took the first hit that looked helpful. I didn't make a note of its URL, sorry.

Your problem is that what you tried to use is the hex value ("U+2013"). uchar() takes the decimal value instead.

For the hex value you'd use the ustrunescape() function, e.g., ustrunescape("\u2013"), but that can get a little tricky because it can give rise to nested double-quotation marks in your code.

For extended ANSI characters in do-files, I find it easiest to type in the character. For the en dash, it's Alt+0150, that is, holding down the Alt key and using the keypad for the ANSI code point. If I forget the code point, which is often for less frequently used characters, then I just use the Windows Character Map application and copy-paste the character into the do-file. (This keyboard-entry tactic works even at the command line.)

I haven't noticed any alignment problems with SMCL directives in graph text and so I cannot help you there.
Comment
Bruce Arnold

Join Date: May 2024

Posts: 3
#5

03 May 2024, 21:04

Once again, thank you for your help.

I recall having tried inserting a special character into my .do files, but it must not have been all that successful, hence my reverting to =char(). I'll give it another go. And thanks for the tips on the ustrunescape() function. I'll give it a try.

Thanks again, and cheers.
Comment

Announcement

Converting ASCII special characters to UTF-18

Comment

Comment

Comment

Comment