Visually indistinguishable variable name

Zhang_Lu

Join Date: Oct 2014

Posts: 155
#1

Visually indistinguishable variable name

15 May 2018, 11:25

My task is to append many separated .dta files to create a panel dataset. For convenience, I used the Chinese characters in the first row as the variable names. What made me mad is, after appending, there are many visually indistinguishable variable name coexist in the paned dataset, like

It's most likely there are some invisible spaces around one of the variable name. But apparently the --trim-- command did not work in my case. This thread https://www.stata.com/statalist/arch.../msg00891.html provides me some intuition, but I cannot quite figure out how to read (and use) the result from --charlist-- command.
In the end, a data sample is attached and hopefully someone can check it and give me some advice. In the data sample , there are firm id and year id, plus three pairs of visually indistinguishable variable names, I also attached several .dta files before append. My stata version is Stata MP14
https://www.dropbox.com/s/0pqef9hgta...ample.rar?dl=0
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 36058
#2

15 May 2018, 11:33

You're citing a post of mine from 2005 and it's flattering to think that it might stll be useful. Sad facts are

1. The function trim() should be irrelevant to variable names. I don't think that spaces can occur in variable names.

2. charlist (SSC) is documented as for Stata 9 up. So, it knows nothing about Unicode except by accident. I haven't tried to update it. I am sadly ignorant about most languages except English and a few others and that doesn't let me revise or test the program on (e.g.) anything Chinese, Japanese, Korean or Vietnamese. In any case charlist tells you about characters in a string variable. It doesn't apply directly to variable names, although that isn't the main problem.

I'd go to StataCorp on this if you don't get a good answer. They have Chinese speakers on the strength.
Comment

Announcement

Visually indistinguishable variable name

Comment