Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Visually indistinguishable variable name

    My task is to append many separated .dta files to create a panel dataset. For convenience, I used the Chinese characters in the first row as the variable names. What made me mad is, after appending, there are many visually indistinguishable variable name coexist in the paned dataset, like

    It's most likely there are some invisible spaces around one of the variable name. But apparently the --trim-- command did not work in my case. This thread https://www.stata.com/statalist/arch.../msg00891.html provides me some intuition, but I cannot quite figure out how to read (and use) the result from --charlist-- command.
    In the end, a data sample is attached and hopefully someone can check it and give me some advice. In the data sample , there are firm id and year id, plus three pairs of visually indistinguishable variable names, I also attached several .dta files before append. My stata version is Stata MP14
    https://www.dropbox.com/s/0pqef9hgta...ample.rar?dl=0

  • #2
    You're citing a post of mine from 2005 and it's flattering to think that it might stll be useful. Sad facts are

    1. The function trim() should be irrelevant to variable names. I don't think that spaces can occur in variable names.

    2. charlist (SSC) is documented as for Stata 9 up. So, it knows nothing about Unicode except by accident. I haven't tried to update it. I am sadly ignorant about most languages except English and a few others and that doesn't let me revise or test the program on (e.g.) anything Chinese, Japanese, Korean or Vietnamese. In any case charlist tells you about characters in a string variable. It doesn't apply directly to variable names, although that isn't the main problem.

    I'd go to StataCorp on this if you don't get a good answer. They have Chinese speakers on the strength.

    Comment

    Working...
    X