The Stata manual is clear about what a variable name may be, see 11.3 here:
http://www.stata.com/manuals13/u11.pdf
if you don't want to read the rather lengthy post below, just make sure you always follow the rules in #11.3 of the manual.
Some other manuals reduce the definition or try to rephrase it in their own words. For example, here is the SAS interpretation (can you spot two mistakes?):
http://support.sas.com/documentation...a003103776.htm
Which evidently are still not fixed in 9.3 docs:
http://support.sas.com/documentation...9b9qsz4sp2.htm
Rephrasing like this is not a good idea. Why not link to the source?
Occasionally a dataset pops up with illegal variable names. A few user-written commands, including mine usespss (this bug was fixed in Nov 2012 version) were to blame for retaining some not allowed characters in variable names.
Other software may be guilty of that too:
http://www.stata.com/statalist/archi.../msg00124.html
However Stata itself seems to fail validation of variable names.
I believe the following code should stop with an error 198 at the first generate, not the last one:
186 could be preparation for unicode, but
should never be valid.
The problem is of course not with generate. Other commands are equally affected (take for example egen, recode). From this the most likely culprit is the syntax command.
Interestingly, older Stata's behaved differently. For example, Stata 5 used to strip the illegal characters from the variable name, and thus never created such malformed names. I believe the bug might be related to the total overhaul of the rename command, which happened somewhere around version 12:
http://www.stata.com/statalist/archi.../msg01146.html
or earlier changes, since another similar bug (allowed spaces) was discovered by Roger Newson and confirmed by Bill Gould in 2009:
http://www.stata.com/statalist/archi.../msg00016.html
(or is the new bug result of the fix to this one?)
The discovery is coming from debugging the code that should convert foreign variable names to valid Stata variable names that came stumbling upon the following cases (all texts are coming from the file on disk and simulated here with char()):
My expectation is that whatever st_isname() confirms and whatever strtoname() returns should be generate-able valid Stata variable name.
The manual for st_isname() and strtoname() is using "Stata name" lingvo and never mentions "Stata variable name" directly, which I imply from it. If the same Mata functions are used by the syntax internally, then this can explain the above behavior.
IMHO: strtoname() and st_isname() should both be aware of
1) illegal characters for Stata variable names;
2) blacklisted (reserved) names: byte, long, etc as shown in the manual.
Best, Sergiy Radyakin
http://www.stata.com/manuals13/u11.pdf
if you don't want to read the rather lengthy post below, just make sure you always follow the rules in #11.3 of the manual.
Some other manuals reduce the definition or try to rephrase it in their own words. For example, here is the SAS interpretation (can you spot two mistakes?):
http://support.sas.com/documentation...a003103776.htm
Which evidently are still not fixed in 9.3 docs:
http://support.sas.com/documentation...9b9qsz4sp2.htm
Rephrasing like this is not a good idea. Why not link to the source?
Occasionally a dataset pops up with illegal variable names. A few user-written commands, including mine usespss (this bug was fixed in Nov 2012 version) were to blame for retaining some not allowed characters in variable names.
Other software may be guilty of that too:
http://www.stata.com/statalist/archi.../msg00124.html
However Stata itself seems to fail validation of variable names.
I believe the following code should stop with an error 198 at the first generate, not the last one:
Code:
version 13.0 clear all sysuse auto generate F`=char(186)'=32 generate C`=char(186)'=0 generate `=char(186)'F=99
Code:
generate K`=char(13)'=1
The problem is of course not with generate. Other commands are equally affected (take for example egen, recode). From this the most likely culprit is the syntax command.
Interestingly, older Stata's behaved differently. For example, Stata 5 used to strip the illegal characters from the variable name, and thus never created such malformed names. I believe the bug might be related to the total overhaul of the rename command, which happened somewhere around version 12:
http://www.stata.com/statalist/archi.../msg01146.html
or earlier changes, since another similar bug (allowed spaces) was discovered by Roger Newson and confirmed by Bill Gould in 2009:
http://www.stata.com/statalist/archi.../msg00016.html
(or is the new bug result of the fix to this one?)
The discovery is coming from debugging the code that should convert foreign variable names to valid Stata variable names that came stumbling upon the following cases (all texts are coming from the file on disk and simulated here with char()):
Code:
. mata strtoname(".N") _N . mata strtoname("F`=char(186)'") FÂș . mata st_isname("F`=char(186)'") 1
The manual for st_isname() and strtoname() is using "Stata name" lingvo and never mentions "Stata variable name" directly, which I imply from it. If the same Mata functions are used by the syntax internally, then this can explain the above behavior.
IMHO: strtoname() and st_isname() should both be aware of
1) illegal characters for Stata variable names;
2) blacklisted (reserved) names: byte, long, etc as shown in the manual.
Best, Sergiy Radyakin
Comment