Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bad variable names

    In principle, variable names should only include the underscore, letters a-z and A-Z and numeric characters 0-9.

    But in practice problems can creep in somehow or another. See for example this thread started by Jeph Herrin at

    http://www.stata.com/statalist/archi.../msg00385.html

    and this thread started by Sezer Alcan at

    http://www.stata.com/statalist/archi.../msg00217.html

    I've put together a quick hack of a program badvarnames that tells you about bad variable names.

    Here it is.
    #

    Code:
    *! 1.0.0 NJC 16 April 2014
    program badvarnames
        version 10
        syntax [varlist]
    
        local I = 0
        local length = 0
    
        foreach v of var `varlist' {
            mata : st_local("chars", invtokens(strofreal(ascii("`v'"))))
    
            foreach c of local chars {
                local bad
                if inrange(`c', 48, 57) {
                    * OK: 0 to 9
                }
                else if inrange(`c', 65, 90) {
                    * OK: A to Z
                }
                else if inrange(`c', 97, 122) {
                    * OK: a to z
                }
                else if `c' == 95 {
                    * OK: _
                }
                else local bad `bad' `c'
            }
    
            if trim("`bad'") != "" {
                local ++I
                local name`I' "`v'"
                local bad`I'  "`bad'"
                local length = max(`length' + 5, length("`v'") + 5)
            }
        }
    
        if `I' {
            di
            forval i = 1/`I' {
                di "|`name`i''|{col `length'}`bad`i''"
            }
        }
    end
    and here it is in action in Stata 10.1 (tested to see how far back the program would go):

    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . rename rep78 rep78`=char(160)'
    
    . badvarnames
    
    |rep78 |  160
    I could add char(160) to the end of my variable name (I don't recommend this!) but the program showed me a trailing character and printed out the ASCII code for the problematic character.

  • #2
    How is it even possible to rename or create variables such that they have illegal characters in them? In other words, why doesn't Stata reject any attempts to create such variables?

    Comment


    • #3
      That is a good question to which I have no answer.

      Comment


      • #4
        Maybe they assumed (probably correctly) that no one would be stupid enough to purposely do such a thing. Presumably this would only occur by accident in cases where data sets had been created by other programs (e.g., StatTransfer).

        Comment

        Working...
        X