Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is ~ a valid character in a variable name?

    -import delimited- and -insheet- create variable names with a tilde when the name in the first line of the data file too long. But I find that the -generate- command won't allow such names on the right-hand side. I can -rename-, but I would prefer to avoid the extra work. For example here I can -describe- or -rename- a variable, but -generate- thinks it has an "invalid name".:

    Code:
    import delimited using /tmp/Links_2007.txt
    (18 vars, 99,999 obs)
    
    . des shareholderbv~d
    
                  storage   display    value
    variable name   type    format     label      variable label
    -------------------------------------------------------------------------------
    shareholderbv~d str19   %19s                  Shareholder BvD ID
    
    . gen x=shareholderbv~d
    shareholderbv~d invalid name
    r(198);
    
    . rename shareholderbv~d x0
    
    . des x
                  storage   display    value
    variable name   type    format     label      variable label
    -------------------------------------------------------------------------------
    x               str19   %19s                  Shareholder BvD ID
    Am I overlooking something very simple? Any suggestions?

    Daniel Feenberg
    NBER

  • #2
    From the Stata manual: "A name is a sequence of 1 to 32 letters (A–Z, a–z, and any Unicode letter), digits (0–9), and underscores ( )."

    Comment


    • #3
      -import delimited- and -insheet- create variable names with a tilde when the name in the first line of the data file too long. But I find that the -generate- command won't allow such names on the right-hand side. I can -rename-, but I would prefer to avoid the extra work.
      Not so. Rather you can get shown an abbreviated name if the name assigned is uncomfortably long.

      The same thing can happen directly with generate:

      Code:
      . clear
      
      . set obs 10
      number of observations (_N) was 0, now 10
      
      . gen somethingabsurdlylong = 42
      
      . d
      
      Contains data
        obs:            10                          
       vars:             2                          
       size:            80                          
      --------------------------------------------------------------------------------------------------------------------------------------------------
                    storage   display    value
      variable name   type    format     label      variable label
      --------------------------------------------------------------------------------------------------------------------------------------------------
      y               float   %9.0g                
      somethingabsu~g float   %9.0g                
      --------------------------------------------------------------------------------------------------------------------------------------------------
      Sorted by:
           Note: Dataset has changed since last saved.
      
      . describe, fullnames
      
      Contains data
        obs:            10                          
       vars:             1                          
       size:            40                          
      --------------------------------------------------------------------------------------------------------------------------------------------------
                    storage   display    value
      variable name   type    format     label      variable label
      --------------------------------------------------------------------------------------------------------------------------------------------------
      somethingabsurdlylong
                      float   %9.0g                
      --------------------------------------------------------------------------------------------------------------------------------------------------
      Sorted by:
           Note: Dataset has changed since last saved.

      Comment


      • #4
        The tilde isn't in the data. -import delimited- added it. Am I the only one that doesn't consider it a bug when -import delimited- creates a variable with an invalid name? I suppose it is "by design"?

        Comment


        • #5
          Ah, now we are getting someplace. The modified name is an abbreviation for the full name, and the abbreviation is acceptable in some places, but not others. The original name had spaces - what happens to them?

          Comment


          • #6
            Posts are crossing here, but import delimited does nothing wrong on your evidence. The only question arising from #1 is what you can see reported.

            Now in #5 there is something new: If the "original name" had spaces, they aren't allowed in Stata variable names and presumably were removed. If the variable label is
            Shareholder BvD ID then my guess is that the spaces were removed or replaced with underscores, but the fullnames option in #3 removes any need to guess.

            Comment


            • #7
              OK, I will rename my variables to something sensible.

              Daniel Feenberg

              Comment


              • #8
                OK, I will rename my variables to something sensible.

                Daniel Feenberg

                Comment


                • #9
                  In fact looking at #1 allows the guess

                  shareholderbvdid for that variable name. If so, it's just one character longer than will fit without abbreviation.

                  Comment

                  Working...
                  X