Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Put shortname in a macro

    Is there any way to get the shortname of a variable into a macro, given the longname? Something similar to the macro extended function "variable label", but that returns the shortname. I have a -foreach- loop:

    Code:
    foreach name of local varlist {
       file write varfile "`name'" _n
    }
    but I would liike to add the shortnames to each line of output.

  • #2
    I know of no way to do this since there's no abbrev function that I know of, but why not extract, say, the first 5 characters from each variable and use that as an abbreviation?

    Comment


    • #3
      The output is for documentation of the file, so I need the shortnames that Stata uses.

      Comment


      • #4
        Sorry, this may be my lack of coffee, but what do you mean by short and long names?

        Comment


        • #5
          Okay, so I think what you may be getting at is how to convert a full descriptive name of a variable (longname) to a Stata-compatible varible name (shortname). There are two functions to achieve this goal.

          -strtoname()- creates Stata 13 compatible names, truncating to 32 bytes.
          -ustrtoname()- creates Stata 14 (or later) compatible names, truncating to 32 characters.
          The documentation disambiguates string bytes, characters and display columns, but for simple ASCII, these are all the same.

          A straightforward illustration of a difference you may encounter is shown in observation #2, which includes as a first letter the accented á character. This counts as 2 bytes but 1 character, so the resulting name differs depending on the function used (in this case because the longname is longer than 32 bytes).

          Code:
          clear
          input str80(longname)
          "This is a very long name that can't possibly work as a variable name"
          "ábcdefghijklmnopqrstuvwxyz012345"
          "version"
          end
          compress
          
          gen short1 = strtoname(longname)
          gen short2 = ustrtoname(longname)
          list
          Res.:

          Code:
          . list
          
               +--------------------------------------------------------------------------------------------------------------------------------------------+
               |                                                             longname                             short1                             short2 |
               |--------------------------------------------------------------------------------------------------------------------------------------------|
            1. | This is a very long name that can't possibly work as a variable name   This_is_a_very_long_name_that_ca   This_is_a_very_long_name_that_ca |
            2. |                                     ábcdefghijklmnopqrstuvwxyz012345    ábcdefghijklmnopqrstuvwxyz01234   ábcdefghijklmnopqrstuvwxyz012345 |
            3. |                                                              version                            version                            version |
               +--------------------------------------------------------------------------------------------------------------------------------------------+

          Comment


          • #6
            Close, but not quite what I want. My long names are not the values of a variable, but the variable names themselves that can be part of a Stata program. I don't think the Stata documentation has the right vocabulary for distinguishing the two. I get the longnames from code like this:

            Code:
            local varlist `r(varlist)'
            I can see that there is an extended macro function

            Code:
             local label_varname : label  varname
            That assigns the label of varname to local macro label_varname. I was just hoping there was some similar but undocumented way of obtaining the shortname without editing the log file.

            Comment


            • #7
              I only used the dataset to illustrate a technique, but nothing prevents you from using those functions in macro assignments. Extending that to loops is then possible once we know what we need. For example:

              Code:
              local longname rep78
              local shortname = ustrtoname("`longname'")
              mac list _longname _shortname
              If longname is your variable name (from #3) then I'm still not sure what you mean by shortname. Can you show an example without code that may help us to better understand your query?

              Edit: in retrospect, short name must refer to something else since long name is already coming from a (list of) valid variable names.
              Last edited by Leonardo Guizzetti; 07 Aug 2022, 08:11.

              Comment


              • #8
                Probably I should have been saying "fullname" instead of "longname". I don't have a documented way to refer to the abbreviation of a variable name that Stata makes by replacing some of the middle characters of a name with a tilde (~). But that is what I wanted to get into a macro. In the list command, you get the tilde version. For example, if the variable is named x123456012345601234560 the output from list is headed x12345~0.

                However, I am going to terminate this discussion now, since through some experimentation I have just learned that the ~ version is not an alias for the fullname, and is therefore of no use to my users, and therefore I don't need to include it in the documentation. It was just an assumption on my part that turned out to be false.

                We were sent the dataset as a set of tab delimited files where the first row contains variable descriptions rather than variable names. Stata makes these into very long names that are a pain to type. Hence my interest in the tilde versions.

                My apologies to everyone. Sorry for the noise!

                Daniel Feenberg

                Comment


                • #9
                  FWIW, the tilde character is what Stata uses to fit long variable names into a fixed display format. Fixing the display format to 8, we get

                  Code:
                  . display abbrev("x123456012345601234560", 8)
                  x12345~0
                  Note that (i) you can use the abbrev() function with macros, and (ii) 8 is the minimum number of characters that the function allows.

                  Anyway, the tilde can also be used as a wildcard character. It is similar to the asterisk in that it matches 0 or more characters but differs from the asterisk in that only one variable name is allowed to match. This is (incorrectly*) documented in

                  Code:
                  help varlist

                  The documentation implies that the tilde character matches 1 or more characters (like the question mark); that is not the case. The tilde matches 0 or more characters.

                  Comment


                  • #10
                    The crucial aspect here is that Stata's abbrev() does not promise that its output should be unabbreviated to identify exactly one variable. I don't know if that's the case with the shortnames issue.

                    In contrast, Mata's abbrev() command *can* be used for unique abbreviations and allows more than 8 characters.

                    Edit: confused different behaviour of two same named commands.
                    Last edited by Leonardo Guizzetti; 07 Aug 2022, 11:30.

                    Comment


                    • #11
                      Ok I'm back at my computer and did a few tests. Mata and Stata abbrev() operate the same way, despite differences in documentation. Unfortunately the sources are built-in. We can infer that, either function first checks to see if the string(s) match existing variables in the dataset, and if so, it will return abbreviation(s) that are guaranteed to resolve to a single, unique variable(s) (using for instance, -unab-). However, any arbitrary strings are allowed as inputs, so it will still perform the requested abbreviation. Should attempt to resolve the "unabbreviation" of a string that was not abbreviated from a variable name, we should expect an error because there is no logical result.

                      Code:
                      . sysuse auto, clear
                      
                      . // abbreviation of a variable
                      . di abbrev("displacement", 8)
                      displa~t
                      
                      . di abbrev("displacement", 9)
                      displac~t
                      
                      . // check
                      . unab one : `=abbrev("displacement", 8)' `=abbrev("displacement", 9)'
                      
                      . di "`one'"
                      displacement displacement
                      
                      . // one variable with a subtle typo -- note the unique abbreviation
                      . clonevar displaecment = displacement
                      
                      . di abbrev("displaecment", 8)
                      di~cment
                      
                      . di abbrev("displaecment", 9)
                      displae~t
                      
                      . // check
                      . unab two : `=abbrev("displaecment", 8)' `=abbrev("displaecment", 9)'
                      
                      . di "`two'"
                      displaecment displaecment
                      
                      . // abbreviate a word that is not a variable name
                      . di abbrev("acceleration", 9)
                      acceler~n
                      
                      . // check -- error thrown
                      . unab three: `=abbrev("acceleration", 8)'
                      variable accele~n not found
                      Note: Mata commands omitted but returns identical results.

                      Comment

                      Working...
                      X