Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Temporary names for a scalar: Dangerous advice

    In the technical notes to the command scalar in the [P] manual two methods of naming and refering to scalars which avoid conflicts with the names of already existing variables are discussed. In this discussion the use of the pseudofunction scalar() is deprecated in favor of using tempname:
    One solution—and not a good one—is to place the scalar() pseudofunction around the names of all your scalars when you use them. A much better solution is to obtain the names for your scalars from Stata’s tempname facility; see [P] macro.
    However, this advice is dangerous because you can run into situations where this method produces unintended results. Although it is argued that using tempname is safe because Stata will take care that the (internal) names actually used when employing tempname are unique, this is not quite true if the dataset contains variables created by Stata's tempvar facility; see [P] macro. The example below demonstrates that it is possible that you happen to use a dataset which contains a variable named __000000 which has previously been created by using tempvar. If this happens while you are creating a scalar using a name specified by tempname, Stata will not recognize if there is a conflict of names. As a consequence, the internal name of a temporary scalar may also be __000000, thus there will be a scalar and a variable with the same name.

    But if a variable and a scalar have the same name, Stata will always use the variable, not the scalar (see the technical notes in [P] scalar and Kolev, 2006). The use of tempname does not help in this instance, because the selection rule (data-variable in preference to a scalar) also applies to internal variables and scalars to which the macro names created by tempvar or tempname refer to.

    The following demonstration defines and makes use of four programs:
    • foo1: This program is used to create a temporary variable using tempvar. Because the temporary variable is not yet dropped when saving the dataset, the new dataset will contain the variable __000000. This may happen accidentally, because in [P] macro (section "The tempvar, tempname, and tempfile commands") a cursory reader may understand the statement
      Another advantage of temporary variables is that you do not have to drop them. Stata will do that for you when your program terminates, regardless of the reason for the termination.
      as if it could not do any damage not to actively drop a temporary variable. This, however, would be a mistake (see below).
    • foo2: Using this program demonstrates that Stata does not recognize a conflict of names when creating a temporary scalar which internally will be named __000000. As a consequence, when trying to use the temporary scalar (as intended), what Stata actually does is to take the first value of the variable __000000 instead of the scalar. Note that a second use of foo2 will not encounter this problem because after finishing foo2 Stata not only drops the temporary scalar but also the variable __000000.
    • foo3: This program is identical to foo2 except that it uses the pseudofunction scalar() — not instead of a temporary scalar using Stata's tempname facility but simultaneously. This avoids the problem encountered with foo2 and shows that the user is ill advised to use tempname instead of the pseudofunction scalar().
    • foo0: To shed light on the question under which conditions Stata deletes the variable __000000 from the dataset, foo0 demonstrates that any command which creates a temporary scalar with a name that conflicts with an already existing variable (such as recode) will delete the variable with a conflicting name. The user thus should never try to save temporary variables because using a dataset containing such variables runs the risk to accidentally lose them.
    Code:
    * Demonstration:
    
    version 11.2
    set more off
    
    * -------------------------------------------------------------------------------
    cap program drop foo1
    * foo1 will "accidentally" save a temporary variable:
    program define foo1
      args data foovar
      sysuse `data', clear
    
      tempvar newvar
      gen `newvar' = abs(`foovar')
    
    * ... // additional things
    
      save `data'_new   // note: tempvar _000000 will be saved!
    end
    
    * -------------------------------------------------------------------------------
    cap program drop foo2
    /* If a (temporary) variable _000000 exists already in the dataset, -foo2- will
       (accidentally) use the first value of this variable instead of the value of
       the temporary scalar __000000 because when creating the scalar Stata will not
       recognize that the variable __000000 exists already - if there is a variable
       and a scalar using the same name Stata will always use the variable: */
    program define foo2
      args foovar
      tempname fooval
      qui sum `foovar'
      sca `fooval' = max(1,r(max))
    
    * ... // additional things
    
    /* Now change foovar dependent on the value of `fooval'
       (why and how is not important for the argument):     */
    
      local dec = ceil(log10(`fooval'))
      di _n as res "Check: dec = " `dec' ", fooval = " `fooval' _n
      if (round(10^`dec'-1) > `fooval') local maxmi : di round(10^`dec'-1)
      else local maxmi : di round(10^(`dec'+1)-1)
      recode `foovar' (. = `maxmi')
    end
    
    * -------------------------------------------------------------------------------
    cap program drop foo3
    /* -foo3- will avoid to accidentally use a (temporary) variable __000000 (if it
       exists) instead of the temporary scalar because of pseudofunction scalar()
       avoids the conflict of names of the variable and the scalar: */
    program define foo3  // always use -scalar()- !
      args foovar
      tempname fooval
      qui sum `foovar'
      sca `fooval' = max(1,r(max))
    
    * ... // additional things
    
    /* Now change foovar dependent on the value of `fooval'
       (why and how is not important for the argument):     */
    
      local dec = ceil(log10(scalar(`fooval')))
      di _n as res "Check: dec = " `dec' ", fooval = " scalar(`fooval') _n
      if (round(10^`dec'-1) > scalar(`fooval')) local maxmi : di round(10^`dec'-1)
      else local maxmi : di round(10^(`dec'+1)-1)
      recode `foovar' (. = `maxmi')
    end
    
    * -------------------------------------------------------------------------------
    cap program drop foo0
    /* This program will do nothing to the data in memory but running it will remove
       the (temporary) variable __000000 (if it exists) because the command -recode-
       will also use a temporary variable and Stata will remove both, the variable
       __000000 and the temporary variable created by -recode-. However, -recode-
       will  issue an error if a (temporary) variable __000000 exists already and no
       -tempname- or -tempvar- command has been used: */
    program define foo0
      args foovar
      tempname fooval
      gen new = `foovar'
      recode new (1=0) (0=1)  // works only if tempname or tempvar has been used
      drop new
      di _n "This program did nothing but nevertheless dropped variable __000000" _n
    end
    
    * ===============================================================================
    clear
    foo1 auto price
    
    use auto_new, replace
    
    /* Note that the first case of price has the value 4099 which
       is identical to the first value of (tempvar) __000000: */
    
    di "price[1] = " price[1] " = __000000[1] = " __000000[1]
    
    * -------------------------------------------------------------------------------
    clonevar rep78_1 = rep78
    clonevar rep78_2 = rep78
    
    tab rep78, mi
    foo2 rep78_1   // "wrong" result because (tempvar) __000000 exists
    tab rep78_1
    
    foo2 rep78_2   // "correct" result because __000000 no longer exists
    tab rep78_2
    
    * -------------------------------------------------------------------------------
    use auto_new, replace
    
    di "price[1] = " price[1] " = __000000[1] = " __000000[1]
    
    * -------------------------------------------------------------------------------
    clonevar rep78_1 = rep78
    clonevar rep78_2 = rep78
    
    tab rep78, mi
    foo3 rep78_1   // "correct" result even if (tempvar) __000000 exists
    tab rep78_1
    
    foo3 rep78_2   // "correct" result
    tab rep78_2
    
    * -------------------------------------------------------------------------------
    use auto_new, replace
    
    di "price[1] = " price[1] " = __000000[1] = " __000000[1]
    
    * -------------------------------------------------------------------------------
    foo0 foreign
    
    * di "price[1] = " price[1] " = __000000[1] = " __000000[1]
    
    * -------------------------------------------------------------------------------
    erase auto_new.dta
    To summarize, I recommend to always use the pseudofunction scalar() in programs which use temporary scalars. Additionally, the manual entries should be adapted accordingly to warn the user of possibly unintended side effects of not following this advice.

    A possibility to avoid the problem in the first place would be to change Stata's save (and saveold) command in such a way that temporary variables will not automatically be saved. Simultaneously an option such as "keeptemp" could be added that would allow users to save temporary variables if they really want to. At any rate, also the current manual entries of [P] save and [P] macro should alert the reader of possible unintended side effects (as demonstrated above) when using and saving temporary variables.

    I know that changing the functionality of save risks to break code of already written programs, but on the other hand there may be programs in use which do not take into account the unintended side effects as demonstrated. Thus, the question is which does more harm: Breaking code of already written programs or having programs in use which might fall into the trap of conflicting names of variables and scalars.

  • #2
    Interesting. do the problems potentially go beyond this, e.g. If I have accidentally saved a temporary variable in a data file could that create problems with other commands that use temporary names?
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      If you want to fix this without zapping backward compatibility, maybe Stata could just check to see if there are any variables with temporary names and then avoid using those names as it creates new ones.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      Stata Version: 17.0 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Originally posted by Richard Williams View Post
        If you want to fix this without zapping backward compatibility, maybe Stata could just check to see if there are any variables with temporary names and then avoid using those names as it creates new ones.
        Yes, this seems to be the best solution to the problem: It would not break backward compatibility and even if some programs would produce different results, the new behavior will most likely produce the correct (intended) result because it is extremely unlikely that someone intentionally used the unintended side effect of conflicting names of temporary variables and scalars.

        Comment

        Working...
        X