Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PSA: temporary variables are saved with datasets

    Temporary variables are saved when datasets are saved. This causes surprising behavior, at least to me and to some others. I am creating a post about it so that an explanation this behavior is more discoverable and to suggest some best practices to users.

    -help tempvar- explains, "When the program or do-file concludes, any variables with these assigned names are dropped.” Note that there is no exception for -save-. Any temporary variables present will be saved under their [unlabeled] variable names __000000, __000001, etc.

    As an example of the consequences, consider starting a fresh Stata session and then running a do-file that contains the following:
    Code:
    clear *
    sysuse auto
    tempvar zero
    gen `zero' = 0
    save data.dta, replace
    And then running a do-file that contains the following:
    Code:
    clear *
    use data.dta
    tempvar one
    gen `one' = 1
    The last command will cause error "variable __000000 already defined”. This won’t happen if the commands are run interactively, since -tempvar- won’t forget that it has already assigned __000000.

    This behavior does not seem useful to me, except in narrow cases applicable only to programmers. Firstly, it is unlikely that any consumers of the saved dataset have any use for what were probably intended to be program- or do-file-scoped variables. Secondly, even if the dataset consumer wanted to use these variables, these variables do not have consistent variable names (e.g. __000004), since the assigned variable names depend on how many times -tempvar- was been used during the Stata session. Thirdly, the assigned [scoped] macro names (e.g. `foo') are not saved with the dataset, so these too cannot be used reliably.

    Nevertheless, Stata Technical Support confirms that this is intended behavior. Technical Support was also kind enough to reply to a few of my questions and suggestions.

    1. I would like help to mention explicitly that -save- saves tempvars (but not their macro names).
    I will pass along this recommendation. An additional sentence or two clarifying how the temporary variables are stored might be useful.
    2. I would like help to recommend a course of action if the user does not intend to save tempvars. For example, do you recommend preserve, then drop __*, then save, then restore? Is there a way to distinguish tempvars from other variables named __*, such as those set by some user-written programs?
    Yes, your approach of -preserve -> drop -> save -> restore- will do the job. And no, there is no way to distinguish between a user-created variable __000000 and a Stata temporary variable of the same name.
    3. I would like tempvar to remember “where it left off” after loading a dataset with saved tempvars, so that it doesn’t try to reassign them and error. Or is it intended that it isn’t convenient to use tempvar after loading a dataset with saved tempvars? It seems that one possible workaround, if __000000 already exists, is to use tempvar to assign a junk macro to __000000, and only then use tempvar to assign a useful macro to __000001. But this gets ugly if you don’t know how many saved tempvars there were.
    Inconvenience is certainly not the intent, but temporary variables are meant to be just that: temporary. They are not meant to be carried across environments, so Stata always assumes there are none around when it starts working in a new environment (e.g., a new do-file).
    Accordingly, be cautious if you are using -tempvar- with -save-. Perhaps temporary variables "are not meant to be carried across environments", but unless you prevent it, they will be. Be aware that if you do not drop your temporary variables before saving your dataset, it may cause errors if anyone opens that dataset and attempts to use -tempvar-, or use any program that itself uses -tempvar-, or use any command that assigns variables similarly (e.g. -marksample-). If you are saving your dataset in the middle of a program or do-file, but want to continue using your temporary variables, you also may want do this within a -preserve-/-restore- block.

  • #2
    FWIW, in do-files and programs you might want to save at the highest level only. Here are example layouts:

    Code:
    // begin main do-file
    ...
    do sub_dofile.do
    save ...
    ...
    // end main do-file
    
    // begin sub do-file
    ...
    tempvar tmp
    generate `tmp' = exp
    ...
    // end sub do-file
    Code:
    program foo
        
        ...
        
        bar
        
        save ...
        
    end
    
    
    program bar
        
        ...
        
        tempvar tmp
        generate `tmp' = exp
        
        ...
        
    end

    Comment


    • #3
      Originally posted by Nils Enevoldsen View Post
      -help tempvar- explains, "When the program or do-file concludes, any variables with these assigned names are dropped.” Note that there is no exception for -save-.
      Neither should there be an exception for save. The exception would be save not saving temporary variables in the dataset. If save did not save those variables, that exception should be documented in the help file for save.


      Because the topic came up in another context, let me add that the documentation for tempvar does not refer to any commands. The statement refers to the dataset in memory. Here is the expected behavior:

      Code:
      sysuse auto
      tempvar foo
      generate `foo' = 42
      // save myauto // if uncommented, saves the temporary variable
      describe
      If you run the code above from a do-file, you will notice that describe lists a temporary variable at the end of the dataset. Note that this variable is no longer present when the do-file concludes. This is exactly the situation described in the documentation.

      Comment


      • #4
        daniel klein is correct that the documentation for save should mention it if temporary variable would not be saved, and hence that the documentation for save is correct. However, it would be much clearer and would avoid unexpected results if the documentation of tempvar would explicitely mention that temporary variables will be saved by save and that this could have unintended consequences. I myself stumbled across this issue already a long time ago, see "Temporary names for a scalar: Dangerous advice".

        Comment

        Working...
        X