Temporary variables are saved when datasets are saved. This causes surprising behavior, at least to me and to some others. I am creating a post about it so that an explanation this behavior is more discoverable and to suggest some best practices to users.
-help tempvar- explains, "When the program or do-file concludes, any variables with these assigned names are dropped.” Note that there is no exception for -save-. Any temporary variables present will be saved under their [unlabeled] variable names __000000, __000001, etc.
As an example of the consequences, consider starting a fresh Stata session and then running a do-file that contains the following:
And then running a do-file that contains the following:
The last command will cause error "variable __000000 already defined”. This won’t happen if the commands are run interactively, since -tempvar- won’t forget that it has already assigned __000000.
This behavior does not seem useful to me, except in narrow cases applicable only to programmers. Firstly, it is unlikely that any consumers of the saved dataset have any use for what were probably intended to be program- or do-file-scoped variables. Secondly, even if the dataset consumer wanted to use these variables, these variables do not have consistent variable names (e.g. __000004), since the assigned variable names depend on how many times -tempvar- was been used during the Stata session. Thirdly, the assigned [scoped] macro names (e.g. `foo') are not saved with the dataset, so these too cannot be used reliably.
Nevertheless, Stata Technical Support confirms that this is intended behavior. Technical Support was also kind enough to reply to a few of my questions and suggestions.
1. I would like help to mention explicitly that -save- saves tempvars (but not their macro names).
2. I would like help to recommend a course of action if the user does not intend to save tempvars. For example, do you recommend preserve, then drop __*, then save, then restore? Is there a way to distinguish tempvars from other variables named __*, such as those set by some user-written programs?
3. I would like tempvar to remember “where it left off” after loading a dataset with saved tempvars, so that it doesn’t try to reassign them and error. Or is it intended that it isn’t convenient to use tempvar after loading a dataset with saved tempvars? It seems that one possible workaround, if __000000 already exists, is to use tempvar to assign a junk macro to __000000, and only then use tempvar to assign a useful macro to __000001. But this gets ugly if you don’t know how many saved tempvars there were.
Accordingly, be cautious if you are using -tempvar- with -save-. Perhaps temporary variables "are not meant to be carried across environments", but unless you prevent it, they will be. Be aware that if you do not drop your temporary variables before saving your dataset, it may cause errors if anyone opens that dataset and attempts to use -tempvar-, or use any program that itself uses -tempvar-, or use any command that assigns variables similarly (e.g. -marksample-). If you are saving your dataset in the middle of a program or do-file, but want to continue using your temporary variables, you also may want do this within a -preserve-/-restore- block.
-help tempvar- explains, "When the program or do-file concludes, any variables with these assigned names are dropped.” Note that there is no exception for -save-. Any temporary variables present will be saved under their [unlabeled] variable names __000000, __000001, etc.
As an example of the consequences, consider starting a fresh Stata session and then running a do-file that contains the following:
Code:
clear * sysuse auto tempvar zero gen `zero' = 0 save data.dta, replace
Code:
clear * use data.dta tempvar one gen `one' = 1
This behavior does not seem useful to me, except in narrow cases applicable only to programmers. Firstly, it is unlikely that any consumers of the saved dataset have any use for what were probably intended to be program- or do-file-scoped variables. Secondly, even if the dataset consumer wanted to use these variables, these variables do not have consistent variable names (e.g. __000004), since the assigned variable names depend on how many times -tempvar- was been used during the Stata session. Thirdly, the assigned [scoped] macro names (e.g. `foo') are not saved with the dataset, so these too cannot be used reliably.
Nevertheless, Stata Technical Support confirms that this is intended behavior. Technical Support was also kind enough to reply to a few of my questions and suggestions.
1. I would like help to mention explicitly that -save- saves tempvars (but not their macro names).
I will pass along this recommendation. An additional sentence or two clarifying how the temporary variables are stored might be useful.
Yes, your approach of -preserve -> drop -> save -> restore- will do the job. And no, there is no way to distinguish between a user-created variable __000000 and a Stata temporary variable of the same name.
Inconvenience is certainly not the intent, but temporary variables are meant to be just that: temporary. They are not meant to be carried across environments, so Stata always assumes there are none around when it starts working in a new environment (e.g., a new do-file).
Comment