Is there a way to keep only variables referred to in a given .do file?

Danny Lempert

Join Date: Jul 2020

Posts: 12
#1

Is there a way to keep only variables referred to in a given .do file?

16 May 2021, 10:02

Suppose you have dataset containing many variables where you have performed some analyses using only a small number of the variables. Now it is time to upload the data and .do file to a replication archive (Dataverse etc.) for replication purposes. You don't want to upload the entire dataset, but only the relevant variables; specifically, you want to give users a dataset on which they will able to run your .do file containing your analyses, which possibly includes the creation of new variables--but no unnecessary variables.

Is there an efficient way to keep (or otherwise identify) only those variables in a dataset that are referred to in a given .do file (DVs, IVs, weights, in "if conditions", etc.)--but not those that are created within the .do file?

This is not so difficult to do manually, but it would be cool if there's a way automate it.
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

16 May 2021, 13:20

Not really. If you think about it, the possibility of creating variable names for analysis programmatically makes this a complicated task.

Code:

. forvalues i=11/19 { 2. local j=strofreal(`i'-5,"%02.0f") 3. display `"regress y20`i' x`j'var"' 4. } regress y2011 x06var regress y2012 x07var regress y2013 x08var regress y2014 x09var regress y2015 x10var regress y2016 x11var regress y2017 x12var regress y2018 x13var regress y2019 x14var

effectively requires executing the loop to find out that y2011 ... y2019 are regressed on x06var ... x14var.
1 like
Comment
Danny Lempert

Join Date: Jul 2020

Posts: 12
#3

16 May 2021, 15:45

William, Thanks. I guess it's not even the creation of variable names that is problematic, since there is no need to know variable names that are created in the .do file. Rather, it's the fact that one can refer to existing variables like v1 v2 v3 as v1-v3, and also that one can abbreviate variable names in .do files that make this difficult. Probably not impossible, but more trouble than it's worth.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

16 May 2021, 15:56

Just to be clear, my code constructed (probably a better word than created) the names of pairs of already existing variables, it did not create new variables.

It postulates that the dataset contains variables named with y followed by a year number in this century, and with x followed by a 2-digit number followed by var, and runs 9 regressions on pairs of those already-existing variables. But note that you don't see in the code the variable names y2019 and x14var - they appear only in the output of the command that is run.

My general practice is to create an analytical dataset by extracting the variables I need from the larger dataset. And if I find later I need an additional variable, I modify the program that does the extract, rerun it, then rerun all the succeeding programs, modifying them as needed to put the additional variable to use.

That forces me to start by thinking about just what it is I intend to do.

Last edited by William Lisowski; 16 May 2021, 15:59.
1 like
Comment
Danny Lempert

Join Date: Jul 2020

Posts: 12
#5

16 May 2021, 18:40

Originally posted by William Lisowski View Post

Just to be clear, my code constructed (probably a better word than created) the names of pairs of already existing variables, it did not create new variables.

It postulates that the dataset contains variables named with y followed by a year number in this century, and with x followed by a 2-digit number followed by var, and runs 9 regressions on pairs of those already-existing variables. But note that you don't see in the code the variable names y2019 and x14var - they appear only in the output of the command that is run.

My general practice is to create an analytical dataset by extracting the variables I need from the larger dataset. And if I find later I need an additional variable, I modify the program that does the extract, rerun it, then rerun all the succeeding programs, modifying them as needed to put the additional variable to use.

That forces me to start by thinking about just what it is I intend to do.

right, thanks--i ended up getting it from your code--the basic issue is that a variable name does not need to appear in a .do file for it to be used in that .do file.

i ended up messing around with it a little more, but there are too many contingencies to deal with, even assuming the variable name does appear.
Comment
Jocelyn Cherry

Join Date: Jun 2015

Posts: 47
#6

17 May 2021, 03:49

Can you create local macros before you create the do file, and only run them on the vars in the local macro? Then you can just do keep local, resave the dataset under a new name and upload that to the archive?
Comment
Danny Lempert

Join Date: Jul 2020

Posts: 12
#7

30 Jun 2021, 09:54

Originally posted by Jocelyn Cherry View Post

Can you create local macros before you create the do file, and only run them on the vars in the local macro? Then you can just do keep local, resave the dataset under a new name and upload that to the archive?

thanks--sorry for the late response, just saw this. i think that would work if i planned ahead when starting the .do file! but unfortunately, that wasn't the case here....
Comment

Announcement

Is there a way to keep only variables referred to in a given .do file?

Comment

Comment

Comment

Comment

Comment

Comment