Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is there a way to keep only variables referred to in a given .do file?

    Suppose you have dataset containing many variables where you have performed some analyses using only a small number of the variables. Now it is time to upload the data and .do file to a replication archive (Dataverse etc.) for replication purposes. You don't want to upload the entire dataset, but only the relevant variables; specifically, you want to give users a dataset on which they will able to run your .do file containing your analyses, which possibly includes the creation of new variables--but no unnecessary variables.

    Is there an efficient way to keep (or otherwise identify) only those variables in a dataset that are referred to in a given .do file (DVs, IVs, weights, in "if conditions", etc.)--but not those that are created within the .do file?

    This is not so difficult to do manually, but it would be cool if there's a way automate it.

  • #2
    Not really. If you think about it, the possibility of creating variable names for analysis programmatically makes this a complicated task.
    Code:
    . forvalues i=11/19 {
      2.     local j=strofreal(`i'-5,"%02.0f")
      3.     display `"regress y20`i' x`j'var"'
      4. }
    regress y2011 x06var
    regress y2012 x07var
    regress y2013 x08var
    regress y2014 x09var
    regress y2015 x10var
    regress y2016 x11var
    regress y2017 x12var
    regress y2018 x13var
    regress y2019 x14var
    effectively requires executing the loop to find out that y2011 ... y2019 are regressed on x06var ... x14var.

    Comment


    • #3
      William, Thanks. I guess it's not even the creation of variable names that is problematic, since there is no need to know variable names that are created in the .do file. Rather, it's the fact that one can refer to existing variables like v1 v2 v3 as v1-v3, and also that one can abbreviate variable names in .do files that make this difficult. Probably not impossible, but more trouble than it's worth.

      Comment


      • #4
        Just to be clear, my code constructed (probably a better word than created) the names of pairs of already existing variables, it did not create new variables.

        It postulates that the dataset contains variables named with y followed by a year number in this century, and with x followed by a 2-digit number followed by var, and runs 9 regressions on pairs of those already-existing variables. But note that you don't see in the code the variable names y2019 and x14var - they appear only in the output of the command that is run.

        My general practice is to create an analytical dataset by extracting the variables I need from the larger dataset. And if I find later I need an additional variable, I modify the program that does the extract, rerun it, then rerun all the succeeding programs, modifying them as needed to put the additional variable to use.

        That forces me to start by thinking about just what it is I intend to do.
        Last edited by William Lisowski; 16 May 2021, 15:59.

        Comment


        • #5
          Originally posted by William Lisowski View Post
          Just to be clear, my code constructed (probably a better word than created) the names of pairs of already existing variables, it did not create new variables.

          It postulates that the dataset contains variables named with y followed by a year number in this century, and with x followed by a 2-digit number followed by var, and runs 9 regressions on pairs of those already-existing variables. But note that you don't see in the code the variable names y2019 and x14var - they appear only in the output of the command that is run.

          My general practice is to create an analytical dataset by extracting the variables I need from the larger dataset. And if I find later I need an additional variable, I modify the program that does the extract, rerun it, then rerun all the succeeding programs, modifying them as needed to put the additional variable to use.

          That forces me to start by thinking about just what it is I intend to do.
          right, thanks--i ended up getting it from your code--the basic issue is that a variable name does not need to appear in a .do file for it to be used in that .do file.

          i ended up messing around with it a little more, but there are too many contingencies to deal with, even assuming the variable name does appear.

          Comment


          • #6
            Can you create local macros before you create the do file, and only run them on the vars in the local macro? Then you can just do keep local, resave the dataset under a new name and upload that to the archive?

            Comment


            • #7
              Originally posted by Jocelyn Cherry View Post
              Can you create local macros before you create the do file, and only run them on the vars in the local macro? Then you can just do keep local, resave the dataset under a new name and upload that to the archive?
              thanks--sorry for the late response, just saw this. i think that would work if i planned ahead when starting the .do file! but unfortunately, that wasn't the case here....

              Comment

              Working...
              X