Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is there any way to keep all modified variables?

    I'm working with the ACS PUMS for a project. I'm recoding and relabeling a few variables which I want to look at over time. To keep the file size manageable I'm planning on dropping the variables I'm not working with.

    I think this is a long shot but figure it doesn't hurt to ask: is there any way to tell Stata to keep only variables that have been modified? It'd be really convenient to add a line in my .do file that says "If a variable has been mentioned in a command or generated keep it, otherwise drop it."

    It's not a big deal to just list the variables, but I think it'd be a cool feature.

  • #2
    Brandon,

    I don't know that there is a way to do just what you're asking. If you are always creating new variables rather than changing original ones, which many feel is bad practice anyway, you can just drop all the original variables by using a variable list that starts with the first original variable and ends with the last original variable. Some may have more sophisticated solutions.

    Lance

    Comment


    • #3
      Some commentators favor creating new variables rather than modifying old variables. I can see the argument either way - new variables means that a variable means what it means the same anywhere in the program, but does tend to create a pile of variables.

      If you add something consistent when you create new variables (e.g., label them varname1 or whatever), you might be able to use keep or drop with wildcards to get what you want.

      Comment


      • #4
        I didn't realize some consider creating new variables a bad practice, thanks for letting me know. I just recently graduated and wasn't taught much coding and associated practices during my program.

        It does seem to me that new variables are required at times. For example recoding something like age into a handful of categories in order to provide cleaner summary stats in a report, but then using the original variable for use in a regression. I have been creating new variables in situations where it isn't necessary however, I'll try to be more mindful of that.

        I think your suggestion is a good solution Lance, I should have thought of that. The only thing it won't pick up is situations where I do a recode without generating a new variable, or apply labels to a variable's values.

        Your idea could work as well Phil. Maybe there's an option for the drop or keep command that allows something like "keep if variable name contains 'x'"?
        Last edited by Branden Galley; 28 Oct 2016, 10:14.

        Comment


        • #5
          Maybe there's an option for the drop or keep command that allows something like "keep if variable name contains 'x'"?
          That would be
          Code:
          keep *x*

          Comment


          • #6
            I'd say that keeping the original variables was entirely good practice whenever (a) you might want to go back to them and/or (b) keeping track of what you changed is important.

            But also it's possible to argue that so long as you have the original dataset saved somewhere, you can pull it up and compare.

            -- and as long as you have an audit trail of what you did, meaning here do files and log files, that's keeping the information in equivalent form.

            Comment


            • #7
              Originally posted by Branden Galley View Post
              I didn't realize some consider creating new variables a bad practice, thanks for letting me know. I just recently graduated and wasn't taught much coding and associated practices during my program.
              I think you misread - it is modifying original variables that is deprecated. Creating new ones is generally considered better practice.

              My general habit is to clone pretty much anything I plan to use, using a naming scheme involving prefixes (eg, all identifiers start with "id_"), whether I'm going to modify it or not. Then I just keep all the variables with those prefixes.



              Comment


              • #8
                Originally posted by Clyde Schechter View Post

                That would be
                Code:
                keep *x*
                Thanks! For some reason I didn't consider using an asterisk at the beginning of variable, I've only used it at the end, for example coding 11* as Agriculture to categorize NAICS codes. There's a lot of little things like this I still need to learn. I'll have to come up with some sort of naming scheme I can use that will work with this method and not result in weird variable names. Alternately I realized while I was typing this that I could probably use a loop command to add a "original" suffix or prefix to the original variables and use that to drop them at the end.

                Originally posted by Jeph Herrin View Post

                I think you misread - it is modifying original variables that is deprecated. Creating new ones is generally considered better practice.

                My general habit is to clone pretty much anything I plan to use, using a naming scheme involving prefixes (eg, all identifiers start with "id_"), whether I'm going to modify it or not. Then I just keep all the variables with those prefixes.


                Yes, you're right, I misread that. Thanks for clarifying.

                Comment


                • #9
                  It might be more efficient to use just use -rename-. Type -help rename group- into the command line and I think number 11 in the rules will give you what you need to add a prefix or suffix to all of the variables. Something like,

                  Code:
                  rename * =_
                  should add an underscore suffix to all of the existing variables. If you do that before creating any variables then you don't need to bother with a loop.

                  Lance

                  Comment

                  Working...
                  X