No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Should I copy codes or merge variable?

    Currently, I have a survey dataset containing 62 variables. Last week, I was focusing on 5 variables. That do file has about 130 lines. This week, I am focusing on some other variables. This new do file has nearly 100 lines. Now, I need one variable (only one var) from last week do file into this week do file. This one var I need is not from original dataset.

    So I have 2 methods. First, I can just copy and paste codes from last week do file into this week do file. Another method is using merge.

    Can any senior researchers tell me which method is better? Also, how to merge in this case (I mean I just need one var)?

    Many thanks in advance!

  • #2
    When you say "this one var I need is not from the original dataset" that would seem to imply that it is a variable you created in the first do-file. I'm guessing you must have saved that variable in a data set somewhere, otherwise -merge-ing it in would not be a possibility.

    So I'll interpret your question as asking whether it makes sense to copy code from the first do-file to re-calculate this one variable, or to just -merge- in the the one variable from the file you created last week.

    I would strongly argue for using -merge-. Here are some reasons why:

    1. Copy/paste is not foolproof. It is easy to inadvertently omit a line or two at the boundary of the section of code you need, or inadvertently include extra lines that might corrupt the calculation.

    2. Suppose a few weeks from now you decide you would like to calculate the variable differently. (Perhaps to correct a mistake, perhaps because you want to try a different "model" of the variable.) If you copy/paste the code, you now have to modify the code in two places--and it's easy to forget that and only change one of them. If you rely on -merge- and just re-run everything, then you only have to change the calculation in its original place and the consequences will automatically propagate down the subsequent calculations without any additional modifications to the code.

    The only circumstance where I would consider copy/pasting the code instead of merging is if the datasets involved are extremely large, so large that the time required to read the file from disk during the -merge- is intolerably long and it would be appreciably faster to calculate the variable from scratch. That situation doesn't arise very often, really only with enormous multi-gigabyte data sets.

    As for how to do the merge, you don't supply enough information about your data sets to give you specific code. But to just focus on the fact that you "just need one var," I call your attention to the -keepusing()- option of the -merge- command. If you specify -keepusing(the_one_var_I_need)- in your -merge- command, Stata will omit the other variables.


    • #3
      Great, thank you so much!