Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge, append, joinby, or cross?

    Hello Statalisters,

    I have multiple datasets I am trying to organize/merge/append them -- basically "put them together". Let's say there are 100 cases total, and 40 variables, and one is a matching ID variable. I have ten small datasets, each with a subset of variables and participants. I have a master file with MOST participants and MOST variables. How can I "insert" each of these smaller datasets into the larger one -- I want to both overwrite some of the current missings for existing participants in the master file with the smaller datasets, but each time also I want to add a couple of variables and add a couple of participants, too.

    I've looked at merge, append, joinby, and cross, but I can't seem to figure out how to get the write-over function to work.

    Thank you much, in advance!

    Jeanne

  • #2
    Without actually seeing examples of these data sets, along with a very explicit description of what the resulting data set should like like, it is difficult to be certain. But from your description, it is most likely that you want to use -merge-. To overwrite values in the master data set with values from the other data set, you have to specify the -update replace- options. Now, you have to be careful how you do that, because it will do the overwrite on every variable you bring in. Say you have your master data set, and the other data set you are bringing in contains variables x1 x2 y1 and y2. Suppose you want to overwrite the master data set's missing values of x1 and x2, but for y1 and y2 you want to retain the master's original values, then you would do it like this:

    Code:
    use master_data_set, clear
    merge 1:1 id_variables using other_data_set, keepusing(x1 x2) update
    tab _merge
    drop _merge
    merge 1:1 id_variables using other_data_set, keepusing(y1 y2)
    tab _merge
    drop _merge
    Bear in mind that this code will only import the other data set's values of x1 and x2 for observations in which x1 and x2 have missing values in the master data set. If the master data set's values are non-missing, they will be retained. If you wanted to overwrite non-missing values (so, "correcting" the data rather than "completing" the data), then you have to specify both -update- and -replace-.

    If the other data set contains ID's with observations that have not match in the master data set, new observations will be created for them, but they will contain missing values for any variables that are not found in the other data set.

    If you have an "other" data set that contains only new participants, then that is best put in with -append- rather than -merge-, and in that situation there is clearly no issue about overwriting.

    I've gone a bit out on a limb offering this advice without actually seeing your data. While you have tried to describe it clearly, even the best description is no substitute for showing data examples with the -dataex- command, and things may not be as I imagine them. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    So if this advice does not lead you to the results you want, do post back, this time showing examples of the data sets, the sequence of commands you used to try to put them together, an example from the result data set (assuming you got one, and not just error messages), and an explanation of how the results you got differ from what you were aiming for.

    Comment


    • #3
      Hello Clyde, and WOW, thank you for generously sharing your time and expertise. I will try the solution you shared and if it doesn't work, I'll be back with some dataex examples, too. Many, many thanks!

      Comment

      Working...
      X