Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error when trying to merge different datasets

    I am investigating the returns to education and I have 4 different stata files each with roughly the same individuals. I have appended 9 waves together in each of the 4 stata files, each corresponding to a different year. So I have 4 different stata files each with 9 waves and roughly the same individuals but each with different variables. The problem arises because I have the log of earnings, education and other variables for year 1 to year 9 in one file and other variables I would like to use in the other stata files. I have already found the common variable in all datasets.
    The problem that occurs is that when I try the command:

    merge 1:1 commonvariable using file.dta

    I always get the same error: variable commonvariable does not uniquely identify observations in the master data
    Is this a problem because of the multiple years? Should I seperate the years and merge all the variables needed in year specific files and then append them all together? Or is there another command that could possibly work? Perhaps joinby?

    Thanks in advance

  • #2
    You say you have found the common variable in all the data sets, but I don't believe you. From what you describe there should not be a single common variable that identifies observations. By having appended together 9 waves on the same individuals in each data set, you need, at a minimum, two variables to uniquely identify observations: one is the person identifier (or perhaps several variables are used to jointly identify a person) and the other is the wave number or year of the survey. You do not show any data examples but my hunch is that you need to do

    Code:
    use dataset1, clear
    merge 1:1 person_identifier(s) wave_or_year using dataset2
    Replace the italicized material with the names of the corresponding variables in your data set.

    Do not use -joinby- here. It is clearly not appropriate for what you're trying to do. And whatever you do, do not even think about -merge m:m-. If what I'm suggesting above doesn't solve your problem, post back with examples from both data sets, created using the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thank you! will do next time

      Comment

      Working...
      X