Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging datasets

    Hi All,

    I am trying to merge two large datasets (from 1996-2018) namely lfs_append_all.dta and rams_append.dta. Both of these datasets contain person_id, firm_id, plant_id and year. I want to merge the datasets using these 4 keys. Each dataset has only one observation for each person_id, firm_id, plant_id and year. Both dataset contains multiple observation per firm_id, plant_id and year. The command I am using is below:

    clear all
    set more off
    set seed 12345

    use "P:\2021\15\Zariab\lfs_append_all"
    merge 1:1
    person_id firm_id plant_id year using "P:\2021\15\Zariab\rams_append", nogen

    However, I am getting the following result:

    variables person_id firm_id plant_id year do not uniquely identify observations in the master data
    r(459);


    I do not have a clear understanding what is going wrong. Any help would be highly appreciated. I sorted the data by person_id, firm_id, plant_id and year.

    Thanks in advance!

    Zariab Hossain
    Uppsala University
    Last edited by Zariab Hossain; 13 Jun 2023, 09:22.

  • #2
    If you want to merge the data using the four key variables you listed, then you need to put those into the merge command, perhaps like so:

    Code:
    merge 1:1 person_id firm_id plant_id year using "P:\2021\15\Zariab\rams_append", nogen
    If you are trying to get the variables dnr202115 etc (and not any others) from the rams_append dataset, then you might want to do

    Code:
    merge 1:1 person_id firm_id plant_id year using "P:\2021\15\Zariab\rams_append", nogen keepusing(dnr202115 dnr2019129_peorgnr dnr2019129_cfar)
    You might want to spend some time carefully reading through the documentation at

    Code:
    help merge

    Comment


    • #3
      Hi Hemanshu

      Sorry for mentioning the wrong name of the variables. I tried to rename the variables for easier interpretation and forgot that I didn't change the code. I corrected my post. I already performed the thing that you mentioned and got that error.

      Comment


      • #4
        Check the Master file for duplicates. For example:

        Code:
        duplicate report person_id firm_id plant_id year
        To learn more, check out: https://www.stata.com/manuals/dduplicates.pdf

        Comment


        • #5
          Zariab Hossain you say in #1 that "each dataset has only one observation for each person_id, firm_id, plant_id and year", and yet Stata is telling you this is not the case, at least in the "master" dataset lfs_append_all. You need to investigate and see why the dataset does not meet your assertion. You can do something like

          Code:
          duplicates tag person_id firm_id plant_id year, gen(dups)
          br if dups != 0

          Comment


          • #6
            Hi Hemanshu and Ken,

            Yes, I did find duplications in the master dataset. Shall I use the duplicates drop command and then try to merge? Thanks a lot for your quick helps.

            Comment


            • #7
              I don't think we can answer that for you. Are the duplicates definitely "mistakes"? If so, then you can probably drop them. Or is it that you have misunderstood the dataset? Then, likely not. If the multiple observations are meaningful, you may want an m:1 merge instead of a 1:1 merge.

              Comment


              • #8
                Thanks a lot for your great suggestions. I solved the problem.
                Last edited by Zariab Hossain; 13 Jun 2023, 10:08.

                Comment

                Working...
                X