Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ask for help: How to retain non-unique values under same gveky and fyear when merging two dataset

    Hi all,

    I am trying to merge two datasets, dataset A and dataset B. Dataset A has variables gvkey and fyear (but no variable ID). Dataset B has variables gvkey, fyear, and ID. However, there are some groups (gvkey fyear) in dataset B ,the var_ID have more than one value, so when I try to remove duplicates using "duplicates report gvkey fyear", it shows that there are duplicates in terms of gvkey and fyear.
    But using "duplicates report gvkey fyear ID", the result report as 0 duplicate.

    Now, I need to merge these two datasets while keeping all the ID values and their corresponding derived values in dataset B. How can I do this?

    Thanks in advance for your help!

    Best,
    Cristiana

  • #2
    Assuming that gvkey and fyear uniquely identify observations in dataset A it's:
    Code:
    use dataset_A
    merge 1:m gvkey fyear using dataset_B
    If gvkey and fyear do not uniquely identify observations in dataset A then it is not possible to merge these two data sets. What you can do in that situation is pair up every observation in dataset A having a given gvkey and fyear with every observation in dataset B having the same gvkey and fyear. That is done with
    Code:
    use dataset_A
    joinby gvkey fyear using dataset_B
    If gvkey and fyear do not uniquely identify observations in data set A and the above is not a sensible thing to do, then there is something wrong with dataset_A: it shouldn't contain duplicates on gvkey and fyear. So then you need to find those duplicates, figure out what went wrong to cause them to exist, and then fix it.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Assuming that gvkey and fyear uniquely identify observations in dataset A it's:
      Code:
      use dataset_A
      merge 1:m gvkey fyear using dataset_B
      If gvkey and fyear do not uniquely identify observations in dataset A then it is not possible to merge these two data sets. What you can do in that situation is pair up every observation in dataset A having a given gvkey and fyear with every observation in dataset B having the same gvkey and fyear. That is done with
      Code:
      use dataset_A
      joinby gvkey fyear using dataset_B
      If gvkey and fyear do not uniquely identify observations in data set A and the above is not a sensible thing to do, then there is something wrong with dataset_A: it shouldn't contain duplicates on gvkey and fyear. So then you need to find those duplicates, figure out what went wrong to cause them to exist, and then fix it.
      Thank you so much Professor Clyde! I tried merge 1:m gvkey fyear. It runs well and keep what I want.

      Comment

      Working...
      X