Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging does not have to be unique, is this possible?

    Hi,

    I am working with stata and i have a question so i can continue my thesis.
    I have got a dataset that i want to merge with another dataset. To do so, i want to use two variabels. the year and the FIPS.
    However, the FIPS is not unique for every company, neither is the fyear,, so i can not use merge. 1:1
    Is there another way to merge them together? Since it is possible that the FIPS and the fiscal year is the same for different company's and that should not be changed.

    Thank you!

  • #2
    Hi Robin,

    You can use a many to 1, a 1 to many or many to many merge. If you type in Stata
    Code:
    Help merge
    you will find more info on how to do a 1:m, m:1 or m:m search
    Best,
    Rhys

    Comment


    • #3
      Robin Hoed

      In post #2 merge m:m (many-to-many) is suggested. Before I give my advice on the question in post #1, I want to firmly steer you away from merge m:m.

      The following is copied word-for-word from the documentation of the merge command in the Stata Data Management Reference Manual PDF included in the Stata installation and accessible from Stata's Help menu.
      m:m merges

      m:m specifies a many-to-many merge and is a bad idea. In an m:m merge, observations are matched within equal values of the key variable(s), with the first observation being matched to the first; the second, to the second; and so on. If the master and using have an unequal number of observations within the group, then the last observation of the shorter group is used repeatedly to match with subsequent observations of the longer group. Thus m:m merges are dependent on the current sort order—something which should never happen.

      Because m:m merges are such a bad idea, we are not going to show you an example. If you think that you need an m:m merge, then you probably need to work with your data so that you can use a 1:m or m:1 merge. Tips for this are given in Troubleshooting m:m merges below.
      If merge m:m seems to be necessary, it is a near certainty that at least one of the following is true:

      1. Your merge key actually does uniquely identify the observations in one of your data sets, so you can use merge 1:m or merge m:1 or maybe even merge 1:1.

      2. You are failing to take account of one or more additional variables in your data set that, combined with the variables you are trying to use as the merge key, uniquely identify the observations in one or both of your data sets, so you can use merge 1:m or merge m:1 or merge 1:1 with the expanded merge key.

      3. You are really trying to accomplish what joinby, a different command, does: create in the output dataset every possible combination of an observation from the first dataset and an observation from the second dataset, both having the same key. (SQL users take note! I fell for this myself: to Stata m-to-m does not mean m-by-m.)

      4. You actually need to append your datasets rather than merge them.

      5. The data sets you are trying to merge are incorrectly configured or contain data errors that need to be fixed.

      So back to your question.

      All that is necessary is that your merge variables uniquely identify the observations in one of the two datasets you want to merge. I'd like to say more, but your question really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what you have shared. If what's been posted thus far doesn't help you find a suitable command, please help us help you. Show example data from each of the two datasets that demonstrates what your data is like. The Statalist FAQ provides advice on effectively posing your questions, posting data, and sharing Stata output. It's particularly helpful to copy output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data.
      Last edited by William Lisowski; 24 Apr 2021, 13:27.

      Comment

      Working...
      X