Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with merging and uniquely identify observations

    Dear all,

    We have tried excessively to merge two datasets by cusip and fiscal year. However, Stata keeps giving the error, which says that our variables do not uniquely identify observations in the master data. We know that in one of the databases there are multiple firm observations (with the same cusip) per year, but even merging many to one does not work. Is there anyone that can give us tips regarding trying to merge this 1:1, m:1 or 1:m, and not m:m?

    We particularly tried this command: merge 1:m cusip_8d fyear using ExecuComp_Query.dta in Stata 14. But we also tried to create a new variable by combining cusip and fiscal year to ahve a unique identifier, but this also didn't work.

    Thank you in advance for all help!

  • #2
    Separately for each dataset, run the command
    Code:
    isid cusip_8d fyear
    You will find that for both datasets, you have fyear values for which the same cusip_8d appears more once. You do not seem to expect there to be duplicates in the master, since you wanted to do the merge 1:m that would be appropriate with duplicates in the using data. You now need to find out why your master data has duplicates and decide what you need to do about them. The duplicates command, sepecially
    Code:
    duplicates report cusip_8d fyear
    duplicates examples cusip_8d fyear
    will help you understand the source of the problem.

    Let me warn you right now: if you have duplicates in each dataset, and you decide you need to include all possible combinations of matches, then you will need to use the joinby command. The merge m:m command never does what people think it does, and it is never appropriate in a case like this.

    Comment


    • #3
      Since it seems you are using common corporate finance datasets, it may be helpful to include some information about the structure of the data in the master file.

      A problem you may have is that Execucomp nests executives (identified by co_per_rol) within companies (identified by gvkeys), while the cusip is an identifier for securities. The fact that many firms issue multiple securities may be causing an issue for you if you are trying to use cusip to identify the firm rather than their securities.

      WRDS, a provider of corporate finance datasets, has a lot of useful guidance. It's possible that you need something like the CRSP-Compustat Merged file: https://wrds-www.wharton.upenn.edu/p...at-merged-ccm/

      Again, more detail about your master file will help us.

      Comment


      • #4
        Hi all,

        I am coping with the same problem of trying to merge two datasets without success. The Compustat masterfile contain the variables cusip and fyear, with cusip as string and fyear as numerical. Being unable to transform cusip into numerical because of cusip containing non-numericals, I transformed fyear to string and generated cusipfyear=cusip+fyear. The KLD (MSCI) database used to merge underwent exactly the same process. When trying to merge, the error code r(111) variable cusipfyear not found appears.

        Any help is welcome here.

        Btw duplicates were removed, and the merge option 1:1 is used.
        Last edited by NORMAN DAVELAAR; 24 May 2022, 00:38.

        Comment


        • #5
          Hi all,

          after searching somewhat further the problem seems to be the length of the string. KLD applies str12, whilst Compustat applies str13 (that is for the combination cusipfyear). So probably this is the problem to be solved.

          Comment


          • #6
            Originally posted by NORMAN DAVELAAR View Post
            Being unable to transform cusip into numerical because of cusip containing non-numericals, I transformed fyear to string and generated cusipfyear=cusip+fyear. The KLD (MSCI) database used to merge underwent exactly the same process.
            You don't need to merge on combinations of variables having the same datatype.

            If the corresponding CUSIP and fiscal year variables in the two datasets have the same variable names (and each is of the same datatype in both datasets), then you can do the following.
            Code:
            merge 1:1 cusip fyear using kld

            Comment

            Working...
            X